update#1
Closed
nanofatdog wants to merge 157 commits into
Closed
Conversation
Give resolvePythonModuleMember the same absolute-dotted-path fallback that resolveModuleImportToFile already uses, so a `module.func()` call after `from pkg import module` / `import pkg.module as module` records its `calls` edge. Adds a regression test and a CHANGELOG entry. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add a resolution-phase pass (goCrossFileMethodContainsEdges) that links a Go method to its same-named receiver type within the same package (= directory), so a method declared in a different file from its `type` is no longer orphaned from the struct. Runs before goImplementsEdges so cross-file methods also count toward interface satisfaction (#584). Adds a regression test + CHANGELOG entry. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…mar (#237) (#717) Vendor tree-sitter-c-sharp 0.23.5 (ABI 15) for C#, replacing the bundled ABI-13 build that dropped primary-constructor classes. Adds native primary-ctor parsing, primary-ctor parameter dependency edges, return-type extraction via the renamed `returns` field, and a preParse that blanks `#if` directive lines the new grammar mis-parses inside enum bodies. Validated on MediatR / eShopOnWeb / Newtonsoft.Json + full suite. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…383) (#722) Spring `application.{properties,yml}` keys (and Shopify Liquid `{% schema %}` blocks) were storing the config VALUE in the node docstring, and `codegraph_explore`'s source section re-read the raw `key = value` line off disk — so a secret committed to a config file (DB password, API key, JDBC URL with embedded credentials) could be pushed into an agent's context via explore/node output without the agent ever opening the file. Config-leaf nodes (`kind: 'constant'` in a config language) now surface the KEY only, via a shared `isConfigLeafNode` predicate applied at both surfacing paths: the value is dropped from extraction, `getCode`/`includeCode` returns the key instead of the file line, and explore excludes config leaves from source rendering. The predicate can't match real code (real constants are ts/java/go/…), so `@Value`/`@ConfigurationProperties` resolution and impact are unaffected. Adds a regression test asserting a planted secret never appears in `codegraph_explore` / `codegraph_node` output while the keys still resolve. Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ot reads (#527) (#724) * fix(security): resolve symlinks in path validation to block out-of-root reads (#527) validatePathWithinRoot was purely lexical (path.resolve + startsWith), so an in-repo symlink whose logical path is inside the project root but whose real target escapes it passed validation — and both content-serving read sinks (codegraph_node includeCode, codegraph_explore source) then readFileSync'd it, leaking out-of-root file contents (e.g. ~/.ssh, /etc) to the agent. Add a realpath layer: after the lexical check, resolve symlinks on both the candidate path and the root and re-compare, rejecting anything whose real path escapes the root. An in-root symlink is still allowed (no over-blocking). Comparison is case-insensitive on Windows (NTFS + realpath casing). Not-yet- existing paths (ENOENT) fall back to the lexical result so about-to-be-written files still validate; other resolution errors reject. Removes the dead, never-called isPathWithinRoot / isPathWithinRootReal helpers (the latter a footgun — it returned true on realpath failure). Adds RED->GREEN tests: in->out file/dir symlinks rejected, in->in allowed, ../ rejected, ENOENT allowed, plus an end-to-end test proving getCode no longer serves an out-of-root file reached through a dir symlink. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * docs(changelog): note the #527 symlink path-escape fix --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The "Attached to shared daemon" line is benign INFO, but it was written to stderr — and MCP hosts render all server stderr at error level (and append an `undefined` data field), so on every session start a healthy attach showed up as `[error] … undefined`. It is now gated behind CODEGRAPH_MCP_LOG_ATTACH=1: silent by default, opt-in for debugging daemon attach. Both attach sites (runProxy + connectWithHello) route through one helper. The daemon integration tests opt the harness into the log so their attach assertions still observe a successful attach. Re-applies the approach from #640 by @mturac. Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…w node mode (#733) * feat(mcp): steer agents to codegraph during implementation, not just Q&A Two changes targeting agents that reach for Read during edits instead of codegraph: 1. Reframe the agent-facing steering (server-instructions + codegraph_node/explore descriptions): drop "consult BEFORE ... not during"; position codegraph_node as the Read upgrade for a named symbol (verbatim current on-disk source, safe to Edit from, + caller/callee trail), explore PRIMARY / node SECONDARY, with the "cached intelligence — better context, fewer tokens" framing. 2. File-view mode: codegraph_node now accepts a `file` with no `symbol` and returns that file's symbol map + graph role (its dependents), plus verbatim bodies with includeCode — so it can displace a path-keyed Read, not just a symbol lookup. Resolves a path or basename; dedups nested members; budget-capped. To be A/B'd on an implementation task before shipping (per the retrieval doctrine: steering changes must be measured, not assumed). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * docs(changelog): note codegraph_node file-view + implementation steering --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…sh (#734) Running scripts/agent-eval against a `claude -p` spawned from within a Claude Code session (nested, e.g. from a Bash tool call) makes the codegraph MCP attach unreliable: the server is healthy (full handshake ~165ms) but the nested client marks it status:"pending"/0-tools under CPU/timing contention, so the agent silently runs with no codegraph. NO_DAEMON + `< /dev/null` don't fix it — it's the nested client, not the server. Documented in CLAUDE.md's validation methodology. Adds ab-new-vs-baseline.sh: A/Bs a retrieval/steering change as new-build vs baseline-build (both codegraph-on, isolating the change — vs run-all.sh's with-vs-without), on a throwaway copy of an indexed repo. Run it in a real terminal. Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ock (#735) Corrects the "run non-nested only" conclusion from #734. The codegraph server is healthy (handshake ~165ms); the flakiness is that on a multi-step implementation task the agent dives into Read/grep before codegraph finishes its ~2-3s startup (worse under nested CPU contention), so it runs with no codegraph. Fix: pre-warm a persistent daemon (high idle timeout) + skip the startup re-exec (CODEGRAPH_WASM_RELAUNCHED=1) so claude connects before the agent's first turn. claude's init snapshot can show status:"pending" even when it then connects — judge by actual codegraph usage, not the init line. ab-new-vs-baseline.sh now bakes in the pre-warm + skip-re-exec. Validated: a clean A/B showed the new build's agent used codegraph 2x / 5 Reads vs the baseline's 0 / 8 on the same fully-implemented task. Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…it, byte-parity (#738) Makes codegraph_node a drop-in faster Read for indexed source files (file-read mode: <n>\t<line> like Read, offset/limit, + blast-radius header; symbolsOnly for the map). Fixes the old file-view dropping imports/line-numbers. #383/#527 preserved. Validated by A/B: explore/node already return source + line numbers, so Read=0 when used. Includes the A/B eval harness scripts. Full suite green (1270).
…iases (#634) (#740) TypeScript service/RPC contracts written as a tuple of generic types — `type List = [Service<'query_apply_record', Req, Resp>, …]` — carry their names only as string-literal type arguments, so static extraction never indexed them and `codegraph query query_apply_record` returned nothing. Add a narrow TS/TSX type-alias pass that emits each tuple entry's string-literal name as a `method` node under the alias (qualifiedName `List::query_apply_record`), making it searchable. Scope is limited to a direct literal arg of a generic that is a direct tuple element, with a valid-identifier filter — so utility types (Pick/Omit/Record), deeper nested generics, and route paths produce no noise. Bumps EXTRACTION_VERSION so existing indexes get a re-index hint. Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…#636) (#741) Two environments that share one working tree — most concretely Windows and WSL — can't safely share a single `.codegraph/`: the daemon lockfile records a platform-specific pid + socket (named pipe vs Unix socket), and SQLite locking across the WSL2/Windows filesystem boundary is unreliable, so two daemons over one index risks corruption. Add a `CODEGRAPH_DIR` env var (default `.codegraph`) that overrides the per-project data directory name, so each environment keeps its own index in the same tree (e.g. `CODEGRAPH_DIR=.codegraph-win` on Windows). The name is resolved live and validated (rejects separators / `..` / absolute, falling back to the default with a one-time stderr warning). Indexing and file-watching now skip ANY `.codegraph-*` sibling so neither side trips over the other's data. Routes the previously-hardcoded `.codegraph` literals (db path, lockfile, error log, watcher ignore, file-scan skip, installer) through the resolver. No extraction-version bump — index content is unchanged. Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…645) (#742) A C++ method call whose receiver is another call's result — `Foo::instance().bar()`, `WidgetFactory::create().draw()`, `openSession()->run()`, or the same stored in an `auto` local first — lost the receiver's type during extraction. The callee degraded to a bare method name, so when two classes shared a method name the call silently resolved to whichever was indexed first (or not at all), corrupting callers / impact / trace with a plausible-but-wrong edge. Three parts: - Capture C++ return types (new nodes.return_type column, schema v5): the function_definition's `type` field, normalized — smart-pointer pointee unwrapped, void/primitives dropped. - Preserve the inner-call receiver in extraction: a C/C++ field_expression whose receiver is itself a call is encoded `inner().method` instead of dropping to the bare name. Other languages keep the existing behavior. - New resolution strategy (matchCppCallChain): infer the receiver's class from the inner call's return type, then resolve AND validate the method on it. Handles singletons/accessors, factories returning a different type, free-function factories, make_unique/make_shared/new/direct construction, single-level member chains, and namespace-qualified inner calls. A wrong inference yields no edge, never a wrong one. EXTRACTION_VERSION 2->3 (re-index to populate return types). Validated on the issue repro + spdlog: node count stable (no explosion), deterministic, and ~100 pre-existing wrong `.size()`-style edges removed. Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
#660) (#663) PHP's importTypes only captured namespace_use_declaration, so include/require(_once) — the dependency mechanism in procedural and script-style PHP — never produced edges. callers, impact, and trace missed the entire file-include graph; only namespace `use` became a dependency edge. Capture the four include/require expression types and emit file→file imports edges, reusing the path-based resolution that C/C++ #include already goes through. Only static string-literal paths are resolved (relative to the including file); dynamic forms (include $var, require __DIR__ . '/x', interpolated strings) are skipped. Include PATHS are distinguished from namespace `use` symbols by shape: a path contains '/' or '.', which PHP identifiers and FQNs never do. A path-shaped include that doesn't resolve to a known project file is left unresolved and does NOT fall back to the symbol name-matcher, which would otherwise mis-connect "inc/db.php" to an unrelated db.php elsewhere — a wrong edge is worse than a missing one. Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Co-authored-by: Colby McHenry <me@colbymchenry.com>
…ore (#682) (#743) A .gitignore transparently encrypted in place by corporate DLP / endpoint software (UTF-16 header + ciphertext), or one containing a pattern the `ignore` library can't compile to a regex (`\[` -> "Unterminated character class"), crashed the entire sync/index. The throw is LAZY — it surfaces at match time (`ig.ignores()`), not `.add()` — so the existing add-time try/catch never caught it, and the error never named the offending file. Read .gitignore defensively: skip a file that isn't valid UTF-8 text whole (NUL byte or fatal UTF-8 decode), drop only the individual uncompilable patterns from a text one (probe-compile, then per-line fallback), and warn with the file path. Indexing continues either way. The watcher inherits the fix via buildDefaultIgnore. Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…e file (#693) (#744) A function called only from an anonymous func_literal at package level — a cobra `RunE: func(){…}` handler, a goroutine literal, a callback closure stored in a `var` — had its call leak to the FILE node, because the Go var-initializer walk ran with an empty scope. So `callers`/`impact` showed the function with a file (or no meaningful) caller, unlike JS/TS where an arrow-in-const becomes a named node whose calls attribute correctly. Scope the Go top-level var/const initializer walk to the declared symbol, so a call nested in any func_literal initializer (struct field, slice/map, nested closure) attributes to the enclosing var. EXTRACTION_VERSION 3->4 (re-index to pick up the corrected attribution). Validated on cli/cli (858 Go files): node/edge counts identical, file-level dependents byte-identical (no regression), and 62 top-level-closure calls correctly moved from file-attributed to var-attributed. Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…720) (#745) A multi-word PascalCase query token — typically a project name a user includes (`SuperBizAgent backend routes`) — splits into sub-tokens (superbizagent / super / biz / agent) that ALL match the same path segment, so path relevance summed +5 four times for one concept. In a mixed-stack repo that ~doubled every score of the lexically-matching stack's file, burying the stack the query was about. Score path relevance per original query WORD instead: a word matches a path level if any of its sub-tokens do, and counts once — while still splitting the word (via extractSearchTerms on the original case) so it matches across naming conventions (`getUserName` → `get_user_name`). Distinct words each still contribute. Partial fix: this removes the dominant path over-counting (backend rises from absent-in-top-6 to parity on the reporter's repro). The residual lexical edge from the project name in the FTS class-name match + dir match is a deeper down-weighting change, tracked separately. No re-index needed (query-time). Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…#748) The per-word path fix (#745) brought the backend to parity but not above: the project name still gave the lexically-matching stack a residual dir match + an FTS class-name match, so a backend query that included the project name still ranked the frontend at/above the backend. Derive the project name from go.mod module / package.json name / repo dir, and treat a query word matching it as non-discriminative: drop it from path relevance and from codegraph_explore's PascalCase type-disambiguation bias (reporter's suggestions #1/#2) — unless it's the only query word, so a bare project-name search still scores. Narrow by construction: the down-weighting fires ONLY when a query word matches the derived project name (≥5 chars), so every query that doesn't name the project is byte-identical. On the reporter's repro the backend controllers now top a backend question that includes the project name; queries without it, bare project-name queries, and normal symbol queries are unchanged. Query-time only (no re-index). Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…)` (#608) (#749) A method called through a PHP fluent static factory — `ApiClient::for($c)->createOrder()`, the canonical Laravel per-credential/per-tenant client idiom — produced no `calls` edge: the receiver of `->createOrder` is the `Cls::for(...)` static call, whose result type was never recovered, so the edge was dropped and `codegraph_callers` returned nothing. Same shape as the C++ singleton/factory fix (#645), reusing its return_type column + the chained-call mechanism: - Capture PHP return types (getReturnType): `: self` / `: static` / `$this` stored as the `self` marker, a concrete `: Type` as its short name, primitives/unions dropped. - Encode the chained scoped-call receiver as `Cls::for().method` so the resolver can split it (PHP-gated, in extractCall). - New matchPhpCallChain: look up the factory's return type (`self` → the factory's own class; concrete → that class), then resolve AND validate the method on it — a wrong inference yields no edge, never a wrong one. EXTRACTION_VERSION 4->5 (re-index to populate PHP return types + chained edges). Validated on koel (1383 PHP files): node count identical (no explosion), 0 edges lost, +80 chained-call edges recovered; synthetic tests cover the self-factory, concrete-return, namespace, decoy, and absent-method cases. Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…() (#750) (#751) A Java method called through a static factory or fluent chain — `Foo.getInstance().bar()`, `Config.create(opts).build()` — lost the receiver's type, so the chained method either didn't resolve at all or (when a same-named method existed on an unrelated class) attached to whichever class was indexed first. Ports the #645 (C++) / #608 (PHP) 3-part mechanism: - Part 1: capture Java return types in the extractor (skip void/primitives/arrays, unwrap generics, strip package qualifier). - Part 2: encode a chained-call receiver as `inner().method` with normalized empty parens, so factory calls that take arguments still split. - Part 3: matchJavaCallChain resolves the chained method on the factory's return type, validated via resolveMethodOnType so a wrong inference yields NO edge (never a wrong one). Validated: synthetic decoy + absent-method safety tests; real-repo A/B on google/guava (3,227 files) — node count identical (no explosion), 0 edges lost, +1,507 unique chained edges recovered, precision spot-checked verbatim (Splitter.on().split(), CacheBuilder.newBuilder().recordStats(), GraphBuilder.directed().build(), nested MultimapBuilder.linkedHashKeys().arrayListValues()). EXTRACTION_VERSION 5 -> 6. Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…).bar() (#750) (#752) A Kotlin method called through a companion-object factory, fluent chain, or constructor — `Foo.getInstance().bar()`, `Config.create(opts).build()`, `STMTransaction(f).commit()` — dropped the receiver to a BARE method name, which then name-matched a same-named method on an unrelated class (a wrong edge) or failed to resolve. Ports the #645/#608 mechanism to Kotlin: - Part 1: capture Kotlin return types in the extractor. tree-sitter-kotlin exposes no field names, so the return type is read positionally (the type node after function_value_parameters); inferred/Unit/Nothing returns yield none. - Part 2: encode a CLASS/companion-factory call-receiver chain as `inner().method`. Gated to a capitalized receiver (`Foo.getInstance()` / `Foo(args)`) so instance chains (`list.filter{}.map{}`) keep their bare-name behavior — re-encoding those would only drop the edge, regressing recall in fluent codebases. - Part 3: generalize matchJavaCallChain -> matchDottedCallChain (shared by the JVM dot-notation languages); resolve the method on the factory's return type, or on the constructed class for a Kotlin `Foo(args).method()` receiver. Validated via resolveMethodOnType, so a wrong inference yields NO edge. Validated: synthetic decoy + args + absent-method safety tests; full suite green; real-repo A/B on arrow-kt/arrow (734 .kt) — node count identical (no explosion), +49 validated-correct chained edges, and the removed edges are wrong bare-name guesses the fix correctly stops emitting (419/438 from test/doc files; the 18 from product code are stdlib `.apply{}`, self-loops, and bare-name mismatches) — a net precision improvement, ~0 correct product edges lost. Java path unchanged (constructor branch is Kotlin-gated). EXTRACTION_VERSION 6 -> 7. Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…750) (#753) A C# method called through a static factory or fluent chain — `Foo.Create().Bar()`, `JObject.Parse(s).Property(...)`, `Instant.FromUtc(...).InZone(zone)` — lost the receiver's type, so the chained method didn't resolve and the call was invisible to callers/impact/trace. Ports the #645/#608 mechanism to C# (additive, like Java #751): - Part 1: capture C# return types in the extractor, reading the `returns` field (`static Foo Create()` -> `Foo`); predefined/array/generic/nullable/namespaced types are normalized or skipped. - Part 2: encode a chained `member_access_expression` receiver (`Foo.Create(args).Bar()`) as `inner().Bar` with normalized empty parens, so factory calls that take arguments still split. Non-chained member calls keep their existing `recv.Method` text. - Part 3: resolve via the shared matchDottedCallChain (now Java/Kotlin/C#), validated by resolveMethodOnType so a wrong inference yields NO edge. Known limitation (safe): C# extension-method chains don't resolve, since the method lives on the extension class, not the receiver's type — no edge, never a wrong one. Validated: synthetic decoy + args + absent-method safety tests; full suite green; real-repo A/B on Newtonsoft.Json (945 .cs: +3, 0 lost) and nodatime (488 .cs: +73, 0 lost) — node count identical (no explosion), 0 edges lost, precision spot-checked verbatim (Instant.FromUtc().InZone(), Offset.FromHoursAndMinutes().Plus(), OffsetDateTimePattern.CreateWithInvariantCulture().WithTwoDigitYearMax()). EXTRACTION_VERSION 7 -> 8. Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…754) * feat(resolution): conformance-aware chained-method resolution (#750) A chained static-factory/fluent call whose method lives on a SUPERTYPE the receiver conforms to — a protocol-extension method (Swift), an interface default method, or an inherited superclass method — now resolves. resolveMethodOnType falls back to walking the return type's implements/extends edges (via the new context.getSupertypes) when the method isn't a direct member. Because those edges don't exist during the single-pass resolution, a second pass (resolveChainedCallsViaConformance) re-resolves the deferred chained refs after edges are built. Still validated, so a wrong inference yields no edge. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * docs(changelog): conformance-aware chained-method resolution (#750) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…nsion naming (#750) (#755) Completes Swift in the #750 chained-call series (after Java #751, Kotlin #752, C# #753, conformance #754). Two parts: 1. Swift chained-call resolution (the #645/#608 mechanism): capture Swift return types (positional, member types -> last segment), encode capitalized-receiver chains `Foo.make().draw()` / `Foo(args).draw()`, resolve+validate via the shared matchDottedCallChain (+ constructor branch). Fixes the decoy wrong-edge bug where a chained method dropped to a bare name and attached to a same-named method on an unrelated class. 2. Nested-type extension naming fix: `extension KF.Builder: KFOptionSetter` parsed as a class_declaration named `KF.Builder` (dot) — inconsistent with the type's own declaration `KF::Builder` (name `Builder`) — so the extension's conformances and members were invisible to a chained call on the type. A Swift resolveName now names a nested-type extension by its last segment (`Builder`), so its `implements`/`extends` edges and methods are found by the supertype walk (conformance #754) and the simple-name method match. Validated: synthetic decoy + args + constructor + absent-method tests; full suite green; nested-extension repro (`KF.url().onSuccess()` resolves via conformance to the protocol method). Real-repo A/B vs main (conformance) — Alamofire and Kingfisher both **0 added / 0 removed, node count unchanged**: NEUTRAL and SAFE. The prior -168 Kingfisher regression (from the naming inconsistency) is eliminated; Swift's unique-named fluent methods already resolved by bare name, so the chain path lands the same edges — the value here is decoy-collision correctness, the nested-extension naming fix, and consistency with the other four languages. EXTRACTION_VERSION 9 -> 10. Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…#750) (#757) A Rust call through a chained associated function — `Foo::new().bar()`, `Foo::with(cfg).build()` — dropped the receiver to a bare method name, which then attached to a same-named method on an unrelated type (a wrong edge) or didn't resolve. Ports the #645/#608 mechanism for Rust's `::` receivers: - Part 1: capture Rust return types; `-> Self` yields the `self` marker (resolved to the impl's own type, like PHP), references/generics are unwrapped/reduced. - Part 2: encode an associated-function chain (`Foo::new().bar`), gated to a scoped_identifier receiver so instance chains (`x.foo().bar()`) keep bare-name. - Part 3: resolve via matchScopedCallChain (PHP's `::` resolver, generalized), validated by resolveMethodOnType. Wire Rust into the conformance second pass (matchScopedCallChain variant) so a chained method provided by a trait the type implements (`impl Trait for Type` → existing implements edges) resolves too. Validated: synthetic decoy + args + Self + trait-default-conformance + absent safety tests; full suite green (lone failure is the known-flaky #662 daemon test, passes in isolation). Real-repo A/B vs main: clap (329 .rs) a net precision win — **+937 added (96% correct builder methods), 622 wrong->right retargets** (`Command::new().arg()` was mis-resolving to `ArgGroup::arg`, now `Command::arg`), +162 net unique edges; the pure-drops are largely wrong bare-name edges the fix correctly stops emitting. tokio-rs/bytes 0/0 (no regression). Known limit: the single-hop mechanism re-encodes only the first hop of a chain (deeper hops keep bare-name) — clap's unusually deep builder chains are partly covered. EXTRACTION_VERSION 10 -> 11. Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…#760) * fix(go): resolve chained factory-function calls New().Method() (#750) A Go call through a chained factory function — `New().Method()`, `With(cfg).Build()` — dropped the receiver to a bare method name, which then attached to a same-named method on an unrelated type (a wrong edge) or didn't resolve. Ports the #645/#608 mechanism for Go's bare-factory receivers: - Part 1: capture Go return types; a pointer `*Foo` -> `Foo`, a multi-return `(*Foo, error)` -> its first result, qualified `pkg.Foo` -> `Foo`. - Part 2: encode a bare-factory chain (`New().Method`), gated to an `identifier` receiver so instance chains (`obj.Method().Other()`) keep bare-name. - Part 3: matchDottedCallChain bare-inner Go branch looks up the FUNCTION's return type, then resolves+validates the method on it. Wired into the conformance pass so a method promoted from an embedded struct (`type Widget struct{ Base }` -> the existing `extends` edge) resolves. FALLBACK: when the inner isn't a resolvable function (a package-level VARIABLE holding a function value, e.g. gin's `engine()`), fall back to bare-name so the edge isn't dropped. Validated: synthetic decoy + args + multi-return + embedded-conformance + absent safety tests (4/4); full suite green. Real-repo A/B on gin (99 .go): pre-fallback -40 = 25 wrong self-loops removed (good) + 15 correct `Engine::ServeHTTP` dropped (gin's ginS variable-factory `engine()`); the fallback recovers the 15. gin A/B re-confirm with the fallback is PENDING (local index flakiness, not a code issue). EXTRACTION_VERSION 11 -> 12. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(go): stop the chained-call fallback from looping the batched resolver The Go variable-inner fallback (for chains like `engine().ServeHTTP()` whose inner is a package-level var, not a factory function) resolved the method via a synthetic bare-name ref and propagated THAT ref as `.original`. Its `referenceName` was the bare `ServeHTTP`, not the stored `engine().ServeHTTP`, so `resolveAndPersistBatched`'s keyed `deleteSpecificResolvedReferences` no-oped, the offset-0 batch never drained, and the loop re-resolved + re-inserted the same rows forever — a runaway that grew a 99-file repo (gin) to 5,050,206 edges / 1.4 GB before filling the disk. - name-matcher.ts: tie the bare-name match back to the original `ref` so the batch-cleanup delete matches the stored row and the loop drains. - index.ts: add a non-progress guard to resolveAndPersistBatched — if the unresolved_refs table doesn't shrink after a batch, stop instead of growing the graph without bound (defense-in-depth for any future keyed-delete mismatch). - resolution.test.ts: regression test for the variable-inner chain — asserts the fallback edge resolves AND the edge count stays bounded (no explosion). gin A/B (post-fix): db 5.8 MB / 3,699 calls edges; net-zero unique-edge diff vs main (the fallback recovers the dropped edges, adds no wrong ones). Full suite green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ar() (#750) (#761) Ports the #645 (C++) / #608 (PHP) chained-receiver mechanism to Scala. A call whose receiver is itself a call — `Foo.create().bar()` (companion factory), `Builder(cfg).bar()` (case-class apply), or a fluent chain — used to drop the receiver to a bare `bar`, which name-matched a same-named method on an unrelated type. The most common wrong edge was a stdlib `Option`/`Iterator` `.map`/`.flatMap`/ `.foreach` mis-attributed onto the project's own same-named class. - scala.ts: `getReturnType` reads the `return_type` field — generic `List[Foo]` → container `List`, qualified `pkg.Foo` → `Foo`, `this.type` left undefined. - tree-sitter.ts: re-encode `Foo.create().bar` when the inner call's receiver chain starts with a capital (companion factory / case-class apply); instance chains (`list.map().filter()`) stay bare. - name-matcher.ts: `scala` joins the dotted-chain gate + CONSTRUCTS_VIA_BARE_CALL (case-class `apply` constructs the class); resolveMethodOnType validates, so a non-conventional `apply` returning another type yields no edge, not a wrong one. - index.ts: `scala` joins CHAIN_LANGUAGES so trait-inherited methods resolve via the conformance second pass. Validation: 4 synthetic tests (factory+decoy, case-class apply, trait conformance, absent-method safety). Real-repo A/B on gatling (750 Scala files): +14 / -59 unique edges — all corrections. The +14 are retargets (e.g. `HttpProtocolBuilder(cfg).baseUrl` now resolves to HttpProtocolBuilder::baseUrl, not the same-named private BaseUrlSupport helper); the -59 are wrong edges removed (stdlib Option/Iterator monad calls mis-tied to the project's Validation::*, self-loops, decoy collisions) — zero genuine factory chains dropped (verified: gatling has no real Validation.success().map() chains). db stable at 40 MB. EXTRACTION_VERSION 12→13. Full suite green. Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ate().bar() (#750) (#762) Ports the #645/#608 chained-receiver mechanism to Dart, plus makes Dart factory and named constructors first-class so their chains can resolve at all. A call whose receiver is itself a call — `Foo.create().bar()` (static factory or factory/named constructor) — used to drop the receiver to a bare `bar`, which name-matched a same-named method on an unrelated type (commonly a stdlib `Option`/`Iterator` `.map`/`.where` mis-tied to the project's own class). - dart.ts: extractBareCall now re-encodes `Foo.create().bar` when the chain starts with a capitalized type; getReturnType captures the return type (generic `List<Foo>` → `List`); factory (`factory Foo.create()`) and named (`Foo._()`) constructors are indexed as `Foo::create` / `Foo::_` with return type = the class (via resolveName + getReturnType + constructor_signature in methodTypes). - The UNNAMED ctor `Foo()` is deliberately NOT extracted (isMisparsedFunction), so plain construction stays an `instantiates` edge to the class rather than a call to a phantom `Foo::Foo` method. - dartCtorInfo validates a "constructor" against the enclosing class name, so a method tree-sitter MISPARSES as a constructor — `@override (A, B) m()`, where the annotation swallows the record return type and `m()` looks like a one-id constructor_signature — is still extracted as the method it is (regression found on localsend; covered by a new test). - name-matcher.ts / index.ts: `dart` joins the dotted-chain gate, CONSTRUCTS_VIA_BARE_CALL (case construction), and CHAIN_LANGUAGES (conformance for superclass/mixin methods). resolveMethodOnType validates, so a wrong inference yields no edge. Validation: 7 synthetic tests (static factory, factory/named ctor, construction, conformance, absent-method safety, the misparse regression, instantiation-not- hijacked). Real-repo A/B on localsend (368 Dart files): hand-written +17/-10 — all corrections (the -10 = 7 wrong stdlib/extension misattributions removed + 3 ctor source-renames), plus additive factory/named-ctor call resolution. Instantiation preserved; no node explosion. EXTRACTION_VERSION 13->14. Full suite green. Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…) (#786) Ports the #645/#608 chained-receiver mechanism to Objective-C. A message send whose receiver is itself a message send — `[[Foo create] doIt]` — used to drop the receiver, so `doIt` name-matched a same-named method on an unrelated class (commonly a test helper's `init` or an Apple-SDK method). - objc.ts: getReturnType reads the method's `method_type`, SKIPPING nullability / ARC qualifiers (`nonnull instancetype` must yield instancetype, not `nonnull`). - tree-sitter.ts: the message_expression branch now re-encodes a chained send `[[Foo create] doIt]` as `Foo.create().doIt` when the inner receiver is a capitalized class and the outer selector is unary. - name-matcher.ts: `objc` joins the dotted-chain gate + CHAIN_LANGUAGES. A class-message factory returns an instance of the RECEIVER class by convention (`instancetype`), so when the factory's own return type isn't recoverable (`alloc`/`new`/`shared…` return instancetype, or aren't user nodes), the receiver's type is the class itself — this resolves the ubiquitous `[[X alloc] init]` and singleton chains. resolveMethodOnType validates against the class and its supertypes, so a wrong inference yields no edge. Validation: 4 synthetic tests (factory+decoy, superclass conformance, absent-method safety, the nonnull-instancetype singleton). Real-repo A/B on SDWebImage (208 files): +35 / -75 — all corrections (the -75 are wrong `init` mis-matches to a test helper / wrong class, retargeted to the right class's init in the +35, plus 2 Apple-SDK chains on unindexed classes). db stable, no node explosion. EXTRACTION_VERSION 14->15. Full suite green. Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…nism (#750) (#787) A checked-in design doc for the #645/#608/#750 chained-call mechanism — the permanent, discoverable record the work previously lacked (it lived only in git history, the tracking issue, and an untracked scratch handoff). Covers the 3-part mechanism, the three shared resolvers + receiver styles, the per-language coverage matrix (12 shipped with A/B results), the conformance pass, and the full 21-language README classification (incl. why TypeScript + Luau were skipped and Pascal is blocked). Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ory` (#936) (#941) When the indexed root is a directory an enclosing git repo ignores, `git ls-files --directory` collapses the whole cwd to a single literal `./` entry. That sentinel reached the `ignore` matcher, which rejects it ("path should be a `path.relative()`d string, but got "./""), aborting buildScopeIgnore — the one ignore-building call in FileWatcher.start(). So the MCP daemon's startWatching() threw, was caught as "Failed to open project", and auto-sync never started: the index silently went stale until a manual `codegraph sync` (CODEGRAPH_NO_DAEMON=1 was the only workaround). Filter the `./`/`.` self-entry wherever we consume `--directory` output (listIgnoredDirs + the untracked-dir loop in discoverEmbeddedRepoRoots). Semantically correct, not just a crash guard: `./` means "the whole cwd", never a nested repo to recurse into. Not platform-specific (reported on Codex/Windows, reproduced on macOS): the trigger is git state, not the OS. Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… duplicates (#945) (#947) A worktree of a submodule points its `.git` into `.git/modules/<module>/worktrees/<name>`, but `classifyGitDir` only matched the top-level `.git/worktrees/` shape — so submodule worktrees fell through to "embedded" and every symbol they shared with the real submodule checkout got indexed twice (one report: ~28% of the index was duplicates, inflating both query results and the DB). Broaden the worktree detector to allow the optional `modules/<module>` segment. The submodule's own checkout (`.git/modules/<module>`, no `worktrees/`) is unaffected and stays indexed as distinct code. Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… class misparse (#946) (#948) A C++ class/struct annotated with an export/visibility macro — `class MYLIB_EXPORT Foo : public Bar { … }` — makes tree-sitter read `class MYLIB_EXPORT` as an elaborated type specifier and the whole declaration as a `function_definition` named after the class, spanning the entire body. That phantom `function` polluted callers/impact/blast-radius and skewed kind stats. Detect the misparse structurally in cppExtractor.isMisparsedFunction — a function_definition whose `type` field is a *bodyless* class/struct specifier (the elaborated-type macro) and whose declarator is not a function_declarator — and drop the bogus node, matching how macro-prefixed C prototypes are already handled. The body is mangled by the same misparse and is unrecoverable. Precise enough to leave genuine code alone: `struct P { int x; } makeP() {}` (real inline-defined return type, has a field list) and `class Foo f() {}` (elaborated return type on a real function, has a function_declarator) are untouched. The leading macro alone triggers the misparse; a base clause is not required. Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… 20, and expand language/framework coverage - Benchmark table reordered to lead with tool calls, time, and file reads (the universal wins); cost and tokens moved right with a note that savings are scale-dependent, not a headline claim - README/introduction/quickstart/installation messaging updated to "surgical context · fewer tool calls · faster answers" framing, dropping the "16% cheaper" headline - Node engine floor raised from 18 to 20 in CLAUDE.md, package.json description updated - `codegraph init` now creates and indexes in one step; the `-i` flag is retired (still accepted as a no-op) - CLI reference expanded with new commands: `explore`, `node`, `unlock`, `daemon`, `telemetry`, `upgrade`, `version`, `help` - MCP server docs clarified: single `codegraph_explore` tool exposed by default, others unlisted but re-enableable via `CODEGRAPH_MCP_TOOLS` - Language support adds Objective-C, Astro, and R; framework routes adds Play, Vue Router/Nuxt, and Astro - API reference documents lower-level exports and embedding requirements (Node 22.5+ for `node:sqlite`) - Troubleshooting adds WSL/Windows dual-checkout guidance - How-it-works updated: SQLite backend is now Node's built-in `node:sqlite` in WAL mode, not better-sqlite3/WASM
Added an image and a note on cost savings for CodeGraph.
…path (#766) (#949) Change detection's git fast path (collectGitStatus) consumed `git status` output with only an isSourceFile filter, on the assumption that git already omits ignored paths. It doesn't: gitignore is a no-op for *tracked* files, and the built-in default excludes (vendor/, node_modules/) aren't gitignore at all. So a tracked file inside a committed dependency dir, or under a .gitignored dir, surfaced as a change the full index never tracks — `codegraph status` reported phantom pending changes that `sync` (a filtered filesystem reconcile) never cleared, and the public getChangedFiles() API returned the same wrong list. Apply buildDefaultIgnore(repoDir) per recursion level, matching repo-relative paths — structurally equivalent to the full-index path's ScopeIgnore (each embedded repo judged by its own rules) with no extra git subprocess calls. Deletions stay unfiltered: getChangedFiles acts on one only when the path is already tracked in the DB, where removal is always correct, and that lets a newly-excluded dir's stale rows clean themselves up. Unblocks #699 (an .ignore overlay inherits this leak unless change detection consults the same matcher as enumeration). Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…concile (#905) (#950) On a very large repo (the report is a ~93k-file / 5.7GB-DB Java monorepo) the first MCP `tools/call` after a fresh `serve --mcp` could hang for 10+ minutes with zero output, and with the liveness watchdog on, the daemon was SIGKILLed mid-query instead. Root cause: the post-open catch-up reconcile that the first tool call is gated on does ~2*N synchronous `fs.existsSync`/`fs.statSync` calls plus a load-all-files query in two non-yielding loops. On a huge repo that wedges the event loop for minutes, which (a) trips the 60s watchdog (it SIGKILLs a process whose loop stops turning) and (b) blocks the first call the whole time. Two complementary fixes: - Make the reconcile yield. `ExtractionOrchestrator.sync()` now uses the yielding `scanDirectoryAsync`, and both O(files) reconcile loops `await setImmediate` every SYNC_RECONCILE_YIELD_INTERVAL (1000) files. The loop can no longer wedge the main thread, so the watchdog stays fed and the socket / any concurrent read stays responsive while a big reconcile runs. Results are unchanged — only yield points are added. - Time-box the catch-up gate. The first `tools/call` now waits on the reconcile for at most CODEGRAPH_CATCHUP_GATE_TIMEOUT_MS (default 3000ms), then serves and lets the reconcile finish in the background (which now yields, so the served call runs concurrently). `=0` restores the old unbounded wait. On a normal repo the reconcile finishes well under the budget, so behavior is unchanged. Tests: adds two time-box cases to mcp-catchup-gate (serves promptly when the reconcile runs long; `=0` restores the unbounded wait). Full suite green (1655 passed). Validated end-to-end through the real daemon: first call returns at the ~3s time-box instead of waiting an injected 8s reconcile; no-delay control unchanged; `=0` opt-out waits the full reconcile. Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…951) MCP tool results used Markdown ATX headings (##/###/####) for section headers — the status summary, each search hit, every file section in an exploration — which Markdown-rendering clients (e.g. the Claude Code VSCode extension) blow up to H1–H4 font size, filling the transcript with oversized lines (worst on search/explore, where the noise scales with result count). Swap them all for bold labels, which render at body size while keeping the same structure. CLI/TTY output (ContextBuilder) is unchanged — the issue notes it's fine. The format is parse-coupled, so kept in sync: - The explore truncation boundary and the offload chunker (reasoning/reasoner.ts) both key off the per-file header, now a unique `**`-prefixed marker emitted via a shared fileSectionHeader() helper. - Updated the offload strip regexes and switched the opt-in report-style prompt off ATX headings (same client, same rendering issue). - Updated test helpers (sectionFor, sourcedFiles, the callers section-boundary scan) that scanned the old markers. Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…) (#953) Lombok generates getters/setters, builder(), equals/hashCode/toString, and the @slf4j log field at compile time, so they never appear in the source AST. Static extraction missed them entirely, so a bean.getName() / User.builder() / log.info() call resolved to nothing and call-chain analysis broke silently — the agent would conclude the method didn't exist. Add a synthesizeMembers hook on LanguageExtractor, called at the end of class extraction (class still on the scope stack, real members already extracted), and a Java implementation that synthesizes the mechanical members for @Getter, @Setter, @DaTa, @value, @Builder/@SuperBuilder, @tostring, @EqualsAndHashCode, and the @log* family. Each node is anchored on the field/class name-token leaf (so it pulls in no spurious value-reference scope), marked with a `lombok` decorator and a docstring naming the generating annotation, and never overrides a member the source already declares. Methods and fields are deduped separately since they're distinct namespaces in Java (a boolean field `isRunning` and its generated getter `isRunning()` coexist). Deliberately not synthesized: constructors (new X() already links via instantiates, and overloaded @NoArgs/@AllArgs/@RequiredArgs ctors would collide on a synthetic node id), fluent builder setters, and @accessors(fluent=true). Validated on eladmin (274 Java files, Lombok-heavy): 100% accessor precision (878/878 map to a real field), 722 previously-broken calls now resolve; spring-petclinic (no Lombok) control synthesizes nothing. Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
C/C++ polymorphism is the function pointer: a struct fn-pointer field, concrete
functions registered into it through a table (`{"add", cmd_add}`), a designated
initializer (`.handler = on_open`), or an assignment, then dispatched indirectly
(`p->fn(argv)`). Static extraction captures neither the registration→field
binding nor the indirect call, so the dispatcher→handler edge was missing — git's
run_builtin looked like it called nothing, a vtable's implementations had no
callers, and the hook_demo.c in the issue was unreachable.
Add a resolution-layer synthesizer keyed by (struct type, fn-pointer field). It
reads source (the established Celery/Sidekiq/Spring pattern — C extraction has no
struct fields or indirect-call edges to build on) in passes: collect fn-pointer
typedefs, parse struct field layouts, collect registrations (positional matched
by field index, designated, and assignment), propagate field←field assignments
(so a generic hook slot reassigned from a registry — the hook_demo.c
`h->func = found->fn` shape — inherits the registry field's handlers), then link
each indirect dispatch site to the registered handlers. Receiver type resolves
from the enclosing function's params/locals, falling back to a field name unique
to one struct. Covers both the command-table idiom (git, redis) and the
ops-struct/vtable idiom (curl content-encoders, protocol handlers).
Pure edge synthesis (no node growth); high precision via the (struct, field) key.
Validated: git 502 edges (run_builtin→cmd_* plus git_hash_algo/archiver/reftable
vtables), redis 357 (dictType.hashFunction, connection + reply-object vtables),
curl 478 (Curl_cwtype.do_init → deflate/gzip/brotli/zstd); 0 non-function targets
on all three; node-stable; 0 on the lua control (its {name,fn} tables register
into the Lua VM, with no C indirect call to bridge). Full suite 1665 pass.
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…son (#906) (#955) The extension → language table was hardcoded, so a codebase using a non-standard extension for a supported language (e.g. `.dota_lua` for Lua) had those files silently skipped — no way to opt them in short of patching the source. Add an opt-in, project-scoped `codegraph.json` at the repo root: { "extensions": { ".dota_lua": "lua", ".tpl": "php" } } Mappings merge on top of the built-in defaults and take precedence (so a built-in can be re-pointed, e.g. `.h` → `cpp`). Absent or malformed config is the zero-config default — byte-identical to prior behavior; an invalid target language or unparseable file is warned-and-skipped, never fatal. Implementation: - New `src/project-config.ts` — `loadExtensionOverrides(rootDir)`, validated against `isLanguageSupported`, mtime-cached per root. - `detectLanguage` / `isSourceFile` gain an optional `overrides` arg (omitting it is the existing behavior). - Overrides threaded per-operation through every extraction call site (scan/walk gates, git change-detection, grammar selection, extraction, the file watcher), resolved from the project root — no process-global state, so the multi-project daemon stays isolated. The parse worker receives the resolved language in its message. Tests: 13 new cases (unit, loader validation/normalization/caching, and a full-index integration proving a custom-extension file is extracted while the zero-config path indexes nothing). Worker path smoke-tested via the built CLI. Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…int outside the repo (#935) (#956) The directory walk deliberately follows an in-root symlink whose target lives outside the repo root (the standard Dota custom-game layout, where `game/` and `content/` link into the SDK tree) and enumerates the files under it. But the read path then rejected every one of them via the strict symlink-escape guard, logging `Path traversal blocked in batch reader` and indexing nothing — discovery and the reader disagreed. Add an opt-in `allowSymlinkEscape` to validatePathWithinRoot that waives only the realpath-escape rejection (the lexical `../` guard still applies) and pass it at the three indexing read sites (batch reader, indexFile, indexFileWithContent). The content-serving sinks (ContextBuilder, MCP tools) keep the strict guard, so this stays inside the #527 model: indexing now follows the symlink, getCode still refuses to serve out-of-root contents. Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…hods (#747) (#957) GoFrame's standard router binds routes reflectively (group.Bind(ctrl)): the path and method live in a g.Meta struct tag on a request type, and the controller method that serves it is matched by that request type at runtime — so there was no path string and no edge from a route to its handler, and "where is this route handled / where are routes bound to controllers?" could only be answered lexically (issue #720's report). - frameworks/goframe.ts: detect gogf/gf in go.mod, extract each path-bearing g.Meta into a route node (requires path:, so response mime:-only tags are skipped), encoding the package-qualified request type for the join. - goframe-synthesizer.ts: join each route -> the controller method whose signature takes that request type — NOT by name (DeptSearchReq is served by List) — keyed pkg.Type to disambiguate the many identical bare names a large app defines one-per-module, with an addon-root tiebreak for cloned demo addons. Edge kind calls, provenance heuristic, synthesizedBy goframe-route, surfaced as a dynamic-dispatch hop in codegraph_explore. Validated on real repos: gf-demo-user 7/7, gfast 65/68 (3 genuinely handler-less), hotgo 242/247 (98%) — 100% precision (0 non-controller handlers, 0 core/addon cross-binding), node count stable. Agent A/B (gfast, sonnet/high, 2 runs/arm): with codegraph 1 explore call / 0 Read / ~20s vs without 7.5 Read avg + grep-hunting for the non-existent literal route string / ~42s; same correct answer. Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
) (#965) * perf(resolution): resolve imports to definitions, not sibling import nodes (#915) "Resolving refs" crawled (tens of minutes) on large projects — most painfully ones mixing a big front-end and back-end. An external package or module imported across hundreds/thousands of files (react, a shared UI package, Python logging/typing) is re-declared as an `import` node in every importing file, so its unresolved import ref fell through to the exact-name matcher, which scored all K same-named import nodes via findBestMatch — K refs x K candidates = O(K^2) per package, producing only meaningless import->import edges. Fix: exclude `import`-kind nodes as name-match targets (they're statements, not definitions; real import->definition resolution is the import resolver's job). Plus two safe constant-factor wins in findBestMatch: hoist the per-candidate ref.filePath split, and skip cross-language candidates when a same-language one exists (provably the same winner — same-language scores >=50, cross-language maxes at 35). Measured: superset (Py+TS) candidates scored 7.5M -> 833K (9x), non-import edges preserved (+1618 now resolve to real defs), ~22K useless import->import edges removed; kubernetes (Go) computePathProximity 37.2s -> 5.0s; synthetic 8k-file mixed repo (K=4000) resolution 16.0s -> 1.7s. Full suite green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * docs: correct stale better-sqlite3/wasm references to node:sqlite The SQLite backend has been Node's built-in node:sqlite (real SQLite, WAL + FTS5, from the bundled runtime) for a while — there is no native build step and no node-sqlite3-wasm fallback. README and the docs site were already updated; this catches the stragglers: - CLAUDE.md: the src/db/ backend description and the sqlite-backend test note. - src/db/index.ts, src/mcp/tools.ts: two code comments that still blamed "the wasm backend" for non-WAL behavior (reworded to "when WAL isn't in effect"). Leaves tree-sitter grammar wasm (web-tree-sitter / --liftoff-only) untouched — that's a different, still-current use of wasm. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * chore(telemetry): drop the dead sqlite_backend field (schema v2) node:sqlite is now the only backend, so the `index` event's `sqlite_backend` field was a constant ("native") carrying no signal — and the `install` event never actually sent it. Remove the field and the backendKind() helper, bump the telemetry SCHEMA_VERSION 1 -> 2, and update TELEMETRY.md + docs/design/telemetry.md. The ingest worker is deliberately left tolerant: `index` doesn't require the field and schema_version validates as nonNegInt(99), so v2 events ingest fine and old clients still sending v1 + sqlite_backend keep validating too. Added a legacy comment there explaining it's safe to drop once old-client share is negligible. telemetry.test.ts: the assertion pinning schema_version and a stale-claim fixture line updated 1 -> 2. All telemetry tests pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… monorepo-aware (#964) (#966) The MCP server gated tool availability on whether the server root had a .codegraph/ index, so in a monorepo where only sub-projects are indexed the agent saw zero tools — and couldn't reach an indexed sub-project even by projectPath. A session started before `codegraph init` also never surfaced the tools afterward. The Claude front-load hook had the mirror gap: it only walked UP for an index, so it stayed silent at a monorepo root. MCP server: - Always expose the tool surface; when the root isn't indexed, send a per-project instructions variant (pass projectPath) instead of the "inactive" note. Safety comes from response SHAPE (success-shaped guidance, never isError), not from hiding tools. - Reword the no-default-project guidance to be per-project, not per-session, and sharpen the projectPath schema description. Front-load hook (UserPromptSubmit): - Scan DOWN (bounded depth, workspace-root-gated) for indexed sub-projects and shape the injection by topology: front-load the one the prompt names, nudge about the rest, or list them when ambiguous. Verified: full suite (1703 passed); a live two-package monorepo run confirms the hook front-loads the correct sub-project with no cross-package leakage. The front-load's net speed effect is the existing multi-file-vs-single-file tradeoff, unchanged by this work. Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
[skip ci] Auto-generated by Release workflow.
…overy (#970, #976) (#980) #514 (v1.0.0) began walking into gitignored directories to discover and index the git repos nested inside them. That broke users who rely on .gitignore to exclude a directory: a gitignored folder of cloned reference repos blew graphs up (one report went 10k to 500k edges, #976) and stalled indexing on multi-gigabyte trees of clones (#970). Respect .gitignore by default again. Discovering embedded repos inside a gitignored directory is now opt-in via codegraph.json: { "includeIgnored": ["packages/", "services/"] } The single choke point findIgnoredEmbeddedRepos now returns nothing unless a gitignored dir matches the project's includeIgnored patterns, and the matcher is threaded from the scan root through the full-index, incremental-sync, and watcher-scope paths. Downstream ScopeIgnore and the watcher are unchanged: they key off the discovered embedded roots, so gating discovery fixes the indexer, sync, and watcher together. Untracked embedded repos (#193) stay indexed by default. This restores the super-repo-of-clones behavior (#622, #699) for the people who want it, while making the default match what every other tool (and CodeGraph's own git ls-files foundation) does: .gitignore excludes. project-config.ts now parses codegraph.json once (loadParsedConfig) and exposes loadIncludeIgnoredPatterns alongside the existing extension map. Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
PR #980 merged after v1.1.0 was already promoted and published, so the squash-merge's 3-way merge auto-placed its CHANGELOG entry under the released [1.1.0] section. Move it to [Unreleased] so the published 1.1.0 notes stay accurate and the next release (1.1.1) promotes the fix. Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…#974) (#983) The client-facing MCP proxy could exit with "Transport closed" when its connection to the shared daemon hit a socket 'error' with no listener attached — common on WSL2 /mnt (DrvFs), where AF_UNIX is flaky. The global fatal handler turned that uncaughtException into process.exit(1), which the MCP client saw as a bare transport close even though the index was healthy. proxy.ts now keeps an 'error' listener on the daemon socket for its whole life (and skips a socket destroyed in the connect window), so a stray error degrades to the existing in-process fallback instead of crashing. daemon.ts releases the lockfile it acquired when it fails to bind, so the next launch doesn't spin on a stale lock (the duplicate serve --mcp pileup). No default behavior change for anyone; WSL /mnt users who still hit trouble can set CODEGRAPH_NO_DAEMON=1 to skip the shared daemon entirely. Validated on macOS (unit + live serve probe) and Linux (Docker, --init): 64/64 across the daemon/socket/lifecycle suites, incl. real AF_UNIX. Closes #974 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
[skip ci] Auto-generated by Release workflow.
…rent-call timeouts (#1002) The shared daemon served every session on one event loop with synchronous node:sqlite. codegraph_explore is CPU-bound work stitched together by microtask awaits, so N concurrent explores keep the microtask queue continuously full and starve the macrotask phases — timers AND socket I/O. The transport freezes: no response can flush until the whole batch drains, so with ~10 subagents on a large repo clients routinely time out (reported via X by @symbolic2020). Move the heavy read-tool dispatch onto a worker-thread pool. Each worker holds its own WAL read connection (verified: a worker reader sees the main writer's committed catch-up/watcher writes); the single watcher/writer, the catch-up gate, codegraph_status, and the staleness/worktree notices stay on the main thread. Concurrent reads now run in true parallel up to core count and the main loop stays free for the MCP transport, so responses flush incrementally instead of all-at-once after the batch drains. Enabled for the shared daemon only; direct (single-stdio-client) mode is unchanged. - crash recovery: respawn + retry-once, with a circuit breaker that falls back to in-process dispatch if workers can't run on this platform - graceful backstop: an overloaded pool returns success-shaped "busy, retry" guidance, never isError (so it can't teach the agent to abandon codegraph) - pending-aware growth + capped concurrent cold-starts avoid a startup thundering herd (N simultaneous module-loads + DB opens could stall the loop) - config: CODEGRAPH_QUERY_POOL_SIZE (default clamp(cores-1, 1, 16); 0 disables → in-process), CODEGRAPH_QUERY_BUSY_TIMEOUT_MS (default 45s) 10 concurrent explores on vscode (10.5k files): 31s → ~9s, staggered flush, 0 timeouts, byte-identical output; scales with cores (≈3.3× on 8, 1.8× on 2). Full suite passes plus 10 new query-pool tests (fake-worker injection so the scheduling logic is covered without spawning threads). Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…onditional-compilation & bare arrays (#991) (#1003) * feat(c/c++): resolve macro-built function-pointer command tables (#991) C/C++ commands dispatched through macro-built function-pointer tables were dead-ends in the graph: redis' `call` never showed up as a caller of any command (`c->cmd->proc(c)`), because the table is generated into a #included `.def`, the handler is buried inside `MAKE_CMD(...)`, the struct type is itself a macro alias, the `proc` field uses a function-TYPE typedef, and the receiver is a chained field access. #954 deferred exactly this shape. Six composable additions to c-fnptr-synthesizer.ts close it: - function-type typedefs (`typedef RET T(...)` + `T *f`) flag the field as a function pointer; - multi-declarator fields (`struct redisCommand *cmd, *last`) each count as a slot/type (needed for positional alignment and the chain walk); - chained/array receivers (`c->cmd->proc`) resolve through field types across all same-named struct layouts (redis has two unrelated `client` structs); - `#include "x"` directives are followed (from raw source) so a non-indexed `.def` is read as a registration unit with the includer's effective macro env; - function-like + object-like macros are expanded (params->args, type aliases) before positional/designated registration; - a macro that expands to a brace-wrapped element (sqlite `FUNCTION(...)`) has one outer brace layer peeled. Validated on two independent macro-table lineages at 100% target precision: redis (209 commands via redisCommand.proc, `call`->every command) and sqlite (69 FuncDef.xSFunc targets). No regression on the controls: git (cmd_struct.fn, 138 builtins), curl (Curl_cftype.*), lua (0). 0 non-function targets across all five; +3 synthetic fixtures; full suite green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(c/c++): resolve conditional-compilation command tables (vim) (#991) Vim's `:ex` and normal-mode command tables are the hardest fn-pointer-table shape: the struct is defined INLINE with the array, the whole thing is behind `#ifdef DO_DECLARE_EXCMD`/`DO_DECLARE_NVCMD` (switched on by the includer), built by a macro the file conditionally redefines (`EXCMD`/`NVCMD` = the table element under the switch, a bare enum id otherwise), and dispatched by a parenthesized array subscript through a file-scope table: `(cmdnames[i].cmd_func)(&ea)`. Four more composable additions on top of the macro-table work: - a focused `#ifdef`/`#ifndef`/`#if defined`/`#else`/`#elif`/`#endif` evaluator drops inactive arms (unevaluable `#if EXPR` keeps its body); an indexed header is re-scanned in an includer's context only when that includer #defines a switch the header guards, with the include's macros re-read from the resolved text (the plain last-wins parse picks the wrong, enum, arm); - inline `struct TAG {…} var[] = {…}` tables whose struct never became a node are parsed in place and registered; - array-subscript receivers (`tbl[i].f`) strip the subscript and resolve the base through a global-var → struct-type map; - an optional `)` before the call covers the parenthesized `(….f)(args)` form. Validated on vim: 273 `:ex` commands (`do_one_cmd`→every command) + 67 normal-mode commands, 0 non-function targets, 0 cross-table misroute (registering both tables is what stops `normal_cmd`'s `nv_cmds[i].cmd_func` from falling back to the `cmdname` owner of the shared field name). Controls unchanged at 0 non-function (redis/sqlite/git/curl gain coverage from array/global dispatch, lua still 0); +1 synthetic fixture; full suite green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(c/c++): resolve bare arrays of function pointers (#991) The C/C++ fn-pointer synthesizer keyed everything on (struct type, fn-pointer field), so a dispatch through a bare array of function pointers — no struct, no field — was unbridged: an opcode/handler table like `static op_t *opcodes[256] = {nop,…}` invoked `opcodes[op](…)` left every handler with zero callers. Closes the last #991 deferred item. Keyed by the array VARIABLE name (a new `arrayReg`, parallel to the struct `reg`). Registration detects an array whose element type is a function typedef — a function-TYPE typedef element (`opcode_t *ops[]`, the `*` making it an array of pointers) or a function-pointer typedef element (`zend_rc_dtor_func_t t[]`) — and reads its literal entries, whether positional (`fn`/`&fn`), designated by index (`[IDX]=fn`), or cast-wrapped (`(cast)fn`). Dispatch is `tbl[i](…)` / `(*tbl[i])(…)`, gated on `tbl` being a known fn-pointer array (the precision anchor); the fan-out reaches the whole set (a runtime subscript hits any entry), like a command table. The same-file table wins on a name collision, so two file-local `static opcodes[256]` (SameBoy's CPU + disassembler) never cross. The fn-pointer typedef/field regexes now also tolerate a calling-convention macro before the `*` (`(ZEND_FASTCALL *name)`), which hardens the existing struct-field path too. Validated on two independent lineages: SameBoy (GB emulator) — 147 edges via `opcodes[]`, 0 cross-file leak; php-src (Zend) — 54 edges across 7 tables in the designated+cast+CC-typedef form. Control: lua 0 — its `lua_CFunction searchers[]` is pushed into the VM, never C-dispatched, so the call-gate fires nothing. No regression on the #991 corpus: redis (835) / sqlite (683) struct edges byte-identical, git +3 / curl +20 legitimate new bare-array edges, vim 433 with all guards holding; 0 non-function targets across all. + 4 fixtures. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…#1004) The UserPromptSubmit hook's structural-prompt gate was English-only, so a structural question written in Chinese — or any non-Latin script — silently injected nothing: JS `\b` is ASCII-only and never matches between Han characters, so the keyword regex couldn't fire (and couldn't be extended in place). To the user the hook looked unwired, with no error to explain why. Make the gate language-aware, split into tested helpers in directory.ts: - hasStructuralKeyword: English (\b-guarded) + CJK structural keywords. - extractCodeTokens: identifier-shaped tokens (camelCase / snake_case / name() / a.b) in any language — verified against the index via getNodesByName before firing, so a tech brand like `JavaScript` that looks like a symbol but isn't one here doesn't inject ~16KB of spurious context. - isStructuralPrompt: the cheap candidate gate (keyword OR code-token). Adds 21 unit tests for the gate (previously untested) covering the reporter's verification table plus the false-positive guards. Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ject (#993) (#1007) When the server runs with no default project to fall back to — a gateway server started outside any repo, or a monorepo root whose .codegraph/ indexes live only in sub-projects — every tool call must carry an explicit projectPath. Previously projectPath was always optional, so an agent talking to such a server would omit it, get success-shaped "pass projectPath" guidance, and not reliably retry; the user had to nudge it by hand. getTools() now marks projectPath required in the exposed tool schemas on the no-default-project branch (a high-salience channel clients surface/validate, unlike the instructions prose the reporter found too weak). When a default project is open, projectPath stays optional and a bare call falls back to it. The fix lives at the MCP schema layer, not the Claude-only front-load hook: the hook is local-filesystem-based and never runs for the reporter (they're on AGENTS.md / Codex-opencode). The proxy/getStaticTools path is untouched — index.ts forces direct mode whenever resolveDaemonRoot is null, so the no-default case never reaches the proxy. Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…; add exclude config + index watchdogs (#999) (#1009) Three fixes for a repo that commits a large JS/TS theme/SDK (Metronic under static/, ~1,600 tracked files): 1. A SECOND "Resolving refs" quadratic that #915 didn't cover. #915 capped import-name collisions; this caps method-name collisions (init/update/render re-declared on every widget), which flow through matchMethodCall Strategy 3 and findBestMatch instead. New AMBIGUOUS_NAME_CEILING (default 500, env CODEGRAPH_AMBIGUOUS_NAME_CEILING): above it the fuzzy strategies decline rather than score K candidates — no proximity score can pick the one true target among thousands anyway. Resolving drops from O(K^2) to linear in refs (e.g. 900-file synthetic: 28.7s -> 3.4s), edge counts unchanged, and the cap never fires on normal repos (max real method-collision ~40). 2. A new `exclude` array in codegraph.json keeps git-TRACKED paths out of the index, which .gitignore can't do (enumeration is `git ls-files`). Mirrors the existing includeIgnored plumbing across the git, sync, and non-git-walk paths. 3. `index`/`init` now install the #850 liveness + #277 ppid watchdogs (which were serve-only), so a wedged or orphaned indexer self-terminates instead of pinning a core. The --liftoff-only relaunch's spawnSync can't forward signals, so killing the parent shim used to orphan the worker. Tests: ubiquitous-name ceiling, exclude (incl. tracked-file exclusion on git + non-git), orphan self-termination (POSIX), and ppid-parser units. Shared the ppid parsers out of mcp/index.ts into mcp/ppid-watchdog.ts. Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
#1009 added `exclude` (keep git-tracked dirs out of the index) but didn't document it. Add an "Excluding a tracked directory" section to the site config page (parallel to includeIgnored) and a brief note + example to the README, covering the committed-theme/SDK case .gitignore can't handle.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.