Skip to content

update#1

Closed
nanofatdog wants to merge 157 commits into
nanofatdog:mainfrom
colbymchenry:main
Closed

update#1
nanofatdog wants to merge 157 commits into
nanofatdog:mainfrom
colbymchenry:main

Conversation

@nanofatdog

Copy link
Copy Markdown
Owner

No description provided.

colbymchenry and others added 30 commits June 7, 2026 02:11
Give resolvePythonModuleMember the same absolute-dotted-path fallback that
resolveModuleImportToFile already uses, so a `module.func()` call after
`from pkg import module` / `import pkg.module as module` records its `calls`
edge. Adds a regression test and a CHANGELOG entry.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add a resolution-phase pass (goCrossFileMethodContainsEdges) that links a Go
method to its same-named receiver type within the same package (= directory),
so a method declared in a different file from its `type` is no longer orphaned
from the struct. Runs before goImplementsEdges so cross-file methods also count
toward interface satisfaction (#584). Adds a regression test + CHANGELOG entry.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…mar (#237) (#717)

Vendor tree-sitter-c-sharp 0.23.5 (ABI 15) for C#, replacing the bundled ABI-13
build that dropped primary-constructor classes. Adds native primary-ctor
parsing, primary-ctor parameter dependency edges, return-type extraction via the
renamed `returns` field, and a preParse that blanks `#if` directive lines the
new grammar mis-parses inside enum bodies. Validated on MediatR / eShopOnWeb /
Newtonsoft.Json + full suite.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…383) (#722)

Spring `application.{properties,yml}` keys (and Shopify Liquid `{% schema %}`
blocks) were storing the config VALUE in the node docstring, and
`codegraph_explore`'s source section re-read the raw `key = value` line off
disk — so a secret committed to a config file (DB password, API key, JDBC URL
with embedded credentials) could be pushed into an agent's context via
explore/node output without the agent ever opening the file.

Config-leaf nodes (`kind: 'constant'` in a config language) now surface the KEY
only, via a shared `isConfigLeafNode` predicate applied at both surfacing
paths: the value is dropped from extraction, `getCode`/`includeCode` returns
the key instead of the file line, and explore excludes config leaves from
source rendering. The predicate can't match real code (real constants are
ts/java/go/…), so `@Value`/`@ConfigurationProperties` resolution and impact are
unaffected. Adds a regression test asserting a planted secret never appears in
`codegraph_explore` / `codegraph_node` output while the keys still resolve.

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ot reads (#527) (#724)

* fix(security): resolve symlinks in path validation to block out-of-root reads (#527)

validatePathWithinRoot was purely lexical (path.resolve + startsWith), so an
in-repo symlink whose logical path is inside the project root but whose real
target escapes it passed validation — and both content-serving read sinks
(codegraph_node includeCode, codegraph_explore source) then readFileSync'd it,
leaking out-of-root file contents (e.g. ~/.ssh, /etc) to the agent.

Add a realpath layer: after the lexical check, resolve symlinks on both the
candidate path and the root and re-compare, rejecting anything whose real path
escapes the root. An in-root symlink is still allowed (no over-blocking).
Comparison is case-insensitive on Windows (NTFS + realpath casing). Not-yet-
existing paths (ENOENT) fall back to the lexical result so about-to-be-written
files still validate; other resolution errors reject.

Removes the dead, never-called isPathWithinRoot / isPathWithinRootReal helpers
(the latter a footgun — it returned true on realpath failure). Adds RED->GREEN
tests: in->out file/dir symlinks rejected, in->in allowed, ../ rejected, ENOENT
allowed, plus an end-to-end test proving getCode no longer serves an out-of-root
file reached through a dir symlink.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* docs(changelog): note the #527 symlink path-escape fix

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The "Attached to shared daemon" line is benign INFO, but it was written to
stderr — and MCP hosts render all server stderr at error level (and append an
`undefined` data field), so on every session start a healthy attach showed up
as `[error] … undefined`. It is now gated behind CODEGRAPH_MCP_LOG_ATTACH=1:
silent by default, opt-in for debugging daemon attach. Both attach sites
(runProxy + connectWithHello) route through one helper. The daemon integration
tests opt the harness into the log so their attach assertions still observe a
successful attach.

Re-applies the approach from #640 by @mturac.

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…w node mode (#733)

* feat(mcp): steer agents to codegraph during implementation, not just Q&A

Two changes targeting agents that reach for Read during edits instead of codegraph:

1. Reframe the agent-facing steering (server-instructions + codegraph_node/explore
   descriptions): drop "consult BEFORE ... not during"; position codegraph_node as
   the Read upgrade for a named symbol (verbatim current on-disk source, safe to
   Edit from, + caller/callee trail), explore PRIMARY / node SECONDARY, with the
   "cached intelligence — better context, fewer tokens" framing.

2. File-view mode: codegraph_node now accepts a `file` with no `symbol` and returns
   that file's symbol map + graph role (its dependents), plus verbatim bodies with
   includeCode — so it can displace a path-keyed Read, not just a symbol lookup.
   Resolves a path or basename; dedups nested members; budget-capped.

To be A/B'd on an implementation task before shipping (per the retrieval doctrine:
steering changes must be measured, not assumed).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* docs(changelog): note codegraph_node file-view + implementation steering

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…sh (#734)

Running scripts/agent-eval against a `claude -p` spawned from within a Claude
Code session (nested, e.g. from a Bash tool call) makes the codegraph MCP
attach unreliable: the server is healthy (full handshake ~165ms) but the
nested client marks it status:"pending"/0-tools under CPU/timing contention,
so the agent silently runs with no codegraph. NO_DAEMON + `< /dev/null` don't
fix it — it's the nested client, not the server. Documented in CLAUDE.md's
validation methodology.

Adds ab-new-vs-baseline.sh: A/Bs a retrieval/steering change as new-build vs
baseline-build (both codegraph-on, isolating the change — vs run-all.sh's
with-vs-without), on a throwaway copy of an indexed repo. Run it in a real
terminal.

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ock (#735)

Corrects the "run non-nested only" conclusion from #734. The codegraph server
is healthy (handshake ~165ms); the flakiness is that on a multi-step
implementation task the agent dives into Read/grep before codegraph finishes
its ~2-3s startup (worse under nested CPU contention), so it runs with no
codegraph. Fix: pre-warm a persistent daemon (high idle timeout) + skip the
startup re-exec (CODEGRAPH_WASM_RELAUNCHED=1) so claude connects before the
agent's first turn. claude's init snapshot can show status:"pending" even when
it then connects — judge by actual codegraph usage, not the init line.

ab-new-vs-baseline.sh now bakes in the pre-warm + skip-re-exec. Validated: a
clean A/B showed the new build's agent used codegraph 2x / 5 Reads vs the
baseline's 0 / 8 on the same fully-implemented task.

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…it, byte-parity (#738)

Makes codegraph_node a drop-in faster Read for indexed source files (file-read mode: <n>\t<line> like Read, offset/limit, + blast-radius header; symbolsOnly for the map). Fixes the old file-view dropping imports/line-numbers. #383/#527 preserved. Validated by A/B: explore/node already return source + line numbers, so Read=0 when used. Includes the A/B eval harness scripts. Full suite green (1270).
…iases (#634) (#740)

TypeScript service/RPC contracts written as a tuple of generic types —
`type List = [Service<'query_apply_record', Req, Resp>, …]` — carry their
names only as string-literal type arguments, so static extraction never
indexed them and `codegraph query query_apply_record` returned nothing.

Add a narrow TS/TSX type-alias pass that emits each tuple entry's
string-literal name as a `method` node under the alias (qualifiedName
`List::query_apply_record`), making it searchable. Scope is limited to a
direct literal arg of a generic that is a direct tuple element, with a
valid-identifier filter — so utility types (Pick/Omit/Record), deeper
nested generics, and route paths produce no noise.

Bumps EXTRACTION_VERSION so existing indexes get a re-index hint.

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…#636) (#741)

Two environments that share one working tree — most concretely Windows
and WSL — can't safely share a single `.codegraph/`: the daemon lockfile
records a platform-specific pid + socket (named pipe vs Unix socket), and
SQLite locking across the WSL2/Windows filesystem boundary is unreliable,
so two daemons over one index risks corruption.

Add a `CODEGRAPH_DIR` env var (default `.codegraph`) that overrides the
per-project data directory name, so each environment keeps its own index
in the same tree (e.g. `CODEGRAPH_DIR=.codegraph-win` on Windows). The
name is resolved live and validated (rejects separators / `..` / absolute,
falling back to the default with a one-time stderr warning). Indexing and
file-watching now skip ANY `.codegraph-*` sibling so neither side trips
over the other's data.

Routes the previously-hardcoded `.codegraph` literals (db path, lockfile,
error log, watcher ignore, file-scan skip, installer) through the
resolver. No extraction-version bump — index content is unchanged.

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…645) (#742)

A C++ method call whose receiver is another call's result — `Foo::instance().bar()`,
`WidgetFactory::create().draw()`, `openSession()->run()`, or the same stored in an
`auto` local first — lost the receiver's type during extraction. The callee degraded
to a bare method name, so when two classes shared a method name the call silently
resolved to whichever was indexed first (or not at all), corrupting callers / impact /
trace with a plausible-but-wrong edge.

Three parts:
- Capture C++ return types (new nodes.return_type column, schema v5): the
  function_definition's `type` field, normalized — smart-pointer pointee unwrapped,
  void/primitives dropped.
- Preserve the inner-call receiver in extraction: a C/C++ field_expression whose
  receiver is itself a call is encoded `inner().method` instead of dropping to the
  bare name. Other languages keep the existing behavior.
- New resolution strategy (matchCppCallChain): infer the receiver's class from the
  inner call's return type, then resolve AND validate the method on it. Handles
  singletons/accessors, factories returning a different type, free-function
  factories, make_unique/make_shared/new/direct construction, single-level member
  chains, and namespace-qualified inner calls. A wrong inference yields no edge,
  never a wrong one.

EXTRACTION_VERSION 2->3 (re-index to populate return types).

Validated on the issue repro + spdlog: node count stable (no explosion),
deterministic, and ~100 pre-existing wrong `.size()`-style edges removed.

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
#660) (#663)

PHP's importTypes only captured namespace_use_declaration, so
include/require(_once) — the dependency mechanism in procedural and
script-style PHP — never produced edges. callers, impact, and trace
missed the entire file-include graph; only namespace `use` became a
dependency edge.

Capture the four include/require expression types and emit file→file
imports edges, reusing the path-based resolution that C/C++ #include
already goes through. Only static string-literal paths are resolved
(relative to the including file); dynamic forms (include $var,
require __DIR__ . '/x', interpolated strings) are skipped.

Include PATHS are distinguished from namespace `use` symbols by shape: a
path contains '/' or '.', which PHP identifiers and FQNs never do. A
path-shaped include that doesn't resolve to a known project file is left
unresolved and does NOT fall back to the symbol name-matcher, which would
otherwise mis-connect "inc/db.php" to an unrelated db.php elsewhere — a
wrong edge is worse than a missing one.

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-authored-by: Colby McHenry <me@colbymchenry.com>
…ore (#682) (#743)

A .gitignore transparently encrypted in place by corporate DLP / endpoint
software (UTF-16 header + ciphertext), or one containing a pattern the
`ignore` library can't compile to a regex (`\[` -> "Unterminated character
class"), crashed the entire sync/index. The throw is LAZY — it surfaces at
match time (`ig.ignores()`), not `.add()` — so the existing add-time
try/catch never caught it, and the error never named the offending file.

Read .gitignore defensively: skip a file that isn't valid UTF-8 text whole
(NUL byte or fatal UTF-8 decode), drop only the individual uncompilable
patterns from a text one (probe-compile, then per-line fallback), and warn
with the file path. Indexing continues either way. The watcher inherits the
fix via buildDefaultIgnore.

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…e file (#693) (#744)

A function called only from an anonymous func_literal at package level — a
cobra `RunE: func(){…}` handler, a goroutine literal, a callback closure
stored in a `var` — had its call leak to the FILE node, because the Go
var-initializer walk ran with an empty scope. So `callers`/`impact` showed
the function with a file (or no meaningful) caller, unlike JS/TS where an
arrow-in-const becomes a named node whose calls attribute correctly.

Scope the Go top-level var/const initializer walk to the declared symbol, so
a call nested in any func_literal initializer (struct field, slice/map,
nested closure) attributes to the enclosing var. EXTRACTION_VERSION 3->4
(re-index to pick up the corrected attribution).

Validated on cli/cli (858 Go files): node/edge counts identical, file-level
dependents byte-identical (no regression), and 62 top-level-closure calls
correctly moved from file-attributed to var-attributed.

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…720) (#745)

A multi-word PascalCase query token — typically a project name a user
includes (`SuperBizAgent backend routes`) — splits into sub-tokens
(superbizagent / super / biz / agent) that ALL match the same path segment,
so path relevance summed +5 four times for one concept. In a mixed-stack
repo that ~doubled every score of the lexically-matching stack's file,
burying the stack the query was about.

Score path relevance per original query WORD instead: a word matches a path
level if any of its sub-tokens do, and counts once — while still splitting
the word (via extractSearchTerms on the original case) so it matches across
naming conventions (`getUserName` → `get_user_name`). Distinct words each
still contribute.

Partial fix: this removes the dominant path over-counting (backend rises
from absent-in-top-6 to parity on the reporter's repro). The residual lexical
edge from the project name in the FTS class-name match + dir match is a deeper
down-weighting change, tracked separately. No re-index needed (query-time).

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…#748)

The per-word path fix (#745) brought the backend to parity but not above:
the project name still gave the lexically-matching stack a residual dir
match + an FTS class-name match, so a backend query that included the
project name still ranked the frontend at/above the backend.

Derive the project name from go.mod module / package.json name / repo dir,
and treat a query word matching it as non-discriminative: drop it from path
relevance and from codegraph_explore's PascalCase type-disambiguation bias
(reporter's suggestions #1/#2) — unless it's the only query word, so a bare
project-name search still scores.

Narrow by construction: the down-weighting fires ONLY when a query word
matches the derived project name (≥5 chars), so every query that doesn't
name the project is byte-identical. On the reporter's repro the backend
controllers now top a backend question that includes the project name;
queries without it, bare project-name queries, and normal symbol queries
are unchanged. Query-time only (no re-index).

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…)` (#608) (#749)

A method called through a PHP fluent static factory — `ApiClient::for($c)->createOrder()`,
the canonical Laravel per-credential/per-tenant client idiom — produced no
`calls` edge: the receiver of `->createOrder` is the `Cls::for(...)` static
call, whose result type was never recovered, so the edge was dropped and
`codegraph_callers` returned nothing.

Same shape as the C++ singleton/factory fix (#645), reusing its return_type
column + the chained-call mechanism:
- Capture PHP return types (getReturnType): `: self` / `: static` / `$this`
  stored as the `self` marker, a concrete `: Type` as its short name,
  primitives/unions dropped.
- Encode the chained scoped-call receiver as `Cls::for().method` so the
  resolver can split it (PHP-gated, in extractCall).
- New matchPhpCallChain: look up the factory's return type (`self` → the
  factory's own class; concrete → that class), then resolve AND validate the
  method on it — a wrong inference yields no edge, never a wrong one.

EXTRACTION_VERSION 4->5 (re-index to populate PHP return types + chained edges).

Validated on koel (1383 PHP files): node count identical (no explosion),
0 edges lost, +80 chained-call edges recovered; synthetic tests cover the
self-factory, concrete-return, namespace, decoy, and absent-method cases.

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…() (#750) (#751)

A Java method called through a static factory or fluent chain — `Foo.getInstance().bar()`,
`Config.create(opts).build()` — lost the receiver's type, so the chained method either
didn't resolve at all or (when a same-named method existed on an unrelated class) attached
to whichever class was indexed first. Ports the #645 (C++) / #608 (PHP) 3-part mechanism:

- Part 1: capture Java return types in the extractor (skip void/primitives/arrays,
  unwrap generics, strip package qualifier).
- Part 2: encode a chained-call receiver as `inner().method` with normalized empty
  parens, so factory calls that take arguments still split.
- Part 3: matchJavaCallChain resolves the chained method on the factory's return type,
  validated via resolveMethodOnType so a wrong inference yields NO edge (never a wrong one).

Validated: synthetic decoy + absent-method safety tests; real-repo A/B on google/guava
(3,227 files) — node count identical (no explosion), 0 edges lost, +1,507 unique chained
edges recovered, precision spot-checked verbatim (Splitter.on().split(),
CacheBuilder.newBuilder().recordStats(), GraphBuilder.directed().build(), nested
MultimapBuilder.linkedHashKeys().arrayListValues()). EXTRACTION_VERSION 5 -> 6.

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…).bar() (#750) (#752)

A Kotlin method called through a companion-object factory, fluent chain, or
constructor — `Foo.getInstance().bar()`, `Config.create(opts).build()`,
`STMTransaction(f).commit()` — dropped the receiver to a BARE method name, which
then name-matched a same-named method on an unrelated class (a wrong edge) or
failed to resolve. Ports the #645/#608 mechanism to Kotlin:

- Part 1: capture Kotlin return types in the extractor. tree-sitter-kotlin
  exposes no field names, so the return type is read positionally (the type node
  after function_value_parameters); inferred/Unit/Nothing returns yield none.
- Part 2: encode a CLASS/companion-factory call-receiver chain as `inner().method`.
  Gated to a capitalized receiver (`Foo.getInstance()` / `Foo(args)`) so instance
  chains (`list.filter{}.map{}`) keep their bare-name behavior — re-encoding those
  would only drop the edge, regressing recall in fluent codebases.
- Part 3: generalize matchJavaCallChain -> matchDottedCallChain (shared by the JVM
  dot-notation languages); resolve the method on the factory's return type, or on
  the constructed class for a Kotlin `Foo(args).method()` receiver. Validated via
  resolveMethodOnType, so a wrong inference yields NO edge.

Validated: synthetic decoy + args + absent-method safety tests; full suite green;
real-repo A/B on arrow-kt/arrow (734 .kt) — node count identical (no explosion),
+49 validated-correct chained edges, and the removed edges are wrong bare-name
guesses the fix correctly stops emitting (419/438 from test/doc files; the 18
from product code are stdlib `.apply{}`, self-loops, and bare-name mismatches) —
a net precision improvement, ~0 correct product edges lost. Java path unchanged
(constructor branch is Kotlin-gated). EXTRACTION_VERSION 6 -> 7.

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…750) (#753)

A C# method called through a static factory or fluent chain —
`Foo.Create().Bar()`, `JObject.Parse(s).Property(...)`,
`Instant.FromUtc(...).InZone(zone)` — lost the receiver's type, so the chained
method didn't resolve and the call was invisible to callers/impact/trace. Ports
the #645/#608 mechanism to C# (additive, like Java #751):

- Part 1: capture C# return types in the extractor, reading the `returns` field
  (`static Foo Create()` -> `Foo`); predefined/array/generic/nullable/namespaced
  types are normalized or skipped.
- Part 2: encode a chained `member_access_expression` receiver
  (`Foo.Create(args).Bar()`) as `inner().Bar` with normalized empty parens, so
  factory calls that take arguments still split. Non-chained member calls keep
  their existing `recv.Method` text.
- Part 3: resolve via the shared matchDottedCallChain (now Java/Kotlin/C#),
  validated by resolveMethodOnType so a wrong inference yields NO edge.

Known limitation (safe): C# extension-method chains don't resolve, since the
method lives on the extension class, not the receiver's type — no edge, never a
wrong one.

Validated: synthetic decoy + args + absent-method safety tests; full suite green;
real-repo A/B on Newtonsoft.Json (945 .cs: +3, 0 lost) and nodatime (488 .cs:
+73, 0 lost) — node count identical (no explosion), 0 edges lost, precision
spot-checked verbatim (Instant.FromUtc().InZone(), Offset.FromHoursAndMinutes().Plus(),
OffsetDateTimePattern.CreateWithInvariantCulture().WithTwoDigitYearMax()).
EXTRACTION_VERSION 7 -> 8.

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…754)

* feat(resolution): conformance-aware chained-method resolution (#750)

A chained static-factory/fluent call whose method lives on a SUPERTYPE the
receiver conforms to — a protocol-extension method (Swift), an interface default
method, or an inherited superclass method — now resolves. resolveMethodOnType
falls back to walking the return type's implements/extends edges (via the new
context.getSupertypes) when the method isn't a direct member. Because those edges
don't exist during the single-pass resolution, a second pass
(resolveChainedCallsViaConformance) re-resolves the deferred chained refs after
edges are built. Still validated, so a wrong inference yields no edge.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* docs(changelog): conformance-aware chained-method resolution (#750)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…nsion naming (#750) (#755)

Completes Swift in the #750 chained-call series (after Java #751, Kotlin #752,
C# #753, conformance #754). Two parts:

1. Swift chained-call resolution (the #645/#608 mechanism): capture Swift return
   types (positional, member types -> last segment), encode capitalized-receiver
   chains `Foo.make().draw()` / `Foo(args).draw()`, resolve+validate via the
   shared matchDottedCallChain (+ constructor branch). Fixes the decoy wrong-edge
   bug where a chained method dropped to a bare name and attached to a same-named
   method on an unrelated class.

2. Nested-type extension naming fix: `extension KF.Builder: KFOptionSetter` parsed
   as a class_declaration named `KF.Builder` (dot) — inconsistent with the type's
   own declaration `KF::Builder` (name `Builder`) — so the extension's conformances
   and members were invisible to a chained call on the type. A Swift resolveName
   now names a nested-type extension by its last segment (`Builder`), so its
   `implements`/`extends` edges and methods are found by the supertype walk
   (conformance #754) and the simple-name method match.

Validated: synthetic decoy + args + constructor + absent-method tests; full suite
green; nested-extension repro (`KF.url().onSuccess()` resolves via conformance to
the protocol method). Real-repo A/B vs main (conformance) — Alamofire and
Kingfisher both **0 added / 0 removed, node count unchanged**: NEUTRAL and SAFE.
The prior -168 Kingfisher regression (from the naming inconsistency) is eliminated;
Swift's unique-named fluent methods already resolved by bare name, so the chain
path lands the same edges — the value here is decoy-collision correctness, the
nested-extension naming fix, and consistency with the other four languages.
EXTRACTION_VERSION 9 -> 10.

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…#750) (#757)

A Rust call through a chained associated function — `Foo::new().bar()`,
`Foo::with(cfg).build()` — dropped the receiver to a bare method name, which
then attached to a same-named method on an unrelated type (a wrong edge) or
didn't resolve. Ports the #645/#608 mechanism for Rust's `::` receivers:

- Part 1: capture Rust return types; `-> Self` yields the `self` marker (resolved
  to the impl's own type, like PHP), references/generics are unwrapped/reduced.
- Part 2: encode an associated-function chain (`Foo::new().bar`), gated to a
  scoped_identifier receiver so instance chains (`x.foo().bar()`) keep bare-name.
- Part 3: resolve via matchScopedCallChain (PHP's `::` resolver, generalized),
  validated by resolveMethodOnType. Wire Rust into the conformance second pass
  (matchScopedCallChain variant) so a chained method provided by a trait the type
  implements (`impl Trait for Type` → existing implements edges) resolves too.

Validated: synthetic decoy + args + Self + trait-default-conformance + absent
safety tests; full suite green (lone failure is the known-flaky #662 daemon test,
passes in isolation). Real-repo A/B vs main: clap (329 .rs) a net precision win —
**+937 added (96% correct builder methods), 622 wrong->right retargets**
(`Command::new().arg()` was mis-resolving to `ArgGroup::arg`, now `Command::arg`),
+162 net unique edges; the pure-drops are largely wrong bare-name edges the fix
correctly stops emitting. tokio-rs/bytes 0/0 (no regression). Known limit: the
single-hop mechanism re-encodes only the first hop of a chain (deeper hops keep
bare-name) — clap's unusually deep builder chains are partly covered.
EXTRACTION_VERSION 10 -> 11.

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…#760)

* fix(go): resolve chained factory-function calls New().Method() (#750)

A Go call through a chained factory function — `New().Method()`,
`With(cfg).Build()` — dropped the receiver to a bare method name, which then
attached to a same-named method on an unrelated type (a wrong edge) or didn't
resolve. Ports the #645/#608 mechanism for Go's bare-factory receivers:

- Part 1: capture Go return types; a pointer `*Foo` -> `Foo`, a multi-return
  `(*Foo, error)` -> its first result, qualified `pkg.Foo` -> `Foo`.
- Part 2: encode a bare-factory chain (`New().Method`), gated to an `identifier`
  receiver so instance chains (`obj.Method().Other()`) keep bare-name.
- Part 3: matchDottedCallChain bare-inner Go branch looks up the FUNCTION's
  return type, then resolves+validates the method on it. Wired into the
  conformance pass so a method promoted from an embedded struct (`type Widget
  struct{ Base }` -> the existing `extends` edge) resolves. FALLBACK: when the
  inner isn't a resolvable function (a package-level VARIABLE holding a function
  value, e.g. gin's `engine()`), fall back to bare-name so the edge isn't dropped.

Validated: synthetic decoy + args + multi-return + embedded-conformance + absent
safety tests (4/4); full suite green. Real-repo A/B on gin (99 .go): pre-fallback
-40 = 25 wrong self-loops removed (good) + 15 correct `Engine::ServeHTTP` dropped
(gin's ginS variable-factory `engine()`); the fallback recovers the 15. gin A/B
re-confirm with the fallback is PENDING (local index flakiness, not a code issue).
EXTRACTION_VERSION 11 -> 12.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(go): stop the chained-call fallback from looping the batched resolver

The Go variable-inner fallback (for chains like `engine().ServeHTTP()` whose
inner is a package-level var, not a factory function) resolved the method via
a synthetic bare-name ref and propagated THAT ref as `.original`. Its
`referenceName` was the bare `ServeHTTP`, not the stored `engine().ServeHTTP`,
so `resolveAndPersistBatched`'s keyed `deleteSpecificResolvedReferences` no-oped,
the offset-0 batch never drained, and the loop re-resolved + re-inserted the
same rows forever — a runaway that grew a 99-file repo (gin) to 5,050,206 edges
/ 1.4 GB before filling the disk.

- name-matcher.ts: tie the bare-name match back to the original `ref` so the
  batch-cleanup delete matches the stored row and the loop drains.
- index.ts: add a non-progress guard to resolveAndPersistBatched — if the
  unresolved_refs table doesn't shrink after a batch, stop instead of growing
  the graph without bound (defense-in-depth for any future keyed-delete mismatch).
- resolution.test.ts: regression test for the variable-inner chain — asserts the
  fallback edge resolves AND the edge count stays bounded (no explosion).

gin A/B (post-fix): db 5.8 MB / 3,699 calls edges; net-zero unique-edge diff vs
main (the fallback recovers the dropped edges, adds no wrong ones). Full suite green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ar() (#750) (#761)

Ports the #645 (C++) / #608 (PHP) chained-receiver mechanism to Scala. A call
whose receiver is itself a call — `Foo.create().bar()` (companion factory),
`Builder(cfg).bar()` (case-class apply), or a fluent chain — used to drop the
receiver to a bare `bar`, which name-matched a same-named method on an unrelated
type. The most common wrong edge was a stdlib `Option`/`Iterator` `.map`/`.flatMap`/
`.foreach` mis-attributed onto the project's own same-named class.

- scala.ts: `getReturnType` reads the `return_type` field — generic `List[Foo]`
  → container `List`, qualified `pkg.Foo` → `Foo`, `this.type` left undefined.
- tree-sitter.ts: re-encode `Foo.create().bar` when the inner call's receiver chain
  starts with a capital (companion factory / case-class apply); instance chains
  (`list.map().filter()`) stay bare.
- name-matcher.ts: `scala` joins the dotted-chain gate + CONSTRUCTS_VIA_BARE_CALL
  (case-class `apply` constructs the class); resolveMethodOnType validates, so a
  non-conventional `apply` returning another type yields no edge, not a wrong one.
- index.ts: `scala` joins CHAIN_LANGUAGES so trait-inherited methods resolve via
  the conformance second pass.

Validation: 4 synthetic tests (factory+decoy, case-class apply, trait conformance,
absent-method safety). Real-repo A/B on gatling (750 Scala files): +14 / -59 unique
edges — all corrections. The +14 are retargets (e.g. `HttpProtocolBuilder(cfg).baseUrl`
now resolves to HttpProtocolBuilder::baseUrl, not the same-named private BaseUrlSupport
helper); the -59 are wrong edges removed (stdlib Option/Iterator monad calls
mis-tied to the project's Validation::*, self-loops, decoy collisions) — zero genuine
factory chains dropped (verified: gatling has no real Validation.success().map() chains).
db stable at 40 MB. EXTRACTION_VERSION 12→13. Full suite green.

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ate().bar() (#750) (#762)

Ports the #645/#608 chained-receiver mechanism to Dart, plus makes Dart factory
and named constructors first-class so their chains can resolve at all. A call
whose receiver is itself a call — `Foo.create().bar()` (static factory or
factory/named constructor) — used to drop the receiver to a bare `bar`, which
name-matched a same-named method on an unrelated type (commonly a stdlib
`Option`/`Iterator` `.map`/`.where` mis-tied to the project's own class).

- dart.ts: extractBareCall now re-encodes `Foo.create().bar` when the chain
  starts with a capitalized type; getReturnType captures the return type (generic
  `List<Foo>` → `List`); factory (`factory Foo.create()`) and named (`Foo._()`)
  constructors are indexed as `Foo::create` / `Foo::_` with return type = the
  class (via resolveName + getReturnType + constructor_signature in methodTypes).
- The UNNAMED ctor `Foo()` is deliberately NOT extracted (isMisparsedFunction),
  so plain construction stays an `instantiates` edge to the class rather than a
  call to a phantom `Foo::Foo` method.
- dartCtorInfo validates a "constructor" against the enclosing class name, so a
  method tree-sitter MISPARSES as a constructor — `@override (A, B) m()`, where
  the annotation swallows the record return type and `m()` looks like a one-id
  constructor_signature — is still extracted as the method it is (regression
  found on localsend; covered by a new test).
- name-matcher.ts / index.ts: `dart` joins the dotted-chain gate,
  CONSTRUCTS_VIA_BARE_CALL (case construction), and CHAIN_LANGUAGES (conformance
  for superclass/mixin methods). resolveMethodOnType validates, so a wrong
  inference yields no edge.

Validation: 7 synthetic tests (static factory, factory/named ctor, construction,
conformance, absent-method safety, the misparse regression, instantiation-not-
hijacked). Real-repo A/B on localsend (368 Dart files): hand-written +17/-10 — all
corrections (the -10 = 7 wrong stdlib/extension misattributions removed + 3 ctor
source-renames), plus additive factory/named-ctor call resolution. Instantiation
preserved; no node explosion. EXTRACTION_VERSION 13->14. Full suite green.

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…) (#786)

Ports the #645/#608 chained-receiver mechanism to Objective-C. A message send
whose receiver is itself a message send — `[[Foo create] doIt]` — used to drop
the receiver, so `doIt` name-matched a same-named method on an unrelated class
(commonly a test helper's `init` or an Apple-SDK method).

- objc.ts: getReturnType reads the method's `method_type`, SKIPPING nullability /
  ARC qualifiers (`nonnull instancetype` must yield instancetype, not `nonnull`).
- tree-sitter.ts: the message_expression branch now re-encodes a chained send
  `[[Foo create] doIt]` as `Foo.create().doIt` when the inner receiver is a
  capitalized class and the outer selector is unary.
- name-matcher.ts: `objc` joins the dotted-chain gate + CHAIN_LANGUAGES. A
  class-message factory returns an instance of the RECEIVER class by convention
  (`instancetype`), so when the factory's own return type isn't recoverable
  (`alloc`/`new`/`shared…` return instancetype, or aren't user nodes), the
  receiver's type is the class itself — this resolves the ubiquitous
  `[[X alloc] init]` and singleton chains. resolveMethodOnType validates against
  the class and its supertypes, so a wrong inference yields no edge.

Validation: 4 synthetic tests (factory+decoy, superclass conformance, absent-method
safety, the nonnull-instancetype singleton). Real-repo A/B on SDWebImage (208 files):
+35 / -75 — all corrections (the -75 are wrong `init` mis-matches to a test helper /
wrong class, retargeted to the right class's init in the +35, plus 2 Apple-SDK chains
on unindexed classes). db stable, no node explosion. EXTRACTION_VERSION 14->15.
Full suite green.

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…nism (#750) (#787)

A checked-in design doc for the #645/#608/#750 chained-call mechanism — the
permanent, discoverable record the work previously lacked (it lived only in git
history, the tracking issue, and an untracked scratch handoff). Covers the 3-part
mechanism, the three shared resolvers + receiver styles, the per-language coverage
matrix (12 shipped with A/B results), the conformance pass, and the full 21-language
README classification (incl. why TypeScript + Luau were skipped and Pascal is blocked).

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
colbymchenry and others added 21 commits June 22, 2026 08:13
…ory` (#936) (#941)

When the indexed root is a directory an enclosing git repo ignores,
`git ls-files --directory` collapses the whole cwd to a single literal
`./` entry. That sentinel reached the `ignore` matcher, which rejects it
("path should be a `path.relative()`d string, but got "./""), aborting
buildScopeIgnore — the one ignore-building call in FileWatcher.start().
So the MCP daemon's startWatching() threw, was caught as "Failed to open
project", and auto-sync never started: the index silently went stale
until a manual `codegraph sync` (CODEGRAPH_NO_DAEMON=1 was the only
workaround).

Filter the `./`/`.` self-entry wherever we consume `--directory` output
(listIgnoredDirs + the untracked-dir loop in discoverEmbeddedRepoRoots).
Semantically correct, not just a crash guard: `./` means "the whole cwd",
never a nested repo to recurse into.

Not platform-specific (reported on Codex/Windows, reproduced on macOS):
the trigger is git state, not the OS.

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… duplicates (#945) (#947)

A worktree of a submodule points its `.git` into
`.git/modules/<module>/worktrees/<name>`, but `classifyGitDir` only matched
the top-level `.git/worktrees/` shape — so submodule worktrees fell through
to "embedded" and every symbol they shared with the real submodule checkout
got indexed twice (one report: ~28% of the index was duplicates, inflating
both query results and the DB). Broaden the worktree detector to allow the
optional `modules/<module>` segment. The submodule's own checkout
(`.git/modules/<module>`, no `worktrees/`) is unaffected and stays indexed as
distinct code.

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… class misparse (#946) (#948)

A C++ class/struct annotated with an export/visibility macro —
`class MYLIB_EXPORT Foo : public Bar { … }` — makes tree-sitter read
`class MYLIB_EXPORT` as an elaborated type specifier and the whole declaration
as a `function_definition` named after the class, spanning the entire body. That
phantom `function` polluted callers/impact/blast-radius and skewed kind stats.

Detect the misparse structurally in cppExtractor.isMisparsedFunction — a
function_definition whose `type` field is a *bodyless* class/struct specifier
(the elaborated-type macro) and whose declarator is not a function_declarator —
and drop the bogus node, matching how macro-prefixed C prototypes are already
handled. The body is mangled by the same misparse and is unrecoverable. Precise
enough to leave genuine code alone: `struct P { int x; } makeP() {}` (real
inline-defined return type, has a field list) and `class Foo f() {}` (elaborated
return type on a real function, has a function_declarator) are untouched. The
leading macro alone triggers the misparse; a base clause is not required.

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… 20, and expand language/framework coverage

- Benchmark table reordered to lead with tool calls, time, and file reads (the universal wins); cost and tokens moved right with a note that savings are scale-dependent, not a headline claim
- README/introduction/quickstart/installation messaging updated to "surgical context · fewer tool calls · faster answers" framing, dropping the "16% cheaper" headline
- Node engine floor raised from 18 to 20 in CLAUDE.md, package.json description updated
- `codegraph init` now creates and indexes in one step; the `-i` flag is retired (still accepted as a no-op)
- CLI reference expanded with new commands: `explore`, `node`, `unlock`, `daemon`, `telemetry`, `upgrade`, `version`, `help`
- MCP server docs clarified: single `codegraph_explore` tool exposed by default, others unlisted but re-enableable via `CODEGRAPH_MCP_TOOLS`
- Language support adds Objective-C, Astro, and R; framework routes adds Play, Vue Router/Nuxt, and Astro
- API reference documents lower-level exports and embedding requirements (Node 22.5+ for `node:sqlite`)
- Troubleshooting adds WSL/Windows dual-checkout guidance
- How-it-works updated: SQLite backend is now Node's built-in `node:sqlite` in WAL mode, not better-sqlite3/WASM
Added an image and a note on cost savings for CodeGraph.
…path (#766) (#949)

Change detection's git fast path (collectGitStatus) consumed `git status`
output with only an isSourceFile filter, on the assumption that git already
omits ignored paths. It doesn't: gitignore is a no-op for *tracked* files, and
the built-in default excludes (vendor/, node_modules/) aren't gitignore at all.
So a tracked file inside a committed dependency dir, or under a .gitignored
dir, surfaced as a change the full index never tracks — `codegraph status`
reported phantom pending changes that `sync` (a filtered filesystem reconcile)
never cleared, and the public getChangedFiles() API returned the same wrong
list.

Apply buildDefaultIgnore(repoDir) per recursion level, matching repo-relative
paths — structurally equivalent to the full-index path's ScopeIgnore (each
embedded repo judged by its own rules) with no extra git subprocess calls.
Deletions stay unfiltered: getChangedFiles acts on one only when the path is
already tracked in the DB, where removal is always correct, and that lets a
newly-excluded dir's stale rows clean themselves up.

Unblocks #699 (an .ignore overlay inherits this leak unless change detection
consults the same matcher as enumeration).

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…concile (#905) (#950)

On a very large repo (the report is a ~93k-file / 5.7GB-DB Java monorepo) the
first MCP `tools/call` after a fresh `serve --mcp` could hang for 10+ minutes
with zero output, and with the liveness watchdog on, the daemon was SIGKILLed
mid-query instead. Root cause: the post-open catch-up reconcile that the first
tool call is gated on does ~2*N synchronous `fs.existsSync`/`fs.statSync` calls
plus a load-all-files query in two non-yielding loops. On a huge repo that wedges
the event loop for minutes, which (a) trips the 60s watchdog (it SIGKILLs a
process whose loop stops turning) and (b) blocks the first call the whole time.

Two complementary fixes:

- Make the reconcile yield. `ExtractionOrchestrator.sync()` now uses the
  yielding `scanDirectoryAsync`, and both O(files) reconcile loops
  `await setImmediate` every SYNC_RECONCILE_YIELD_INTERVAL (1000) files. The loop
  can no longer wedge the main thread, so the watchdog stays fed and the socket /
  any concurrent read stays responsive while a big reconcile runs. Results are
  unchanged — only yield points are added.

- Time-box the catch-up gate. The first `tools/call` now waits on the reconcile
  for at most CODEGRAPH_CATCHUP_GATE_TIMEOUT_MS (default 3000ms), then serves and
  lets the reconcile finish in the background (which now yields, so the served
  call runs concurrently). `=0` restores the old unbounded wait. On a normal repo
  the reconcile finishes well under the budget, so behavior is unchanged.

Tests: adds two time-box cases to mcp-catchup-gate (serves promptly when the
reconcile runs long; `=0` restores the unbounded wait). Full suite green
(1655 passed). Validated end-to-end through the real daemon: first call returns
at the ~3s time-box instead of waiting an injected 8s reconcile; no-delay control
unchanged; `=0` opt-out waits the full reconcile.

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…951)

MCP tool results used Markdown ATX headings (##/###/####) for section
headers — the status summary, each search hit, every file section in an
exploration — which Markdown-rendering clients (e.g. the Claude Code
VSCode extension) blow up to H1–H4 font size, filling the transcript with
oversized lines (worst on search/explore, where the noise scales with
result count). Swap them all for bold labels, which render at body size
while keeping the same structure. CLI/TTY output (ContextBuilder) is
unchanged — the issue notes it's fine.

The format is parse-coupled, so kept in sync:
- The explore truncation boundary and the offload chunker
  (reasoning/reasoner.ts) both key off the per-file header, now a unique
  `**`-prefixed marker emitted via a shared fileSectionHeader() helper.
- Updated the offload strip regexes and switched the opt-in report-style
  prompt off ATX headings (same client, same rendering issue).
- Updated test helpers (sectionFor, sourcedFiles, the callers
  section-boundary scan) that scanned the old markers.

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…) (#953)

Lombok generates getters/setters, builder(), equals/hashCode/toString, and
the @slf4j log field at compile time, so they never appear in the source AST.
Static extraction missed them entirely, so a bean.getName() / User.builder() /
log.info() call resolved to nothing and call-chain analysis broke silently —
the agent would conclude the method didn't exist.

Add a synthesizeMembers hook on LanguageExtractor, called at the end of class
extraction (class still on the scope stack, real members already extracted), and
a Java implementation that synthesizes the mechanical members for @Getter,
@Setter, @DaTa, @value, @Builder/@SuperBuilder, @tostring, @EqualsAndHashCode,
and the @log* family. Each node is anchored on the field/class name-token leaf
(so it pulls in no spurious value-reference scope), marked with a `lombok`
decorator and a docstring naming the generating annotation, and never overrides
a member the source already declares. Methods and fields are deduped separately
since they're distinct namespaces in Java (a boolean field `isRunning` and its
generated getter `isRunning()` coexist).

Deliberately not synthesized: constructors (new X() already links via
instantiates, and overloaded @NoArgs/@AllArgs/@RequiredArgs ctors would collide
on a synthetic node id), fluent builder setters, and @accessors(fluent=true).

Validated on eladmin (274 Java files, Lombok-heavy): 100% accessor precision
(878/878 map to a real field), 722 previously-broken calls now resolve;
spring-petclinic (no Lombok) control synthesizes nothing.

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
C/C++ polymorphism is the function pointer: a struct fn-pointer field, concrete
functions registered into it through a table (`{"add", cmd_add}`), a designated
initializer (`.handler = on_open`), or an assignment, then dispatched indirectly
(`p->fn(argv)`). Static extraction captures neither the registration→field
binding nor the indirect call, so the dispatcher→handler edge was missing — git's
run_builtin looked like it called nothing, a vtable's implementations had no
callers, and the hook_demo.c in the issue was unreachable.

Add a resolution-layer synthesizer keyed by (struct type, fn-pointer field). It
reads source (the established Celery/Sidekiq/Spring pattern — C extraction has no
struct fields or indirect-call edges to build on) in passes: collect fn-pointer
typedefs, parse struct field layouts, collect registrations (positional matched
by field index, designated, and assignment), propagate field←field assignments
(so a generic hook slot reassigned from a registry — the hook_demo.c
`h->func = found->fn` shape — inherits the registry field's handlers), then link
each indirect dispatch site to the registered handlers. Receiver type resolves
from the enclosing function's params/locals, falling back to a field name unique
to one struct. Covers both the command-table idiom (git, redis) and the
ops-struct/vtable idiom (curl content-encoders, protocol handlers).

Pure edge synthesis (no node growth); high precision via the (struct, field) key.

Validated: git 502 edges (run_builtin→cmd_* plus git_hash_algo/archiver/reftable
vtables), redis 357 (dictType.hashFunction, connection + reply-object vtables),
curl 478 (Curl_cwtype.do_init → deflate/gzip/brotli/zstd); 0 non-function targets
on all three; node-stable; 0 on the lua control (its {name,fn} tables register
into the Lua VM, with no C indirect call to bridge). Full suite 1665 pass.

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…son (#906) (#955)

The extension → language table was hardcoded, so a codebase using a
non-standard extension for a supported language (e.g. `.dota_lua` for Lua)
had those files silently skipped — no way to opt them in short of patching
the source.

Add an opt-in, project-scoped `codegraph.json` at the repo root:

    { "extensions": { ".dota_lua": "lua", ".tpl": "php" } }

Mappings merge on top of the built-in defaults and take precedence (so a
built-in can be re-pointed, e.g. `.h` → `cpp`). Absent or malformed config
is the zero-config default — byte-identical to prior behavior; an invalid
target language or unparseable file is warned-and-skipped, never fatal.

Implementation:
- New `src/project-config.ts` — `loadExtensionOverrides(rootDir)`, validated
  against `isLanguageSupported`, mtime-cached per root.
- `detectLanguage` / `isSourceFile` gain an optional `overrides` arg
  (omitting it is the existing behavior).
- Overrides threaded per-operation through every extraction call site
  (scan/walk gates, git change-detection, grammar selection, extraction,
  the file watcher), resolved from the project root — no process-global
  state, so the multi-project daemon stays isolated. The parse worker
  receives the resolved language in its message.

Tests: 13 new cases (unit, loader validation/normalization/caching, and a
full-index integration proving a custom-extension file is extracted while
the zero-config path indexes nothing). Worker path smoke-tested via the
built CLI.

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…int outside the repo (#935) (#956)

The directory walk deliberately follows an in-root symlink whose target
lives outside the repo root (the standard Dota custom-game layout, where
`game/` and `content/` link into the SDK tree) and enumerates the files
under it. But the read path then rejected every one of them via the
strict symlink-escape guard, logging `Path traversal blocked in batch
reader` and indexing nothing — discovery and the reader disagreed.

Add an opt-in `allowSymlinkEscape` to validatePathWithinRoot that waives
only the realpath-escape rejection (the lexical `../` guard still
applies) and pass it at the three indexing read sites (batch reader,
indexFile, indexFileWithContent). The content-serving sinks
(ContextBuilder, MCP tools) keep the strict guard, so this stays inside
the #527 model: indexing now follows the symlink, getCode still refuses
to serve out-of-root contents.

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…hods (#747) (#957)

GoFrame's standard router binds routes reflectively (group.Bind(ctrl)): the path
and method live in a g.Meta struct tag on a request type, and the controller
method that serves it is matched by that request type at runtime — so there was
no path string and no edge from a route to its handler, and "where is this route
handled / where are routes bound to controllers?" could only be answered
lexically (issue #720's report).

- frameworks/goframe.ts: detect gogf/gf in go.mod, extract each path-bearing
  g.Meta into a route node (requires path:, so response mime:-only tags are
  skipped), encoding the package-qualified request type for the join.
- goframe-synthesizer.ts: join each route -> the controller method whose
  signature takes that request type — NOT by name (DeptSearchReq is served by
  List) — keyed pkg.Type to disambiguate the many identical bare names a large
  app defines one-per-module, with an addon-root tiebreak for cloned demo addons.
  Edge kind calls, provenance heuristic, synthesizedBy goframe-route, surfaced as
  a dynamic-dispatch hop in codegraph_explore.

Validated on real repos: gf-demo-user 7/7, gfast 65/68 (3 genuinely
handler-less), hotgo 242/247 (98%) — 100% precision (0 non-controller handlers,
0 core/addon cross-binding), node count stable. Agent A/B (gfast, sonnet/high,
2 runs/arm): with codegraph 1 explore call / 0 Read / ~20s vs without 7.5 Read
avg + grep-hunting for the non-existent literal route string / ~42s; same correct
answer.

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
) (#965)

* perf(resolution): resolve imports to definitions, not sibling import nodes (#915)

"Resolving refs" crawled (tens of minutes) on large projects — most painfully
ones mixing a big front-end and back-end. An external package or module imported
across hundreds/thousands of files (react, a shared UI package, Python
logging/typing) is re-declared as an `import` node in every importing file, so
its unresolved import ref fell through to the exact-name matcher, which scored
all K same-named import nodes via findBestMatch — K refs x K candidates = O(K^2)
per package, producing only meaningless import->import edges.

Fix: exclude `import`-kind nodes as name-match targets (they're statements, not
definitions; real import->definition resolution is the import resolver's job).
Plus two safe constant-factor wins in findBestMatch: hoist the per-candidate
ref.filePath split, and skip cross-language candidates when a same-language one
exists (provably the same winner — same-language scores >=50, cross-language
maxes at 35).

Measured: superset (Py+TS) candidates scored 7.5M -> 833K (9x), non-import edges
preserved (+1618 now resolve to real defs), ~22K useless import->import edges
removed; kubernetes (Go) computePathProximity 37.2s -> 5.0s; synthetic 8k-file
mixed repo (K=4000) resolution 16.0s -> 1.7s. Full suite green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* docs: correct stale better-sqlite3/wasm references to node:sqlite

The SQLite backend has been Node's built-in node:sqlite (real SQLite, WAL + FTS5,
from the bundled runtime) for a while — there is no native build step and no
node-sqlite3-wasm fallback. README and the docs site were already updated; this
catches the stragglers:

- CLAUDE.md: the src/db/ backend description and the sqlite-backend test note.
- src/db/index.ts, src/mcp/tools.ts: two code comments that still blamed "the
  wasm backend" for non-WAL behavior (reworded to "when WAL isn't in effect").

Leaves tree-sitter grammar wasm (web-tree-sitter / --liftoff-only) untouched —
that's a different, still-current use of wasm.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* chore(telemetry): drop the dead sqlite_backend field (schema v2)

node:sqlite is now the only backend, so the `index` event's `sqlite_backend`
field was a constant ("native") carrying no signal — and the `install` event
never actually sent it. Remove the field and the backendKind() helper, bump the
telemetry SCHEMA_VERSION 1 -> 2, and update TELEMETRY.md + docs/design/telemetry.md.

The ingest worker is deliberately left tolerant: `index` doesn't require the
field and schema_version validates as nonNegInt(99), so v2 events ingest fine and
old clients still sending v1 + sqlite_backend keep validating too. Added a legacy
comment there explaining it's safe to drop once old-client share is negligible.

telemetry.test.ts: the assertion pinning schema_version and a stale-claim fixture
line updated 1 -> 2. All telemetry tests pass.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… monorepo-aware (#964) (#966)

The MCP server gated tool availability on whether the server root had a
.codegraph/ index, so in a monorepo where only sub-projects are indexed the
agent saw zero tools — and couldn't reach an indexed sub-project even by
projectPath. A session started before `codegraph init` also never surfaced the
tools afterward. The Claude front-load hook had the mirror gap: it only walked
UP for an index, so it stayed silent at a monorepo root.

MCP server:
- Always expose the tool surface; when the root isn't indexed, send a
  per-project instructions variant (pass projectPath) instead of the
  "inactive" note. Safety comes from response SHAPE (success-shaped guidance,
  never isError), not from hiding tools.
- Reword the no-default-project guidance to be per-project, not per-session,
  and sharpen the projectPath schema description.

Front-load hook (UserPromptSubmit):
- Scan DOWN (bounded depth, workspace-root-gated) for indexed sub-projects and
  shape the injection by topology: front-load the one the prompt names, nudge
  about the rest, or list them when ambiguous.

Verified: full suite (1703 passed); a live two-package monorepo run confirms the
hook front-loads the correct sub-project with no cross-package leakage. The
front-load's net speed effect is the existing multi-file-vs-single-file
tradeoff, unchanged by this work.

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
[skip ci] Auto-generated by Release workflow.
…overy (#970, #976) (#980)

#514 (v1.0.0) began walking into gitignored directories to discover and
index the git repos nested inside them. That broke users who rely on
.gitignore to exclude a directory: a gitignored folder of cloned
reference repos blew graphs up (one report went 10k to 500k edges, #976)
and stalled indexing on multi-gigabyte trees of clones (#970).

Respect .gitignore by default again. Discovering embedded repos inside a
gitignored directory is now opt-in via codegraph.json:

    { "includeIgnored": ["packages/", "services/"] }

The single choke point findIgnoredEmbeddedRepos now returns nothing
unless a gitignored dir matches the project's includeIgnored patterns,
and the matcher is threaded from the scan root through the full-index,
incremental-sync, and watcher-scope paths. Downstream ScopeIgnore and the
watcher are unchanged: they key off the discovered embedded roots, so
gating discovery fixes the indexer, sync, and watcher together. Untracked
embedded repos (#193) stay indexed by default.

This restores the super-repo-of-clones behavior (#622, #699) for the
people who want it, while making the default match what every other tool
(and CodeGraph's own git ls-files foundation) does: .gitignore excludes.

project-config.ts now parses codegraph.json once (loadParsedConfig) and
exposes loadIncludeIgnoredPatterns alongside the existing extension map.

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
PR #980 merged after v1.1.0 was already promoted and published, so the
squash-merge's 3-way merge auto-placed its CHANGELOG entry under the
released [1.1.0] section. Move it to [Unreleased] so the published 1.1.0
notes stay accurate and the next release (1.1.1) promotes the fix.

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…#974) (#983)

The client-facing MCP proxy could exit with "Transport closed" when its
connection to the shared daemon hit a socket 'error' with no listener
attached — common on WSL2 /mnt (DrvFs), where AF_UNIX is flaky. The global
fatal handler turned that uncaughtException into process.exit(1), which the
MCP client saw as a bare transport close even though the index was healthy.

proxy.ts now keeps an 'error' listener on the daemon socket for its whole
life (and skips a socket destroyed in the connect window), so a stray error
degrades to the existing in-process fallback instead of crashing. daemon.ts
releases the lockfile it acquired when it fails to bind, so the next launch
doesn't spin on a stale lock (the duplicate serve --mcp pileup).

No default behavior change for anyone; WSL /mnt users who still hit trouble
can set CODEGRAPH_NO_DAEMON=1 to skip the shared daemon entirely. Validated
on macOS (unit + live serve probe) and Linux (Docker, --init): 64/64 across
the daemon/socket/lifecycle suites, incl. real AF_UNIX.

Closes #974

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
github-actions Bot and others added 7 commits June 24, 2026 22:18
[skip ci] Auto-generated by Release workflow.
…rent-call timeouts (#1002)

The shared daemon served every session on one event loop with synchronous
node:sqlite. codegraph_explore is CPU-bound work stitched together by microtask
awaits, so N concurrent explores keep the microtask queue continuously full and
starve the macrotask phases — timers AND socket I/O. The transport freezes: no
response can flush until the whole batch drains, so with ~10 subagents on a large
repo clients routinely time out (reported via X by @symbolic2020).

Move the heavy read-tool dispatch onto a worker-thread pool. Each worker holds
its own WAL read connection (verified: a worker reader sees the main writer's
committed catch-up/watcher writes); the single watcher/writer, the catch-up gate,
codegraph_status, and the staleness/worktree notices stay on the main thread.
Concurrent reads now run in true parallel up to core count and the main loop
stays free for the MCP transport, so responses flush incrementally instead of
all-at-once after the batch drains. Enabled for the shared daemon only; direct
(single-stdio-client) mode is unchanged.

- crash recovery: respawn + retry-once, with a circuit breaker that falls back
  to in-process dispatch if workers can't run on this platform
- graceful backstop: an overloaded pool returns success-shaped "busy, retry"
  guidance, never isError (so it can't teach the agent to abandon codegraph)
- pending-aware growth + capped concurrent cold-starts avoid a startup
  thundering herd (N simultaneous module-loads + DB opens could stall the loop)
- config: CODEGRAPH_QUERY_POOL_SIZE (default clamp(cores-1, 1, 16); 0 disables
  → in-process), CODEGRAPH_QUERY_BUSY_TIMEOUT_MS (default 45s)

10 concurrent explores on vscode (10.5k files): 31s → ~9s, staggered flush,
0 timeouts, byte-identical output; scales with cores (≈3.3× on 8, 1.8× on 2).
Full suite passes plus 10 new query-pool tests (fake-worker injection so the
scheduling logic is covered without spawning threads).

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…onditional-compilation & bare arrays (#991) (#1003)

* feat(c/c++): resolve macro-built function-pointer command tables (#991)

C/C++ commands dispatched through macro-built function-pointer tables were
dead-ends in the graph: redis' `call` never showed up as a caller of any
command (`c->cmd->proc(c)`), because the table is generated into a #included
`.def`, the handler is buried inside `MAKE_CMD(...)`, the struct type is itself
a macro alias, the `proc` field uses a function-TYPE typedef, and the receiver
is a chained field access. #954 deferred exactly this shape.

Six composable additions to c-fnptr-synthesizer.ts close it:
- function-type typedefs (`typedef RET T(...)` + `T *f`) flag the field as a
  function pointer;
- multi-declarator fields (`struct redisCommand *cmd, *last`) each count as a
  slot/type (needed for positional alignment and the chain walk);
- chained/array receivers (`c->cmd->proc`) resolve through field types across
  all same-named struct layouts (redis has two unrelated `client` structs);
- `#include "x"` directives are followed (from raw source) so a non-indexed
  `.def` is read as a registration unit with the includer's effective macro env;
- function-like + object-like macros are expanded (params->args, type aliases)
  before positional/designated registration;
- a macro that expands to a brace-wrapped element (sqlite `FUNCTION(...)`) has
  one outer brace layer peeled.

Validated on two independent macro-table lineages at 100% target precision:
redis (209 commands via redisCommand.proc, `call`->every command) and sqlite
(69 FuncDef.xSFunc targets). No regression on the controls: git (cmd_struct.fn,
138 builtins), curl (Curl_cftype.*), lua (0). 0 non-function targets across all
five; +3 synthetic fixtures; full suite green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* feat(c/c++): resolve conditional-compilation command tables (vim) (#991)

Vim's `:ex` and normal-mode command tables are the hardest fn-pointer-table
shape: the struct is defined INLINE with the array, the whole thing is behind
`#ifdef DO_DECLARE_EXCMD`/`DO_DECLARE_NVCMD` (switched on by the includer), built
by a macro the file conditionally redefines (`EXCMD`/`NVCMD` = the table element
under the switch, a bare enum id otherwise), and dispatched by a parenthesized
array subscript through a file-scope table: `(cmdnames[i].cmd_func)(&ea)`.

Four more composable additions on top of the macro-table work:
- a focused `#ifdef`/`#ifndef`/`#if defined`/`#else`/`#elif`/`#endif` evaluator
  drops inactive arms (unevaluable `#if EXPR` keeps its body); an indexed header
  is re-scanned in an includer's context only when that includer #defines a
  switch the header guards, with the include's macros re-read from the resolved
  text (the plain last-wins parse picks the wrong, enum, arm);
- inline `struct TAG {…} var[] = {…}` tables whose struct never became a node are
  parsed in place and registered;
- array-subscript receivers (`tbl[i].f`) strip the subscript and resolve the
  base through a global-var → struct-type map;
- an optional `)` before the call covers the parenthesized `(….f)(args)` form.

Validated on vim: 273 `:ex` commands (`do_one_cmd`→every command) + 67
normal-mode commands, 0 non-function targets, 0 cross-table misroute (registering
both tables is what stops `normal_cmd`'s `nv_cmds[i].cmd_func` from falling back
to the `cmdname` owner of the shared field name). Controls unchanged at 0
non-function (redis/sqlite/git/curl gain coverage from array/global dispatch, lua
still 0); +1 synthetic fixture; full suite green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* feat(c/c++): resolve bare arrays of function pointers (#991)

The C/C++ fn-pointer synthesizer keyed everything on (struct type,
fn-pointer field), so a dispatch through a bare array of function
pointers — no struct, no field — was unbridged: an opcode/handler table
like `static op_t *opcodes[256] = {nop,…}` invoked `opcodes[op](…)` left
every handler with zero callers. Closes the last #991 deferred item.

Keyed by the array VARIABLE name (a new `arrayReg`, parallel to the
struct `reg`). Registration detects an array whose element type is a
function typedef — a function-TYPE typedef element (`opcode_t *ops[]`,
the `*` making it an array of pointers) or a function-pointer typedef
element (`zend_rc_dtor_func_t t[]`) — and reads its literal entries,
whether positional (`fn`/`&fn`), designated by index (`[IDX]=fn`), or
cast-wrapped (`(cast)fn`). Dispatch is `tbl[i](…)` / `(*tbl[i])(…)`,
gated on `tbl` being a known fn-pointer array (the precision anchor);
the fan-out reaches the whole set (a runtime subscript hits any entry),
like a command table. The same-file table wins on a name collision, so
two file-local `static opcodes[256]` (SameBoy's CPU + disassembler)
never cross. The fn-pointer typedef/field regexes now also tolerate a
calling-convention macro before the `*` (`(ZEND_FASTCALL *name)`), which
hardens the existing struct-field path too.

Validated on two independent lineages: SameBoy (GB emulator) — 147 edges
via `opcodes[]`, 0 cross-file leak; php-src (Zend) — 54 edges across 7
tables in the designated+cast+CC-typedef form. Control: lua 0 — its
`lua_CFunction searchers[]` is pushed into the VM, never C-dispatched, so
the call-gate fires nothing. No regression on the #991 corpus: redis
(835) / sqlite (683) struct edges byte-identical, git +3 / curl +20
legitimate new bare-array edges, vim 433 with all guards holding; 0
non-function targets across all. + 4 fixtures.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…#1004)

The UserPromptSubmit hook's structural-prompt gate was English-only, so a
structural question written in Chinese — or any non-Latin script — silently
injected nothing: JS `\b` is ASCII-only and never matches between Han
characters, so the keyword regex couldn't fire (and couldn't be extended in
place). To the user the hook looked unwired, with no error to explain why.

Make the gate language-aware, split into tested helpers in directory.ts:
- hasStructuralKeyword: English (\b-guarded) + CJK structural keywords.
- extractCodeTokens: identifier-shaped tokens (camelCase / snake_case /
  name() / a.b) in any language — verified against the index via
  getNodesByName before firing, so a tech brand like `JavaScript` that looks
  like a symbol but isn't one here doesn't inject ~16KB of spurious context.
- isStructuralPrompt: the cheap candidate gate (keyword OR code-token).

Adds 21 unit tests for the gate (previously untested) covering the reporter's
verification table plus the false-positive guards.

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ject (#993) (#1007)

When the server runs with no default project to fall back to — a gateway
server started outside any repo, or a monorepo root whose .codegraph/
indexes live only in sub-projects — every tool call must carry an explicit
projectPath. Previously projectPath was always optional, so an agent talking
to such a server would omit it, get success-shaped "pass projectPath"
guidance, and not reliably retry; the user had to nudge it by hand.

getTools() now marks projectPath required in the exposed tool schemas on the
no-default-project branch (a high-salience channel clients surface/validate,
unlike the instructions prose the reporter found too weak). When a default
project is open, projectPath stays optional and a bare call falls back to it.

The fix lives at the MCP schema layer, not the Claude-only front-load hook:
the hook is local-filesystem-based and never runs for the reporter (they're
on AGENTS.md / Codex-opencode). The proxy/getStaticTools path is untouched —
index.ts forces direct mode whenever resolveDaemonRoot is null, so the
no-default case never reaches the proxy.

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…; add exclude config + index watchdogs (#999) (#1009)

Three fixes for a repo that commits a large JS/TS theme/SDK (Metronic under
static/, ~1,600 tracked files):

1. A SECOND "Resolving refs" quadratic that #915 didn't cover. #915 capped
   import-name collisions; this caps method-name collisions (init/update/render
   re-declared on every widget), which flow through matchMethodCall Strategy 3
   and findBestMatch instead. New AMBIGUOUS_NAME_CEILING (default 500, env
   CODEGRAPH_AMBIGUOUS_NAME_CEILING): above it the fuzzy strategies decline
   rather than score K candidates — no proximity score can pick the one true
   target among thousands anyway. Resolving drops from O(K^2) to linear in refs
   (e.g. 900-file synthetic: 28.7s -> 3.4s), edge counts unchanged, and the cap
   never fires on normal repos (max real method-collision ~40).

2. A new `exclude` array in codegraph.json keeps git-TRACKED paths out of the
   index, which .gitignore can't do (enumeration is `git ls-files`). Mirrors the
   existing includeIgnored plumbing across the git, sync, and non-git-walk
   paths.

3. `index`/`init` now install the #850 liveness + #277 ppid watchdogs (which
   were serve-only), so a wedged or orphaned indexer self-terminates instead of
   pinning a core. The --liftoff-only relaunch's spawnSync can't forward
   signals, so killing the parent shim used to orphan the worker.

Tests: ubiquitous-name ceiling, exclude (incl. tracked-file exclusion on git +
non-git), orphan self-termination (POSIX), and ppid-parser units. Shared the
ppid parsers out of mcp/index.ts into mcp/ppid-watchdog.ts.

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
#1009 added `exclude` (keep git-tracked dirs out of the index) but didn't
document it. Add an "Excluding a tracked directory" section to the site config
page (parallel to includeIgnored) and a brief note + example to the README,
covering the committed-theme/SDK case .gitignore can't handle.
@nanofatdog nanofatdog closed this Jun 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants