[bugfix] PathExpr: iterate atomic, context-independent RHS per-item (closes #798) by joewiz · Pull Request #6418 · eXist-db/exist

joewiz · 2026-05-30T16:31:05Z

[This response was co-authored with Claude Code. -Joe]

Summary

XPath 3.1 §3.3.5 (Path Operator) requires that in E1/E2, the right-hand step E2 is evaluated once for each item produced by E1 — even when E2 doesn't reference the context. eXist's PathExpr.eval short-circuited that iteration for context-independent atomic-returning right-hand sides over persistent inputs, collapsing the multiplicity.

The bug has been open since 2015 (#798); @line-o confirmed it still reproduces on develop earlier today.

Closes #798.

Reproducer (from #798, restated by @line-o in the latest comment)

let $data := <a><b/><b/></a>
let $doc := xmldb:store('/db', 'test.xml', $data)
return [
    $data//b/3,
    doc('/db/test.xml')//b/3
]

Before this PR: [(3, 3), 3] — in-memory iterates per item (2 results), persistent collapses (1 result).
After this PR: [(3, 3), (3, 3)] — both forms iterate, multiplicity preserved.

Root cause

PathExpr.eval (~line 261) chooses between per-item iteration of E2 and a single evaluation. In-memory inputs always iterate (inMemProcessing flag). Persistent inputs only iterate when E2 declares a CONTEXT_ITEM or CONTEXT_POSITION dependency.

The single-eval shortcut is sound when E2 returns nodes: the post-step result.removeDuplicates() at line ~313 absorbs the missing multiplicity, since each iteration would produce the same node-set. It's not sound when E2 returns atomics — there's no de-duplication step, the literal value is the same every iteration, and the missing iterations are exactly what carries the multiplicity required by §3.3.5.

Fix

Three guards bolted on to the iterate/single-eval condition. Each one was needed because a broader form broke real callers — the test suite drove the design.

+ // XPath 3.1 §3.3.5: E2 in E1/E2 must evaluate per item of E1. The else-branch
+ // shortcut is only sound for node-returning E2 (where removeDuplicates() absorbs
+ // missing iterations). For atomic-returning, context-independent E2 we must
+ // force iteration to preserve multiplicity. See #798.
+ final boolean stepReturnsNonNode = !Type.subTypeOf(expr.returnsType(), Type.NODE);
+ final boolean stepIsContextIndependent =
+         !Dependency.dependsOn(exprDeps, Dependency.CONTEXT_ITEM)
+         && !Dependency.dependsOn(exprDeps, Dependency.CONTEXT_POSITION)
+         && !Dependency.dependsOn(exprDeps, Dependency.CONTEXT_SET);
+ final boolean atomicRhsMustIterate = stepReturnsNonNode
+         && stepIsContextIndependent
+         && stepIdx > 0
+         && currentContext != null && currentContext.hasMany();
- if (inMemProcessing ||
+ if (inMemProcessing || atomicRhsMustIterate ||

Guard	Why it's needed
`stepReturnsNonNode`	Node-axis RHS keeps the persistent fast-path; only atomic RHS needs forced iteration.
`stepIsContextIndependent` (no `CONTEXT_ITEM` / `CONTEXT_POSITION` / `CONTEXT_SET`)	Context-dependent atomic steps already take the existing iterate branch, or — in the index-optimised `Predicate.selectByNodeSet` path — are intentionally single-evaluated against the full node-set. Forcing iteration there broke regex/date-predicate / value-index optimisations in `mvn test`.
`stepIdx > 0`	Only applies to RHS positions. Without this guard, a bare atomic expression wrapped in a single-step `PathExpr` (e.g. the literal pattern arg of `matches(SPEAKER, '^HAM.*')`) would be iterated over the surrounding node-set and produce a many-item arg where one was expected.

Test plan

New regression test (PathExprAtomicRhsTest, 4 cases):

inMemoryAtomicRhsIteratesPerItem — baseline ((<a><b/><b/></a>)//b/3 → 2 items)
persistentAtomicRhsIteratesPerItem — the bug (doc('…')//b/3 → 2 items, was 1)
inMemoryAndPersistentAgree — sequence equality (both "3,3")
nodeRhsStillDedupes — sanity that //b/.. still dedups to 1 (no regression in node-axis fast-path)

All 4 pass. The first 3 fail on develop.

xquery.xquery3.XQuery3Tests: 1030/1030 pass (1 pre-existing skip).

Full mvn test -pl exist-core on this branch: 3 failures — all unrelated infrastructure flakes confirmed by inspecting each stack:

GetXMLResourceNoLockTest — Failed to bind to 0.0.0.0:8088 (another Docker container is holding the port on this machine)
RecoverBinary2Test.storeAndRead — Collection /db/test/test2 should exist after store(), a long-running flaky storage test
EvalWebSocketEndpointTest.cancellation — 30s WebSocket cancellation timeout
None of the three touch PathExpr or path semantics; the same three fail without this fix on the same machine.

Full XQTS HEAD before/after (exist-xqts-runner --xqts-version HEAD, i.e. the live qt3tests master / final 3.1 Rec + errata — not the older --xqts-version 3.1 archive). Per-test comparison on the 26,014 tests both runs actually measured (excluding tests dropped by batch-runner timeout):

              develop    fix    delta
pass           23405    23405      +0
fail            1427     1427      +0
error            128      128      +0
skip            1054     1054      +0

Zero per-test transitions. The aggregate headlines differ (23,437 → 23,666 pass) only because different test sets hit the batched-runner timeout each run — that's the same measurement noise we've seen in earlier XQTS comparisons.

Closes different path result from in-memory vs stored #798
Spec: XPath 3.1 §3.3.5 (Path Operator: "For each item in the result of evaluating E1, E2 is evaluated …")
Distinct from [bugfix] RootNode: declare CONTEXT_ITEM dependency #6409 (RootNode CONTEXT_ITEM dependency, which was about a node-axis falsely claiming context-independence). This one is about the optimizer's persistent fast-path being wrong for atomic-RHS regardless of dependency.

@line-o

…loses eXist-db#798) XPath 3.1 §3.3.5 (Path Operator) mandates that in E1/E2, E2 is evaluated for *each* item in E1's result — even when E2 doesn't reference the context. PathExpr.eval's persistent-input fast-path short-circuited that iteration for context-independent RHS, on the assumption that node-axis RHS gets de-duplicated by the post-step removeDuplicates() call so a single eval was equivalent. The assumption doesn't hold for *atomic*-returning RHS: there's no de-duplication for atomics, the literal value is the same every iteration, and the missing iterations are exactly what's needed to preserve multiplicity. Reproducer from the 2015 issue, restated by @line-o today: let $data := <a><b/><b/></a> let $doc := xmldb:store('/db', 'test.xml', $data) return [ $data//b/3, doc('/db/test.xml')//b/3 ] Before this commit: `[(3, 3), 3]` — in-memory iterates per item, persistent collapses to a single 3. After this commit: `[(3, 3), (3, 3)]` — both forms iterate, multiplicity preserved on both code paths. The fix adds three guards to the existing iterate/single-eval condition in PathExpr.eval (~line 287): - stepReturnsNonNode: only kicks in when the RHS step's static return type isn't a node, so node-axis fast-paths (the perf-sensitive ones) keep their persistent shortcut. - stepIsContextIndependent: only fires for steps that declare no CONTEXT_ITEM / CONTEXT_POSITION / CONTEXT_SET dependency. Steps with real context dependencies already take the existing iterate branch (e.g. matches(., "regex")) — or, in the index-optimised Predicate.selectByNodeSet path, are intentionally evaluated once against the full node-set; we must not force iteration there. - stepIdx > 0: only applies to RHS positions, not first steps. The parser wraps a bare atomic expression (e.g. the literal pattern arg of matches(SPEAKER, '^HAM.*')) in a single-step PathExpr whose currentContext is the surrounding node-set. Without this guard we'd iterate that wrapper's literal over the outer context and produce a many-item argument where one was expected. Each guard was added because the broader form broke real callers in mvn test (DateTests cardinality errors on date predicates, OptimizerTest regex-predicate index optimisation, ValueIndexTest string-function index optimisation). With all three guards in place those callers pass. Verification: - PathExprAtomicRhsTest: 4 regression tests (issue reproducer, reverse order check, node-RHS dedup sanity). - xquery.xquery3.XQuery3Tests: 1030/1030 pass. - exist-core mvn test: 3 unrelated infrastructure flakes (GetXMLResourceNoLockTest port-bind from another container, RecoverBinary2Test storage flake, EvalWebSocketEndpointTest 30s WebSocket timeout). None touch PathExpr or path semantics; all pre-existing. - XQTS HEAD before/after on 26,014 tests both runs measured: zero per-test transitions (23,405 pass / 1,427 fail / 128 error / 1,054 skip on both sides). Headline numbers differ only because different test sets hit the batched-runner timeout each run. Closes eXist-db#798 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

line-o

Thank you!

duncdrum · 2026-05-30T17:34:39Z

I like the Design instruction 😉

joewiz requested a review from a team as a code owner May 30, 2026 16:31

line-o approved these changes May 30, 2026

View reviewed changes

line-o requested a review from a team May 30, 2026 16:40

duncdrum approved these changes May 30, 2026

View reviewed changes

duncdrum merged commit b990cd6 into eXist-db:develop May 30, 2026
9 checks passed

duncdrum added this to v7.0.0 May 30, 2026

github-project-automation Bot moved this to Done in v7.0.0 May 30, 2026

line-o added this to the eXist-7.0.0 milestone May 30, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[bugfix] PathExpr: iterate atomic, context-independent RHS per-item (closes #798)#6418

[bugfix] PathExpr: iterate atomic, context-independent RHS per-item (closes #798)#6418
duncdrum merged 1 commit into
eXist-db:developfrom
joewiz:bugfix/798-pathexpr-iterate-atomic-rhs

joewiz commented May 30, 2026

Uh oh!

line-o left a comment

Uh oh!

duncdrum commented May 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

joewiz commented May 30, 2026

Summary

Reproducer (from #798, restated by @line-o in the latest comment)

Root cause

Fix

Test plan

Related

Uh oh!

line-o left a comment

Choose a reason for hiding this comment

Uh oh!

duncdrum commented May 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants