Skip to content

[bugfix] fn:xml-to-json: enforce F&O 3.1 §17.4.2 structural validation (+10 XQTS HEAD)#6350

Open
joewiz wants to merge 5 commits into
eXist-db:developfrom
joewiz:bugfix/fn-xml-to-json-over-permissive-validation
Open

[bugfix] fn:xml-to-json: enforce F&O 3.1 §17.4.2 structural validation (+10 XQTS HEAD)#6350
joewiz wants to merge 5 commits into
eXist-db:developfrom
joewiz:bugfix/fn-xml-to-json-over-permissive-validation

Conversation

@joewiz
Copy link
Copy Markdown
Member

@joewiz joewiz commented May 11, 2026

Summary

Adds per-element-type structural validation to fn:xml-to-json as required by F&O 3.1 §17.4.2 (XML Representation of JSON) and §17.5.4. Before this change, develop silently accepted inputs that violate the spec's structural rules and produced a JSON result rather than raising FOJS0006.

This is a follow-up to #6342 (which closed the walk-from-doc-root sub-cluster of fn-xml-to-json HEAD failures). It closes the over-permissive-validation sub-cluster identified in the 2026-05-10 triage of the FOJS0006 cluster.

What changed

exist-core/src/main/java/org/exist/xquery/functions/fn/FunXmlToJson.java:

  • Reject non-whitespace text node children of <map> and <array> elements (whitespace, comments, and processing instructions are still ignored per spec).
  • Reject element children of leaf-type elements (<string>, <number>, <boolean>, <null>).
  • Reject no-namespace attributes other than key, escaped-key, and escaped; reject any attribute in the http://www.w3.org/2005/xpath-functions namespace (the schema's anyAttribute namespace=\"##other\").
  • Require escaped and escaped-key attribute values to be lexically valid xs:boolean (true/false/1/0).
  • Reject element names outside the six allowed local names (map, array, string, number, boolean, null) at start-tag rather than only at end-tag.

Per W3C bug 29917 / qt3tests xml-to-json-065, the escaped attribute is tolerated on non-string elements (treated as a no-op); only the lexical value is enforced.

Foreign-namespace attributes remain ignored, matching the schema's anyAttribute namespace=\"##other\" rule on every element type.

exist-core/src/test/xquery/xquery3/xml-to-json.xql: adds 14 XQSuite regression tests covering each new validation path plus the whitespace-allowed and foreign-namespace-attribute-ignored cases.

Spec references

F&O 3.1 §17.4.2:

The XDM representation of a JSON value … must have the type annotations obtained by validating the untyped representation against the schema given in C.2 Schema for the result of fn:json-to-xml. If it is untyped, then it must be an XDM instance such that validation against this schema would succeed; with the proviso that all attributes other than those in no namespace or in namespace http://www.w3.org/2005/xpath-functions are ignored.

F&O 3.1 §17.5.4 Error Conditions:

A dynamic error is raised [err:FOJS0006] if the value of $input is not a document or element node or is not valid according to the schema for the XML representation of JSON, or if a map element has two children whose normalized key values are the same.

F&O 3.1 Appendix C.2 SchemastringType declares the escaped attribute; nullType, booleanType, numberType, arrayType, mapType each declare only <xs:anyAttribute processContents=\"skip\" namespace=\"##other\"/>.

XQTS HEAD delta

fn-xml-to-json test set:

Newly passing Newly failing
Tests xml-to-json-{033, 040, 042, 043, 044, 062, 063, 069, 081, 082} (10) — (0)
Violation kind Tests
Text child of <map> / <array> 081, 082
Disallowed no-namespace attribute (yek) 033
Attribute in xpath-functions namespace 069
Invalid xs:boolean value for escaped-key 062
Invalid xs:boolean value for escaped 063
Element child of <string> 040
Element child of <boolean> 042
Element child of <null> 043, 044

Baseline: develop @ a3865db (2026-05-11 XQ 3.1 HEAD canonical baseline).

Test plan

  • mvn test -pl exist-core -Dtest=xquery.xquery3.XQuery3Tests — 1025 pass (includes 14 new regression cases)
  • XQTS HEAD fn-xml-to-json re-run shows +10 newly passing, 0 regressions vs develop baseline
  • PMD/Codacy: NPathComplexity stays under the 200 threshold after extracting validateStartElement and validateTextInContext helpers; no new findings (only pre-existing UnusedLocalVariable at line 73 and SimplifyBooleanExpressions at line 207).

🤖 Generated with Claude Code

@joewiz joewiz requested a review from a team as a code owner May 11, 2026 15:37
Comment thread exist-core/src/main/java/org/exist/xquery/functions/fn/FunXmlToJson.java Outdated
Comment thread exist-core/src/main/java/org/exist/xquery/functions/fn/FunXmlToJson.java Outdated
Comment thread exist-core/src/main/java/org/exist/xquery/functions/fn/FunXmlToJson.java Outdated
joewiz added a commit to joewiz/exist that referenced this pull request May 11, 2026
Addresses line-o's review comments on PR eXist-db#6350 (lines 267, 288, 329).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@joewiz
Copy link
Copy Markdown
Member Author

joewiz commented May 11, 2026

[This response was co-authored with Claude Code. -Joe]

Fixed in 3576ffc874F&amp;OF&O at lines 267, 288, 329. Matches the bare-ampersand Javadoc convention used elsewhere in org.exist.xquery.functions.fn.* (e.g., FunSum, FunMin, FunParseIetfDate).

joewiz added a commit to joewiz/exist that referenced this pull request May 11, 2026
Addresses reinhapa's review on PR eXist-db#6350.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@joewiz
Copy link
Copy Markdown
Member Author

joewiz commented May 11, 2026

[This response was co-authored with Claude Code. -Joe]

Done in 0f2b5b5751 — converted the START_ELEMENT switch at line 156 to arrow syntax. Kept the END_ELEMENT switch at line 179 unchanged for now (pre-existing, not touched by this PR's diff) — happy to convert it too if you'd like a consistency follow-up.

@line-o line-o added the xquery issue is related to xquery implementation label May 12, 2026
@line-o line-o added this to v7.0.0 May 27, 2026
joewiz and others added 4 commits June 1, 2026 00:12
Adds the per-element-type validation required by F&O 3.1 §17.4.2 (XML
Representation of JSON) and §17.5.4 (fn:xml-to-json):

- Reject non-whitespace text node children of <map>/<array> elements
  (whitespace, comments and PIs are still ignored per spec).
- Reject element children of leaf-type elements (<string>, <number>,
  <boolean>, <null>).
- Reject no-namespace attributes other than 'key', 'escaped-key',
  'escaped'; reject any attribute in the xpath-functions namespace
  (the schema's anyAttribute namespace="##other").
- Require 'escaped' and 'escaped-key' to hold a valid xs:boolean value.
- Reject element names outside the six allowed local names at start-tag
  rather than only at end-tag.

Per W3C bug 29917 / qt3tests xml-to-json-065, 'escaped' is tolerated on
non-string elements (treated as a no-op); only the lexical value is
enforced.

Foreign-namespace attributes remain ignored, matching the schema rule.

Closes the over-permissive-validation sub-cluster of fn-xml-to-json
FOJS0006 failures identified in the 2026-05-10 triage report
(predecessor PR eXist-db#6342 closed the walk-from-doc-root sub-cluster).

XQTS HEAD fn-xml-to-json: +10 newly passing, 0 regressions on
xml-to-json-{033, 040, 042, 043, 044, 062, 063, 069, 081, 082}.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Addresses line-o's review comments on PR eXist-db#6350 (lines 267, 288, 329).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Addresses reinhapa's review on PR eXist-db#6350.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…sal path

PR eXist-db#6342 (merged after this branch was first proposed) rewrote
fn:xml-to-json's element-input traversal from StAX to a DOM walk
(walk-from-doc-root antipattern fix) so the function now operates on
the input subtree directly. The structural-validation logic this PR
added in commit 3000e1b was wired into the StAX path, which is no
longer the primary entry for element inputs after eXist-db#6342, so 10 XQTS
xml-to-json cases that expect FOJS0006 began passing through to a
successful JSON serialisation again after the rebase onto develop.

Port the validation rules into the DOM path via three new helpers:

  - validateDomAttributes: rejects attributes in the
    xpath-functions namespace and no-namespace attributes outside
    the {key, escaped-key, escaped} allow-list; validates that
    escaped / escaped-key carry a valid xs:boolean lexical value
  - validateContainerChildren: rejects non-whitespace text children
    of map / array
  - validateNoElementChildren: rejects element children of leaf JSON
    elements (string, number, boolean, null)

Wired into writeJsonElement (attributes), writeJsonMap /
writeJsonArray (container children), and the four leaf writers.
writeJsonBoolean's signature picks up XPathException to match.

Tests touched by this commit (all assert FOJS0006, all now pass on
develop):
  xml-to-json-text-child-of-array
  xml-to-json-text-child-of-map
  xml-to-json-element-child-of-{boolean,null,number,string}
  xml-to-json-attribute-in-json-namespace
  xml-to-json-disallowed-no-ns-attribute
  xml-to-json-invalid-escaped-{value,key-value}

The StAX-path validation methods (validateStartElement,
validateTextInContext, validateAttributes, isJsonElementName,
isLeafElementName) and the dead nodeValueToJsonViaStream entry are
now unreachable; leaving them for a follow-up [refactor] commit so
this commit stays focused on the regression fix.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@joewiz joewiz force-pushed the bugfix/fn-xml-to-json-over-permissive-validation branch from 0f2b5b5 to 65437d6 Compare June 1, 2026 04:21
@joewiz
Copy link
Copy Markdown
Member Author

joewiz commented Jun 1, 2026

[This response was co-authored with Claude Code. -Joe]

Rebased + ported, now MERGEABLE. Two changes since your review:

  1. Rebased onto current develop (286 commits ahead). The only text conflicts were trivial — duplicate import, and a default-arrow comment differing in wording.

  2. The rebase exposed a semantic conflict: PR [bugfix] fn:xml-to-json: scope StAX reader to input element (+26 XQTS HEAD) #6342 (walk-from-doc-root antipattern fix for fn:xml-to-json, merged after your review) rewrote element-input traversal from StAX to a DOM walk. The structural-validation logic this PR added in 3000e1bbc7 was wired into the StAX path, so 10 XQTS cases that assert FOJS0006 (text-child-of-array/map, element-child-of-{boolean,null,number,string}, attribute-in-json-namespace, disallowed-no-ns-attribute, invalid-escaped-{value,key-value}) began passing through to a successful serialisation after the rebase.

Ported the validation into the DOM path via three new helpers (validateDomAttributes, validateContainerChildren, validateNoElementChildren) wired into writeJsonElement and the per-shape writers. All 1061 XQuery3 tests pass; the 10 above are back to throwing FOJS0006 as the tests expect. Commit: 65437d62b9.

The StAX-path validation methods (validateStartElement, validateTextInContext, validateAttributes, etc.) and the now-orphaned nodeValueToJsonViaStream entry are unreachable but left in place for a follow-up [refactor] so this commit stays focused on the regression fix. Happy to fold the cleanup in here if you'd rather.

Your earlier use switch expression nit on line 156 was addressed in 0f2b5b5751 the day after the review — that change rode through the rebase unchanged.

* Other-namespace attributes are ignored. The {@code escaped} /
* {@code escaped-key} values must be valid xs:boolean.
*/
private void validateDomAttributes(final org.w3c.dom.Element element, final String localName) throws XPathException {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adress Codacy issue: The method 'validateDomAttributes(org.w3c.dom.Element, String)' has an NPath complexity of 578, current threshold is 200

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the last remaining codacy issue

@github-project-automation github-project-automation Bot moved this to Done in v7.0.0 Jun 1, 2026
@joewiz joewiz reopened this Jun 1, 2026
@joewiz
Copy link
Copy Markdown
Member Author

joewiz commented Jun 1, 2026

[This response was co-authored with Claude Code. -Joe]

@duncdrum quick clarification — I think this PR got closed by mistake right after #6351 was merged. The PR body of #6351 has the line Sibling fix: #6350 (fn:xml-to-json validation)., which I suspect you may have read as "this PR also lands the sibling work" rather than the intended "the sibling work continues over in #6350"?

Reopened so the work isn't lost. Patrick's CR was already addressed in the rebase + DOM-validation port; the PR is currently MERGEABLE with all checks green. Ready for another look when you have the time.

Per @line-o's 2026-06-01 review on PR eXist-db#6350 ("This is the last
remaining codacy issue") plus two additional findings my own pre-push
Codacy run surfaced when I went looking:

  - line 78 (UnusedLocalVariable: 'options'): drop the unused
    args[1] cast inside eval(). The TODO note is preserved at the
    same site so the implementation hook stays discoverable. Removing
    the cast slightly loosens the implicit type-assertion for args[1],
    but options handling is unimplemented today (and dispatch through
    a future helper will type-check then).

  - line 184 (NPathComplexity 578 on validateDomAttributes): the
    per-attribute branching multiplied across the for-loop. Extract
    validateOneAttribute(attr, localName) (per-attribute dispatcher),
    validateNoNamespaceAttribute (name-allow-list check), and
    requireValidXsBoolean (the escaped/escaped-key value check). The
    parent now has a clean iteration with a single delegate call; each
    helper has a small, fixed NPath that does not multiply with
    attribute count.

  - line 476 (SimplifyBooleanExpressions): replace
    `if (elementValueIsEscaped == true)` with the idiomatic
    `if (elementValueIsEscaped)`. Pre-existing in the StAX path
    (legacy code) but trivially in-scope.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@joewiz
Copy link
Copy Markdown
Member Author

joewiz commented Jun 1, 2026

[This response was co-authored with Claude Code. -Joe]

@line-o addressed in 03115ad3a9 — plus two additional findings my own pre-push Codacy run surfaced when I went looking, so the file is clean now:

  • validateDomAttributes NPath 578 (the one you flagged) — the per-attribute branching multiplied across the for-loop. Extracted three named helpers: validateOneAttribute (per-attribute dispatcher), validateNoNamespaceAttribute (name-allow-list check), requireValidXsBoolean (the escaped/escaped-key boolean check). Parent loop now has a single delegate call per iteration with a small fixed NPath that doesn't multiply with attribute count.

  • Line 78 UnusedLocalVariable: 'options' — dropped the unused args[1] cast inside eval(). Pre-existing but in scope. The TODO note is preserved at the same site so the implementation hook stays discoverable.

  • Line 476 SimplifyBooleanExpressionsif (elementValueIsEscaped == true)if (elementValueIsEscaped). Pre-existing in the StAX legacy path but trivially in-scope.

Local Codacy now reports zero findings on the file. 1064 tests pass (FunXmlToJsonTest 3, XQuery3Tests 1061, 1 pre-existing skipped).

I should have caught these myself before my prior pushes. The recurring failure mode was running Codacy on net-new files but not on edits to existing ones — strengthened the rule on my side (MANDATORY pre-push Codacy on every changed Java file, full clean state required), so this kind of round-trip should stop recurring.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

xquery issue is related to xquery implementation

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

3 participants