[bugfix] fn:xml-to-json: enforce F&O 3.1 §17.4.2 structural validation (+10 XQTS HEAD)#6350
Conversation
Addresses line-o's review comments on PR eXist-db#6350 (lines 267, 288, 329). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
[This response was co-authored with Claude Code. -Joe] Fixed in |
Addresses reinhapa's review on PR eXist-db#6350. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
[This response was co-authored with Claude Code. -Joe] Done in |
Adds the per-element-type validation required by F&O 3.1 §17.4.2 (XML Representation of JSON) and §17.5.4 (fn:xml-to-json): - Reject non-whitespace text node children of <map>/<array> elements (whitespace, comments and PIs are still ignored per spec). - Reject element children of leaf-type elements (<string>, <number>, <boolean>, <null>). - Reject no-namespace attributes other than 'key', 'escaped-key', 'escaped'; reject any attribute in the xpath-functions namespace (the schema's anyAttribute namespace="##other"). - Require 'escaped' and 'escaped-key' to hold a valid xs:boolean value. - Reject element names outside the six allowed local names at start-tag rather than only at end-tag. Per W3C bug 29917 / qt3tests xml-to-json-065, 'escaped' is tolerated on non-string elements (treated as a no-op); only the lexical value is enforced. Foreign-namespace attributes remain ignored, matching the schema rule. Closes the over-permissive-validation sub-cluster of fn-xml-to-json FOJS0006 failures identified in the 2026-05-10 triage report (predecessor PR eXist-db#6342 closed the walk-from-doc-root sub-cluster). XQTS HEAD fn-xml-to-json: +10 newly passing, 0 regressions on xml-to-json-{033, 040, 042, 043, 044, 062, 063, 069, 081, 082}. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Addresses line-o's review comments on PR eXist-db#6350 (lines 267, 288, 329). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Addresses reinhapa's review on PR eXist-db#6350. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…sal path PR eXist-db#6342 (merged after this branch was first proposed) rewrote fn:xml-to-json's element-input traversal from StAX to a DOM walk (walk-from-doc-root antipattern fix) so the function now operates on the input subtree directly. The structural-validation logic this PR added in commit 3000e1b was wired into the StAX path, which is no longer the primary entry for element inputs after eXist-db#6342, so 10 XQTS xml-to-json cases that expect FOJS0006 began passing through to a successful JSON serialisation again after the rebase onto develop. Port the validation rules into the DOM path via three new helpers: - validateDomAttributes: rejects attributes in the xpath-functions namespace and no-namespace attributes outside the {key, escaped-key, escaped} allow-list; validates that escaped / escaped-key carry a valid xs:boolean lexical value - validateContainerChildren: rejects non-whitespace text children of map / array - validateNoElementChildren: rejects element children of leaf JSON elements (string, number, boolean, null) Wired into writeJsonElement (attributes), writeJsonMap / writeJsonArray (container children), and the four leaf writers. writeJsonBoolean's signature picks up XPathException to match. Tests touched by this commit (all assert FOJS0006, all now pass on develop): xml-to-json-text-child-of-array xml-to-json-text-child-of-map xml-to-json-element-child-of-{boolean,null,number,string} xml-to-json-attribute-in-json-namespace xml-to-json-disallowed-no-ns-attribute xml-to-json-invalid-escaped-{value,key-value} The StAX-path validation methods (validateStartElement, validateTextInContext, validateAttributes, isJsonElementName, isLeafElementName) and the dead nodeValueToJsonViaStream entry are now unreachable; leaving them for a follow-up [refactor] commit so this commit stays focused on the regression fix. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
0f2b5b5 to
65437d6
Compare
|
[This response was co-authored with Claude Code. -Joe] Rebased + ported, now MERGEABLE. Two changes since your review:
Ported the validation into the DOM path via three new helpers ( The StAX-path validation methods ( Your earlier |
| * Other-namespace attributes are ignored. The {@code escaped} / | ||
| * {@code escaped-key} values must be valid xs:boolean. | ||
| */ | ||
| private void validateDomAttributes(final org.w3c.dom.Element element, final String localName) throws XPathException { |
There was a problem hiding this comment.
Adress Codacy issue: The method 'validateDomAttributes(org.w3c.dom.Element, String)' has an NPath complexity of 578, current threshold is 200
There was a problem hiding this comment.
This is the last remaining codacy issue
|
[This response was co-authored with Claude Code. -Joe] @duncdrum quick clarification — I think this PR got closed by mistake right after #6351 was merged. The PR body of #6351 has the line Reopened so the work isn't lost. Patrick's CR was already addressed in the rebase + DOM-validation port; the PR is currently MERGEABLE with all checks green. Ready for another look when you have the time. |
Per @line-o's 2026-06-01 review on PR eXist-db#6350 ("This is the last remaining codacy issue") plus two additional findings my own pre-push Codacy run surfaced when I went looking: - line 78 (UnusedLocalVariable: 'options'): drop the unused args[1] cast inside eval(). The TODO note is preserved at the same site so the implementation hook stays discoverable. Removing the cast slightly loosens the implicit type-assertion for args[1], but options handling is unimplemented today (and dispatch through a future helper will type-check then). - line 184 (NPathComplexity 578 on validateDomAttributes): the per-attribute branching multiplied across the for-loop. Extract validateOneAttribute(attr, localName) (per-attribute dispatcher), validateNoNamespaceAttribute (name-allow-list check), and requireValidXsBoolean (the escaped/escaped-key value check). The parent now has a clean iteration with a single delegate call; each helper has a small, fixed NPath that does not multiply with attribute count. - line 476 (SimplifyBooleanExpressions): replace `if (elementValueIsEscaped == true)` with the idiomatic `if (elementValueIsEscaped)`. Pre-existing in the StAX path (legacy code) but trivially in-scope. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
[This response was co-authored with Claude Code. -Joe] @line-o addressed in
Local Codacy now reports zero findings on the file. 1064 tests pass ( I should have caught these myself before my prior pushes. The recurring failure mode was running Codacy on net-new files but not on edits to existing ones — strengthened the rule on my side (MANDATORY pre-push Codacy on every changed Java file, full clean state required), so this kind of round-trip should stop recurring. |
Summary
Adds per-element-type structural validation to
fn:xml-to-jsonas required by F&O 3.1 §17.4.2 (XML Representation of JSON) and §17.5.4. Before this change, develop silently accepted inputs that violate the spec's structural rules and produced a JSON result rather than raisingFOJS0006.This is a follow-up to #6342 (which closed the walk-from-doc-root sub-cluster of
fn-xml-to-jsonHEAD failures). It closes the over-permissive-validation sub-cluster identified in the 2026-05-10 triage of the FOJS0006 cluster.What changed
exist-core/src/main/java/org/exist/xquery/functions/fn/FunXmlToJson.java:<map>and<array>elements (whitespace, comments, and processing instructions are still ignored per spec).<string>,<number>,<boolean>,<null>).key,escaped-key, andescaped; reject any attribute in thehttp://www.w3.org/2005/xpath-functionsnamespace (the schema'sanyAttribute namespace=\"##other\").escapedandescaped-keyattribute values to be lexically validxs:boolean(true/false/1/0).map,array,string,number,boolean,null) at start-tag rather than only at end-tag.Per W3C bug 29917 / qt3tests
xml-to-json-065, theescapedattribute is tolerated on non-string elements (treated as a no-op); only the lexical value is enforced.Foreign-namespace attributes remain ignored, matching the schema's
anyAttribute namespace=\"##other\"rule on every element type.exist-core/src/test/xquery/xquery3/xml-to-json.xql: adds 14 XQSuite regression tests covering each new validation path plus the whitespace-allowed and foreign-namespace-attribute-ignored cases.Spec references
F&O 3.1 §17.4.2:
F&O 3.1 §17.5.4 Error Conditions:
F&O 3.1 Appendix C.2 Schema —
stringTypedeclares theescapedattribute;nullType,booleanType,numberType,arrayType,mapTypeeach declare only<xs:anyAttribute processContents=\"skip\" namespace=\"##other\"/>.XQTS HEAD delta
fn-xml-to-jsontest set:xml-to-json-{033, 040, 042, 043, 044, 062, 063, 069, 081, 082}(10)<map>/<array>yek)xs:booleanvalue forescaped-keyxs:booleanvalue forescaped<string><boolean><null>Baseline: develop @ a3865db (2026-05-11 XQ 3.1 HEAD canonical baseline).
Test plan
mvn test -pl exist-core -Dtest=xquery.xquery3.XQuery3Tests— 1025 pass (includes 14 new regression cases)fn-xml-to-jsonre-run shows +10 newly passing, 0 regressions vs develop baselinevalidateStartElementandvalidateTextInContexthelpers; no new findings (only pre-existingUnusedLocalVariableat line 73 andSimplifyBooleanExpressionsat line 207).🤖 Generated with Claude Code