From ec4eb2914aacd4a3eb0a1765e1b9cdaa0e65053d Mon Sep 17 00:00:00 2001 From: Joe Wicentowski Date: Sun, 10 May 2026 16:16:01 -0400 Subject: [PATCH 1/6] [optimize] fn:deep-equal: streaming fast-path for persistent-DOM trees Closes https://github.com/eXist-db/exist/issues/4050 PieterLamers' TEST.zip reproducer (Macbeth.xml vs Macbeth2.xml, ~3,550 elements each, byte-identical) was 5,170 ms median on develop tip b917e1ab1d, ~24x slower than xmldiff:compare's 228 ms. With this change the same comparison runs in 5-11 ms median -- within an order of magnitude of the storage-layer floor (a single serialize() walk of both docs takes ~15 ms) and ~20-45x faster than xmldiff:compare. xmldiff:compare is not a correctness- equivalent shortcut (it overlooks attribute changes, per PieterLamers' 2026-05-08 follow-up), so closing the gap had to preserve fn:deep-equal's full attribute / namespace / collation semantics. The diagnosis from the 2026-05-08 paused investigation pointed at "per-node persistent-DOM accessor cost" (getNamespaceURI, getLocalName, getAttributes). The actual hot path is more specific: StoredNode.getNextSibling() (and ElementImpl.getFirstChild()) acquire a fresh broker per call AND open a new XMLStreamReader on the parent that walks until it finds the requested sibling -- an O(siblings) walk per step. compareContents calls these per node, making the recursion effectively quadratic in sibling count for a wide stored DOM. Fix: dispatch to a new FunDeepEqualStreamingComparator when both arguments are persistent-DOM DOCUMENT or ELEMENT NodeProxy instances. The comparator opens two IEmbeddedXMLStreamReader instances via DBBroker.newXMLStreamReader (which iterates the dom.dbx node stream directly, byte-by-byte) and walks them in lockstep, comparing element names, attribute sets, character data, and skipping comments / processing instructions per the fn:deep-equal spec. Attributes are gathered, xmlns:* declarations filtered, and the remainder sorted by (namespace URI, local name) before positional value comparison -- order-insensitive, matching FunDeepEqual.compareAttributes. The new path is additive: legacy compareElements / compareContents remain in place and handle memtree, atomic, attribute-as-top-level, text, map, array, and any persistent-DOM case where the streaming reader fails (caught XMLStreamException / IOException / RuntimeException falls through to legacy with a debug log). Out of scope: - Schema-aware typed-value comparison (untyped only). Documented; the reporter's reproducer is untyped. - In-memory tree caching of stored documents (broader optimisation affecting many functions, larger architectural piece). - Documents with leading comments / PIs before the root element (the streaming path's documentRoot helper requires the first stored child to be an element; non-element first child triggers legacy fallback). Test plan: - 78 existing *DeepEqual* JUnit tests pass (DeepEqualTest 63, FunDeepEqualPerformanceTest 14, FunDeepEqual4050ReproTest 1). - 1,230-test sweep across XPathQueryTest, XQuery3Tests, FunSortTest, GroupByTest, *MapType*, *Serialization* all green. - Full mvn test -pl exist-core: 6,607 tests, 0 failures, 0 errors, 106 skipped (pre-existing). Reproducer measurements (Macbeth.xml, JDK Zulu 21.38.21, develop tip): | Variant | Median | Notes | |-------------------------------|-----------:|--------------------------------| | current develop (no fix) | 5,170 ms | reporter's TEST.zip | | xmldiff:compare (per AR 2022) | 228 ms | not correctness-equivalent | | this fix | 6 ms | within ~2-3x of serialize() floor | Synthetic 10k-element stored corpus (FunDeepEqualPerformanceTest): | Variant | Median | Notes | |-------------------------------|-----------:|--------------------------------| | current develop | ~2,521 ms | from 2026-05-08 measurement | | this fix | 124 ms | ~20x speedup | Memtree case unchanged (~46 ms on the same corpus); the streaming path does not apply to in-memory nodes. Co-Authored-By: Claude Opus 4.7 (1M context) --- .../xquery/functions/fn/FunDeepEqual.java | 99 +++++- .../fn/FunDeepEqualStreamingComparator.java | 336 ++++++++++++++++++ .../fn/FunDeepEqualPerformanceTest.java | 309 ++++++++++++++++ 3 files changed, 740 insertions(+), 4 deletions(-) create mode 100644 exist-core/src/main/java/org/exist/xquery/functions/fn/FunDeepEqualStreamingComparator.java create mode 100644 exist-core/src/test/java/org/exist/xquery/functions/fn/FunDeepEqualPerformanceTest.java diff --git a/exist-core/src/main/java/org/exist/xquery/functions/fn/FunDeepEqual.java b/exist-core/src/main/java/org/exist/xquery/functions/fn/FunDeepEqual.java index 39c21791607..6c9ff5824fb 100644 --- a/exist-core/src/main/java/org/exist/xquery/functions/fn/FunDeepEqual.java +++ b/exist-core/src/main/java/org/exist/xquery/functions/fn/FunDeepEqual.java @@ -30,6 +30,7 @@ import org.exist.dom.QName; import org.exist.dom.memtree.NodeImpl; import org.exist.dom.memtree.ReferenceNode; +import org.exist.storage.DBBroker; import org.exist.xquery.Cardinality; import org.exist.xquery.Constants; import org.exist.xquery.Dependency; @@ -55,6 +56,8 @@ import org.w3c.dom.Node; import javax.annotation.Nullable; +import javax.xml.stream.XMLStreamException; +import java.io.IOException; /** * Implements the fn:deep-equal library function. @@ -123,7 +126,8 @@ public Sequence eval(Sequence contextSequence, Item contextItem) } final Sequence[] args = getArguments(contextSequence, contextItem); final Collator collator = getCollator(contextSequence, contextItem, 3); - final Sequence result = BooleanValue.valueOf(deepEqualsSeq(args[0], args[1], collator)); + final Sequence result = BooleanValue.valueOf( + deepEqualsSeq(args[0], args[1], collator, context.getBroker())); if (context.getProfiler().isEnabled()) {context.getProfiler().end(this, "", result);} return result; @@ -139,6 +143,26 @@ public Sequence eval(Sequence contextSequence, Item contextItem) * @return a negative integer, zero, or a positive integer, if the first argument is less than, equal to, or greater than the second. */ public static int deepCompareSeq(final Sequence sequence1, final Sequence sequence2, @Nullable final Collator collator) { + return deepCompareSeq(sequence1, sequence2, collator, null); + } + + /** + * Broker-aware variant of {@link #deepCompareSeq(Sequence, Sequence, Collator)}. + * + *

When a non-null {@code broker} is supplied and an item pair is + * a persistent {@code DOCUMENT} or {@code ELEMENT}, the comparison + * uses {@link FunDeepEqualStreamingComparator} as a fast path. Other + * shapes (atomic, memtree, attribute, text, map, array) fall through + * to the legacy recursive path. + * + * @param sequence1 first sequence. + * @param sequence2 second sequence. + * @param collator collation, or {@code null} for code-point. + * @param broker active broker, or {@code null} to disable the fast path. + * @return {@link Constants#EQUAL} / {@link Constants#INFERIOR} / {@link Constants#SUPERIOR}. + */ + public static int deepCompareSeq(final Sequence sequence1, final Sequence sequence2, + @Nullable final Collator collator, @Nullable final DBBroker broker) { if (sequence1 == sequence2) { return Constants.EQUAL; } @@ -150,7 +174,7 @@ public static int deepCompareSeq(final Sequence sequence1, final Sequence sequen final Item item1 = sequence1.itemAt(i); final Item item2 = sequence2.itemAt(i); - final int comparison = deepCompare(item1, item2, collator); + final int comparison = deepCompare(item1, item2, collator, broker); if (comparison != Constants.EQUAL) { return comparison; } @@ -173,6 +197,26 @@ public static int deepCompareSeq(final Sequence sequence1, final Sequence sequen * @throws UnexpectedItemTypeException if an item has an unknown type. */ public static int deepCompare(final Item item1, final Item item2, @Nullable final Collator collator) { + return deepCompare(item1, item2, collator, null); + } + + /** + * Broker-aware variant of {@link #deepCompare(Item, Item, Collator)}. + * + *

When a non-null {@code broker} is supplied and the item pair is + * a persistent {@code DOCUMENT} or {@code ELEMENT}, the comparison + * uses {@link FunDeepEqualStreamingComparator} as a fast path. On + * stream / IO failure the call falls through to the legacy recursive + * path so correctness is preserved. + * + * @param item1 first item. + * @param item2 second item. + * @param collator collation, or {@code null} for code-point. + * @param broker active broker, or {@code null} to disable the fast path. + * @return {@link Constants#EQUAL} / {@link Constants#INFERIOR} / {@link Constants#SUPERIOR}. + */ + public static int deepCompare(final Item item1, final Item item2, + @Nullable final Collator collator, @Nullable final DBBroker broker) { if (item1 == item2) { return Constants.EQUAL; } @@ -188,7 +232,7 @@ public static int deepCompare(final Item item1, final Item item2, @Nullable fina final int array2Size = array2.getSize(); if (array1Size == array2Size) { for (int i = 0; i < array1.getSize(); i++) { - final int comparison = deepCompareSeq(array1.get(i), array2.get(i), collator); + final int comparison = deepCompareSeq(array1.get(i), array2.get(i), collator, broker); if (comparison != Constants.EQUAL) { return comparison; } @@ -214,7 +258,7 @@ public static int deepCompare(final Item item1, final Item item2, @Nullable fina return Constants.SUPERIOR; } - final int comparison = deepCompareSeq(entry1.value(), map2.get(entry1.key()), collator); + final int comparison = deepCompareSeq(entry1.value(), map2.get(entry1.key()), collator, broker); if (comparison != Constants.EQUAL) { return comparison; } @@ -281,11 +325,43 @@ public static int deepCompare(final Item item1, final Item item2, @Nullable fina final Node node2; switch (item1.getType()) { case Type.DOCUMENT: + // GH-4050 fast path: persistent-DOM streaming comparator. + // Falls through to legacy on stream/IO failure to preserve correctness. + if (broker != null + && nva instanceof NodeProxy npa + && nvb instanceof NodeProxy npb + && nva.getImplementationType() == NodeValue.PERSISTENT_NODE + && nvb.getImplementationType() == NodeValue.PERSISTENT_NODE) { + try { + return FunDeepEqualStreamingComparator.compare( + broker, npa, npb, /*subtree=*/false, collator); + } catch (final XMLStreamException | IOException | RuntimeException e) { + if (logger.isDebugEnabled()) { + logger.debug("Streaming deep-equal fast path failed, falling back: " + + e.getMessage()); + } + } + } node1 = nva instanceof Node nnva ? nnva : ((NodeProxy) nva).getOwnerDocument(); node2 = nvb instanceof Node nnvb ? nnvb : ((NodeProxy) nvb).getOwnerDocument(); return compareContents(node1, node2, collator); case Type.ELEMENT: + if (broker != null + && nva instanceof NodeProxy npea + && nvb instanceof NodeProxy npeb + && nva.getImplementationType() == NodeValue.PERSISTENT_NODE + && nvb.getImplementationType() == NodeValue.PERSISTENT_NODE) { + try { + return FunDeepEqualStreamingComparator.compare( + broker, npea, npeb, /*subtree=*/true, collator); + } catch (final XMLStreamException | IOException | RuntimeException e) { + if (logger.isDebugEnabled()) { + logger.debug("Streaming deep-equal fast path failed, falling back: " + + e.getMessage()); + } + } + } node1 = nva.getNode(); node2 = nvb.getNode(); return compareElements(node1, node2, collator); @@ -342,6 +418,21 @@ public static boolean deepEqualsSeq(final Sequence sequence1, final Sequence seq return deepCompareSeq(sequence1, sequence2, collator) == Constants.EQUAL; } + /** + * Broker-aware variant of {@link #deepEqualsSeq(Sequence, Sequence, Collator)}. + * + * @param sequence1 first sequence. + * @param sequence2 second sequence. + * @param collator collation, or {@code null} for code-point. + * @param broker active broker, or {@code null} to disable the streaming + * fast path on persistent-DOM nodes. + * @return true iff the sequences are deep-equal. + */ + public static boolean deepEqualsSeq(final Sequence sequence1, final Sequence sequence2, + @Nullable final Collator collator, @Nullable final DBBroker broker) { + return deepCompareSeq(sequence1, sequence2, collator, broker) == Constants.EQUAL; + } + /** * Deep equality of two Items according to the rules of fn:deep-equals. * diff --git a/exist-core/src/main/java/org/exist/xquery/functions/fn/FunDeepEqualStreamingComparator.java b/exist-core/src/main/java/org/exist/xquery/functions/fn/FunDeepEqualStreamingComparator.java new file mode 100644 index 00000000000..9bb3cebaa45 --- /dev/null +++ b/exist-core/src/main/java/org/exist/xquery/functions/fn/FunDeepEqualStreamingComparator.java @@ -0,0 +1,336 @@ +/* + * eXist-db Open Source Native XML Database + * Copyright (C) 2001 The eXist-db Authors + * + * info@exist-db.org + * http://www.exist-db.org + * + * This library is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * This library is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this library; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + */ +package org.exist.xquery.functions.fn; + +import com.ibm.icu.text.Collator; +import org.exist.Namespaces; +import org.exist.dom.persistent.DocumentImpl; +import org.exist.dom.persistent.NodeHandle; +import org.exist.dom.persistent.NodeProxy; +import org.exist.numbering.NodeId; +import org.exist.stax.IEmbeddedXMLStreamReader; +import org.exist.storage.DBBroker; +import org.exist.xquery.Constants; + +import javax.annotation.Nullable; +import javax.xml.stream.XMLStreamConstants; +import javax.xml.stream.XMLStreamException; +import java.io.IOException; +import java.util.Arrays; +import java.util.Comparator; + +/** + * Streaming fast-path for fn:deep-equal on persistent-DOM trees (GH-4050). + * + *

The recursive {@link FunDeepEqual} implementation walks both subtrees + * via {@code getFirstChild} / {@code getNextSibling} and inspects each node + * via {@code getNamespaceURI} / {@code getLocalName} / {@code getAttributes}. + * On persistent NodeProxy values every accessor materialises a fresh + * {@code ElementImpl} from the BTree, so the cost scales with tree size + * times accessor count per node. On the GH-4050 reproducer (Macbeth.xml, + * 3,550 elements) that is ~5,500 ms, ~24x slower than xmldiff:compare's + * 228 ms. + * + *

This class walks the same trees as event streams via + * {@link IEmbeddedXMLStreamReader}, which iterates the BTree binary node + * stream directly. Per-element work is bounded by the events the reader + * already produces (qname, attribute list, character data) plus the + * comparator's per-event compare. There is no per-node round-trip to + * the storage layer beyond the linear iterator advance. + * + *

Semantics match {@code FunDeepEqual.compareElements} / + * {@code FunDeepEqual.compareContents}: + *

+ * + *

Out of scope: schema-aware typed-value comparison (untyped only), + * memtree (in-memory) nodes, atomic / map / array / attribute / text-as-top-level + * items. The caller must dispatch only persistent {@code DOCUMENT} or + * {@code ELEMENT} NodeHandle pairs. + */ +final class FunDeepEqualStreamingComparator { + + private static final int EOF = -1; + private static final AttrSnapshot[] EMPTY_ATTRS = new AttrSnapshot[0]; + private static final Comparator ATTR_ORDER = (x, y) -> { + final int nsCmp = compareNullable(x.ns, y.ns); + if (nsCmp != 0) { + return nsCmp; + } + return compareNullable(x.local, y.local); + }; + + private FunDeepEqualStreamingComparator() {} + + /** + * Compare two persistent-DOM nodes (DOCUMENT or ELEMENT) via streaming. + * + * @param broker active database broker. + * @param a first node. + * @param b second node. + * @param subtree {@code true} when both inputs are ELEMENT-rooted; + * {@code false} when document-level (the dispatcher resolves each + * document's first stored child via {@link #documentRoot}). + * @param collator collation used to compare attribute values and text; + * {@code null} = code-point. + * @return {@link Constants#EQUAL} / {@link Constants#INFERIOR} / + * {@link Constants#SUPERIOR} (sign indicates ordering for sort use). + * @throws XMLStreamException on stream-level failure + * @throws IOException on storage-level failure + */ + static int compare(final DBBroker broker, final NodeProxy a, final NodeProxy b, + final boolean subtree, @Nullable final Collator collator) + throws XMLStreamException, IOException { + final NodeHandle aHandle = subtree ? a : documentRoot(a); + final NodeHandle bHandle = subtree ? b : documentRoot(b); + if (aHandle == null || bHandle == null) { + // Empty document or non-element first child; signal to caller + // that the legacy path should handle this edge case. + throw new XMLStreamException("streaming fast path: document has no element root"); + } + // Both DOCUMENT and ELEMENT cases reduce to a subtree walk after + // documentRoot() resolves the document's first stored child. + final IEmbeddedXMLStreamReader ra = broker.newXMLStreamReader(aHandle, false); + try { + final IEmbeddedXMLStreamReader rb = broker.newXMLStreamReader(bHandle, false); + try { + return walk(ra, rb, /*subtree=*/true, collator); + } finally { + rb.close(); + } + } finally { + ra.close(); + } + } + + /** + * For document-level comparison, the StAX reader is initialised on the + * document's first stored child (the root element on most XML + * documents; first comment/PI in pathological cases). We obtain the + * concrete StoredNode via {@code DocumentImpl.getFirstChild()} so the + * reader's seek operates on a known-valid address. + * + *

This restricts the streaming fast path to single-root-element + * documents — the common case for GH-4050. Documents with leading + * comments/PIs trigger the legacy fallback when the first stored child + * is not the root element. + */ + @Nullable + private static NodeHandle documentRoot(final NodeProxy n) { + final DocumentImpl doc = n.getOwnerDocument(); + if (doc.getChildCount() == 0) { + return null; + } + final org.w3c.dom.Node firstChild = doc.getFirstChild(); + if (firstChild instanceof NodeHandle nh) { + return nh; + } + return null; + } + + private static int walk(final IEmbeddedXMLStreamReader ra, + final IEmbeddedXMLStreamReader rb, final boolean subtree, + @Nullable final Collator collator) throws XMLStreamException { + int depth = 0; + boolean rootSeen = false; + while (true) { + final int evA = nextRelevantEvent(ra); + final int evB = nextRelevantEvent(rb); + if (evA == EOF && evB == EOF) { + return Constants.EQUAL; + } + if (evA == EOF) { + return Constants.INFERIOR; + } + if (evB == EOF) { + return Constants.SUPERIOR; + } + if (evA != evB) { + return evA < evB ? Constants.INFERIOR : Constants.SUPERIOR; + } + switch (evA) { + case XMLStreamConstants.START_ELEMENT -> { + final int nameCmp = compareElementName(ra, rb); + if (nameCmp != Constants.EQUAL) { + return nameCmp; + } + final int attrCmp = compareAttributes(ra, rb, collator); + if (attrCmp != Constants.EQUAL) { + return attrCmp; + } + depth++; + rootSeen = true; + } + case XMLStreamConstants.END_ELEMENT -> { + depth--; + if (subtree && depth == 0 && rootSeen) { + return Constants.EQUAL; + } + } + case XMLStreamConstants.CHARACTERS, XMLStreamConstants.CDATA -> { + final int textCmp = safeCompare(ra.getText(), rb.getText(), collator); + if (textCmp != Constants.EQUAL) { + return textCmp; + } + } + case XMLStreamConstants.SPACE -> { + final int textCmp = safeCompare(ra.getText(), rb.getText(), collator); + if (textCmp != Constants.EQUAL) { + return textCmp; + } + } + default -> { + // Stream produced an event we did not anticipate; fall back + // to caller's slow path by signalling INFERIOR. This is the + // safety valve for edge cases in the stored-DOM stream. + throw new XMLStreamException( + "Streaming comparator: unexpected event type " + evA); + } + } + } + } + + private static int nextRelevantEvent(final IEmbeddedXMLStreamReader r) + throws XMLStreamException { + while (r.hasNext()) { + final int ev = r.next(); + if (ev == XMLStreamConstants.COMMENT + || ev == XMLStreamConstants.PROCESSING_INSTRUCTION) { + continue; + } + return ev; + } + return EOF; + } + + private static int compareElementName(final IEmbeddedXMLStreamReader ra, + final IEmbeddedXMLStreamReader rb) { + final org.exist.dom.QName qa = ra.getQName(); + final org.exist.dom.QName qb = rb.getQName(); + final int nsCmp = safeCompare(qa.getNamespaceURI(), qb.getNamespaceURI(), null); + if (nsCmp != Constants.EQUAL) { + return nsCmp; + } + return safeCompare(qa.getLocalPart(), qb.getLocalPart(), null); + } + + private static int compareAttributes(final IEmbeddedXMLStreamReader ra, + final IEmbeddedXMLStreamReader rb, @Nullable final Collator collator) { + final AttrSnapshot[] aa = collectSortedAttrs(ra); + final AttrSnapshot[] bb = collectSortedAttrs(rb); + if (aa.length != bb.length) { + return aa.length < bb.length ? Constants.INFERIOR : Constants.SUPERIOR; + } + for (int i = 0; i < aa.length; i++) { + int cmp = safeCompare(aa[i].ns, bb[i].ns, null); + if (cmp != Constants.EQUAL) { + return cmp; + } + cmp = safeCompare(aa[i].local, bb[i].local, null); + if (cmp != Constants.EQUAL) { + return cmp; + } + cmp = safeCompare(aa[i].value, bb[i].value, collator); + if (cmp != Constants.EQUAL) { + return cmp; + } + } + return Constants.EQUAL; + } + + private static AttrSnapshot[] collectSortedAttrs(final IEmbeddedXMLStreamReader r) { + final int count = r.getAttributeCount(); + if (count == 0) { + return EMPTY_ATTRS; + } + final AttrSnapshot[] tmp = new AttrSnapshot[count]; + int kept = 0; + for (int i = 0; i < count; i++) { + final String ns = r.getAttributeNamespace(i); + // Filter out xmlns:* attributes; they are namespace declarations, + // not data. FunDeepEqual.compareAttributes skips them via the + // XMLNS_NS test. + if (ns != null && Namespaces.XMLNS_NS.equals(ns)) { + continue; + } + tmp[kept++] = new AttrSnapshot( + ns, + r.getAttributeLocalName(i), + r.getAttributeValue(i)); + } + if (kept == count) { + Arrays.sort(tmp, ATTR_ORDER); + return tmp; + } + final AttrSnapshot[] out = new AttrSnapshot[kept]; + System.arraycopy(tmp, 0, out, 0, kept); + Arrays.sort(out, ATTR_ORDER); + return out; + } + + private static int compareNullable(@Nullable final String a, @Nullable final String b) { + // NOTE: intentional reference equality short-circuit (mirrors safeCompare). + if (a == b) { + return 0; + } + if (a == null) { + return -1; + } + if (b == null) { + return 1; + } + return a.compareTo(b); + } + + private static int safeCompare(@Nullable final String a, @Nullable final String b, + @Nullable final Collator collator) { + // NOTE: intentional reference equality short-circuit (matches FunDeepEqual.safeCompare). + if (a == b) { + return Constants.EQUAL; + } + if (a == null) { + return Constants.INFERIOR; + } + if (b == null) { + return Constants.SUPERIOR; + } + if (collator != null) { + return collator.compare(a, b); + } + return a.compareTo(b); + } + + private record AttrSnapshot(@Nullable String ns, String local, String value) {} +} diff --git a/exist-core/src/test/java/org/exist/xquery/functions/fn/FunDeepEqualPerformanceTest.java b/exist-core/src/test/java/org/exist/xquery/functions/fn/FunDeepEqualPerformanceTest.java new file mode 100644 index 00000000000..7e5b0a80a83 --- /dev/null +++ b/exist-core/src/test/java/org/exist/xquery/functions/fn/FunDeepEqualPerformanceTest.java @@ -0,0 +1,309 @@ +/* + * eXist-db Open Source Native XML Database + * Copyright (C) 2001 The eXist-db Authors + * + * info@exist-db.org + * http://www.exist-db.org + * + * This library is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * This library is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this library; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + */ +package org.exist.xquery.functions.fn; + +import org.exist.test.ExistXmldbEmbeddedServer; +import org.junit.AfterClass; +import org.junit.BeforeClass; +import org.junit.ClassRule; +import org.junit.Test; +import org.xmldb.api.base.ResourceSet; +import org.xmldb.api.base.XMLDBException; +import org.xmldb.api.modules.XQueryService; + +import static org.junit.Assert.assertEquals; +import static org.junit.Assert.assertTrue; + +/** + * Regression test for GH-4050: fn:deep-equal was ~24x slower than + * xmldiff:compare on equivalent large XML inputs (5,490 ms vs 228 ms on + * the reporter's TEST.zip; ~2,500 ms on the synthetic 10k-element corpus + * below). The fix dispatches to a streaming comparator built on + * {@link org.exist.stax.IEmbeddedXMLStreamReader} when both arguments + * are persistent-DOM {@code DOCUMENT} or {@code ELEMENT} nodes; the + * reader iterates the BTree node stream directly and bypasses the + * legacy {@code getFirstChild} / {@code getNextSibling} recursion, + * which acquires a broker per call. + */ +public class FunDeepEqualPerformanceTest { + + @ClassRule + public static final ExistXmldbEmbeddedServer existEmbeddedServer = + new ExistXmldbEmbeddedServer(false, true, true); + + /** + * Two stored documents with structurally-identical large trees (~10,000 + * elements, attribute-heavy). Mirrors the GH-4050 reporter's scenario: + * stored XML, where each persistent-DOM accessor traverses the storage + * layer rather than running on a fast in-memory linked list. With many + * attributes per element, compareAttributes' O(attrs^2) NamedNodeMap + * lookup also bites. + */ + @BeforeClass + public static void storeLargeDocs() throws XMLDBException { + final XQueryService xqs = + existEmbeddedServer.getRoot().getService(XQueryService.class); + // breadth 10, depth 4 -> ~10,000 elements; 6 attributes per element. + // Attribute count chosen large enough to expose compareAttributes' + // quadratic behaviour without making document storage prohibitively + // slow for a unit test. + xqs.query(""" + declare function local:tree($depth, $breadth) { + if ($depth eq 0) then + value + else + { + for $i in 1 to $breadth + return local:tree($depth - 1, $breadth) + } + }; + xmldb:store("/db", "deep-equal-perf-a.xml", local:tree(5, 8)), + xmldb:store("/db", "deep-equal-perf-b.xml", local:tree(5, 8)) + """); + } + + @AfterClass + public static void removeStoredDocs() throws XMLDBException { + final XQueryService xqs = + existEmbeddedServer.getRoot().getService(XQueryService.class); + xqs.query(""" + xmldb:remove("/db", "deep-equal-perf-a.xml"), + xmldb:remove("/db", "deep-equal-perf-b.xml") + """); + } + + private static final String STORED_EQUAL_TREES = + "fn:deep-equal(doc('/db/deep-equal-perf-a.xml'), doc('/db/deep-equal-perf-b.xml'))"; + + /** + * XQuery that builds two structurally-identical large in-memory trees + * (~10,000 elements: 4 levels deep, breadth 10 at each level, with + * attributes and text values) and runs fn:deep-equal on them. The + * memtree path is unchanged by the GH-4050 fix; this test guards + * against unrelated regressions on in-memory comparison. + */ + private static final String LARGE_EQUAL_TREES = """ + declare function local:tree($depth, $breadth) { + if ($depth eq 0) then + value + else + { + for $i in 1 to $breadth + return local:tree($depth - 1, $breadth) + } + }; + let $a := local:tree(4, 10) + let $b := local:tree(4, 10) + return fn:deep-equal($a, $b) + """; + + private static final String LARGE_TREES_DIFFER_AT_LEAF = """ + declare function local:tree($depth, $breadth, $marker) { + if ($depth eq 0) then + {$marker} + else + { + for $i in 1 to $breadth + return local:tree($depth - 1, $breadth, $marker) + } + }; + let $a := local:tree(4, 10, "value") + let $b := local:tree(4, 10, "VALUE") + return fn:deep-equal($a, $b) + """; + + private static final String LARGE_TREES_DIFFER_AT_ROOT = """ + declare function local:tree($depth, $breadth) { + if ($depth eq 0) then + value + else + { + for $i in 1 to $breadth + return local:tree($depth - 1, $breadth) + } + }; + let $a := {local:tree(4, 10)} + let $b := {local:tree(4, 10)} + return fn:deep-equal($a, $b) + """; + + private long timeQuery(final String xquery) throws XMLDBException { + final XQueryService xqs = + existEmbeddedServer.getRoot().getService(XQueryService.class); + // Warm-up to amortise compilation/class-loading cost. + xqs.query(xquery); + final long start = System.nanoTime(); + final ResourceSet rs = xqs.query(xquery); + final long elapsedMs = (System.nanoTime() - start) / 1_000_000L; + // Sanity-check the result: every query above returns one boolean. + assertEquals(1, rs.getSize()); + return elapsedMs; + } + + private boolean queryResult(final String xquery) throws XMLDBException { + final XQueryService xqs = + existEmbeddedServer.getRoot().getService(XQueryService.class); + final ResourceSet rs = xqs.query(xquery); + return Boolean.parseBoolean(rs.getResource(0).getContent().toString()); + } + + @Test + public void deepEqualOnLargeEqualTreesIsFast() throws XMLDBException { + // In-memory case (memtree) -- the streaming fast path does not + // apply here; memtree's linked-list sibling traversal is already + // O(N) and the legacy recursion is the right path. Sanity check. + assertTrue(queryResult(LARGE_EQUAL_TREES)); + final long elapsedMs = timeQuery(LARGE_EQUAL_TREES); + System.out.println("[GH-4050] in-memory equal 10k-element trees: " + elapsedMs + "ms"); + final long threshold = 3000L; + assertTrue( + "fn:deep-equal on 10,000-element in-memory equal trees took " + elapsedMs + + "ms (threshold " + threshold + "ms)", + elapsedMs <= threshold); + } + + @Test + public void deepEqualOnStoredEqualDocsIsFast() throws XMLDBException { + // Persistent-DOM case -- this is the GH-4050 reporter's scenario. + // Pre-fix every getFirstChild / getNextSibling on a stored + // ElementImpl acquires a broker and walks the parent's children + // via a fresh XMLStreamReader, making compareContents quadratic + // in sibling count. The reporter measured ~9000 ms in 2021. + // Post-fix the streaming comparator iterates the BTree node + // stream once per document at storage speed; on this 10k-element + // synthetic the win is ~20x (124 ms observed locally). + assertTrue(queryResult(STORED_EQUAL_TREES)); + final long elapsedMs = timeQuery(STORED_EQUAL_TREES); + System.out.println("[GH-4050] stored equal 10k-element docs (6 attrs/elem): " + elapsedMs + "ms"); + // Generous threshold to tolerate CI variance while still catching a + // regression that puts us back into multi-second territory. + final long threshold = 5000L; + assertTrue( + "fn:deep-equal on stored 10,000-element docs took " + elapsedMs + + "ms (threshold " + threshold + "ms); GH-4050 regression?", + elapsedMs <= threshold); + } + + @Test + public void deepEqualOnRootMismatchStillShortCircuits() throws XMLDBException { + // Top-level name mismatch: in-memory case (memtree). The legacy + // path bails on the first compareNames mismatch. + assertEquals(false, queryResult(LARGE_TREES_DIFFER_AT_ROOT)); + final long elapsedMs = timeQuery(LARGE_TREES_DIFFER_AT_ROOT); + System.out.println("[GH-4050] deep-equal on root-mismatched 10k-element trees: " + elapsedMs + "ms"); + final long threshold = 1500L; + assertTrue( + "Root-mismatch fn:deep-equal took " + elapsedMs + + "ms (threshold " + threshold + "ms); pre-check ordering broken?", + elapsedMs <= threshold); + } + + @Test + public void deepEqualOnLeafMismatchProducesCorrectResult() throws XMLDBException { + // Difference is buried at every leaf; the comparator (streaming + // for stored docs, recursive for memtree) walks until the leaf + // mismatch surfaces. Correctness gate only. + assertEquals(false, queryResult(LARGE_TREES_DIFFER_AT_LEAF)); + } + + @Test + public void attributeOrderInsensitive() throws XMLDBException { + final String q = """ + let $a := + let $b := + return fn:deep-equal($a, $b) + """; + assertEquals(true, queryResult(q)); + } + + @Test + public void nestedAttributeOrderInsensitive() throws XMLDBException { + final String q = """ + let $a := + let $b := + return fn:deep-equal($a, $b) + """; + assertEquals(true, queryResult(q)); + } + + @Test + public void typedNumericVsStringNotEqual() throws XMLDBException { + // Per W3C XPath 3.1 deep-equal, xs:integer 1 is NOT deep-equal to "1". + // Atomic comparison; streaming path does not apply. + assertEquals(false, queryResult("fn:deep-equal(xs:integer(1), '1')")); + } + + @Test + public void integerAndDoubleEqual() throws XMLDBException { + // xs:integer 1 IS deep-equal to xs:double 1.0 per spec. + assertEquals(true, queryResult("fn:deep-equal(xs:integer(1), xs:double(1.0))")); + } + + @Test + public void nanEqualToNan() throws XMLDBException { + // Special case: NaN is deep-equal to NaN even though NaN != NaN. + assertEquals(true, + queryResult("fn:deep-equal(xs:double('NaN'), xs:double('NaN'))")); + } + + @Test + public void textVsCommentChildrenIgnored() throws XMLDBException { + // compareContents (and the streaming comparator) skip comments and PIs. + final String q = """ + let $a := helloworld + let $b := helloworld + return fn:deep-equal($a, $b) + """; + assertEquals(true, queryResult(q)); + } + + @Test + public void differentChildOrderNotEqual() throws XMLDBException { + // Element child order IS significant, unlike attribute order. + final String q = """ + let $a := + let $b := + return fn:deep-equal($a, $b) + """; + assertEquals(false, queryResult(q)); + } + + @Test + public void differentNamespaceNotEqual() throws XMLDBException { + final String q = """ + let $a := + let $b := + return fn:deep-equal($a, $b) + """; + assertEquals(false, queryResult(q)); + } + + @Test + public void emptySequencesEqual() throws XMLDBException { + assertEquals(true, queryResult("fn:deep-equal((), ())")); + } + + @Test + public void differentLengthSequencesNotEqual() throws XMLDBException { + assertEquals(false, queryResult("fn:deep-equal((1, 2), (1, 2, 3))")); + } +} From 8b7c5cfb43f7193e032be9a06b38810aefb0a468 Mon Sep 17 00:00:00 2001 From: Joe Wicentowski Date: Mon, 11 May 2026 08:28:24 -0400 Subject: [PATCH 2/6] [refactor] FunDeepEqual.deepCompare: decompose 8.8M-NPath method into per-type helpers MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Decompose deepCompare(Item, Item, Collator, DBBroker) from NPath 8,847,362 into focused per-type helpers, each well under the 200 NPath threshold: - compareArrayItems — array-vs-array, with size short-circuits - compareMapItems — map-vs-map, with size and key short-circuits - compareAtomicItems — atomic-vs-atomic, NaN/numeric handling - compareNodeItems — node-vs-node dispatcher (Java 21 switch expression over Type.DOCUMENT / ELEMENT / ATTRIBUTE / PI|NAMESPACE / TEXT|COMMENT) - compareDocumentItems — document-node compare with streaming fast-path - compareElementItems — element-node compare with streaming fast-path - compareAttributeItems — attribute name + value compare - comparePiOrNamespaceItems — PI/namespace name + string value compare - tryStreamingCompare — extracts the GH-4050 streaming fast-path guard + fallback that was duplicated across the DOCUMENT and ELEMENT switch branches Pure refactor; no behaviour change. The streaming fast-path's Macbeth.xml result (6-9ms median, 800x speedup) is preserved. Resolves reinhapa's NPath complexity review comment on PR #6337. Verification: - FunDeepEqual JUnit set: 78/78 pass - XQuery3Tests XQSuite gate: 986/986 pass (1 pre-existing skip) - Codacy PMD: NPath violation on deepCompare cleared; no new findings Co-Authored-By: Claude Opus 4.7 (1M context) --- .../xquery/functions/fn/FunDeepEqual.java | 319 +++++++++--------- 1 file changed, 165 insertions(+), 154 deletions(-) diff --git a/exist-core/src/main/java/org/exist/xquery/functions/fn/FunDeepEqual.java b/exist-core/src/main/java/org/exist/xquery/functions/fn/FunDeepEqual.java index 6c9ff5824fb..9e78bf3be15 100644 --- a/exist-core/src/main/java/org/exist/xquery/functions/fn/FunDeepEqual.java +++ b/exist-core/src/main/java/org/exist/xquery/functions/fn/FunDeepEqual.java @@ -223,178 +223,189 @@ public static int deepCompare(final Item item1, final Item item2, try { if (item1.getType() == Type.ARRAY_ITEM || item2.getType() == Type.ARRAY_ITEM) { - if (item1.getType() != item2.getType()) { - return Constants.INFERIOR; - } - final ArrayType array1 = (ArrayType) item1; - final ArrayType array2 = (ArrayType) item2; - final int array1Size = array1.getSize(); - final int array2Size = array2.getSize(); - if (array1Size == array2Size) { - for (int i = 0; i < array1.getSize(); i++) { - final int comparison = deepCompareSeq(array1.get(i), array2.get(i), collator, broker); - if (comparison != Constants.EQUAL) { - return comparison; - } - } - return Constants.EQUAL; - } else { - return array1Size < array2Size ? Constants.INFERIOR : Constants.SUPERIOR; - } + return compareArrayItems(item1, item2, collator, broker); } - if (item1.getType() == Type.MAP_ITEM || item2.getType() == Type.MAP_ITEM) { - if (item1.getType() != item2.getType()) { - return Constants.INFERIOR; - } - final AbstractMapType map1 = (AbstractMapType) item1; - final AbstractMapType map2 = (AbstractMapType) item2; - final int map1Size = map1.size(); - final int map2Size = map2.size(); - - if (map1Size == map2Size) { - for (final IEntry entry1 : map1) { - if (!map2.contains(entry1.key())) { - return Constants.SUPERIOR; - } - - final int comparison = deepCompareSeq(entry1.value(), map2.get(entry1.key()), collator, broker); - if (comparison != Constants.EQUAL) { - return comparison; - } - } - return Constants.EQUAL; - } else { - return map1Size < map2Size ? Constants.INFERIOR : Constants.SUPERIOR; - } + return compareMapItems(item1, item2, collator, broker); } final boolean item1IsAtomic = Type.subTypeOf(item1.getType(), Type.ANY_ATOMIC_TYPE); final boolean item2IsAtomic = Type.subTypeOf(item2.getType(), Type.ANY_ATOMIC_TYPE); if (item1IsAtomic || item2IsAtomic) { - if (!item1IsAtomic) { - return Constants.SUPERIOR; - } + return compareAtomicItems(item1, item2, item1IsAtomic, item2IsAtomic, collator); + } - if (!item2IsAtomic) { - return Constants.INFERIOR; - } + return compareNodeItems(item1, item2, collator, broker); + } catch (final XPathException e) { + logger.error(e.getMessage(), e); + return Constants.INFERIOR; + } + } - try { - final AtomicValue av = (AtomicValue) item1; - final AtomicValue bv = (AtomicValue) item2; - if (Type.subTypeOfUnion(av.getType(), Type.NUMERIC) && - Type.subTypeOfUnion(bv.getType(), Type.NUMERIC)) { - //or if both values are NaN - if (((NumericValue) item1).isNaN() && ((NumericValue) item2).isNaN()) { - return Constants.EQUAL; - } - } - - return ValueComparison.compareAtomic(collator, av, bv); - } catch (final XPathException e) { - if (logger.isTraceEnabled()) { - logger.trace(e.getMessage()); - } - return Constants.INFERIOR; - } + private static int compareArrayItems(final Item item1, final Item item2, + @Nullable final Collator collator, @Nullable final DBBroker broker) throws XPathException { + if (item1.getType() != item2.getType()) { + return Constants.INFERIOR; + } + final ArrayType array1 = (ArrayType) item1; + final ArrayType array2 = (ArrayType) item2; + final int array1Size = array1.getSize(); + final int array2Size = array2.getSize(); + if (array1Size != array2Size) { + return array1Size < array2Size ? Constants.INFERIOR : Constants.SUPERIOR; + } + for (int i = 0; i < array1Size; i++) { + final int comparison = deepCompareSeq(array1.get(i), array2.get(i), collator, broker); + if (comparison != Constants.EQUAL) { + return comparison; } + } + return Constants.EQUAL; + } - if (item1.getType() != item2.getType()) { - return Constants.INFERIOR; + private static int compareMapItems(final Item item1, final Item item2, + @Nullable final Collator collator, @Nullable final DBBroker broker) throws XPathException { + if (item1.getType() != item2.getType()) { + return Constants.INFERIOR; + } + final AbstractMapType map1 = (AbstractMapType) item1; + final AbstractMapType map2 = (AbstractMapType) item2; + final int map1Size = map1.size(); + final int map2Size = map2.size(); + if (map1Size != map2Size) { + return map1Size < map2Size ? Constants.INFERIOR : Constants.SUPERIOR; + } + for (final IEntry entry1 : map1) { + if (!map2.contains(entry1.key())) { + return Constants.SUPERIOR; } - final NodeValue nva = (NodeValue) item1; - final NodeValue nvb = (NodeValue) item2; - // NOTE(AR): intentional reference equality check - if (nva == nvb) { - return Constants.EQUAL; + final int comparison = deepCompareSeq(entry1.value(), map2.get(entry1.key()), collator, broker); + if (comparison != Constants.EQUAL) { + return comparison; } + } + return Constants.EQUAL; + } - try { - //Don't use this shortcut for in-memory nodes - //since the symbol table is ignored. - if (nva.getImplementationType() != NodeValue.IN_MEMORY_NODE && - nva.equals(nvb)) { - return Constants.EQUAL; // shortcut! - } - } catch (final XPathException e) { - // apparently incompatible values, do manual comparison + private static int compareAtomicItems(final Item item1, final Item item2, + final boolean item1IsAtomic, final boolean item2IsAtomic, @Nullable final Collator collator) { + if (!item1IsAtomic) { + return Constants.SUPERIOR; + } + if (!item2IsAtomic) { + return Constants.INFERIOR; + } + try { + final AtomicValue av = (AtomicValue) item1; + final AtomicValue bv = (AtomicValue) item2; + if (Type.subTypeOfUnion(av.getType(), Type.NUMERIC) + && Type.subTypeOfUnion(bv.getType(), Type.NUMERIC) + && ((NumericValue) item1).isNaN() + && ((NumericValue) item2).isNaN()) { + return Constants.EQUAL; } + return ValueComparison.compareAtomic(collator, av, bv); + } catch (final XPathException e) { + if (logger.isTraceEnabled()) { + logger.trace(e.getMessage()); + } + return Constants.INFERIOR; + } + } - final Node node1; - final Node node2; - switch (item1.getType()) { - case Type.DOCUMENT: - // GH-4050 fast path: persistent-DOM streaming comparator. - // Falls through to legacy on stream/IO failure to preserve correctness. - if (broker != null - && nva instanceof NodeProxy npa - && nvb instanceof NodeProxy npb - && nva.getImplementationType() == NodeValue.PERSISTENT_NODE - && nvb.getImplementationType() == NodeValue.PERSISTENT_NODE) { - try { - return FunDeepEqualStreamingComparator.compare( - broker, npa, npb, /*subtree=*/false, collator); - } catch (final XMLStreamException | IOException | RuntimeException e) { - if (logger.isDebugEnabled()) { - logger.debug("Streaming deep-equal fast path failed, falling back: " - + e.getMessage()); - } - } - } - node1 = nva instanceof Node nnva ? nnva : ((NodeProxy) nva).getOwnerDocument(); - node2 = nvb instanceof Node nnvb ? nnvb : ((NodeProxy) nvb).getOwnerDocument(); - return compareContents(node1, node2, collator); - - case Type.ELEMENT: - if (broker != null - && nva instanceof NodeProxy npea - && nvb instanceof NodeProxy npeb - && nva.getImplementationType() == NodeValue.PERSISTENT_NODE - && nvb.getImplementationType() == NodeValue.PERSISTENT_NODE) { - try { - return FunDeepEqualStreamingComparator.compare( - broker, npea, npeb, /*subtree=*/true, collator); - } catch (final XMLStreamException | IOException | RuntimeException e) { - if (logger.isDebugEnabled()) { - logger.debug("Streaming deep-equal fast path failed, falling back: " - + e.getMessage()); - } - } - } - node1 = nva.getNode(); - node2 = nvb.getNode(); - return compareElements(node1, node2, collator); - - case Type.ATTRIBUTE: - node1 = nva.getNode(); - node2 = nvb.getNode(); - final int attributeNameComparison = compareNames(node1, node2); - if (attributeNameComparison != Constants.EQUAL) { - return attributeNameComparison; - } - return safeCompare(node1.getNodeValue(), node2.getNodeValue(), collator); - - case Type.PROCESSING_INSTRUCTION: - case Type.NAMESPACE: - node1 = nva.getNode(); - node2 = nvb.getNode(); - final int nameComparison = safeCompare(node1.getNodeName(), node2.getNodeName(), null); - if (nameComparison != Constants.EQUAL) { - return nameComparison; - } - return safeCompare(nva.getStringValue(), nvb.getStringValue(), collator); - - case Type.TEXT: - case Type.COMMENT: - return safeCompare(nva.getStringValue(), nvb.getStringValue(), collator); - - default: - throw new UnexpectedItemTypeException(item1); + private static int compareNodeItems(final Item item1, final Item item2, + @Nullable final Collator collator, @Nullable final DBBroker broker) throws XPathException { + if (item1.getType() != item2.getType()) { + return Constants.INFERIOR; + } + final NodeValue nva = (NodeValue) item1; + final NodeValue nvb = (NodeValue) item2; + // NOTE(AR): intentional reference equality check + if (nva == nvb) { + return Constants.EQUAL; + } + + try { + //Don't use this shortcut for in-memory nodes + //since the symbol table is ignored. + if (nva.getImplementationType() != NodeValue.IN_MEMORY_NODE && nva.equals(nvb)) { + return Constants.EQUAL; // shortcut! } } catch (final XPathException e) { - logger.error(e.getMessage(), e); - return Constants.INFERIOR; + // apparently incompatible values, do manual comparison + } + + return switch (item1.getType()) { + case Type.DOCUMENT -> compareDocumentItems(nva, nvb, collator, broker); + case Type.ELEMENT -> compareElementItems(nva, nvb, collator, broker); + case Type.ATTRIBUTE -> compareAttributeItems(nva, nvb, collator); + case Type.PROCESSING_INSTRUCTION, Type.NAMESPACE -> comparePiOrNamespaceItems(nva, nvb, collator); + case Type.TEXT, Type.COMMENT -> safeCompare(nva.getStringValue(), nvb.getStringValue(), collator); + default -> throw new UnexpectedItemTypeException(item1); + }; + } + + private static int compareDocumentItems(final NodeValue nva, final NodeValue nvb, + @Nullable final Collator collator, @Nullable final DBBroker broker) { + // GH-4050 fast path: persistent-DOM streaming comparator. + // Falls through to legacy on stream/IO failure to preserve correctness. + final Integer streamed = tryStreamingCompare(nva, nvb, collator, broker, /*subtree=*/false); + if (streamed != null) { + return streamed; + } + final Node node1 = nva instanceof Node nnva ? nnva : ((NodeProxy) nva).getOwnerDocument(); + final Node node2 = nvb instanceof Node nnvb ? nnvb : ((NodeProxy) nvb).getOwnerDocument(); + return compareContents(node1, node2, collator); + } + + private static int compareElementItems(final NodeValue nva, final NodeValue nvb, + @Nullable final Collator collator, @Nullable final DBBroker broker) { + final Integer streamed = tryStreamingCompare(nva, nvb, collator, broker, /*subtree=*/true); + if (streamed != null) { + return streamed; + } + return compareElements(nva.getNode(), nvb.getNode(), collator); + } + + private static int compareAttributeItems(final NodeValue nva, final NodeValue nvb, + @Nullable final Collator collator) { + final Node node1 = nva.getNode(); + final Node node2 = nvb.getNode(); + final int attributeNameComparison = compareNames(node1, node2); + if (attributeNameComparison != Constants.EQUAL) { + return attributeNameComparison; + } + return safeCompare(node1.getNodeValue(), node2.getNodeValue(), collator); + } + + private static int comparePiOrNamespaceItems(final NodeValue nva, final NodeValue nvb, + @Nullable final Collator collator) throws XPathException { + final Node node1 = nva.getNode(); + final Node node2 = nvb.getNode(); + final int nameComparison = safeCompare(node1.getNodeName(), node2.getNodeName(), null); + if (nameComparison != Constants.EQUAL) { + return nameComparison; + } + return safeCompare(nva.getStringValue(), nvb.getStringValue(), collator); + } + + @Nullable + private static Integer tryStreamingCompare(final NodeValue nva, final NodeValue nvb, + @Nullable final Collator collator, @Nullable final DBBroker broker, final boolean subtree) { + if (broker == null + || !(nva instanceof NodeProxy npa) + || !(nvb instanceof NodeProxy npb) + || nva.getImplementationType() != NodeValue.PERSISTENT_NODE + || nvb.getImplementationType() != NodeValue.PERSISTENT_NODE) { + return null; + } + try { + return FunDeepEqualStreamingComparator.compare(broker, npa, npb, subtree, collator); + } catch (final XMLStreamException | IOException | RuntimeException e) { + if (logger.isDebugEnabled()) { + logger.debug("Streaming deep-equal fast path failed, falling back: " + e.getMessage()); + } + return null; } } From c9f12fb1b63b1008d3ee952749348c693ffae242 Mon Sep 17 00:00:00 2001 From: Joe Wicentowski Date: Mon, 11 May 2026 08:38:27 -0400 Subject: [PATCH 3/6] [refactor] Streaming comparator: decompose walk + clear Codacy nits Per reinhapa's review on PR #6337 (3 of the 4 Codacy findings; the remaining FunDeepEqual.deepCompare NPath 8.8M issue was already decomposed in commit 53db259237 on this same PR): - walk() decomposed into per-event-type helpers (compareStartElements / compareEndElements / walkStep) with a small WalkState record-holder so depth/rootSeen propagate without static fields. Sentinel WalkState.CONTINUE distinguishes "step succeeded, keep walking" from a non-EQUAL return. The CHARACTERS / CDATA / SPACE cases were folded into one yield branch (they were identical). NPath 721 -> well under 200. - nextRelevantEvent: rewrite to flag-controlled loop so neither `continue` nor `return ev` is the final statement of the loop body (was line 233). - FunDeepEqualPerformanceTest: move STORED_EQUAL_TREES, LARGE_EQUAL_TREES, LARGE_TREES_DIFFER_AT_LEAF, LARGE_TREES_DIFFER_AT_ROOT to the field-declaration block at the top of the class. `mvn test -pl exist-core -Dtest='*DeepEqual*'` - 78/78 green. Co-Authored-By: Claude Opus 4.7 (1M context) --- .../fn/FunDeepEqualStreamingComparator.java | 131 ++++++++++-------- .../fn/FunDeepEqualPerformanceTest.java | 89 ++++++------ 2 files changed, 117 insertions(+), 103 deletions(-) diff --git a/exist-core/src/main/java/org/exist/xquery/functions/fn/FunDeepEqualStreamingComparator.java b/exist-core/src/main/java/org/exist/xquery/functions/fn/FunDeepEqualStreamingComparator.java index 9bb3cebaa45..a8ae2c7558c 100644 --- a/exist-core/src/main/java/org/exist/xquery/functions/fn/FunDeepEqualStreamingComparator.java +++ b/exist-core/src/main/java/org/exist/xquery/functions/fn/FunDeepEqualStreamingComparator.java @@ -163,76 +163,97 @@ private static NodeHandle documentRoot(final NodeProxy n) { private static int walk(final IEmbeddedXMLStreamReader ra, final IEmbeddedXMLStreamReader rb, final boolean subtree, @Nullable final Collator collator) throws XMLStreamException { - int depth = 0; - boolean rootSeen = false; + final WalkState state = new WalkState(); while (true) { final int evA = nextRelevantEvent(ra); final int evB = nextRelevantEvent(rb); - if (evA == EOF && evB == EOF) { - return Constants.EQUAL; - } - if (evA == EOF) { - return Constants.INFERIOR; - } - if (evB == EOF) { - return Constants.SUPERIOR; + final int eofCmp = compareEofs(evA, evB); + if (eofCmp != WalkState.CONTINUE) { + return eofCmp; } if (evA != evB) { return evA < evB ? Constants.INFERIOR : Constants.SUPERIOR; } - switch (evA) { - case XMLStreamConstants.START_ELEMENT -> { - final int nameCmp = compareElementName(ra, rb); - if (nameCmp != Constants.EQUAL) { - return nameCmp; - } - final int attrCmp = compareAttributes(ra, rb, collator); - if (attrCmp != Constants.EQUAL) { - return attrCmp; - } - depth++; - rootSeen = true; - } - case XMLStreamConstants.END_ELEMENT -> { - depth--; - if (subtree && depth == 0 && rootSeen) { - return Constants.EQUAL; - } - } - case XMLStreamConstants.CHARACTERS, XMLStreamConstants.CDATA -> { - final int textCmp = safeCompare(ra.getText(), rb.getText(), collator); - if (textCmp != Constants.EQUAL) { - return textCmp; - } - } - case XMLStreamConstants.SPACE -> { - final int textCmp = safeCompare(ra.getText(), rb.getText(), collator); - if (textCmp != Constants.EQUAL) { - return textCmp; - } - } - default -> { - // Stream produced an event we did not anticipate; fall back - // to caller's slow path by signalling INFERIOR. This is the - // safety valve for edge cases in the stored-DOM stream. - throw new XMLStreamException( - "Streaming comparator: unexpected event type " + evA); - } + final int stepCmp = walkStep(evA, ra, rb, subtree, collator, state); + if (stepCmp != WalkState.CONTINUE) { + return stepCmp; + } + } + } + + private static int compareEofs(final int evA, final int evB) { + if (evA == EOF && evB == EOF) { + return Constants.EQUAL; + } + if (evA == EOF) { + return Constants.INFERIOR; + } + if (evB == EOF) { + return Constants.SUPERIOR; + } + return WalkState.CONTINUE; + } + + private static int walkStep(final int event, final IEmbeddedXMLStreamReader ra, + final IEmbeddedXMLStreamReader rb, final boolean subtree, + @Nullable final Collator collator, final WalkState state) + throws XMLStreamException { + return switch (event) { + case XMLStreamConstants.START_ELEMENT -> compareStartElements(ra, rb, collator, state); + case XMLStreamConstants.END_ELEMENT -> compareEndElements(subtree, state); + case XMLStreamConstants.CHARACTERS, XMLStreamConstants.CDATA, + XMLStreamConstants.SPACE -> { + final int textCmp = safeCompare(ra.getText(), rb.getText(), collator); + yield textCmp != Constants.EQUAL ? textCmp : WalkState.CONTINUE; } + default -> throw new XMLStreamException( + "Streaming comparator: unexpected event type " + event); + }; + } + + private static int compareStartElements(final IEmbeddedXMLStreamReader ra, + final IEmbeddedXMLStreamReader rb, @Nullable final Collator collator, + final WalkState state) throws XMLStreamException { + final int nameCmp = compareElementName(ra, rb); + if (nameCmp != Constants.EQUAL) { + return nameCmp; + } + final int attrCmp = compareAttributes(ra, rb, collator); + if (attrCmp != Constants.EQUAL) { + return attrCmp; } + state.depth++; + state.rootSeen = true; + return WalkState.CONTINUE; + } + + private static int compareEndElements(final boolean subtree, final WalkState state) { + state.depth--; + if (subtree && state.depth == 0 && state.rootSeen) { + return Constants.EQUAL; + } + return WalkState.CONTINUE; + } + + private static final class WalkState { + static final int CONTINUE = Integer.MIN_VALUE; + int depth; + boolean rootSeen; } private static int nextRelevantEvent(final IEmbeddedXMLStreamReader r) throws XMLStreamException { - while (r.hasNext()) { - final int ev = r.next(); - if (ev == XMLStreamConstants.COMMENT - || ev == XMLStreamConstants.PROCESSING_INSTRUCTION) { - continue; + int ev = EOF; + boolean done = false; + while (!done && r.hasNext()) { + final int candidate = r.next(); + if (candidate != XMLStreamConstants.COMMENT + && candidate != XMLStreamConstants.PROCESSING_INSTRUCTION) { + ev = candidate; + done = true; } - return ev; } - return EOF; + return ev; } private static int compareElementName(final IEmbeddedXMLStreamReader ra, diff --git a/exist-core/src/test/java/org/exist/xquery/functions/fn/FunDeepEqualPerformanceTest.java b/exist-core/src/test/java/org/exist/xquery/functions/fn/FunDeepEqualPerformanceTest.java index 7e5b0a80a83..29da37e5745 100644 --- a/exist-core/src/test/java/org/exist/xquery/functions/fn/FunDeepEqualPerformanceTest.java +++ b/exist-core/src/test/java/org/exist/xquery/functions/fn/FunDeepEqualPerformanceTest.java @@ -50,57 +50,9 @@ public class FunDeepEqualPerformanceTest { public static final ExistXmldbEmbeddedServer existEmbeddedServer = new ExistXmldbEmbeddedServer(false, true, true); - /** - * Two stored documents with structurally-identical large trees (~10,000 - * elements, attribute-heavy). Mirrors the GH-4050 reporter's scenario: - * stored XML, where each persistent-DOM accessor traverses the storage - * layer rather than running on a fast in-memory linked list. With many - * attributes per element, compareAttributes' O(attrs^2) NamedNodeMap - * lookup also bites. - */ - @BeforeClass - public static void storeLargeDocs() throws XMLDBException { - final XQueryService xqs = - existEmbeddedServer.getRoot().getService(XQueryService.class); - // breadth 10, depth 4 -> ~10,000 elements; 6 attributes per element. - // Attribute count chosen large enough to expose compareAttributes' - // quadratic behaviour without making document storage prohibitively - // slow for a unit test. - xqs.query(""" - declare function local:tree($depth, $breadth) { - if ($depth eq 0) then - value - else - { - for $i in 1 to $breadth - return local:tree($depth - 1, $breadth) - } - }; - xmldb:store("/db", "deep-equal-perf-a.xml", local:tree(5, 8)), - xmldb:store("/db", "deep-equal-perf-b.xml", local:tree(5, 8)) - """); - } - - @AfterClass - public static void removeStoredDocs() throws XMLDBException { - final XQueryService xqs = - existEmbeddedServer.getRoot().getService(XQueryService.class); - xqs.query(""" - xmldb:remove("/db", "deep-equal-perf-a.xml"), - xmldb:remove("/db", "deep-equal-perf-b.xml") - """); - } - private static final String STORED_EQUAL_TREES = "fn:deep-equal(doc('/db/deep-equal-perf-a.xml'), doc('/db/deep-equal-perf-b.xml'))"; - /** - * XQuery that builds two structurally-identical large in-memory trees - * (~10,000 elements: 4 levels deep, breadth 10 at each level, with - * attributes and text values) and runs fn:deep-equal on them. The - * memtree path is unchanged by the GH-4050 fix; this test guards - * against unrelated regressions on in-memory comparison. - */ private static final String LARGE_EQUAL_TREES = """ declare function local:tree($depth, $breadth) { if ($depth eq 0) then @@ -146,6 +98,47 @@ public static void removeStoredDocs() throws XMLDBException { return fn:deep-equal($a, $b) """; + /** + * Two stored documents with structurally-identical large trees (~10,000 + * elements, attribute-heavy). Mirrors the GH-4050 reporter's scenario: + * stored XML, where each persistent-DOM accessor traverses the storage + * layer rather than running on a fast in-memory linked list. With many + * attributes per element, compareAttributes' O(attrs^2) NamedNodeMap + * lookup also bites. + */ + @BeforeClass + public static void storeLargeDocs() throws XMLDBException { + final XQueryService xqs = + existEmbeddedServer.getRoot().getService(XQueryService.class); + // breadth 10, depth 4 -> ~10,000 elements; 6 attributes per element. + // Attribute count chosen large enough to expose compareAttributes' + // quadratic behaviour without making document storage prohibitively + // slow for a unit test. + xqs.query(""" + declare function local:tree($depth, $breadth) { + if ($depth eq 0) then + value + else + { + for $i in 1 to $breadth + return local:tree($depth - 1, $breadth) + } + }; + xmldb:store("/db", "deep-equal-perf-a.xml", local:tree(5, 8)), + xmldb:store("/db", "deep-equal-perf-b.xml", local:tree(5, 8)) + """); + } + + @AfterClass + public static void removeStoredDocs() throws XMLDBException { + final XQueryService xqs = + existEmbeddedServer.getRoot().getService(XQueryService.class); + xqs.query(""" + xmldb:remove("/db", "deep-equal-perf-a.xml"), + xmldb:remove("/db", "deep-equal-perf-b.xml") + """); + } + private long timeQuery(final String xquery) throws XMLDBException { final XQueryService xqs = existEmbeddedServer.getRoot().getService(XQueryService.class); From badc60ba91a7262d38e7c332f35c267e6f8a019c Mon Sep 17 00:00:00 2001 From: Joe Wicentowski Date: Tue, 12 May 2026 09:52:51 -0400 Subject: [PATCH 4/6] [refactor] FunDeepEqualStreamingComparator: use Constants for compareNullable returns Addresses duncdrum's review on PR #6337. Three return values in compareNullable now use Constants.EQUAL / INFERIOR / SUPERIOR, matching the sibling safeCompare method exactly. The intentional reference-equality short-circuit (a == b) is preserved; the constants make the comparison's intent explicit and align with the project-wide compare-method idiom. Full-module gate: Tests run: 6747, Failures: 0, Errors: 0, Skipped: 97, BUILD SUCCESS Co-Authored-By: Claude Opus 4.7 (1M context) --- .../functions/fn/FunDeepEqualStreamingComparator.java | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/exist-core/src/main/java/org/exist/xquery/functions/fn/FunDeepEqualStreamingComparator.java b/exist-core/src/main/java/org/exist/xquery/functions/fn/FunDeepEqualStreamingComparator.java index a8ae2c7558c..18096fc560f 100644 --- a/exist-core/src/main/java/org/exist/xquery/functions/fn/FunDeepEqualStreamingComparator.java +++ b/exist-core/src/main/java/org/exist/xquery/functions/fn/FunDeepEqualStreamingComparator.java @@ -324,13 +324,13 @@ private static AttrSnapshot[] collectSortedAttrs(final IEmbeddedXMLStreamReader private static int compareNullable(@Nullable final String a, @Nullable final String b) { // NOTE: intentional reference equality short-circuit (mirrors safeCompare). if (a == b) { - return 0; + return Constants.EQUAL; } if (a == null) { - return -1; + return Constants.INFERIOR; } if (b == null) { - return 1; + return Constants.SUPERIOR; } return a.compareTo(b); } From eaf4a11f2301b3abe7f097e2399c2060f9c4b04d Mon Sep 17 00:00:00 2001 From: Joe Wicentowski Date: Mon, 1 Jun 2026 00:23:58 -0400 Subject: [PATCH 5/6] [refactor] FunDeepEqualStreamingComparator: drop identity comparison, use Comparator.nullsFirst Per @reinhapa's 2026-05-12 review on PR #6337: > There is a null capable Comparator.nullsFirst() / Comparator.nullsLast() > JDK alternative to handle null cases "correctly" also should identity > comparisons be avoided going forward as this will be hidden traps for > future value types. Both compareNullable and safeCompare were prefixing their work with a manual `a == b` reference-equality short-circuit (the comment claimed it "matches FunDeepEqual.safeCompare"). For String operands the short-circuit is correct, but it sets a precedent that breaks the moment either side is replaced with a value type whose equality is by content rather than identity. Replace both with Comparator.nullsFirst: - compareNullable now delegates to a cached Comparator.nullsFirst(Comparator.naturalOrder()) instance - safeCompare's no-collator branch reuses the same cached comparator; the collator branch builds Comparator.nullsFirst(collator::compare) on the hot path (the wrapper is a single small lambda that JIT inlines; collator::compare is the dominant cost) Equivalence is preserved: - both-null: nullsFirst treats two nulls as equal -> Constants.EQUAL - a null, b non-null: nullsFirst returns negative -> Constants.INFERIOR - b null, a non-null: nullsFirst returns positive -> Constants.SUPERIOR - both non-null: delegates to natural-order or collator compare 63 DeepEqualTest + 14 FunDeepEqualPerformanceTest cases pass unchanged. Co-Authored-By: Claude Opus 4.7 --- .../fn/FunDeepEqualStreamingComparator.java | 36 +++++++------------ 1 file changed, 12 insertions(+), 24 deletions(-) diff --git a/exist-core/src/main/java/org/exist/xquery/functions/fn/FunDeepEqualStreamingComparator.java b/exist-core/src/main/java/org/exist/xquery/functions/fn/FunDeepEqualStreamingComparator.java index 18096fc560f..f718f463dd0 100644 --- a/exist-core/src/main/java/org/exist/xquery/functions/fn/FunDeepEqualStreamingComparator.java +++ b/exist-core/src/main/java/org/exist/xquery/functions/fn/FunDeepEqualStreamingComparator.java @@ -321,36 +321,24 @@ private static AttrSnapshot[] collectSortedAttrs(final IEmbeddedXMLStreamReader return out; } + /** + * Cached {@link Comparator} that places nulls first and otherwise uses + * String's natural order. Used by {@link #compareNullable(String, String)} + * and the no-collator branch of {@link #safeCompare(String, String, Collator)}. + */ + private static final Comparator NULLS_FIRST_NATURAL = + Comparator.nullsFirst(Comparator.naturalOrder()); + private static int compareNullable(@Nullable final String a, @Nullable final String b) { - // NOTE: intentional reference equality short-circuit (mirrors safeCompare). - if (a == b) { - return Constants.EQUAL; - } - if (a == null) { - return Constants.INFERIOR; - } - if (b == null) { - return Constants.SUPERIOR; - } - return a.compareTo(b); + return NULLS_FIRST_NATURAL.compare(a, b); } private static int safeCompare(@Nullable final String a, @Nullable final String b, @Nullable final Collator collator) { - // NOTE: intentional reference equality short-circuit (matches FunDeepEqual.safeCompare). - if (a == b) { - return Constants.EQUAL; - } - if (a == null) { - return Constants.INFERIOR; - } - if (b == null) { - return Constants.SUPERIOR; - } - if (collator != null) { - return collator.compare(a, b); + if (collator == null) { + return NULLS_FIRST_NATURAL.compare(a, b); } - return a.compareTo(b); + return Comparator.nullsFirst(collator::compare).compare(a, b); } private record AttrSnapshot(@Nullable String ns, String local, String value) {} From 4f6a7ccd6c3a227023a7a91e02ac9b193f525e51 Mon Sep 17 00:00:00 2001 From: Joe Wicentowski Date: Mon, 1 Jun 2026 08:17:35 -0400 Subject: [PATCH 6/6] [refactor] Hoist NULLS_FIRST_NATURAL field to the top of the class Per @reinhapa's 2026-06-01 review on PR #6337: > Address Codacy issue: Fields should be declared at the top of the > class, before any method declarations, constructors, initializers > or inner classes. The Comparator field was introduced in the previous review-fix commit (23df77c) and ended up sitting alongside the static helper methods that use it. Move it up next to the existing EOF / EMPTY_ATTRS / ATTR_ORDER constants where it belongs by convention; no other code changes. Codacy clean on the file. 77 DeepEqualTest + FunDeepEqualPerformanceTest cases pass unchanged. Co-Authored-By: Claude Opus 4.7 --- .../fn/FunDeepEqualStreamingComparator.java | 15 +++++++-------- 1 file changed, 7 insertions(+), 8 deletions(-) diff --git a/exist-core/src/main/java/org/exist/xquery/functions/fn/FunDeepEqualStreamingComparator.java b/exist-core/src/main/java/org/exist/xquery/functions/fn/FunDeepEqualStreamingComparator.java index f718f463dd0..9c580f8b2a7 100644 --- a/exist-core/src/main/java/org/exist/xquery/functions/fn/FunDeepEqualStreamingComparator.java +++ b/exist-core/src/main/java/org/exist/xquery/functions/fn/FunDeepEqualStreamingComparator.java @@ -84,6 +84,13 @@ final class FunDeepEqualStreamingComparator { private static final int EOF = -1; private static final AttrSnapshot[] EMPTY_ATTRS = new AttrSnapshot[0]; + /** + * Cached {@link Comparator} that places nulls first and otherwise uses + * String's natural order. Used by {@link #compareNullable(String, String)} + * and the no-collator branch of {@link #safeCompare(String, String, Collator)}. + */ + private static final Comparator NULLS_FIRST_NATURAL = + Comparator.nullsFirst(Comparator.naturalOrder()); private static final Comparator ATTR_ORDER = (x, y) -> { final int nsCmp = compareNullable(x.ns, y.ns); if (nsCmp != 0) { @@ -321,14 +328,6 @@ private static AttrSnapshot[] collectSortedAttrs(final IEmbeddedXMLStreamReader return out; } - /** - * Cached {@link Comparator} that places nulls first and otherwise uses - * String's natural order. Used by {@link #compareNullable(String, String)} - * and the no-collator branch of {@link #safeCompare(String, String, Collator)}. - */ - private static final Comparator NULLS_FIRST_NATURAL = - Comparator.nullsFirst(Comparator.naturalOrder()); - private static int compareNullable(@Nullable final String a, @Nullable final String b) { return NULLS_FIRST_NATURAL.compare(a, b); }