[AURON #2257] Avoid URI reparsing in JNI Hadoop paths by zhtttylz · Pull Request #2264 · apache/auron

zhtttylz · 2026-05-12T03:56:00Z

Which issue does this PR close?

Rationale for this change

Auron's JNI Hadoop file wrappers currently reconstruct Hadoop paths with new Path(new URI(path)).
This does not preserve Hadoop Path(String) semantics before the path is passed back to FileSystem.

When a raw Hadoop path string contains a literal #, Java URI parsing treats the suffix after # as a fragment, so the actual Hadoop path is truncated.

For example, the intended path:

hdfs://mycluster/auron-it-hdfs-rbf-repro/raw#mini.txt

is opened as:

/auron-it-hdfs-rbf-repro/raw

What changes are included in this PR?

This PR stops reparsing Hadoop path strings through java.net.URI in JniBridge.
The path reconstruction is changed from:

- new Path(new URI(path))
+ new Path(path)

This preserves Hadoop Path(String) semantics.
Add a regression test for JNI Hadoop file wrapper path handling when the path contains a literal #.

Are there any user-facing changes?

This fixes a bug where Hadoop paths containing a literal # could be truncated.

No new APIs, configs, or migration steps are required.

How was this patch tested?

Ran the focused Java regression test:

mvn -pl auron-core -am -Pspark-3.5 -Pscala-2.12 -Ppre \
  -DskipBuildNative \
  -Dtest=org.apache.auron.jni.JniBridgeTest \
   test

Result:

Tests run: 1, Failures: 0, Errors: 0, Skipped: 0
BUILD SUCCESS

Copilot

Pull request overview

This PR fixes a bug in Auron’s JNI Hadoop file wrappers where paths containing a literal # could be truncated due to java.net.URI fragment parsing, and adds Java regression coverage to prevent recurrence.

Changes:

Adjusted JNI bridge path handling to avoid fragment truncation when # appears in the path string.
Added JniBridgeTest regression tests covering literal # handling and percent-encoding behavior.
Added a test-scoped Hadoop runtime dependency to support the new unit test.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File	Description
`auron-core/src/main/java/org/apache/auron/jni/JniBridge.java`	Changes how input/output paths are converted to Hadoop `Path` objects to avoid `#` fragment truncation.
`auron-core/src/test/java/org/apache/auron/jni/JniBridgeTest.java`	Adds regression tests asserting `#` is preserved and that read/write path encoding behavior is stable.
`auron-core/pom.xml`	Adds `hadoop-client-runtime` as a test dependency to compile/run the new test.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

    public static FSDataInputWrapper openFileAsDataInputWrapper(FileSystem fs, String path) throws Exception {
-        // the path is a URI string, so we need to convert it to a URI object
-        return FSDataInputWrapper.wrap(fs.open(new Path(new URI(path))));
+        return FSDataInputWrapper.wrap(fs.open(toInputPath(path)));
    }


    public static FSDataOutputWrapper createFileAsDataOutputWrapper(FileSystem fs, String path) throws Exception {
-        return FSDataOutputWrapper.wrap(fs.create(new Path(new URI(path))));
+        return FSDataOutputWrapper.wrap(fs.create(new Path(path)));
+    }
+
+    private static Path toInputPath(String path) throws URISyntaxException {
+        String safePath = path.indexOf('#') >= 0 ? path.replace("#", "%23") : path;
+        return new Path(new URI(safePath));
    }


ShreyeshArangath · 2026-05-15T22:47:18Z

+    }
+
+    private static Path toInputPath(String path) throws URISyntaxException {
+        String safePath = path.indexOf('#') >= 0 ? path.replace("#", "%23") : path;


This seems a little brittle; why do we need this? can we do something like so?

public static FSDataInputWrapper openFileAsDataInputWrapper(FileSystem fs, String path) throws Exception { return FSDataInputWrapper.wrap(fs.open(new Path(path))); }

yew1eb

The fix for the read path is inconsistent with the write path, and the toInputPath workaround has an edge-case bug.

yew1eb · 2026-05-16T07:29:33Z

+
+    private static Path toInputPath(String path) throws URISyntaxException {
+        String safePath = path.indexOf('#') >= 0 ? path.replace("#", "%23") : path;
+        return new Path(new URI(safePath));


Two issues here:

Inconsistent fix: createFileAsDataOutputWrapper was changed to new Path(path) (the correct simple fix), but openFileAsDataInputWrapper still goes through new URI(...) after escaping #. If new Path(path) is correct for writes, the same change should work for reads — the PR description itself says the fix is "change from new Path(new URI(path)) to new Path(path)". Why does the read path need a different approach?

Double-encoding bug: path.replace("#", "%23") will corrupt a path that already contains a literal %23 (i.e. a percent-encoded #) by turning it into %2523. If the simpler new Path(path) works for writes, applying it uniformly to reads as well would fix both issues at once.

[AURON apache#2257] Avoid URI reparsing in JNI Hadoop paths

a507d0e

github-actions Bot added build core labels May 12, 2026

zhtttylz marked this pull request as draft May 12, 2026 12:04

Fix JNI Hadoop path decoding

0c4fc47

zhtttylz marked this pull request as ready for review May 14, 2026 08:38

cxzl25 requested a review from Copilot May 14, 2026 08:58

Copilot started reviewing on behalf of cxzl25 May 14, 2026 08:58 View session

Copilot AI reviewed May 14, 2026

View reviewed changes

ShreyeshArangath reviewed May 15, 2026

View reviewed changes

yew1eb reviewed May 16, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AURON #2257] Avoid URI reparsing in JNI Hadoop paths#2264

[AURON #2257] Avoid URI reparsing in JNI Hadoop paths#2264
zhtttylz wants to merge 2 commits into
apache:masterfrom
zhtttylz:fix-hadoop-fs-path-hash

zhtttylz commented May 12, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

ShreyeshArangath May 15, 2026

Uh oh!

yew1eb left a comment

Uh oh!

yew1eb May 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

zhtttylz commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

How was this patch tested?

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

ShreyeshArangath May 15, 2026

Choose a reason for hiding this comment

Uh oh!

yew1eb left a comment

Choose a reason for hiding this comment

Uh oh!

yew1eb May 16, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

zhtttylz commented May 12, 2026 •

edited

Loading