Skip to content

[CELEBORN-2317] Validate applicationId to prevent worker path traversal#3674

Open
afterincomparableyum wants to merge 1 commit into
apache:mainfrom
afterincomparableyum:celeborn-2317
Open

[CELEBORN-2317] Validate applicationId to prevent worker path traversal#3674
afterincomparableyum wants to merge 1 commit into
apache:mainfrom
afterincomparableyum:celeborn-2317

Conversation

@afterincomparableyum
Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

This change:

  • Adds Utils.validateAppId enforcing ^[A-Za-z0-9_-]+$, which matches Spark (application_<ts>_<n>, local-<ts>), Flink, and MR formats.
  • Calls it at every worker RPC entry point that takes an applicationId or shuffleKey from the wire: Controller (ReserveSlots, CommitFiles, DestroyWorkerSlots), PushDataHandler.handleCore, and the two checkAuth sites in FetchHandler.
  • Adds a canonical path containment check in StorageManager.createDiskFile (local disk branch) as defense in depth, before mkdirs() runs.

Why are the changes needed?

The worker builds local shuffle paths by concatenating applicationId received over RPC: <workingDir>/<appId>/<shuffleId>/<fileName>. The int32 and fileName is built from int id/epoch/mode, but it was never validated against a charset. With auth disabled, any client on the network could supply appId = "../foo" and have the worker mkdir, create, or delete files outside its working dir. With auth enabled, the SASL clientId == applicationId equality check in RpcEndpoint did not constrain format, so a tenant whose registered id contained .. could still escape.

Does this PR resolve a correctness bug?

Yes

Does this PR introduce any user-facing change?

No

How was this patch tested?

Unit tests/CI

@afterincomparableyum
Copy link
Copy Markdown
Contributor Author

I'll fix CI issues and push

The worker builds local shuffle paths by concatenating applicationId received over RPC: `<workingDir>/<appId>/<shuffleId>/<fileName>`. The int32 and fileName is built from int id/epoch/mode, but it was never validated against a charset. With auth disabled, any client on the network could supply `appId = "../foo"` and have the worker mkdir, create, or delete files outside its working dir. With auth enabled, the SASL clientId == applicationId equality check in RpcEndpoint did not constrain format, so a tenant whose registered id contained `..` could still escape.

This change:
  - Adds Utils.validateAppId enforcing `^[A-Za-z0-9_-]+$`, which matches
    Spark (`application_<ts>_<n>`, `local-<ts>`), Flink, and MR formats.
  - Calls it at every worker RPC entry point that takes an applicationId
    or shuffleKey from the wire: Controller (ReserveSlots, CommitFiles,
    DestroyWorkerSlots), PushDataHandler.handleCore, and the two
    checkAuth sites in FetchHandler.
  - Adds a canonical path containment check in
    StorageManager.createDiskFile (local disk branch) as defense in
    depth, before mkdirs() runs.
@RexXiong
Copy link
Copy Markdown
Contributor

RexXiong commented May 14, 2026

Thanks for the fix! The path traversal issue is real and the approach is solid.

One suggestion: I think the validateAppId check is only needed in Controller (ReserveSlots, CommitFiles, DestroyWorkerSlots). The reason is that PushData and Fetch operations require the shuffle to already be registered via ReserveSlots — if the appId was never accepted by the Controller, subsequent push/fetch calls will fail anyway because the shuffle metadata doesn't exist.

So the checks added in PushDataHandler.handleCore and FetchHandler are effectively redundant — a malicious appId can never reach the filesystem layer through those paths without first passing through the Controller.

That said, the canonical path containment check in StorageManager is worth keeping as defense-in-depth — it's cheap and guards against potential future RPC entry points that might bypass the Controller.


Reviewed with Claude Code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants