Security: Prevent zip slip during archive extraction#5112
Security: Prevent zip slip during archive extraction#5112RishabhJain2018 wants to merge 11 commits into
Conversation
Validate zip member paths before extractall in the API, config utilities, and submission workers so archives cannot write outside the target directory. Co-authored-by: Rishabh Jain <rishabhjain2018@gmail.com>
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughThis PR hardens ZIP extraction and file path handling against traversal attacks by introducing centralized validation utilities and routing all ZIP operations through them. A new ChangesZIP Extraction and Path-Traversal Security
Sequence DiagramsequenceDiagram
participant Client
participant create_challenge_endpoint
participant safe_extract_zip_file
participant safe_join_under_root
participant ChallengeConfigValidator
Client->>create_challenge_endpoint: POST challenge zip
create_challenge_endpoint->>safe_extract_zip_file: extract with member validation
safe_extract_zip_file->>safe_join_under_root: validate each member path
safe_join_under_root-->>safe_extract_zip_file: path safe or traversal error
safe_extract_zip_file-->>create_challenge_endpoint: extracted or BadZipFile
create_challenge_endpoint->>safe_join_under_root: resolve YAML path
safe_join_under_root-->>create_challenge_endpoint: safe path or PathTraversalError
create_challenge_endpoint->>ChallengeConfigValidator: load and validate config
ChallengeConfigValidator->>safe_join_under_root: resolve evaluation_script
ChallengeConfigValidator->>safe_join_under_root: resolve test annotations
ChallengeConfigValidator->>safe_join_under_root: resolve descriptions
ChallengeConfigValidator-->>create_challenge_endpoint: validated or unsafe path
create_challenge_endpoint-->>Client: HTTP 200 or HTTP 406
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related issues
Possibly related PRs
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@apps/base/utils.py`:
- Around line 503-517: The ZIP extraction is protected but YAML-sourced paths
can still traverse out of the extracted tree; add a helper (e.g.,
safe_join_under_root(root, *parts)) that resolves and verifies the final path is
inside root (similar check to safe_extract_zip_file), then replace every place
that builds filesystem paths from YAML values (references to description,
evaluation_details, terms_and_conditions, submission_guidelines,
evaluation_script, test_annotation_file) to call
safe_join_under_root(extracted_root, yaml_value) before any
os.path.isfile/open/stat calls; ensure the helper raises on unsafe targets and
is used everywhere YAML-derived values are passed to isfile() or open().
In `@apps/challenges/challenge_config_utils.py`:
- Around line 96-103: The extract_zip_file function currently opens
zipfile.ZipFile(file_path, mode) into zip_ref and calls
safe_extract_zip_file(zip_ref, output_path) but only calls zip_ref.close() after
that, so if safe_extract_zip_file raises the handle remains open; wrap the
extraction call in a try/finally (or use a context manager) so zip_ref.close()
is always executed, i.e., ensure zip_ref is closed in the finally block even on
exceptions and return only after closing; reference symbols: extract_zip_file,
safe_extract_zip_file, zipfile.ZipFile, zip_ref.close().
In `@scripts/workers/remote_submission_worker.py`:
- Around line 137-140: The import of safe_extract_zip_file at module top-level
causes apps/base/utils.py (with Django/DRF top-level imports) to execute and
crash this worker; to fix, remove the top-level line "from base.utils import
safe_extract_zip_file" and instead perform a lazy/local import inside the
function that uses it (the code around zip_ref =
zipfile.ZipFile(download_location, "r") and safe_extract_zip_file(zip_ref,
extract_location)), or refactor safe_extract_zip_file into a small standalone
helper module with no Django dependencies and import that helper here; update
references to use the newly imported/local function name so extraction occurs
without importing Django during module import.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: dd256f7a-b836-45ba-9cb2-0e94968634f5
📒 Files selected for processing (5)
apps/base/utils.pyapps/challenges/challenge_config_utils.pyapps/challenges/views.pyscripts/workers/remote_submission_worker.pyscripts/workers/submission_worker.py
| def safe_extract_zip_file(zip_ref, destination): | ||
| """ | ||
| Extract zip archive members while preventing path traversal (zip slip). | ||
| """ | ||
| destination_path = os.path.abspath(destination) | ||
| for member in zip_ref.namelist(): | ||
| member_path = os.path.abspath(os.path.join(destination_path, member)) | ||
| if not ( | ||
| member_path == destination_path | ||
| or member_path.startswith(destination_path + os.sep) | ||
| ): | ||
| raise zipfile.BadZipFile( | ||
| "Zip archive contains unsafe file paths." | ||
| ) | ||
| zip_ref.extractall(destination_path) |
There was a problem hiding this comment.
This still leaves a traversal path via YAML-referenced files.
safe_extract_zip_file() only validates archive member paths. In the challenge import flows, the extracted YAML is still used to build paths for files like description, evaluation_details, terms_and_conditions, submission_guidelines, evaluation_script, and test_annotation_file with plain join(..., yaml_value) calls. A config value containing ../ can still escape the extracted tree and turn this into an arbitrary file read even though extraction itself is now bounded. Please add a shared safe_join_under_root()-style helper and use it everywhere a YAML value becomes a filesystem path.
🔐 Suggested direction
+def safe_join_under_root(root, relative_path):
+ candidate = os.path.abspath(os.path.join(root, relative_path))
+ if not (
+ candidate == root
+ or candidate.startswith(root + os.sep)
+ ):
+ raise ValueError("Path escapes extraction root.")
+ return candidateUse that helper before every isfile() / open() on YAML-derived paths.
As per coding guidelines, apps/**: Django backend. Pay attention to: Safe handling of user-submitted files and Docker image references.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def safe_extract_zip_file(zip_ref, destination): | |
| """ | |
| Extract zip archive members while preventing path traversal (zip slip). | |
| """ | |
| destination_path = os.path.abspath(destination) | |
| for member in zip_ref.namelist(): | |
| member_path = os.path.abspath(os.path.join(destination_path, member)) | |
| if not ( | |
| member_path == destination_path | |
| or member_path.startswith(destination_path + os.sep) | |
| ): | |
| raise zipfile.BadZipFile( | |
| "Zip archive contains unsafe file paths." | |
| ) | |
| zip_ref.extractall(destination_path) | |
| def safe_extract_zip_file(zip_ref, destination): | |
| """ | |
| Extract zip archive members while preventing path traversal (zip slip). | |
| """ | |
| destination_path = os.path.abspath(destination) | |
| for member in zip_ref.namelist(): | |
| member_path = os.path.abspath(os.path.join(destination_path, member)) | |
| if not ( | |
| member_path == destination_path | |
| or member_path.startswith(destination_path + os.sep) | |
| ): | |
| raise zipfile.BadZipFile( | |
| "Zip archive contains unsafe file paths." | |
| ) | |
| zip_ref.extractall(destination_path) | |
| def safe_join_under_root(root, relative_path): | |
| candidate = os.path.abspath(os.path.join(root, relative_path)) | |
| if not ( | |
| candidate == root | |
| or candidate.startswith(root + os.sep) | |
| ): | |
| raise ValueError("Path escapes extraction root.") | |
| return candidate |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@apps/base/utils.py` around lines 503 - 517, The ZIP extraction is protected
but YAML-sourced paths can still traverse out of the extracted tree; add a
helper (e.g., safe_join_under_root(root, *parts)) that resolves and verifies the
final path is inside root (similar check to safe_extract_zip_file), then replace
every place that builds filesystem paths from YAML values (references to
description, evaluation_details, terms_and_conditions, submission_guidelines,
evaluation_script, test_annotation_file) to call
safe_join_under_root(extracted_root, yaml_value) before any
os.path.isfile/open/stat calls; ensure the helper raises on unsafe targets and
is used everywhere YAML-derived values are passed to isfile() or open().
Move safe_extract_zip_file into base.zip_utils for workers without Django setup and close zip handles on extraction failure.
…on root. Add safe_join_under_root to block path traversal via challenge config values and ensure zip handles close on extraction failure in views and workers.
Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@apps/base/zip_utils.py`:
- Around line 13-19: The path validation in safe_join_under_root() function uses
os.path.abspath() with a string prefix check, which does not prevent
symlink-based path traversal attacks. Replace os.path.abspath() with
os.path.realpath() to resolve symlinks before validation, and replace the string
prefix check with os.path.commonpath() to robustly verify that the resolved
candidate path is contained within the root. Additionally, identify the
duplicate path validation logic in safe_extract_zip_file() (which the comment
indicates also applies to lines 26-34) and refactor it to reuse
safe_join_under_root() instead of reimplementing the same check, ensuring
consistent security enforcement across both functions.
In `@apps/challenges/challenge_config_utils.py`:
- Around line 943-959: The challenge_test_annotation_files list is not being
kept in sync with challenge_phases when a test annotation file is not found. In
the else branch after the isfile check for test_annotation_file_path, the code
appends an error message but does not append None to
self.files["challenge_test_annotation_files"], whereas other error branches do
append None. To maintain list alignment with challenge_phases and prevent
annotation files from being shifted onto wrong phases, append None to
self.files["challenge_test_annotation_files"] after appending the error message
in the else block when the annotation file is not found.
In `@apps/challenges/views.py`:
- Around line 1931-1934: The RuntimeError exceptions raised for unsafe file
paths in the PathTraversalError exception handler blocks are losing their error
message context because the outer exception handler at line 2175 does not read
or preserve the RuntimeError message text, causing requests to fall through to
the generic configuration error response instead of returning the specific
unsafe-path failure signal. Create a dedicated exception class (such as
UnsafeFilePathError) that inherits from an appropriate base exception, replace
both RuntimeError raises (in the blocks at lines 1931-1934 and 1947-1950) with
this new exception type, and update the outer handler at line 2175 to
specifically catch and handle this exception class while properly returning the
unsafe-path response and rolling back the transaction.
- Around line 1488-1490: The image lookup still bypasses the path-traversal
guard by using raw join() and isfile() calls without validating against
challenge_config_root. Apply the same root guard protection used for
challenge_config_root to the image file resolution to prevent malicious YAML
values like ../../shared/logo.png from traversing outside the extracted archive.
Ensure the resolved image path is validated to stay within the
challenge_config_root directory boundary before accessing it, using the same
guarding approach established for the configuration file handling.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: be10aef5-930d-40d8-a5c8-c31223a5d89a
📒 Files selected for processing (5)
apps/base/zip_utils.pyapps/challenges/challenge_config_utils.pyapps/challenges/views.pyscripts/workers/remote_submission_worker.pyscripts/workers/submission_worker.py
🚧 Files skipped from review as they are similar to previous changes (2)
- scripts/workers/submission_worker.py
- scripts/workers/remote_submission_worker.py
4076770 to
05ae194
Compare
Use realpath/commonpath for symlink-safe path checks, guard challenge image paths, keep annotation file lists aligned, return explicit unsafe-path errors, and add zip_utils unit tests.
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
tests/unit/base/test_zip_utils.py (1)
48-58: ⚡ Quick winAssert the all-or-nothing extraction contract.
This test only checks that
BadZipFileis raised. Becausesafe_extract_zip_file()is supposed to validate every member beforeextractall(), the test would still pass if a future change extracted safe members first and only then failed on the unsafe one. Add a mixed archive and assert nothing was written toextract_dir.Suggested test tightening
def test_rejects_unsafe_members(self): zip_path = os.path.join(self.root, "unsafe.zip") extract_dir = os.path.join(self.root, "unsafe-extracted") os.makedirs(extract_dir) with zipfile.ZipFile(zip_path, "w") as archive: + archive.writestr("ok.txt", "ok") archive.writestr("../evil.txt", "bad") with zipfile.ZipFile(zip_path, "r") as archive: with self.assertRaises(zipfile.BadZipFile): safe_extract_zip_file(archive, extract_dir) + + self.assertFalse(os.path.exists(os.path.join(extract_dir, "ok.txt")))As per coding guidelines,
**/test_*.pychanges should "Check that tests actually assert behavior, cover edge cases, and are not silently passing."🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/unit/base/test_zip_utils.py` around lines 48 - 58, The test_rejects_unsafe_members method only verifies that BadZipFile is raised but does not check that the all-or-nothing extraction contract is maintained. Strengthen the test by modifying the archive to contain both safe and unsafe members (e.g., add a safe file like "safe.txt" before the unsafe "../evil.txt" entry), then after the assertRaises block completes, add an assertion to verify that extract_dir is empty (using os.listdir or similar). This ensures the test fails if a future change extracts safe members before failing on the unsafe one, properly enforcing the all-or-nothing extraction guarantee.Source: Coding guidelines
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@apps/challenges/views.py`:
- Around line 2219-2223: The PathTraversalError exception handler returns a
response before reaching the shutil.rmtree(BASE_LOCATION) cleanup code, causing
the downloaded archive and extracted tree to remain on disk when a traversal
attempt is rejected, which creates a disk-space leak if the request is repeated.
Move the shutil.rmtree(BASE_LOCATION) cleanup out of the generic exception
branch by wrapping it in a finally block at the appropriate scope level, or use
a TemporaryDirectory wrapper, so the cleanup executes for all code paths
including the early return from the PathTraversalError handler, successful
executions, and other exception scenarios.
---
Nitpick comments:
In `@tests/unit/base/test_zip_utils.py`:
- Around line 48-58: The test_rejects_unsafe_members method only verifies that
BadZipFile is raised but does not check that the all-or-nothing extraction
contract is maintained. Strengthen the test by modifying the archive to contain
both safe and unsafe members (e.g., add a safe file like "safe.txt" before the
unsafe "../evil.txt" entry), then after the assertRaises block completes, add an
assertion to verify that extract_dir is empty (using os.listdir or similar).
This ensures the test fails if a future change extracts safe members before
failing on the unsafe one, properly enforcing the all-or-nothing extraction
guarantee.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: edbb1763-4266-4638-a3d5-673855f566ee
📒 Files selected for processing (4)
apps/base/zip_utils.pyapps/challenges/challenge_config_utils.pyapps/challenges/views.pytests/unit/base/test_zip_utils.py
🚧 Files skipped from review as they are similar to previous changes (2)
- apps/base/zip_utils.py
- apps/challenges/challenge_config_utils.py
| except PathTraversalError: | ||
| response_data = { | ||
| "error": "Challenge configuration contains unsafe file paths." | ||
| } | ||
| return Response(response_data, status=status.HTTP_406_NOT_ACCEPTABLE) |
There was a problem hiding this comment.
Move temp-directory cleanup out of the generic-exception branch.
Line 2219 now returns before the only shutil.rmtree(BASE_LOCATION) block, so rejected traversal attempts leave the downloaded archive and extracted tree behind. Repeating that request is enough to turn the hardening path into a disk-space leak; cleanup needs to run from an outer finally or a TemporaryDirectory wrapper so it executes on success and every early return, not just the fallback except.
Suggested fix shape
- try:
+ try:
...
except PathTraversalError:
response_data = {
"error": "Challenge configuration contains unsafe file paths."
}
return Response(response_data, status=status.HTTP_406_NOT_ACCEPTABLE)
except: # noqa: E722
...
- finally:
- try:
- shutil.rmtree(BASE_LOCATION)
- logger.info("Zip folder is removed")
- except: # noqa: E722
- logger.exception(
- "Zip folder for challenge {} is not removed from {} "
- "location".format(challenge.pk, BASE_LOCATION)
- )
+ finally:
+ try:
+ shutil.rmtree(BASE_LOCATION)
+ logger.info("Zip folder is removed")
+ except Exception:
+ logger.exception(
+ "Zip folder is not removed from %s", BASE_LOCATION
+ )As per coding guidelines, apps/** changes should pay attention to "Safe handling of user-submitted files and Docker image references."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@apps/challenges/views.py` around lines 2219 - 2223, The PathTraversalError
exception handler returns a response before reaching the
shutil.rmtree(BASE_LOCATION) cleanup code, causing the downloaded archive and
extracted tree to remain on disk when a traversal attempt is rejected, which
creates a disk-space leak if the request is repeated. Move the
shutil.rmtree(BASE_LOCATION) cleanup out of the generic exception branch by
wrapping it in a finally block at the appropriate scope level, or use a
TemporaryDirectory wrapper, so the cleanup executes for all code paths including
the early return from the PathTraversalError handler, successful executions, and
other exception scenarios.
Source: Coding guidelines
Set challenge_config_location in config util tests and expect realpath- normalized extraction paths in remote submission worker tests.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #5112 +/- ##
==========================================
+ Coverage 90.92% 90.96% +0.04%
==========================================
Files 110 110
Lines 8614 8644 +30
==========================================
+ Hits 7832 7863 +31
+ Misses 782 781 -1
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report in Codecov by Harness.
🚀 New features to boost your workflow:
|
Cover PathTraversalError branches in zip_utils, challenge config validation, and create_challenge_using_zip_file so patch coverage meets Codecov thresholds.
Initialize test_annotation_file_path before phase validation so PathTraversalError handling does not leave the variable unset.
Summary
Mitigates zip slip (path traversal) when extracting challenge and submission archives.
Changes
safe_extract_zip_file()inapps/base/utils.pyto validate member paths before extraction.Security impact
Prevents malicious zip archives from writing files outside the intended extraction directory.
Slack Thread
Summary by CodeRabbit