Skip to content

Multi-wacz upload processing#3323

Open
emma-sg wants to merge 28 commits into
mainfrom
issue-2814-multi-wacz-upload
Open

Multi-wacz upload processing#3323
emma-sg wants to merge 28 commits into
mainfrom
issue-2814-multi-wacz-upload

Conversation

@emma-sg

@emma-sg emma-sg commented May 20, 2026

Copy link
Copy Markdown
Member

Closes #2814

Changes

This adds an upload post-processing stage that runs after an item is uploaded, either in the same request as the upload if the file is <50MiB or in a background job if the file is larger.

This step inspects the WACZ file and if it find it's a nested file (with multiple WACZ files inside it), it expands the nested WACZs into separate files that are attached to the archived item. This also allows for pages to be correctly ingested for nested uploaded items.

The upload completion webhook is sent after this processing completes, so for larger uploads webhooks may send later than they previously would have.

Also adds a new "upload-processing" archived item state, which displays in most places on the frontend the same way as "uploaded" does, but with a pulsing dot icon & a loading indicator for the page count.

I made the deliberate choice here not to include processing uploads in the archived item selector dialog window for collections, since I believe the page calculation there relies on the archived items being correctly split/having pages in the DB by then.

Testing

The backend changes here are pretty thoroughly integration-tested.

To test manually:

  1. Upload a small multi-wacz file (there's one in the test data files), and check that the page count is correct after uploading. This shouldn't show a separate "upload processing" state.
  2. Upload a larger multi-wacz file (you can grab one from one of the collections on dev if you like), and check that it shows up after uploading with an "upload processing" state. When this completes, check that the page count is correct.

@emma-sg emma-sg force-pushed the issue-2814-multi-wacz-upload branch 3 times, most recently from 417e564 to 8822375 Compare June 17, 2026 14:31
@emma-sg emma-sg force-pushed the issue-2814-multi-wacz-upload branch 4 times, most recently from 89e579b to a88251b Compare June 23, 2026 19:34
@emma-sg emma-sg requested review from ikreymer and tw4l June 23, 2026 20:29
@emma-sg emma-sg marked this pull request as ready for review June 23, 2026 20:29
@emma-sg emma-sg force-pushed the issue-2814-multi-wacz-upload branch from a88251b to 548e84d Compare June 24, 2026 16:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

WACZ-files dowloaded from Browsertrix and then uploaded to Browsertrix using "Upload WACZ" contains 0 pages

2 participants