Multi-wacz upload processing#3323
Open
emma-sg wants to merge 28 commits into
Open
Conversation
417e564 to
8822375
Compare
89e579b to
a88251b
Compare
- add and organize missing and new imports - pass correct `child_wacz.filename` into `FilePreparer` - add a bunch more logging and format
also delete original files after processing is complete
processing completes
a88251b to
548e84d
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #2814
Changes
This adds an upload post-processing stage that runs after an item is uploaded, either in the same request as the upload if the file is <50MiB or in a background job if the file is larger.
This step inspects the WACZ file and if it find it's a nested file (with multiple WACZ files inside it), it expands the nested WACZs into separate files that are attached to the archived item. This also allows for pages to be correctly ingested for nested uploaded items.
The upload completion webhook is sent after this processing completes, so for larger uploads webhooks may send later than they previously would have.
Also adds a new "upload-processing" archived item state, which displays in most places on the frontend the same way as "uploaded" does, but with a pulsing dot icon & a loading indicator for the page count.
I made the deliberate choice here not to include processing uploads in the archived item selector dialog window for collections, since I believe the page calculation there relies on the archived items being correctly split/having pages in the DB by then.
Testing
The backend changes here are pretty thoroughly integration-tested.
To test manually: