Is your feature request related to a problem? Please describe.
Zip files always retain an index located separately from each entry's possibly-compressed data. This allows performing high-level split/merge operations without de/recompressing file contents. This produces improved performance on benchmarks compared to serially iterating over each entry to extract, or serially iterating over each file to compress.
Describe the solution you'd like
It's possible to extract zip files in parallel (see #72) as well as merge them to create archives in parallel (see discussion in #73).
Describe alternatives you've considered
While parallel zip extraction as in #72 has likely been implemented elsewhere, to my knowledge the parallel split/merge technique in #73 (researched for pex-tool/pex#2175 and prototyped in https://github.com/cosmicexplorer/medusa-zip) has not been discussed or implemented before in other zip tooling (please let me know of any prior art for this!).
Additional context
TODO:
Is your feature request related to a problem? Please describe.
Zip files always retain an index located separately from each entry's possibly-compressed data. This allows performing high-level split/merge operations without de/recompressing file contents. This produces improved performance on benchmarks compared to serially iterating over each entry to extract, or serially iterating over each file to compress.
Describe the solution you'd like
It's possible to extract zip files in parallel (see #72) as well as merge them to create archives in parallel (see discussion in #73).
Describe alternatives you've considered
While parallel zip extraction as in #72 has likely been implemented elsewhere, to my knowledge the parallel split/merge technique in #73 (researched for pex-tool/pex#2175 and prototyped in https://github.com/cosmicexplorer/medusa-zip) has not been discussed or implemented before in other zip tooling (please let me know of any prior art for this!).
Additional context
TODO:
Sendbounds)ZipWriter::merge_contents()already works with a singleio::copy()call. bulk copy with rename avoids de/recompression of file data, but must edit each renamed local file header and therefore requires O(n)io::copy()calls.zipcrate should probably not get into the weeds of crawling the filesystem, which keepsmedusa-zipuseful as a separate crate, and ensures we don't add too much extraneous code to this one.ZipWriter::merge_contents()can be parallelized, and this is something thezipcrate should be able to do.