feat: parse kerchunk inline refs into inlined ChunkManifest entries#979
Merged
TomNicholas merged 5 commits intozarr-developers:mainfrom Apr 24, 2026
Merged
Conversation
Replaces the two tests that locked in NotImplementedError for inlined refs with in-memory tests that hand-craft a refs dict with one inlined chunk and one virtual chunk, and assert the exact bytes, offsets, and paths that should appear in the resulting ChunkManifest. Covers both kerchunk inline encodings (base64-prefixed and raw string) for the JSON parser, plus a parquet round-trip. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Kerchunk represents inlined chunk data as either a raw string (interpreted as bytes) or a base64-encoded payload prefixed with 'base64:'. Previously the translator raised NotImplementedError for either; it now decodes both forms into a ChunkEntry with a 'data' field so the bytes flow through to ChunkManifest._inlined. Works for both the KerchunkJSONParser and KerchunkParquetParser since both share this translator. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Constructs a refs dict with one base64-encoded inlined chunk plus one virtual chunk pointing at a file in a MemoryStore, parses it through KerchunkJSONParser, then awaits ManifestStore.get for each chunk key and asserts the bytes match. The inlined chunk is served directly from ChunkManifest._inlined while the virtual chunk is fetched via the object store registry. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
for more information, see https://pre-commit.ci
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #979 +/- ##
=======================================
Coverage 89.91% 89.92%
=======================================
Files 33 33
Lines 2053 2054 +1
=======================================
+ Hits 1846 1847 +1
Misses 207 207
🚀 New features to boost your workflow:
|
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This was referenced Apr 24, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Teaches the kerchunk parsers (both JSON and Parquet) to decode inline references into
ChunkManifest._inlinedinstead of raisingNotImplementedError. Both the raw-string andbase64:-prefixed forms of kerchunk inline data are supported, and the single translator change covers both parsers since they share the same pipeline.Fixes the read half of #489 — round-tripping a kerchunk-with-inlined-data file through
open_virtual_dataset(..., filetype="kerchunk")now works. The write half (emitting inlined chunks from the kerchunk/icechunk writers) is deliberately left for a follow-up PR, so this PR mentions but does not close #489.Only possible thanks to #938.
Test plan
manifest._inlinedbytes andmanifest.dict().kerchunk.df.refs_to_dataframe, same assertions.ManifestStore.getfor both an inlined chunk key and a virtual chunk key returns the expected bytes.