You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
feat: write inlined ChunkManifest entries to icechunk as native chunks (#981)
* Add failing test for writing inlined chunks to icechunk
The icechunk writer currently sends the INLINED_CHUNK_PATH sentinel
('__inlined__') straight into icechunk's set_virtual_refs_arr, which
rejects it as a malformed virtual URL. The new test writes a manifest
containing one inlined chunk plus one virtual chunk, commits, then
re-opens via xarray and asserts the values match end-to-end.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* Support writing inlined ChunkManifest entries to icechunk
Icechunk's set_virtual_refs_arr rejects the INLINED_CHUNK_PATH sentinel
('__inlined__') as a malformed URL. write_manifest_to_icechunk now writes
inlined chunks first as native chunks via store.set, then rewrites those
positions to empty strings in the paths array before calling
set_virtual_refs_arr with the cleaned view. A cheap numpy-level check
skips the virtual-refs call entirely for all-inlined or all-missing
manifests.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* Update broadcast description to reflect bytes replication
Broadcasting inlined chunks not only prepends singleton dims to their
keys, but also replicates the bytes (by reference) across every position
of an expanded axis, per the fix in #938.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Inlined chunks participate in all manifest operations: concatenation and stacking shift their indices, broadcasting prepends singleton dimensions to their keys, equality compares the inlined bytes, pickling carries the data along (for Dask/multiprocessing), `ManifestStore` reads return them directly from memory, and `nbytes` includes their size.
147
+
Inlined chunks participate in all manifest operations: concatenation and stacking shift their indices, broadcasting both prepends singleton dimensions to their keys and replicates the bytes (by reference) across every position of an expanded axis, equality compares the inlined bytes, pickling carries the data along (for Dask/multiprocessing), `ManifestStore` reads return them directly from memory, and `nbytes` includes their size.
Copy file name to clipboardExpand all lines: docs/releases.md
+2Lines changed: 2 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,6 +6,8 @@
6
6
7
7
- The kerchunk writer now serializes inlined `ChunkManifest` entries as kerchunk's `base64:`-prefixed inline form, rather than emitting broken `["__inlined__", 0, length]` triples. Together with the read-side support added in #979, this means a virtual dataset with inlined chunks can be round-tripped through both `to_kerchunk(format="json"/"parquet")` and the corresponding `KerchunkJSONParser`/`KerchunkParquetParser`.
8
8
By [Tom Nicholas](https://github.com/TomNicholas).
9
+
- The icechunk writer now handles `ChunkManifest` entries containing inlined chunk data. For arrays with no inlined chunks the existing fast bulk `set_virtual_refs_arr` path is unchanged; otherwise inlined positions are sent to icechunk as empty (missing) virtual refs and the inlined bytes are written separately as managed chunks. A virtual dataset with inlined chunks can now be `to_icechunk`'d and re-opened via `xr.open_zarr` without data loss.
10
+
By [Tom Nicholas](https://github.com/TomNicholas).
9
11
10
12
-`ChunkManifest` can now hold inlined chunks — raw chunk bytes carried directly in memory rather than as references to external files. Intended for parser authors (e.g., loading Kerchunk references with inlined data); not exposed via `loadable_variables`.
0 commit comments