Skip to content

feat: write inlined ChunkManifest entries to icechunk as native chunks#981

Merged
TomNicholas merged 3 commits intozarr-developers:mainfrom
TomNicholas:icechunk-write-inlined-chunks
Apr 25, 2026
Merged

feat: write inlined ChunkManifest entries to icechunk as native chunks#981
TomNicholas merged 3 commits intozarr-developers:mainfrom
TomNicholas:icechunk-write-inlined-chunks

Conversation

@TomNicholas
Copy link
Copy Markdown
Member

Summary

Teaches the icechunk writer to handle ChunkManifest entries containing inlined chunk data. Without this, write_manifest_to_icechunk sends the "__inlined__" sentinel straight into icechunk's set_virtual_refs_arr, which rejects it with `error parsing virtual ref URL: "inlined"`.

New control flow (single bulk set_virtual_refs_arr call; fast path unchanged):

  1. If the manifest has inlined chunks, write them first as native chunks — one store.set(chunk_key, buffer) per entry, all dispatched in parallel under a single sync(asyncio.gather(...)) bridge.
  2. Rewrite those positions in the paths array from "__inlined__" to "" so they're treated as skipped-by-Rust positions on the subsequent call.
  3. Cheap numpy-level (virtual_paths != "").any() check — skip the set_virtual_refs_arr call entirely for all-inlined or all-missing manifests (no .tolist() allocation, no Python→Rust round trip).

Together with #979 and #980, this completes end-to-end round-tripping of inlined chunks through both kerchunk and icechunk.

Test plan

  • End-to-end test writes a ManifestArray with one inlined chunk plus one virtual chunk, commits, re-opens via xr.open_zarr, and asserts the expected values on both positions.
  • All 39 existing icechunk writer tests still pass unchanged (they exercise the pure-virtual fast path).

@TomNicholas TomNicholas added the Icechunk 🧊 Relates to Icechunk library / spec label Apr 24, 2026
The icechunk writer currently sends the INLINED_CHUNK_PATH sentinel
('__inlined__') straight into icechunk's set_virtual_refs_arr, which
rejects it as a malformed virtual URL. The new test writes a manifest
containing one inlined chunk plus one virtual chunk, commits, then
re-opens via xarray and asserts the values match end-to-end.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 24, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 89.97%. Comparing base (08f95c3) to head (4634a5f).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #981      +/-   ##
==========================================
+ Coverage   89.94%   89.97%   +0.02%     
==========================================
  Files          33       33              
  Lines        2058     2064       +6     
==========================================
+ Hits         1851     1857       +6     
  Misses        207      207              
Files with missing lines Coverage Δ
virtualizarr/writers/icechunk.py 91.66% <100.00%> (+0.30%) ⬆️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@TomNicholas TomNicholas force-pushed the icechunk-write-inlined-chunks branch from 3707363 to 5ffd01f Compare April 24, 2026 19:19
Icechunk's set_virtual_refs_arr rejects the INLINED_CHUNK_PATH sentinel
('__inlined__') as a malformed URL. write_manifest_to_icechunk now writes
inlined chunks first as native chunks via store.set, then rewrites those
positions to empty strings in the paths array before calling
set_virtual_refs_arr with the cleaned view. A cheap numpy-level check
skips the virtual-refs call entirely for all-inlined or all-missing
manifests.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@TomNicholas TomNicholas force-pushed the icechunk-write-inlined-chunks branch from 5ffd01f to ab8c067 Compare April 24, 2026 19:55
Broadcasting inlined chunks not only prepends singleton dims to their
keys, but also replicates the bytes (by reference) across every position
of an expanded axis, per the fix in zarr-developers#938.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@TomNicholas TomNicholas merged commit f518a07 into zarr-developers:main Apr 25, 2026
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Icechunk 🧊 Relates to Icechunk library / spec

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant