Commit 3194b09
feat: generalize ChunkManifest to hold inline chunks (#938)
* feat: generalize ChunkManifest to hold native chunks
* Rename native to inlined
* Move docs to explanation
* Rename data to inlined_data
* Better sentinel values
* Improve required entry validation
* Add scalar test
* Revert changes that should be a separate PR
* Fix mypy: avoid narrowing StringDType on np.where reassignment
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* Revert icechunk writer changes; handle inlined chunks in a follow-up PR
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* Move inlined chunks docs into data_structures.md
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* Add failing tests for broadcasting manifests with inlined chunks
Broadcast should replicate inlined chunk bytes to every position along an
expanded axis, matching the behaviour already observed for virtual chunks.
Three of the four new tests fail under the current implementation.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* Replicate inlined chunks across expanded axes in broadcast_to
Previously _broadcast_manifest only prepended singleton dimensions to
inlined chunk keys, leaving a single dict entry even when np.broadcast_to
expanded an axis. Reads at the replicated positions would find the
INLINED_CHUNK_PATH sentinel in the paths array but miss the _inlined dict,
producing broken behaviour in ManifestStore.get.
Now we replicate each inlined entry to every target position along any
axis that was size 1 in the source, mirroring how the paths/offsets/lengths
arrays are broadcast. The bytes themselves are shared by reference, not
copied.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* Add tests for concat and stack with inlined chunks
Locks in the existing behaviour of _concat_manifests and _stack_manifests
for manifests containing inlined chunks: keys are shifted along the concat
axis or gain the stack-axis index, and bytes are shared by reference rather
than copied.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* Add bytes-identity test for broadcasting inlined chunks
Confirms replicated entries share the same bytes object rather than
allocating copies at each expanded position.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* Add failing test for ManifestArray equality with differing inlined bytes
When two ManifestArrays share paths/offsets/lengths but have different
inlined chunk data, ManifestArray.__eq__ falls through to its 'over-cautious'
fallback via ChunkManifest.elementwise_eq, which does not currently compare
inlined bytes. That triggers RuntimeWarning('Should not be possible to get here').
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* Compare inlined bytes in ChunkManifest.elementwise_eq
Previously elementwise_eq only looked at paths/offsets/lengths, which all
agree for inlined chunks even when their bytes differ. That let two
ChunkManifests disagree per __eq__ but look identical per elementwise_eq,
tripping the 'Should not be possible to get here' branch in
ManifestArray.__eq__.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* Add ManifestStore read tests for inlined chunks
Covers the inlined-chunk branch in ManifestStore.get including byte-range
variants (RangeByteRequest, OffsetByteRequest, SuffixByteRequest), a mixed
manifest where inlined and virtual chunks are served from the same array,
and list_dir enumeration of inlined chunk keys.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* Smoke test that to_virtual_variable preserves inlined chunks
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Reject ChunkManifest entries with extra keys
Validation previously used a subset check, which silently accepted entries
with unknown keys alongside the required path/offset/length. Now the entry
key set must match exactly one of the two valid shapes: virtual ({path,
offset, length}) or inlined ({path, offset, length, data}).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* Document the three chunk states (virtual, missing, inlined) in a table
Calls out the path-value convention used by ChunkManifest entries so parser
authors have a single, discoverable reference for distinguishing virtual,
missing, and inlined chunks.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* Add release note for inlined chunks support
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: TomNicholas <tom@earthmover.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>1 parent e82ac27 commit 3194b09
8 files changed
Lines changed: 850 additions & 32 deletions
File tree
- docs
- virtualizarr
- manifests
- tests/test_manifests
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
94 | 94 | | |
95 | 95 | | |
96 | 96 | | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
97 | 149 | | |
98 | 150 | | |
99 | 151 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
4 | 4 | | |
5 | 5 | | |
6 | 6 | | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
7 | 11 | | |
8 | 12 | | |
9 | 13 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
1 | 2 | | |
2 | 3 | | |
3 | 4 | | |
| |||
214 | 215 | | |
215 | 216 | | |
216 | 217 | | |
| 218 | + | |
| 219 | + | |
| 220 | + | |
| 221 | + | |
| 222 | + | |
| 223 | + | |
| 224 | + | |
| 225 | + | |
| 226 | + | |
| 227 | + | |
| 228 | + | |
217 | 229 | | |
218 | 230 | | |
219 | 231 | | |
220 | 232 | | |
221 | 233 | | |
| 234 | + | |
222 | 235 | | |
223 | 236 | | |
224 | 237 | | |
| |||
230 | 243 | | |
231 | 244 | | |
232 | 245 | | |
| 246 | + | |
| 247 | + | |
| 248 | + | |
| 249 | + | |
| 250 | + | |
| 251 | + | |
| 252 | + | |
| 253 | + | |
| 254 | + | |
233 | 255 | | |
234 | 256 | | |
235 | 257 | | |
236 | 258 | | |
237 | 259 | | |
| 260 | + | |
238 | 261 | | |
239 | 262 | | |
240 | 263 | | |
| |||
248 | 271 | | |
249 | 272 | | |
250 | 273 | | |
| 274 | + | |
| 275 | + | |
| 276 | + | |
| 277 | + | |
| 278 | + | |
| 279 | + | |
| 280 | + | |
| 281 | + | |
| 282 | + | |
| 283 | + | |
| 284 | + | |
| 285 | + | |
| 286 | + | |
| 287 | + | |
| 288 | + | |
| 289 | + | |
| 290 | + | |
251 | 291 | | |
252 | 292 | | |
253 | 293 | | |
254 | 294 | | |
255 | 295 | | |
| 296 | + | |
256 | 297 | | |
257 | 298 | | |
258 | 299 | | |
| |||
0 commit comments