Avoid memory copies during serialization

Same issue as https://github.com/earth-mover/icechunk/issues/1574 but on the virtualizarr side. Part of #104.

tl;dr: we are creating shitloads of python objects during `vds.vz.to_icechunk()`.

We always knew our current implementation would be inefficient but it's even worse than I had expected. [This benchmark](https://gist.github.com/TomNicholas/63002b2a99434a8cdceb60c9bef03389) writes a single chunk manifest containing 10M chunk references. Those take up about 300MB in memory as numpy arrays (could be better but not terrible), but writing them to Icechunk take 90s and 4GB!!!

I think if we pass the manifest in-memory using arrow arrays instead this should become much much better.

---

Actually if we transform to arrow arrays (or maybe even just use arrow arrays from the start...) we could make use of this idea in `vds.vz.to_kerchunk()` also.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoid memory copies during serialization #860

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Avoid memory copies during serialization #860

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions