Skip to content

Commit 1c761b5

Browse files
d-v-bclaude
andcommitted
docs(pipeline): document max_workers regression on small chunks
Added a Notes section to _resolve_max_workers explaining when threading helps and when it hurts: - Large chunks (>= 1 MB): threading helps, default is right - Small chunks (<= 64 KB): per-task pool overhead (~30-50us) dominates the per-chunk work, threading slows things down 1.5-3x - Workaround: codec_pipeline.max_workers=1 for small-chunk workloads Approximate breakeven: 256-512 KB per uncompressed chunk. Compressed chunks shift the threshold lower because decode is real CPU work. No code change. Wiring an automatic threshold is deferred — 1 MB is a typical chunk size and a hard cutoff would catch legitimate workloads. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent faf44a0 commit 1c761b5

1 file changed

Lines changed: 15 additions & 0 deletions

File tree

src/zarr/core/codec_pipeline.py

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -45,6 +45,21 @@ def _resolve_max_workers() -> int:
4545
4646
``None`` means "auto" → ``os.cpu_count()`` (or 1 if unavailable).
4747
Values < 1 are clamped to 1 (sequential).
48+
49+
Notes
50+
-----
51+
The default (``None`` → ``cpu_count``) is tuned for large chunks
52+
(≳ 1 MB encoded) where per-chunk decode + scatter is real work and
53+
threading helps. For small chunks (≲ 64 KB) the per-task pool
54+
overhead (≈ 30-50 µs submit + worker handoff) outweighs the work
55+
and threading slows things down by 1.5-3x. If your workload uses
56+
many small chunks, set ``codec_pipeline.max_workers=1`` explicitly:
57+
58+
zarr.config.set({"codec_pipeline.max_workers": 1})
59+
60+
Approximate breakeven on uncompressed reads: 256-512 KB per chunk.
61+
Compressed chunks shift the threshold lower because decode is real
62+
CPU work that benefits from parallelism.
4863
"""
4964
import os as _os
5065

0 commit comments

Comments
 (0)