arrow-select: fuse inline Utf8View/BinaryView filter coalescing#9755
arrow-select: fuse inline Utf8View/BinaryView filter coalescing#9755ClSlaid wants to merge 4 commits intoapache:mainfrom
Conversation
Avoid materialising an intermediate filtered RecordBatch when coalescing sparse inline BinaryView batches, and reuse shared filter indices for primitive columns in mixed batches. This addresses issue apache#9143 and adds focused tests and benchmarks for single-column and mixed BinaryView filter paths. Signed-off-by: cl <cailue@apache.org>
|
/cc @alamb PTAL. |
|
run benchmark coalesce_kernels |
Split push_batch_with_filter into an explicit fused path classifier and dedicated inline BinaryView dispatch helpers, and use iterator-based extension in the primitive fast path. This keeps unsupported BinaryView layouts on the existing filter path while making the optimised routing for issue apache#9143 clearer to review and maintain. Signed-off-by: cl <cailue@apache.org>
|
Optimising BinaryView filter will give you a slight overall performance regression on that benchmark. @Dandandan 🫠 Warp up
I reran the relevant Results:
I also checked which path the benchmark is actually taking.
So the main improvement is in the fully inline |
| .as_primitive::<T>(); | ||
|
|
||
| if let Some(nulls) = s.nulls().filter(|nulls| nulls.null_count() > 0) { | ||
| for &idx in indices { |
There was a problem hiding this comment.
I think there should be a path based on iterator here as well?
|
In my earlier experiments, #8951 I also got good performance on my local machine but was regressing quite a bit on the shared benchmarks based on the @alamb runner (which was running on Intel CPUs). I got some improvements from increasing the selectivity threshold https://github.com/apache/arrow-rs/pull/8951/files#diff-49658ee26403ad8cc710620108072c017373bea671df8d2a83543b9de13d0e8aL42 and there are some optimizations we can do to the filter iterators to make them faster. |
|
run benchmarks coalesce_kernels |
|
🤖 Arrow criterion benchmark running (GKE) | trigger CPU Details (lscpu)Comparing fix-9143-binaryview-filter-coalescer (7907826) to d7d9ad3 (merge-base) diff File an issue against this benchmark runner |
|
🤖 Arrow criterion benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagebase (merge-base)
branch
File an issue against this benchmark runner |
Signed-off-by: cl <cailue@apache.org>
Collapse the cached Utf8View/BinaryView dispatch state into a single fused filter so unsupported schemas stay on the generic path with minimal overhead. This keeps the fused inline view path easier to follow while preserving the targeted coalescing fast path. Signed-off-by: cl <cailue@apache.org>
Summary
BinaryViewfilter and coalescing paths so primitive columns and inline views can be appended directly without materialising an intermediate filteredRecordBatchBinaryViewinputsBinaryViewfilter cases related to#9143Verification
cargo test -p arrow-select coalesce --libcargo clippy -p arrow-select --lib --tests -- -D warningscargo clippy -p arrow --bench coalesce_kernels --features test_utils -- -D warningscargo bench -p arrow --bench coalesce_kernels --features test_utils -- --noplot single_binaryviewcargo bench -p arrow --bench coalesce_kernels --features test_utils -- --noplot mixed_binaryviewBenchmark Results
Measured against a clean
origin/mainworktree with the sameBinaryViewbenchmark additions. The figures below compare representative median times from the baseline worktree and this branch.Mixed primitive + BinaryView
mixed_binaryview (max_string_len=8), 8192, nulls: 0, selectivity: 0.001:23.16 ms->8.51 msmixed_binaryview (max_string_len=8), 8192, nulls: 0, selectivity: 0.01:2.37 ms->1.31 msmixed_binaryview (max_string_len=8), 8192, nulls: 0.1, selectivity: 0.001:31.70 ms->14.33 msmixed_binaryview (max_string_len=8), 8192, nulls: 0.1, selectivity: 0.01:3.92 ms->2.44 msSingle BinaryView
single_binaryview, 8192, nulls: 0, selectivity: 0.01:4.86 ms->4.90 ms(roughly flat, slightly slower)single_binaryview (max_string_len=8), 8192, nulls: 0, selectivity: 0.001:34.72 ms->19.33 mssingle_binaryview (max_string_len=8), 8192, nulls: 0, selectivity: 0.01:3.46 ms->2.03 mssingle_binaryview (max_string_len=8), 8192, nulls: 0.1, selectivity: 0.01:5.93 ms->3.97 mssingle_binaryview (max_string_len=8), 8192, nulls: 0, selectivity: 0.8:597 µs->619 µs(regression)single_binaryview (max_string_len=8), 8192, nulls: 0.1, selectivity: 0.8:1.78 ms->1.79 ms(roughly flat, slightly slower)In short, this change substantially improves the mixed primitive + inline
BinaryViewpath that motivated#9143, while the single-columnBinaryViewbenchmarks still show trade-offs: sparse inline cases improve, but dense inline cases are slightly slower and the non-inline single-column path is effectively unchanged.Closes #9143.