BE-513: HashQL: Rework dynamic aggregate size estimation#8697
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
2 Skipped Deployments
|
PR SummaryMedium Risk Overview Structs/tuples now sum materialized field sizes with cardinality fixed to Adds supporting math/utilities: Reviewed by Cursor Bugbot for commit 393f316. Bugbot is set up for automated code reviews on this repo. Configure here. |
d52594c to
ae2d3af
Compare
fb7f8ba to
5eeec53
Compare
Merging this PR will degrade performance by 11.34%
|
| Benchmark | BASE |
HEAD |
Efficiency | |
|---|---|---|---|---|
| ❌ | entity_projections |
64.9 µs | 73.2 µs | -11.34% |
| ❌ | fibonacci_recursive[16] |
1.8 ms | 2.1 ms | -10.86% |
| ❌ | diamond |
16.2 µs | 18 µs | -10.03% |
| ❌ | fibonacci_recursive[24] |
85.9 ms | 96.4 ms | -10.9% |
Comparing bm/be-513-hashql-size-estimation-aggregate-cardinality-is-wrong-for (393f316) with main (a18ae88)2
Footnotes
-
56 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩
-
No successful run was found on
bm/be-512-hashql-switchint-allow-cross-backend-transitions(43c449a) during the generation of this report, somain(a18ae88) was used instead as the comparison base. There might be some changes unrelated to this pull request in this report. ↩
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## bm/be-512-hashql-switchint-allow-cross-backend-transitions #8697 +/- ##
==============================================================================================
+ Coverage 68.56% 68.64% +0.08%
==============================================================================================
Files 1061 1061
Lines 106420 106748 +328
Branches 4986 4993 +7
==============================================================================================
+ Hits 72963 73277 +314
- Misses 32706 32717 +11
- Partials 751 754 +3
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
5eeec53 to
91d9bc3
Compare
598d416 to
a64c0f8
Compare
91d9bc3 to
f2fc415
Compare
a64c0f8 to
80db587
Compare
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 80db587. Configure here.
f2fc415 to
43c449a
Compare
80db587 to
393f316
Compare
Benchmark results
|
| Function | Value | Mean | Flame graphs |
|---|---|---|---|
| resolve_policies_for_actor | user: empty, selectivity: high, policies: 2002 | Flame Graph | |
| resolve_policies_for_actor | user: empty, selectivity: low, policies: 1 | Flame Graph | |
| resolve_policies_for_actor | user: empty, selectivity: medium, policies: 1001 | Flame Graph | |
| resolve_policies_for_actor | user: seeded, selectivity: high, policies: 3314 | Flame Graph | |
| resolve_policies_for_actor | user: seeded, selectivity: low, policies: 1 | Flame Graph | |
| resolve_policies_for_actor | user: seeded, selectivity: medium, policies: 1526 | Flame Graph | |
| resolve_policies_for_actor | user: system, selectivity: high, policies: 2078 | Flame Graph | |
| resolve_policies_for_actor | user: system, selectivity: low, policies: 1 | Flame Graph | |
| resolve_policies_for_actor | user: system, selectivity: medium, policies: 1033 | Flame Graph |
policy_resolution_medium
| Function | Value | Mean | Flame graphs |
|---|---|---|---|
| resolve_policies_for_actor | user: empty, selectivity: high, policies: 102 | Flame Graph | |
| resolve_policies_for_actor | user: empty, selectivity: low, policies: 1 | Flame Graph | |
| resolve_policies_for_actor | user: empty, selectivity: medium, policies: 51 | Flame Graph | |
| resolve_policies_for_actor | user: seeded, selectivity: high, policies: 269 | Flame Graph | |
| resolve_policies_for_actor | user: seeded, selectivity: low, policies: 1 | Flame Graph | |
| resolve_policies_for_actor | user: seeded, selectivity: medium, policies: 107 | Flame Graph | |
| resolve_policies_for_actor | user: system, selectivity: high, policies: 133 | Flame Graph | |
| resolve_policies_for_actor | user: system, selectivity: low, policies: 1 | Flame Graph | |
| resolve_policies_for_actor | user: system, selectivity: medium, policies: 63 | Flame Graph |
policy_resolution_none
| Function | Value | Mean | Flame graphs |
|---|---|---|---|
| resolve_policies_for_actor | user: empty, selectivity: high, policies: 2 | Flame Graph | |
| resolve_policies_for_actor | user: empty, selectivity: low, policies: 1 | Flame Graph | |
| resolve_policies_for_actor | user: empty, selectivity: medium, policies: 1 | Flame Graph | |
| resolve_policies_for_actor | user: system, selectivity: high, policies: 8 | Flame Graph | |
| resolve_policies_for_actor | user: system, selectivity: low, policies: 1 | Flame Graph | |
| resolve_policies_for_actor | user: system, selectivity: medium, policies: 3 | Flame Graph |
policy_resolution_small
| Function | Value | Mean | Flame graphs |
|---|---|---|---|
| resolve_policies_for_actor | user: empty, selectivity: high, policies: 52 | Flame Graph | |
| resolve_policies_for_actor | user: empty, selectivity: low, policies: 1 | Flame Graph | |
| resolve_policies_for_actor | user: empty, selectivity: medium, policies: 25 | Flame Graph | |
| resolve_policies_for_actor | user: seeded, selectivity: high, policies: 94 | Flame Graph | |
| resolve_policies_for_actor | user: seeded, selectivity: low, policies: 1 | Flame Graph | |
| resolve_policies_for_actor | user: seeded, selectivity: medium, policies: 26 | Flame Graph | |
| resolve_policies_for_actor | user: system, selectivity: high, policies: 66 | Flame Graph | |
| resolve_policies_for_actor | user: system, selectivity: low, policies: 1 | Flame Graph | |
| resolve_policies_for_actor | user: system, selectivity: medium, policies: 29 | Flame Graph |
read_scaling_complete
| Function | Value | Mean | Flame graphs |
|---|---|---|---|
| entity_by_id;one_depth | 1 entities | Flame Graph | |
| entity_by_id;one_depth | 10 entities | Flame Graph | |
| entity_by_id;one_depth | 25 entities | Flame Graph | |
| entity_by_id;one_depth | 5 entities | Flame Graph | |
| entity_by_id;one_depth | 50 entities | Flame Graph | |
| entity_by_id;two_depth | 1 entities | Flame Graph | |
| entity_by_id;two_depth | 10 entities | Flame Graph | |
| entity_by_id;two_depth | 25 entities | Flame Graph | |
| entity_by_id;two_depth | 5 entities | Flame Graph | |
| entity_by_id;two_depth | 50 entities | Flame Graph | |
| entity_by_id;zero_depth | 1 entities | Flame Graph | |
| entity_by_id;zero_depth | 10 entities | Flame Graph | |
| entity_by_id;zero_depth | 25 entities | Flame Graph | |
| entity_by_id;zero_depth | 5 entities | Flame Graph | |
| entity_by_id;zero_depth | 50 entities | Flame Graph |
read_scaling_linkless
| Function | Value | Mean | Flame graphs |
|---|---|---|---|
| entity_by_id | 1 entities | Flame Graph | |
| entity_by_id | 10 entities | Flame Graph | |
| entity_by_id | 100 entities | Flame Graph | |
| entity_by_id | 1000 entities | Flame Graph | |
| entity_by_id | 10000 entities | Flame Graph |
representative_read_entity
| Function | Value | Mean | Flame graphs |
|---|---|---|---|
| entity_by_id | entity type ID: https://blockprotocol.org/@alice/types/entity-type/block/v/1
|
Flame Graph | |
| entity_by_id | entity type ID: https://blockprotocol.org/@alice/types/entity-type/book/v/1
|
Flame Graph | |
| entity_by_id | entity type ID: https://blockprotocol.org/@alice/types/entity-type/building/v/1
|
Flame Graph | |
| entity_by_id | entity type ID: https://blockprotocol.org/@alice/types/entity-type/organization/v/1
|
Flame Graph | |
| entity_by_id | entity type ID: https://blockprotocol.org/@alice/types/entity-type/page/v/2
|
Flame Graph | |
| entity_by_id | entity type ID: https://blockprotocol.org/@alice/types/entity-type/person/v/1
|
Flame Graph | |
| entity_by_id | entity type ID: https://blockprotocol.org/@alice/types/entity-type/playlist/v/1
|
Flame Graph | |
| entity_by_id | entity type ID: https://blockprotocol.org/@alice/types/entity-type/song/v/1
|
Flame Graph | |
| entity_by_id | entity type ID: https://blockprotocol.org/@alice/types/entity-type/uk-address/v/1
|
Flame Graph |
representative_read_entity_type
| Function | Value | Mean | Flame graphs |
|---|---|---|---|
| get_entity_type_by_id | Account ID: bf5a9ef5-dc3b-43cf-a291-6210c0321eba
|
Flame Graph |
representative_read_multiple_entities
| Function | Value | Mean | Flame graphs |
|---|---|---|---|
| entity_by_property | traversal_paths=0 | 0 | |
| entity_by_property | traversal_paths=255 | 1,resolve_depths=inherit:1;values:255;properties:255;links:127;link_dests:126;type:true | |
| entity_by_property | traversal_paths=2 | 1,resolve_depths=inherit:0;values:0;properties:0;links:0;link_dests:0;type:false | |
| entity_by_property | traversal_paths=2 | 1,resolve_depths=inherit:0;values:0;properties:0;links:1;link_dests:0;type:true | |
| entity_by_property | traversal_paths=2 | 1,resolve_depths=inherit:0;values:0;properties:2;links:1;link_dests:0;type:true | |
| entity_by_property | traversal_paths=2 | 1,resolve_depths=inherit:0;values:2;properties:2;links:1;link_dests:0;type:true | |
| link_by_source_by_property | traversal_paths=0 | 0 | |
| link_by_source_by_property | traversal_paths=255 | 1,resolve_depths=inherit:1;values:255;properties:255;links:127;link_dests:126;type:true | |
| link_by_source_by_property | traversal_paths=2 | 1,resolve_depths=inherit:0;values:0;properties:0;links:0;link_dests:0;type:false | |
| link_by_source_by_property | traversal_paths=2 | 1,resolve_depths=inherit:0;values:0;properties:0;links:1;link_dests:0;type:true | |
| link_by_source_by_property | traversal_paths=2 | 1,resolve_depths=inherit:0;values:0;properties:2;links:1;link_dests:0;type:true | |
| link_by_source_by_property | traversal_paths=2 | 1,resolve_depths=inherit:0;values:2;properties:2;links:1;link_dests:0;type:true |
scenarios
| Function | Value | Mean | Flame graphs |
|---|---|---|---|
| full_test | query-limited | Flame Graph | |
| full_test | query-unlimited | Flame Graph | |
| linked_queries | query-limited | Flame Graph | |
| linked_queries | query-unlimited | Flame Graph |


🌟 What is the purpose of this PR?
The size estimation analysis previously treated all aggregate kinds (structs, tuples, lists, dicts, closures) identically — summing operand footprints and accumulating cardinality as if every aggregate were a flat collection. This was incorrect: a struct or tuple is a single composite value with cardinality 1, while a list or dict is a true collection whose cardinality equals its element count.
This PR introduces type-aware aggregate footprint evaluation. Structs and tuples now correctly report cardinality 1 with units equal to the sum of their fields' materialized sizes. Lists report per-element units (joined across elements) with cardinality equal to the element count. Dicts compute per-pair units (key + value combined) with cardinality equal to the pair count. Closures combine their function pointer and environment footprints into a single scalar value.
To support this, a
materialize()method is introduced onFootprintthat collapses a footprint's(units, cardinality)pair into a single total information estimate. This is needed when a value with its own cardinality (e.g. a list) is embedded as a field of a composite type — the field's contribution to the parent's units must account for the full information content of the nested value, not just its per-element size.🔍 What does this change?
RValue::Aggregatehandler ineval_rvaluewith a dedicatedeval_rvalue_aggregatemethod that dispatches onAggregateKind.materialize()d footprints of all operands and sets cardinality to 1.saturating_mul_add) and sets cardinality to the pair count.Footprint::materialize()which multiplies units by cardinality to produce a total information estimate, with case-specific handling for constant×constant (exact), affine units×constant cardinality (scale coefficients by cardinality upper bound), and affine×affine (element-wise coefficient multiplication as a linear under-approximation).Footprint::one(units)constructor for footprints with cardinality exactly 1.Estimate::saturating_coeff_mulfor element-wise coefficient multiplication between two estimates.InformationRange::saturating_mul_cardinalityfor range-level multiplication of information by cardinality, saturating to unbounded on overflow.Eval::into_footprintas a consuming counterpart toEval::as_ref.struct_aggregate_sums_operandsandtuple_aggregate_sums_operands, which previously reported cardinality 2 for a two-field struct/tuple; both now correctly report cardinality 1.🛡 What tests cover this?
list_aggregate_per_element_units,dict_aggregate_per_pair_units,tuple_many_fields_cardinality_one, andstruct_materializes_list_parameter, each with corresponding snapshot files.footprint.rscovering all fourmaterialize()branches: scalar identity, constant×constant, affine units×constant cardinality, constant units×affine cardinality, and both-affine same-parameter.range.rscoveringsaturating_mul_cardinality: exact multiplication, identity by 1, empty inputs, unbounded cardinality, and overflow to unbounded.❓ How to test this?
cargo test -p hashql-mirand confirm all tests pass.libs/@local/hashql/mir/tests/ui/pass/size-estimation/to verify the reportedunitsandcardinalityvalues match the expected semantics for each aggregate kind.