Rich t kid/introduce dict benchmarks by Rich-T-kid · Pull Request #21860 · apache/datafusion

Rich-T-kid · 2026-04-26T18:20:51Z

Which issue does this PR close?

This PR provides the benchmarks mentioned in #7647 & #9017

Works towards closing Materialize Dictionaries in Group Keys #7647.

Rationale for this change

Currently the benchmark suite doesn't have any dictionary-encoded tables with aggregations performed on them. This makes it difficult to prove performance improvements, for example, a separate PR I'm working on (#21765) is hard to validate because the existing benchmarks don't exercise this path. This PR attempts to close that gap.

What changes are included in this PR?

Adds a new dict benchmark to dfbench that measures group-by performance on dictionary-encoded columns across varying cardinality (5/10/25%), null rates (0/15%), and value types (Utf8 and List), covering both single and multi-column group-by scenarios.

Are these changes tested?

--

Are there any user-facing changes?

no

Rich-T-kid · 2026-04-27T14:57:26Z

@alamb these are the benchmarks for #21765, once their merged I can run a comparison between the current implementation and my approach

kumarUjjawal

There's something wrong with github so I am not able to post comment on the line number but basically at line 372:

Is this check needed? `schema` is created from the same `query`, and `make_record_batch` always adds `dict_col2` when `query.col2` is `Some`. So this condition looks unreachable?

Rich-T-kid · 2026-05-06T15:58:25Z

Yea I agree. Removed it

Rich-T-kid · 2026-05-06T16:33:19Z

@kumarUjjawal linting error broke the CI, just pushed up a fix

kumarUjjawal

Looks good!

kumarUjjawal · 2026-05-07T05:49:15Z

Thank you @Rich-T-kid

Rich-T-kid force-pushed the rich-t-kid/Introduce-dict-benchmarks branch from 9c07966 to c465861 Compare April 26, 2026 18:30

Rich-T-kid added 2 commits April 26, 2026 14:37

introduce dictionary test

dff60d4

Revamp v1

7347a93

Rich-T-kid force-pushed the rich-t-kid/Introduce-dict-benchmarks branch from c465861 to 7347a93 Compare April 26, 2026 18:38

Rich-T-kid mentioned this pull request Apr 26, 2026

Optimize Dictionary groupings #21765

Open

Merge branch 'main' into rich-t-kid/Introduce-dict-benchmarks

bb648fa

Rich-T-kid mentioned this pull request May 3, 2026

Add benchmarks for dictionary path of new_group_values #22004

Open

kumarUjjawal reviewed May 5, 2026

View reviewed changes

Comment thread benchmarks/src/dict.rs

Comment thread benchmarks/src/dict.rs Outdated

Comment thread benchmarks/bench.sh

revised with PR comments

96961e2

Rich-T-kid requested a review from kumarUjjawal May 5, 2026 21:36

kumarUjjawal reviewed May 6, 2026

View reviewed changes

Rich-T-kid requested a review from kumarUjjawal May 6, 2026 15:58

remove un-needed check

5befe3f

Rich-T-kid force-pushed the rich-t-kid/Introduce-dict-benchmarks branch from f9e5ee5 to 5befe3f Compare May 6, 2026 16:31

kumarUjjawal approved these changes May 6, 2026

View reviewed changes

Merge branch 'main' into rich-t-kid/Introduce-dict-benchmarks

7ca4a70

kumarUjjawal added this pull request to the merge queue May 7, 2026

Merged via the queue into apache:main with commit 6b27d2d May 7, 2026
35 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rich t kid/introduce dict benchmarks#21860

Rich t kid/introduce dict benchmarks#21860
kumarUjjawal merged 6 commits intoapache:mainfrom
Rich-T-kid:rich-t-kid/Introduce-dict-benchmarks

Rich-T-kid commented Apr 26, 2026 •

edited

Loading

Uh oh!

Rich-T-kid commented Apr 27, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kumarUjjawal left a comment

Uh oh!

Rich-T-kid commented May 6, 2026

Uh oh!

Rich-T-kid commented May 6, 2026

Uh oh!

kumarUjjawal left a comment

Uh oh!

kumarUjjawal commented May 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Rich-T-kid commented Apr 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

Rich-T-kid commented Apr 27, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kumarUjjawal left a comment

Choose a reason for hiding this comment

Uh oh!

Rich-T-kid commented May 6, 2026

Uh oh!

Rich-T-kid commented May 6, 2026

Uh oh!

kumarUjjawal left a comment

Choose a reason for hiding this comment

Uh oh!

kumarUjjawal commented May 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Rich-T-kid commented Apr 26, 2026 •

edited

Loading