benchmark compactly::ans by droundy · Pull Request #131 · djkoloski/rust_serialization_benchmark

droundy · 2026-04-13T20:52:17Z

This adds support for the compactly crate but I haven't been able to compile it due to #130 so I am doubtful that it is correct.

Technically this benchmarks compactly::Ans which is supposed to be faster at the cost of sometimes requiring one more byte of storage. If you wanted to support two versions of compactly we could also test the arithmetic coding version, but that seems excessive.

mumbleskates

this doesn't compile right now. derives for the required trait are missing on a number of structs.

there are also stray changes made to the capnp generated code, which should not be included and will cause our build checks to fail.

…code

droundy · 2026-04-14T13:33:20Z

As I look more closely at the code... are the tests run on randomly generated data? That is sort of a pathological case for compactly, which is focused on size rather than time.

mumbleskates · 2026-04-15T08:21:28Z

are the tests run on randomly generated data?

i believe they all run against a fixed seed? or all against the same data, anyway.

there are test columns reflective of the resulting encoded size (as well as encoded data compressed with generalized compression algorithms), but if there is any kind of complicated compression going on then yeah it's going to be a very nuanced thing to measure. that said given that every other library has to represent the exact same data, if you're able to uniquely represent it in fewer bytes and eliminate that redundancy that's great. this is a good suite for general speed/size tradeoffs, not quite as much for showing the ability to compress a compressible dataset.

mumbleskates · 2026-04-15T08:25:56Z

currently the pr is failing cargo fmt. The two main things we want to be able to run for the build to pass are cargo fmt --check and cargo test --benches --no-default-features --features default-encoding-set

mumbleskates

needs cargo fmt

mumbleskates · 2026-04-15T08:34:21Z

note that your ability to build & test this might also be affected by the same problem #132 is trying to address right now if you don't have a locally cached version of core2

droundy · 2026-04-15T22:53:05Z

I ran `cargo fmt` and pushed and am trying the test right now.

…

On Wed, Apr 15, 2026 at 1:34 AM Kent Ross ***@***.***> wrote: *mumbleskates* left a comment (djkoloski/rust_serialization_benchmark#131) <#131 (comment)> note that your ability to build & test this might also be affected by the same problem #132 <#132> is trying to address right now if you don't have a locally cached version of core2 — Reply to this email directly, view it on GitHub <#131 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AABBSKNWE3SZG4R7MYIO2K34V5CSJAVCNFSM6AAAAACXX54TUSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHM2DENJQGQ3TENRRGY> . You are receiving this because you authored the thread.Message ID: ***@***.***>

-- David Roundy

droundy · 2026-04-15T23:02:48Z

are the tests run on randomly generated data?

i believe they all run against a fixed seed? or all against the same data, anyway.

there are test columns reflective of the resulting encoded size (as well as encoded data compressed with generalized compression algorithms), but if there is any kind of complicated compression going on then yeah it's going to be a very nuanced thing to measure. that said given that every other library has to represent the exact same data, if you're able to uniquely represent it in fewer bytes and eliminate that redundancy that's great. this is a good suite for general speed/size tradeoffs, not quite as much for showing the ability to compress a compressible dataset.

Yeah, I see that the logs are not completely random, so presumably there is some compression to be had there. But the details of how easily it compresses is obviously a function of how the random data is constructed...

It'll be fun to see. And also depressing to see how slow compactly currently is! :)

mumbleskates · 2026-04-16T00:44:12Z

you should be able to see both a) how compressible compactly's encoded data is altogether and b) how much compression it captures in its own processes vs how much is captured after the fact by zlib/zstd. it may make the most sense for its usecase to compare compactly's encoding time to encoding+compression time for other formats

mumbleskates

ok this looks fine to me! 👍 i think some of the fields now annotated as "low cardinality" maybe aren't actually low cardinality in terms of their intent, but i'm also not an expert on what that actually does nor do i want to put down overly precious rules about the spirit of the benchmark or whatever. This library already has different aims than a lot of the others we benchmark so we'll see what it looks like.

droundy · 2026-04-16T13:43:17Z

you should be able to see both a) how compressible compactly's encoded data is altogether and b) how much compression it captures in its own processes vs how much is captured after the fact by zlib/zstd. it may make the most sense for its usecase to compare compactly's encoding time to encoding+compression time for other formats

Yes that sounds like the right way to look at it.

droundy · 2026-04-16T13:49:24Z

ok this looks fine to me! 👍 i think some of the fields now annotated as "low cardinality" maybe aren't actually low cardinality in terms of their intent, but i'm also not an expert on what that actually does nor do i want to put down overly precious rules about the spirit of the benchmark or whatever. This library already has different aims than a lot of the others we benchmark so we'll see what it looks like.

I'm curious which fields you're thinking maybe aren't low cardinality? Roughly speaking low cardinality in compactly just means low enough cardinality that it's worth holding copies of all values in memory on case we get repeats.

mumbleskates · 2026-04-16T23:01:00Z

if that's what it does then in this case it probably doesn't matter. i was just thinking that things like "user id" are often relatively high cardinality compared to enum-like strings.

droundy · 2026-04-17T02:27:44Z

if that's what it does then in this case it probably doesn't matter. i was just thinking that things like "user id" are often relatively high cardinality compared to enum-like strings.

Ah that makes sense.

mumbleskates · 2026-04-18T01:20:08Z

well hey: it never encoded further-compressible data and handily won every size benchmark, so that looks like success to me :)

droundy added 6 commits April 13, 2026 13:48

benchmark compactly::ans

82788de

fix Cargo.toml (I could have sworn I already did this)

2980cda

remove change to Cargo.toml

2bee35b

add module

9610899

fix code just use default

4f74568

test with ANS encoding

749dc9a

mumbleskates requested changes Apr 13, 2026

View reviewed changes

droundy added 2 commits April 14, 2026 06:08

fix compile errors

ee0f1ad

switch back to ans, and revert accidental changes to capnp generated …

cb9f6c4

…code

droundy requested a review from mumbleskates April 14, 2026 13:13

mumbleskates requested changes Apr 15, 2026

View reviewed changes

cargo fmt

6f3d55f

specify LowCardinality attribute for enum-like strings

6b53153

mumbleskates approved these changes Apr 16, 2026

View reviewed changes

mumbleskates merged commit 2065c46 into djkoloski:master Apr 17, 2026
1 check passed

Conversation

droundy commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mumbleskates left a comment

Choose a reason for hiding this comment

Uh oh!

droundy commented Apr 14, 2026

Uh oh!

mumbleskates commented Apr 15, 2026

Uh oh!

mumbleskates commented Apr 15, 2026

Uh oh!

mumbleskates left a comment

Choose a reason for hiding this comment

Uh oh!

mumbleskates commented Apr 15, 2026

Uh oh!

droundy commented Apr 15, 2026 via email

Uh oh!

droundy commented Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mumbleskates commented Apr 16, 2026

Uh oh!

mumbleskates left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

droundy commented Apr 16, 2026

Uh oh!

droundy commented Apr 16, 2026

Uh oh!

mumbleskates commented Apr 16, 2026

Uh oh!

droundy commented Apr 17, 2026

Uh oh!

Uh oh!

mumbleskates commented Apr 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

droundy commented Apr 13, 2026 •

edited

Loading

droundy commented Apr 15, 2026 •

edited

Loading

mumbleskates left a comment •

edited

Loading