benchmark compactly::ans#131
Conversation
mumbleskates
left a comment
There was a problem hiding this comment.
this doesn't compile right now. derives for the required trait are missing on a number of structs.
there are also stray changes made to the capnp generated code, which should not be included and will cause our build checks to fail.
|
As I look more closely at the code... are the tests run on randomly generated data? That is sort of a pathological case for |
i believe they all run against a fixed seed? or all against the same data, anyway. there are test columns reflective of the resulting encoded size (as well as encoded data compressed with generalized compression algorithms), but if there is any kind of complicated compression going on then yeah it's going to be a very nuanced thing to measure. that said given that every other library has to represent the exact same data, if you're able to uniquely represent it in fewer bytes and eliminate that redundancy that's great. this is a good suite for general speed/size tradeoffs, not quite as much for showing the ability to compress a compressible dataset. |
|
currently the pr is failing |
|
note that your ability to build & test this might also be affected by the same problem #132 is trying to address right now if you don't have a locally cached version of |
|
I ran `cargo fmt` and pushed and am trying the test right now.
…On Wed, Apr 15, 2026 at 1:34 AM Kent Ross ***@***.***> wrote:
*mumbleskates* left a comment (djkoloski/rust_serialization_benchmark#131)
<#131 (comment)>
note that your ability to build & test this might also be affected by the
same problem #132
<#132> is
trying to address right now if you don't have a locally cached version of
core2
—
Reply to this email directly, view it on GitHub
<#131 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AABBSKNWE3SZG4R7MYIO2K34V5CSJAVCNFSM6AAAAACXX54TUSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHM2DENJQGQ3TENRRGY>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
--
David Roundy
|
Yeah, I see that the logs are not completely random, so presumably there is some compression to be had there. But the details of how easily it compresses is obviously a function of how the random data is constructed... It'll be fun to see. And also depressing to see how slow |
|
you should be able to see both a) how compressible compactly's encoded data is altogether and b) how much compression it captures in its own processes vs how much is captured after the fact by zlib/zstd. it may make the most sense for its usecase to compare compactly's encoding time to encoding+compression time for other formats |
There was a problem hiding this comment.
ok this looks fine to me! 👍 i think some of the fields now annotated as "low cardinality" maybe aren't actually low cardinality in terms of their intent, but i'm also not an expert on what that actually does nor do i want to put down overly precious rules about the spirit of the benchmark or whatever. This library already has different aims than a lot of the others we benchmark so we'll see what it looks like.
Yes that sounds like the right way to look at it. |
I'm curious which fields you're thinking maybe aren't low cardinality? Roughly speaking low cardinality in compactly just means low enough cardinality that it's worth holding copies of all values in memory on case we get repeats. |
|
if that's what it does then in this case it probably doesn't matter. i was just thinking that things like "user id" are often relatively high cardinality compared to enum-like strings. |
Ah that makes sense. |
|
well hey: it never encoded further-compressible data and handily won every size benchmark, so that looks like success to me :) |
This adds support for the
compactlycrate but I haven't been able to compile it due to #130 so I am doubtful that it is correct.Technically this benchmarks
compactly::Answhich is supposed to be faster at the cost of sometimes requiring one more byte of storage. If you wanted to support two versions ofcompactlywe could also test the arithmetic coding version, but that seems excessive.