[c++] avrogencpp: emit deterministic include guard#3778
Open
travisdowns wants to merge 1 commit into
Open
Conversation
CodeGen::guard() in avrogencpp.cc was suffixing the generated header's include guard with the output of std::mt19937 seeded from ::time(nullptr). That produced a different guard on every avrogen invocation, e.g.: #ifndef FOO_AVROGEN_H_3350718792_H #ifndef FOO_AVROGEN_H_2362587291_H Two consequences: 1. Generated headers were non-deterministic. Repeated runs on the same schema produced different bytes, which is surprising for a codegen and makes side-by-side diff / git review difficult. 2. Build systems that key their cache on input-content digests (e.g. Bazel's remote cache, the Nix store) saw every consumer of the generated header miss the cache on every build, even when the schema was byte-identical. In a hermetic two-output-base Bazel build of a downstream project, this surfaced as a chain of cascade rebuilds that started at manifest_file.avrogen.h and propagated through every .cc that included it. headerFile_ is already guaranteed-unique per output (it's the path of the file we're about to write). makeCanonical(h, true) turns it into a valid C identifier, which is already a fine guard name on its own; the random suffix doesn't add uniqueness, only entropy.
This was referenced May 14, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
CodeGen::guard()in lang/c++/impl/avrogencpp.cc suffixes the generated header's#ifndefguard with the output of anstd::mt19937seeded from::time(nullptr):So two invocations on the same schema produce headers whose guards differ:
This PR drops the random suffix.
Why
Generated output is non-deterministic. Re-running
avrogencpp -i schema.avsc -o foo.hon the same input produces a different file every time, which is surprising for a codegen and makes side-by-side diff / git review difficult.It breaks content-addressed build systems. Bazel's remote cache and the Nix store both key on input-content digests. With a randomised include guard, every consumer of the generated header sees a different input digest on every build — so on byte-identical schemas the entire downstream
.cc→.o→.a→ binary chain has to recompile and re-link, even though the schema hasn't changed. This was the trigger for filing the PR: a hermetic two-output-base Bazel build comparison flaggedmanifest_file.avrogen.has a root non-hermetic action and traced ~hundreds of downstream cascade rebuilds back to it.Why this is safe
headerFile_is already guaranteed-unique per output (it's the path of the file we're about to write).makeCanonical(h, true)turns that path into a valid C identifier, which on its own is a fine guard name. The RNG suffix only added entropy, not uniqueness — there's no scenario where twoavrogencppruns producing headers at the same output path are supposed to coexist with conflicting guards.After the change, the guard is
<canonicalised-path>_H, deterministic across runs.Test
Build avrogencpp, run it twice against the same schema (
echo $RANDOM > /tmp/x.avscis irrelevant — same schema, same path), check the outputs are byte-identical. Before the change they differ; after, they match.