Skip to content

bazel/avro: make avrogen header include guard deterministic#30487

Closed
travisdowns wants to merge 1 commit into
redpanda-data:devfrom
travisdowns:td-core-16317-avrogen-stable-guard
Closed

bazel/avro: make avrogen header include guard deterministic#30487
travisdowns wants to merge 1 commit into
redpanda-data:devfrom
travisdowns:td-core-16317-avrogen-stable-guard

Conversation

@travisdowns
Copy link
Copy Markdown
Member

@travisdowns travisdowns commented May 14, 2026

Summary

  • Patches the upstream Apache Avro avrogencpp tool (via bazel/thirdparty/avrogen-stable-include-guard.patch) so the generated header's include guard no longer carries a time-seeded random suffix.
  • Drops the boost::mt19937 random number from lang/c++/impl/avrogencpp.cc:796 — the canonicalized headerFile_ is already path-unique.

Why

The current avrogen produces a guard like BAZEL_OUT_..._MANIFEST_FILE_AVROGEN_H_<random>_H where <random> comes from a boost::mt19937 seeded with ::time(nullptr). Two invocations on the same schema produce headers with different bytes, so every consumer of the generated header (manifest_list_avro.cc) sees a different input digest on every build. That blows the bazel remote cache for the entire downstream chain — iceberg lib → redpanda binary → //tools:redpanda_package_for_pgo — even when the schema is byte-identical.

Identified via the beptool walkback analysis of two-output-base builds of //tools:redpanda_package_for_pgo: it's one of only two persistent root causes (the other being make_tool, CORE-16315) that infect that target across overnight runs. Fixing this is half of the work to make the PGO instrument tar reproducible, which would let the ~1956 PGO-optimize compile actions in the overnight builds (84428–84432) actually hit the remote cache instead of re-executing locally on every run.

Tracks CORE-16317.

Validation

bazel --output_base=/tmp/run1 build //src/v/iceberg:manifest_file_genrule
bazel --output_base=/tmp/run2 build //src/v/iceberg:manifest_file_genrule
sha256sum /tmp/run{1,2}/execroot/_main/bazel-out/k8-fastbuild/bin/src/v/iceberg/manifest_file.avrogen.h

Before this patch the two hashes differed. After:

df2e1bf5be81d8f068f1e1c982c302a18a284b77bd981563865618c7001ef22e  run1/.../manifest_file.avrogen.h
df2e1bf5be81d8f068f1e1c982c302a18a284b77bd981563865618c7001ef22e  run2/.../manifest_file.avrogen.h

New include guard: BAZEL_OUT_K8_FASTBUILD_BIN_SRC_V_ICEBERG_MANIFEST_FILE_AVROGEN_H_H (deterministic; the trailing _H_H is cosmetic — could be simplified to just h if reviewers prefer).

Upstream

I'll open a corresponding PR against apache/avro so other users of avrogen with hermetic build systems (Bazel, Nix, etc.) get the fix.

Test plan

  • Two-output-base builds of //src/v/iceberg:manifest_file_genrule produce byte-identical headers
  • CI green (bazel build + test)
  • Confirm //tools:redpanda_package_for_pgo tar diverges only on the remaining make_tool root after this lands

The generated header's include guard was suffixed with the output of a
`boost::mt19937` seeded from `::time(nullptr)` (`avrogencpp.cc:796`),
producing a different guard on every avrogen invocation:

  #ifndef BAZEL_OUT_..._MANIFEST_FILE_AVROGEN_H_3350718792_H
  #ifndef BAZEL_OUT_..._MANIFEST_FILE_AVROGEN_H_2362587291_H

That made every consumer of the generated header (e.g. manifest_list_avro.cc)
see a different input digest on each build, so bazel's remote cache was
broken for the entire downstream dependency chain even on byte-identical
schemas. `headerFile_` is already a unique-per-output path; dropping the
random suffix leaves a stable, still-unique guard.

Identified via two-output-base hermetic builds of
`//tools:redpanda_package_for_pgo` as one of the two roots that infect that
target.

Tracks CORE-16317.
@travisdowns travisdowns requested review from andrwng and Copilot and removed request for Copilot May 14, 2026 22:54
@travisdowns travisdowns requested review from dotnwat and removed request for andrwng and dotnwat May 14, 2026 23:00
@travisdowns travisdowns marked this pull request as draft May 14, 2026 23:01
@travisdowns
Copy link
Copy Markdown
Member Author

Closing in favor of bumping the pinned redpanda-data/avro commit once redpanda-data/avro#399 merges. That removes the need for a downstream bazel patch entirely. Upstream apache/avro change tracked at apache/avro#3778.

@vbotbuildovich
Copy link
Copy Markdown
Collaborator

Retry command for Build#84489

please wait until all jobs are finished before running the slash command

/ci-repeat 1
skip-redpanda-build
skip-units
skip-rebase
tests/rptest/tests/availability_test.py::AvailabilityTests.test_recovery_after_catastrophic_failure

@vbotbuildovich
Copy link
Copy Markdown
Collaborator

CI test results

test results on build#84489
test_status test_class test_method test_arguments test_kind job_url passed reason test_history
FAIL AvailabilityTests test_recovery_after_catastrophic_failure null integration https://buildkite.com/redpanda/redpanda/builds/84489#019e28ba-e92a-4ac7-bb78-9f25ad653c23 0/1 https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=AvailabilityTests&test_method=test_recovery_after_catastrophic_failure
FLAKY(PASS) InternalTopicProtectionLargeClusterTest test_consumer_offset_topic null integration https://buildkite.com/redpanda/redpanda/builds/84489#019e28ba-21cd-4fc5-a3bc-5fcec21124a8 19/21 Test PASSES after retries.No significant increase in flaky rate(baseline=0.0038, p0=0.0739, reject_threshold=0.0100. adj_baseline=0.1000, p1=0.3917, trust_threshold=0.5000) https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=InternalTopicProtectionLargeClusterTest&test_method=test_consumer_offset_topic

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants