tools: run clang-format/clang-tidy in a separate bazel output_base by travisdowns · Pull Request #30213 · redpanda-data/redpanda

travisdowns · 2026-04-17T14:31:01Z

Invoking tools/clang-format or tools/clang-tidy currently uses the main
bazel server's output_base, which means the tool invocation acquires the
same server lock as builds. If a build is in progress, the tool just sits
there waiting — potentially for many minutes. Fast, interactive operations
(formatting one file, running tidy on a diff) turn into long blocks, and
it's especially confusing when driven by Claude because the command looks
indistinguishable from a hang (which it already sometimes is for other
reasons).

This PR introduces an opt-in BAZEL_TRAMPOLINE_PARALLEL env var. When set,
bazel-tool-trampoline.sh passes --output_base=<default>_rptool so the
tool runs against a dedicated bazel server that can operate concurrently
with the main one. clang-tool.sh (the shim behind tools/clang-format
and tools/clang-tidy) defaults it to 1, so those tools opt in
automatically; callers can still set BAZEL_TRAMPOLINE_PARALLEL=0 to
restore the old behavior.

Computing the default output_base is itself awkward, because
bazel info output_base also takes the main lock. The script caches the
value at tools/bazel-tools/.cache/output_base and refreshes it
opportunistically with bazel --noblock_for_lock info output_base
(exit 9 = lock held, fall back to cache). Only a cold cache forces a
blocking query.

Backports Required

Release Notes

none

travisdowns · 2026-04-17T14:34:45Z

The other implementation i could think of is executing the clang-format binary directly, instead of using bazel run, which would also probably require some similar caching nonsense (to remember the path of the tool), and maybe a few more footguns.

This separate output base takes ~802 MB on my machine.

It actually seems to speed up tool execution from ~200ms to ~100ms, probably because bazel run happens against a server with much less loaded state ("or something").

Copilot

Pull request overview

Adds an opt-in mechanism for Bazel tool trampolines (and enables it by default for clang tools) to run using a dedicated Bazel output_base, avoiding lock contention with the primary Bazel server and improving interactive latency.

Changes:

Add BAZEL_TRAMPOLINE_PARALLEL support to run bazel run with a separate --output_base and cache the default output_base.
Default tools/clang-format / tools/clang-tidy shim behavior to enable parallel mode.
Ignore the new .cache/ directory used for output_base caching.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

File	Description
tools/clang-tool.sh	Defaults clang tool invocations to opt into parallel Bazel output base behavior.
tools/bazel-tools/support/bazel-tool-trampoline.sh	Implements parallel `output_base` selection + caching and threads `--output_base` into `bazel run`.
tools/bazel-tools/.gitignore	Ignores the `.cache/` directory created by the trampoline.

Copilot · 2026-04-17T14:34:54Z

+  tool_output_base="${default_output_base}_rptool"
+  output_base_flag=(--output_base="$tool_output_base")
+
+  debug "tool output_base   : $tool_output_base"


When BAZEL_TRAMPOLINE_PARALLEL is enabled, bazel run executes against a different --output_base, but the clang-tidy plugin path (LOADS) is still derived from bazel info bazel-bin using the default output base (earlier in this script). That can cause clang-tidy to fail to find plugins.so (or load a plugin built in a different output base) on a clean checkout or when the main output base hasn't built it yet. Consider computing bazel-bin using the same --output_base as the tool invocation (or restructuring so output_base_flag is decided before resolving BAZEL_BIN/LOADS).

Good catch — this was the reason I originally left the whole thing opt-in. Rather than try to compute plugin paths against the _rptool output_base (which would force clang-tidy to rebuild them), I split the shim: clang-format now goes through a new tools/clang-tool-parallel.sh that sets BAZEL_TRAMPOLINE_PARALLEL=1, while clang-tidy still points at clang-tool.sh and therefore keeps using the default output_base where plugins.so already lives.

Copilot · 2026-04-17T14:34:55Z

+#   - Cache exists: try `bazel --block_for_lock=false info output_base` to
+#     opportunistically refresh the cache; on lock-held (exit 9) just use the
+#     cached value.


The comments refer to --block_for_lock=false, but the actual command uses --noblock_for_lock. To avoid confusing future maintainers, align the documentation with the flag that is actually used (and the exit-code semantics described).

Aligned the comments to reference --noblock_for_lock (the form actually used in the code) in the amended commit.

Copilot · 2026-04-17T14:34:55Z

+    fresh=$(bazel --noblock_for_lock info output_base 2>/dev/null) || rc=$?
+    if ((rc == 0)); then
+      debug "output_base from: live bazel query (cache refreshed)"
+      default_output_base=$fresh
+      echo "$default_output_base" >"$cache_file"
+    elif ((rc == 9)); then
+      default_output_base="$(cat "$cache_file")"
+      debug "output_base from: cache ($cache_file, lock held)"
+    else
+      echo "ERROR: bazel-tool-trampoline.sh: bazel info output_base failed (exit $rc)" >&2
+      exit "$rc"


The non-blocking bazel ... info output_base call discards stderr (2>/dev/null). For failures other than lock-held (rc=9), this makes the surfaced error much harder to diagnose because only a generic message is printed. Consider capturing stderr and printing it when rc != 0 && rc != 9 (while still keeping the lock-held path quiet).

yeah, if it becomes a problem we can fix it

Copilot · 2026-04-17T14:34:55Z

+  if [[ ! -f $cache_file ]]; then
+    debug "output_base from: blocking bazel query (no cache yet)"
+    default_output_base="$(bazel info output_base)"
+    echo "$default_output_base" >"$cache_file"
+  else
+    # Bazel exits 9 (LOCK_HELD_NOBLOCK_FOR_LOCK) when --block_for_lock=false
+    # cannot acquire the server lock. Any other non-zero exit is a real
+    # error and should surface rather than silently falling back to cache.
+    rc=0
+    fresh=$(bazel --noblock_for_lock info output_base 2>/dev/null) || rc=$?
+    if ((rc == 0)); then
+      debug "output_base from: live bazel query (cache refreshed)"
+      default_output_base=$fresh
+      echo "$default_output_base" >"$cache_file"
+    elif ((rc == 9)); then
+      default_output_base="$(cat "$cache_file")"
+      debug "output_base from: cache ($cache_file, lock held)"


Cache updates write directly to output_base (echo ... >"$cache_file"). If the process is interrupted mid-write, the cache can become truncated/empty and later produce an invalid --output_base like "_rptool". Consider writing to a temp file and atomically mv-ing into place, and/or validating the cached value is non-empty before using it.

you have to be pretty unlucky to get a truncated write of a single line :)

vbotbuildovich · 2026-04-17T15:16:12Z

Retry command for Build#83315

please wait until all jobs are finished before running the slash command

/ci-repeat 1
skip-redpanda-build
skip-units
skip-rebase
tests/rptest/tests/cluster_config_test.py::ClusterConfigTest.test_valid_settings

vbotbuildovich · 2026-04-17T15:38:50Z

CI test results

test results on build#83315

test_status	test_class	test_method	test_arguments	test_kind	job_url	passed	reason	test_history
FAIL	ClusterConfigTest	test_valid_settings	null	integration	https://buildkite.com/redpanda/redpanda/builds/83315#019d9be1-f3d5-4f78-9a6b-408f99ea3d5f	0/1		https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=ClusterConfigTest&test_method=test_valid_settings

StephanDollberg · 2026-04-17T15:41:14Z

+# their output base. disk_cache will in general obviate the need to actually
+# rebuild anything.
+#
+# Determining the default output_base is tricky: `bazel info output_base` itself


Why do we need the default output_base at all? Isn't that equivalent to running with no args?

We build the 2nd base output based on the first one, by appending _rptool, so it goes in the same directory, gets wiped when you wipe that, on the same mount, etc.

Instead of just dumping it somehwere in /tmp (which maybe also isn't a terrible idea, but it will accumulate).

Right I see, guess if it works then we might as well just go with it now.

Previously, invoking `tools/clang-format` or `tools/clang-tidy` used the main bazel server's output_base, which meant the tool invocation would block on the main server's lock whenever a build was in progress. This turned simple, fast operations (formatting a single file, running tidy on a diff) into multi-minute waits — very annoying interactively, and especially confusing when driven by Claude since the command just looks like it's hanging (which it already sometimes does for other reasons). Add a BAZEL_TRAMPOLINE_PARALLEL env var that, when set, causes bazel-tool-trampoline.sh to pass `--output_base=<default>_rptool` so the tool runs under a dedicated bazel server. Default it to 1 in clang-tool.sh so clang-format and clang-tidy opt in automatically. Determining the default output_base is itself fiddly because `bazel info output_base` also takes the main lock. We cache the value in tools/bazel-tools/.cache/output_base, refreshing it opportunistically via `bazel --noblock_for_lock info output_base` (exit 9 = lock held) and only blocking when the cache doesn't yet exist.

travisdowns requested review from StephanDollberg and Copilot April 17, 2026 14:31

Copilot started reviewing on behalf of travisdowns April 17, 2026 14:31 View session

travisdowns requested a review from rockwotj April 17, 2026 14:31

Copilot AI reviewed Apr 17, 2026

View reviewed changes

StephanDollberg reviewed Apr 17, 2026

View reviewed changes

travisdowns force-pushed the td-noblock-tools branch from d10a96d to 41393c6 Compare April 17, 2026 15:50

StephanDollberg approved these changes Apr 17, 2026

View reviewed changes

travisdowns merged commit 3a600a1 into redpanda-data:dev Apr 20, 2026
19 checks passed

Conversation

travisdowns commented Apr 17, 2026

Backports Required

Release Notes

Uh oh!

travisdowns commented Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

travisdowns Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

travisdowns Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

travisdowns Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

travisdowns Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

vbotbuildovich commented Apr 17, 2026

Retry command for Build#83315

Uh oh!

vbotbuildovich commented Apr 17, 2026

CI test results

Uh oh!

StephanDollberg Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

travisdowns Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

StephanDollberg Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

travisdowns commented Apr 17, 2026 •

edited

Loading