tools: run clang-format/clang-tidy in a separate bazel output_base#30213
tools: run clang-format/clang-tidy in a separate bazel output_base#30213travisdowns merged 1 commit intoredpanda-data:devfrom
Conversation
|
The other implementation i could think of is executing the clang-format binary directly, instead of using This separate output base takes ~802 MB on my machine. It actually seems to speed up tool execution from ~200ms to ~100ms, probably because |
There was a problem hiding this comment.
Pull request overview
Adds an opt-in mechanism for Bazel tool trampolines (and enables it by default for clang tools) to run using a dedicated Bazel output_base, avoiding lock contention with the primary Bazel server and improving interactive latency.
Changes:
- Add
BAZEL_TRAMPOLINE_PARALLELsupport to runbazel runwith a separate--output_baseand cache the defaultoutput_base. - Default
tools/clang-format/tools/clang-tidyshim behavior to enable parallel mode. - Ignore the new
.cache/directory used foroutput_basecaching.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| tools/clang-tool.sh | Defaults clang tool invocations to opt into parallel Bazel output base behavior. |
| tools/bazel-tools/support/bazel-tool-trampoline.sh | Implements parallel output_base selection + caching and threads --output_base into bazel run. |
| tools/bazel-tools/.gitignore | Ignores the .cache/ directory created by the trampoline. |
| tool_output_base="${default_output_base}_rptool" | ||
| output_base_flag=(--output_base="$tool_output_base") | ||
|
|
||
| debug "tool output_base : $tool_output_base" |
There was a problem hiding this comment.
When BAZEL_TRAMPOLINE_PARALLEL is enabled, bazel run executes against a different --output_base, but the clang-tidy plugin path (LOADS) is still derived from bazel info bazel-bin using the default output base (earlier in this script). That can cause clang-tidy to fail to find plugins.so (or load a plugin built in a different output base) on a clean checkout or when the main output base hasn't built it yet. Consider computing bazel-bin using the same --output_base as the tool invocation (or restructuring so output_base_flag is decided before resolving BAZEL_BIN/LOADS).
There was a problem hiding this comment.
Good catch — this was the reason I originally left the whole thing opt-in. Rather than try to compute plugin paths against the _rptool output_base (which would force clang-tidy to rebuild them), I split the shim: clang-format now goes through a new tools/clang-tool-parallel.sh that sets BAZEL_TRAMPOLINE_PARALLEL=1, while clang-tidy still points at clang-tool.sh and therefore keeps using the default output_base where plugins.so already lives.
| # - Cache exists: try `bazel --block_for_lock=false info output_base` to | ||
| # opportunistically refresh the cache; on lock-held (exit 9) just use the | ||
| # cached value. |
There was a problem hiding this comment.
The comments refer to --block_for_lock=false, but the actual command uses --noblock_for_lock. To avoid confusing future maintainers, align the documentation with the flag that is actually used (and the exit-code semantics described).
There was a problem hiding this comment.
Aligned the comments to reference --noblock_for_lock (the form actually used in the code) in the amended commit.
| fresh=$(bazel --noblock_for_lock info output_base 2>/dev/null) || rc=$? | ||
| if ((rc == 0)); then | ||
| debug "output_base from: live bazel query (cache refreshed)" | ||
| default_output_base=$fresh | ||
| echo "$default_output_base" >"$cache_file" | ||
| elif ((rc == 9)); then | ||
| default_output_base="$(cat "$cache_file")" | ||
| debug "output_base from: cache ($cache_file, lock held)" | ||
| else | ||
| echo "ERROR: bazel-tool-trampoline.sh: bazel info output_base failed (exit $rc)" >&2 | ||
| exit "$rc" |
There was a problem hiding this comment.
The non-blocking bazel ... info output_base call discards stderr (2>/dev/null). For failures other than lock-held (rc=9), this makes the surfaced error much harder to diagnose because only a generic message is printed. Consider capturing stderr and printing it when rc != 0 && rc != 9 (while still keeping the lock-held path quiet).
There was a problem hiding this comment.
yeah, if it becomes a problem we can fix it
| if [[ ! -f $cache_file ]]; then | ||
| debug "output_base from: blocking bazel query (no cache yet)" | ||
| default_output_base="$(bazel info output_base)" | ||
| echo "$default_output_base" >"$cache_file" | ||
| else | ||
| # Bazel exits 9 (LOCK_HELD_NOBLOCK_FOR_LOCK) when --block_for_lock=false | ||
| # cannot acquire the server lock. Any other non-zero exit is a real | ||
| # error and should surface rather than silently falling back to cache. | ||
| rc=0 | ||
| fresh=$(bazel --noblock_for_lock info output_base 2>/dev/null) || rc=$? | ||
| if ((rc == 0)); then | ||
| debug "output_base from: live bazel query (cache refreshed)" | ||
| default_output_base=$fresh | ||
| echo "$default_output_base" >"$cache_file" | ||
| elif ((rc == 9)); then | ||
| default_output_base="$(cat "$cache_file")" | ||
| debug "output_base from: cache ($cache_file, lock held)" |
There was a problem hiding this comment.
Cache updates write directly to output_base (echo ... >"$cache_file"). If the process is interrupted mid-write, the cache can become truncated/empty and later produce an invalid --output_base like "_rptool". Consider writing to a temp file and atomically mv-ing into place, and/or validating the cached value is non-empty before using it.
There was a problem hiding this comment.
you have to be pretty unlucky to get a truncated write of a single line :)
Retry command for Build#83315please wait until all jobs are finished before running the slash command |
CI test resultstest results on build#83315
|
| # their output base. disk_cache will in general obviate the need to actually | ||
| # rebuild anything. | ||
| # | ||
| # Determining the default output_base is tricky: `bazel info output_base` itself |
There was a problem hiding this comment.
Why do we need the default output_base at all? Isn't that equivalent to running with no args?
There was a problem hiding this comment.
We build the 2nd base output based on the first one, by appending _rptool, so it goes in the same directory, gets wiped when you wipe that, on the same mount, etc.
Instead of just dumping it somehwere in /tmp (which maybe also isn't a terrible idea, but it will accumulate).
There was a problem hiding this comment.
Right I see, guess if it works then we might as well just go with it now.
Previously, invoking `tools/clang-format` or `tools/clang-tidy` used the main bazel server's output_base, which meant the tool invocation would block on the main server's lock whenever a build was in progress. This turned simple, fast operations (formatting a single file, running tidy on a diff) into multi-minute waits — very annoying interactively, and especially confusing when driven by Claude since the command just looks like it's hanging (which it already sometimes does for other reasons). Add a BAZEL_TRAMPOLINE_PARALLEL env var that, when set, causes bazel-tool-trampoline.sh to pass `--output_base=<default>_rptool` so the tool runs under a dedicated bazel server. Default it to 1 in clang-tool.sh so clang-format and clang-tidy opt in automatically. Determining the default output_base is itself fiddly because `bazel info output_base` also takes the main lock. We cache the value in tools/bazel-tools/.cache/output_base, refreshing it opportunistically via `bazel --noblock_for_lock info output_base` (exit 9 = lock held) and only blocking when the cache doesn't yet exist.
d10a96d to
41393c6
Compare
Invoking
tools/clang-formatortools/clang-tidycurrently uses the mainbazel server's output_base, which means the tool invocation acquires the
same server lock as builds. If a build is in progress, the tool just sits
there waiting — potentially for many minutes. Fast, interactive operations
(formatting one file, running tidy on a diff) turn into long blocks, and
it's especially confusing when driven by Claude because the command looks
indistinguishable from a hang (which it already sometimes is for other
reasons).
This PR introduces an opt-in
BAZEL_TRAMPOLINE_PARALLELenv var. When set,bazel-tool-trampoline.shpasses--output_base=<default>_rptoolso thetool runs against a dedicated bazel server that can operate concurrently
with the main one.
clang-tool.sh(the shim behindtools/clang-formatand
tools/clang-tidy) defaults it to1, so those tools opt inautomatically; callers can still set
BAZEL_TRAMPOLINE_PARALLEL=0torestore the old behavior.
Computing the default output_base is itself awkward, because
bazel info output_basealso takes the main lock. The script caches thevalue at
tools/bazel-tools/.cache/output_baseand refreshes itopportunistically with
bazel --noblock_for_lock info output_base(exit 9 = lock held, fall back to cache). Only a cold cache forces a
blocking query.
Backports Required
Release Notes