debug_bundle: support OAUTHBEARER auth in broker-side admin API#30225
debug_bundle: support OAUTHBEARER auth in broker-side admin API#30225
Conversation
|
This should be backported to 26.1.x and 25.3.x |
CI test resultstest results on build#83412
test results on build#83430
|
@david-yu please update your PR description to follow the template: https://github.com/redpanda-data/redpanda/blob/dev/.github/pull_request_template.md If you want a backport, you can check those boxes and it'll happen automatically. |
dotnwat
left a comment
There was a problem hiding this comment.
The debug bundle unit tests failed
2026-04-20 18:53:39 UTC | //src/v/debug_bundle/tests:debug_bundle_service_test FAILED in 109.9s
-- | --
2026-04-20 18:53:39 UTC | /root/.cache/bazel/_bazel_root/661a3f61959a8d0bfbf25f4f3b68e0c0/execroot/_main/bazel-out/aarch64-dbg/testlogs/src/v/debug_bundle/tests/debug_bundle_service_test/test.log
The logs for the test failure are available as artifacts on the run if you click through to the buildkite job. Happy to help navigate that with you if needed!
|
Failures resolved, waiting review, no rush though |
|
From talking to @mattschumpert there are two separate endpoints to auth to both brokers and admin api. We should try to do service discovery on brokers and admin api. Will need to dig into whether this is needed. |
There was a problem hiding this comment.
Can we clean-up the commit history a bit. There are three issues with it:
- In general we want to avoid having commits in a PR that fix a bug that was introduced in a previous commit in the same PR. So for example, the commit debug_bundle: fix test_bearer_creds_args and check_clean_up appears to fix issues introduced in the first commit of the PR.
- That same commit, debug_bundle: fix test_bearer_creds_args and check_clean_up, in addition to fixing that issue introduced in the first commit, fixes an apparent issue with the "check_clean_up" test which appears to be unrelated to the OAUTHBEARER changes and in this case it would be nice to factor out that fix into a separate commit so that we can discuss it separately.
- We'll want to remove the merge commits from the commit history. This can generally be done right before merging with a git rebase.
On testing, since #30169 merged, it looks like we can add a ducktape test for the feature now so that we get end-to-end coverage for the feature. The unit tests in this PR are one part of the testing strategy, but they don't do a full e2e test of the feature over the network. Is it possible to do that or are there still pieces that are missing before we can have an e2e test? Another option is manual testing too if we don't expect to derive value out of the e2e test.
Extends the debug bundle SASL credential path to handle OAUTHBEARER in
addition to the existing SCRAM variants.
The admin API already accepts OIDC/bearer tokens for inbound request
authentication. However, `rpk debug remote-bundle start` involves a
second layer of auth: the broker spawns an `rpk debug bundle`
subprocess, which must connect back to the local Kafka, schema
registry, and admin APIs to collect the bundle. The caller supplies
those credentials inside the POST body under an `authentication`
field; the broker forwards them to the subprocess via -X flags. Until
now, that in-body `authentication` variant only modeled SCRAM
(`{username, password, mechanism}`) — there was no way to instruct
the subprocess to use OAUTHBEARER when calling Kafka.
- types.h: add bearer_creds{token, mechanism} and expand
debug_bundle_authn_options to std::variant<scram_creds, bearer_creds>
- json.h: add from_json<bearer_creds>; update debug_bundle_authn_options
dispatch to select the variant by the presence of "token" (OAUTHBEARER)
vs "username" (SCRAM), so a missing "token" field on an OAUTHBEARER
payload is rejected with invalid_parameters (400)
- debug_bundle_service.cc: add bearer_creds arm to the ss::visit that
builds rpk subprocess args; emits -Xpass=token:<TOKEN> and
-Xsasl.mechanism=OAUTHBEARER, which rpk already accepts
- debug_bundle.json: update the authentication schema to document both
the SCRAM and OAUTHBEARER variants
- tests: extend json_test.cc typed suite with bearer_creds and update
debug_bundle_authn_options cases; add standalone tests for parameters
with OAUTHBEARER auth and for rejection of a missing token field; add
test_bearer_creds_args to the service test to verify correct
subprocess argument emission
Test fixes surfaced by the new test ordering:
- test_bearer_creds_args: the previous implementation polled
rpk_debug_bundle_status in a loop while expecting the process to
still be running. When stdout became non-empty (~1s after start),
the loop broke and the test body finished while the rpk-shim was
still sleeping, causing the fixture to tear down a live process.
Rewritten to use run_bundle so the process completes before checking
the output.
- check_clean_up: the test waited for the .out file with a 10s budget
starting from when the zip appeared (~T+0). The .out file is written
only after the process exits (T+5s) and set_metadata finishes its
SHA256 calculation; on slow sandbox I/O this SHA256 step was observed
to take >10s. Replace wait_for_file_to_be_created(.out, 10s) with
wait_for_kvstore_to_populate(30s): the kvstore is updated only after
the .out file is written, making it a reliable completion signal
with headroom for SHA256 latency.
Also fixes an ill-formed duplicate default argument on
wait_for_kvstore_to_populate: the forward declaration already specifies
the default, and repeating it in the definition is ill-formed in C++.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
b982155 to
c45b786
Compare
|
Yes, a ducktape e2e test is feasible. The missing pieces are: (1) tagging common-go#165 as a new |
dotnwat
left a comment
There was a problem hiding this comment.
lgtm. in the future please split out changes into separate commits (e.g. "check_clean_up" fix should be in a separate commit).
Add DebugBundleOAuthBearerAuthn, an end-to-end ducktape test that exercises the full OAUTHBEARER forwarding path introduced in #30225 and #30277. The test spins up a Keycloak OIDC provider alongside a single Redpanda broker configured with SASL OAUTHBEARER. It issues a client credentials token from Keycloak, then POSTs a debug bundle start request with authentication: {mechanism: OAUTHBEARER, token: <JWT>}. The broker forwards the token to the rpk subprocess via -Xsasl.mechanism and -Xpass=token:..., and the subprocess authenticates to Kafka using the JWT. The test verifies the bundle completes successfully and the expected topic appears in kafka.json. Supporting changes: - redpanda_types.py: add OAuthBearerCredentials dataclass, which serializes to {mechanism, token} (handled by the existing DebugBundleEncoder dataclass branch without any special-casing) - admin.py: widen DebugBundleStartConfigParams.authentication to accept OAuthBearerCredentials alongside the existing SaslCredentials Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
Will do thank you! |
Remove the early-exit guard that rejected OAUTHBEARER profiles in 'rpk debug remote-bundle start'. The broker-side admin API now accepts a bearer_creds payload (#30225), and rpadmin v0.2.6 exposes WithOAuthBearerAuthentication so rpk can forward the token. toRpadminOptions now dispatches on mechanism: OAUTHBEARER profiles call WithOAuthBearerAuthentication(token), all other SASL profiles fall through to the existing WithSCRAMAuthentication path. Add DebugBundleOAuthBearerAuthn, an end-to-end ducktape test that exercises the full OAUTHBEARER forwarding path. The test spins up a Keycloak OIDC provider alongside a single Redpanda broker configured with SASL OAUTHBEARER. It issues a client credentials token from Keycloak, then POSTs a debug bundle start request with authentication: {mechanism: OAUTHBEARER, token: <JWT>}. The broker forwards the token to the rpk subprocess via -Xsasl.mechanism and -Xpass=token:..., and the subprocess authenticates to Kafka using the JWT. The test verifies the bundle completes successfully and the expected topic appears in kafka.json. Supporting changes: - redpanda_types.py: add OAuthBearerCredentials dataclass, which serializes to {mechanism, token} - admin.py: widen DebugBundleStartConfigParams.authentication to accept OAuthBearerCredentials alongside the existing SaslCredentials Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Remove the early-exit guard that rejected OAUTHBEARER profiles in 'rpk debug remote-bundle start'. The broker-side admin API now accepts a bearer_creds payload (#30225), and rpadmin v0.2.6 exposes WithOAuthBearerAuthentication so rpk can forward the token. toRpadminOptions now dispatches on mechanism: OAUTHBEARER profiles call WithOAuthBearerAuthentication(token), all other SASL profiles fall through to the existing WithSCRAMAuthentication path. Add DebugBundleOAuthBearerAuthn, an end-to-end ducktape test that exercises the full OAUTHBEARER forwarding path. The test spins up a Keycloak OIDC provider alongside a single Redpanda broker configured with SASL OAUTHBEARER. It issues a client credentials token from Keycloak, then POSTs a debug bundle start request with authentication: {mechanism: OAUTHBEARER, token: <JWT>}. The broker forwards the token to the rpk subprocess via -Xsasl.mechanism and -Xpass=token:..., and the subprocess authenticates to Kafka using the JWT. The test verifies the bundle completes successfully and the expected topic appears in kafka.json. Supporting changes: - redpanda_types.py: add OAuthBearerCredentials dataclass, which serializes to {mechanism, token} - admin.py: widen DebugBundleStartConfigParams.authentication to accept OAuthBearerCredentials alongside the existing SaslCredentials Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Closes #30222
Why is this needed?
Redpanda's admin API already supports OIDC/bearer tokens for inbound request authentication (docs) — so it's reasonable to ask: why does
rpk debug remote-bundle startneed any broker-side change to support OAUTHBEARER?The answer is that this endpoint involves two distinct layers of authentication, and only one of them was already covered:
rpk debug bundlesubprocess on the node, and that subprocess has to turn around and call the local Kafka API, schema registry, and admin API to collect the bundle. The caller supplies those credentials inside the POST body under anauthenticationfield; the broker forwards them to the subprocess via-Xflags.Before this PR, the in-body
authenticationvariant only modeled SCRAM ({username, password, mechanism}). Even if the caller authenticated their admin API request with a bearer token, there was no way to tell the subprocess "use OAUTHBEARER when you call Kafka" — the broker's JSON parser would reject or mis-parse a{token, mechanism}payload. This PR adds that second variant so the broker can emit-Xpass=token:<TOKEN> -Xsasl.mechanism=OAUTHBEARERto the subprocess.TL;DR: admin API inbound auth ≠ debug-bundle subprocess auth. The existing OIDC support covers the first; this PR adds the second.
Context
This PR implements the broker-side half of end-to-end OAUTHBEARER support for
rpk debug remote-bundle start. The full chain is:rpk debug remote-bundle startwith a "not yet supported" error until the broker side is readyrpadmin.WithOAuthBearerAuthentication(token)and the{mechanism, token}JSON payload{mechanism, token}payload and passes-Xpass=token:<TOKEN> -Xsasl.mechanism=OAUTHBEARERto the rpk subprocessout.Dieinstart.goand callrpadmin.WithOAuthBearerAuthentication(token)when the profile mechanism is OAUTHBEARERSummary of changes
types.h: Addbearer_creds{token, mechanism}; expanddebug_bundle_authn_optionstostd::variant<scram_creds, bearer_creds>json.h: Addfrom_json<bearer_creds>; updatedebug_bundle_authn_optionsdispatch to select the variant by presence of"token"(OAUTHBEARER) vs"username"(SCRAM) —{mechanism: OAUTHBEARER}with notokenfield is rejected withinvalid_parameters(400)debug_bundle_service.cc: Addbearer_credsarm to thess::visitthat builds rpk subprocess args, emitting-Xpass=token:<TOKEN>and-Xsasl.mechanism=OAUTHBEARERdebug_bundle.json: Update authentication schema to document both the SCRAM and OAUTHBEARER variantsjson_test.cc:bearer_credsadded to typed test suite (BasicType, TypeIsInvalid, ValidateControlCharacters);ParametersWithBearerAuthstandalone test;BearerAuthMissingTokenIsRejectedstandalone test verifying 400 behaviourdebug_bundle_service_test.cc:test_bearer_creds_argsverifies the subprocess receives-Xpass=token:<TOKEN> -Xsasl.mechanism=OAUTHBEARER;check_clean_upmade robust against slow SHA256 I/OTest fixes (from CI run analysis)
Two pre-existing test issues were exposed by the new test ordering:
test_bearer_creds_args— the original implementation polledrpk_debug_bundle_statusin a loop while expectingstatus == running, breaking out when stdout became non-empty (~1s). At that point the rpk-shim was still sleeping (5s total), so fixture teardown terminated a live process, producing a spurious "Failed to terminate" warning. Rewritten to userun_bundleso the process completes before the stdout is inspected.check_clean_up— the test waited for the.outfile with a 10s budget starting from when the zip file appeared (~T+0). The.outfile is only written after the process exits (T+5s) andset_metadatafinishes itscalculate_sha256_sumcall; on slow Bazel sandbox I/O, SHA256 was observed taking >10s, blowing the deadline. Fixed by replacingwait_for_file_to_be_created(.out, 10s)withwait_for_kvstore_to_populate(30s): the kvstore entry is written only after the.outfile, making it a reliable completion signal with headroom for SHA256 latency.Test plan
bazel test //src/v/debug_bundle/tests:json_testbazel test //src/v/debug_bundle/tests:debug_bundle_service_testbazel run //tools:clang_formatBackports Required
Release Notes
Features
rpk debug remote-bundle start, enabling remote debug bundle collection against clusters configured with OAUTHBEARER authentication.Generated with Claude Code