base: finer grained log filtering in vlog#30202
base: finer grained log filtering in vlog#30202WillemKauf wants to merge 6 commits intoredpanda-data:devfrom
base: finer grained log filtering in vlog#30202Conversation
base: finer grained log filteringbase: finer grained log filtering in vlog
There was a problem hiding this comment.
Pull request overview
Adds runtime, per-callsite filtering for the vlog macro family, plus an internal admin RPC service to apply/reset/list log filter rules on a live node.
Changes:
- Introduces a
vlog::detail::callsiteregistry and rule engine (apply_rules/reset_rules/for_each_callsite) to enable/disable individualvlogcallsites at runtime. - Updates
vlog/vlogl/vloglrmacros to gate formatting/argument evaluation behind a per-callsiteatomic<bool>. - Adds
LogFilterService(proto + admin service impl) with RPCs to set/reset rules and list registered callsites; includes a new unit test target for the filtering logic.
Reviewed changes
Copilot reviewed 18 out of 18 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| src/v/redpanda/application_admin.cc | Registers the new internal log filter admin service. |
| src/v/redpanda/admin/services/internal/log_filter.h | Declares the LogFilterService implementation class. |
| src/v/redpanda/admin/services/internal/log_filter.cc | Implements RPC handlers translating wire rules to vlog::apply_rules and listing callsites. |
| src/v/redpanda/admin/services/internal/BUILD | Adds a new library target for the log filter admin service. |
| src/v/redpanda/BUILD | Links the new internal log filter service into the redpanda target. |
| src/v/base/vlog_filter.h | Defines rule schema and the public filter API (apply_rules/reset_rules/for_each_callsite). |
| src/v/base/vlog_callsite.h | Introduces the per-callsite data structure and enabled flag. |
| src/v/base/vlog_callsite.cc | Implements rule matching/evaluation, rules snapshot publication, and the lock-free callsite registry. |
| src/v/base/vlog.h | Updates vlog* macros to consult the per-callsite enabled flag before formatting/log dispatch. |
| src/v/base/tests/vlog_filter_test.cc | Adds unit tests for rule matching, ordering, registration timing, and defaults. |
| src/v/base/tests/BUILD | Adds a new gtest target for vlog filtering. |
| src/v/base/BUILD | Adds the new callsite source and headers to the base library. |
| proto/redpanda/core/admin/internal/v1/log_filter.proto | Adds the RPC/service and message schema for log filtering and callsite listing. |
| proto/redpanda/core/admin/internal/v1/BUILD | Adds build targets for the new proto. |
| // Enabled value written to every site the rule matches. | ||
| bool enabled = 4; |
There was a problem hiding this comment.
LogFilterRule.enabled is a non-optional proto3 bool, so if a client omits it the default will be false (disabling matching callsites). That conflicts with the in-process default (vlog::rule::enabled = true) and makes it easy for partially-specified rules (e.g. {file:"..."}) to unintentionally disable logs. Consider making this field optional bool enabled = 4 (so presence is tracked) and defaulting to true server-side when it is not set, or otherwise redesigning the API so the default behavior is unambiguous.
| // Enabled value written to every site the rule matches. | |
| bool enabled = 4; | |
| // Enabled value written to every site the rule matches. If omitted, | |
| // the server should default this to true. | |
| optional bool enabled = 4; |
|
|
||
| // A registered vlog callsite and its current enabled state. | ||
| message LogCallsiteInfo { | ||
| // Source-file basename (path stripped at compile time). |
There was a problem hiding this comment.
The LogCallsiteInfo.file field comment says the value is a basename with the path stripped, but cs.file() comes from __FILE__ (and the PR description suggests matching on paths like src/v/storage/*). Either update the comment to describe the actual value (e.g. __FILE__ path as compiled), or change the implementation to strip to basename before returning it to clients so the API matches its documentation.
| // Source-file basename (path stripped at compile time). | |
| // Callsite source file path as compiled into __FILE__. |
71bd540 to
6a4f868
Compare
CI test resultstest results on build#83283
|
|
/ci-repeat 1 |
1f533f2 to
55379c0
Compare
Each vlog/vlogl/vloglr invocation now emits a static vlog::detail::callsite
that self-registers in a lock-free intrusive list at first hit and gates
the log emission on a relaxed atomic<bool>. Runtime filter rules
(file-glob, file+line, format-substring) flip those flags through
vlog::apply_rules, which snapshots the current rule set as a shared_ptr
swapped atomically via std::atomic_load/store free functions. A cold
callsite evaluates its initial enabled state against the currently-
published rules so filters set before a line first fires take effect on
its very first invocation.
The enabled-line cost is an extra relaxed load plus a well-predicted
branch; the disabled-line cost drops from full format-argument evaluation
to just that branch.
`base`: promote vlog callsite gate from bool to 4-value state enum
Callsite state becomes {uninit, default_, force_on, force_off}.
Filter rules carry a state field in place of bool enabled.
evaluate() returns the resolved state; apply_rules / reset_rules
write state values via set_state. The vlog.h macro layer and the
existing tests still compile against bool enabled — both are
updated in subsequent commits.
base/tests: rewrite vlog_filter_test to state enum schema
Mechanical translation of bool enabled() and rule{.enabled=...}
to resolved_state() and rule{.state=...}. Semantics preserved:
the prior 'enabled = false' is 'state = force_off', the prior
'enabled = true' used as a later-rule override is 'state = default_'.
Test coverage for force_on and for state=default_ as a carve-out
follows in a separate commit.
base/tests: cover force_on, force_off, default_ rule states
Drives vlog(...) against a captured seastar logger with the level
set to warn, verifying force_on emits a trace line and force_off
suppresses it regardless. A final test exercises apply_rules
ordering across all three states against a static callsite whose
resolved state is observed directly.
vlog / vlogl / vloglr macros switch on the callsite's resolved state: default_ keeps today's path, force_on routes through the seastar force_tag overloads to bypass the level gate, force_off short-circuits before argument evaluation. The uninit case is unreachable after resolved_state()'s slow_init — listed only to satisfy exhaustive-switch warnings. Wrapper logger types that are used with vlog (raft::ctx_log, prefix_logger, kafka group::ctx_log, connection_context::ctx_log, basic_retry_chain_logger) each receive matching force_tag overloads so that the force_on path compiles throughout the codebase. The one callsite that used an ad-hoc lambda as the method argument is converted to vloglr, which carries force semantics natively. The vlogl and vloglr macro parameters are renamed to _vlog_lgr_ to avoid token collision with the "logger" in ::seastar::logger::force during macro expansion. utils/truncating_logger: add force_tag overloads Follow-up to the vlog 3-way dispatch change. truncating_logger was not in the set of wrapper loggers updated in the preceding commit, but kafka::kwire (a truncating_logger) is passed to vlog macros so its force_on branch requires a force_tag overload on each convenience method. Mirrors the pattern used for the other wrapper loggers. The force path pre-formats into a fmt::memory_buffer with the same truncation logic as the normal path, then emits via _logger.log(lvl, force, ...) to bypass the level gate. SEASTAR_LOGGER_COMPILE_TIME_FMT is handled via #ifdef since the two seastar code paths differ in how runtime strings are passed to format_info.
proto/admin: log filter gains LogFilterState enum Replaces the bool enabled field on LogFilterRule and LogCallsiteInfo with a LogFilterState enum carrying INHERITED/FORCE_ON/FORCE_OFF. The UNSPECIFIED = 0 proto3 default is rejected by SetLogFilter and never returned by ListLogCallsites. LOG_FILTER_STATE_DEFAULT was renamed to LOG_FILTER_STATE_INHERITED because the protobuf C++ generator strips the enum name prefix, producing a symbol named 'default' which is a reserved keyword. Branch-local schema break; no clients exist yet.
admin/log_filter: translate LogFilterState enum both ways SetLogFilter rejects LOG_FILTER_STATE_UNSPECIFIED with INVALID_ARGUMENT and refuses any unknown enum value. ListLogCallsites reports the resolved callsite state via to_wire_state. The proto enum's INHERITED maps to the C++ callsite_base::state::default_ (keyword-collision workaround from the preceding proto commit).
| seastar::logger::set_ostream(_captured); | ||
| seastar::logger::set_ostream_enabled(true); | ||
| } | ||
| ~logger_capture() { seastar::logger::set_ostream(std::cerr); } |
There was a problem hiding this comment.
This seems like it would be wrong if the output stream wasn't cerr (e.g., because the user used soem commad line args which changed it).
Perhaps the set_ostream should return the old stream so it can be restored? Or are there lifetime issues?
Alternately maybe we need a reset_ostream which sets the stream back to the default.
There was a problem hiding this comment.
Yeah agreed. Unfortunately that will take some API changes to seastar... the tests are nice to have though, so wondering which direction you think we should go here (straight to upstream seastar changes, or changes to our fork)
There was a problem hiding this comment.
Otherwise, seastar::logger::set_ostream_enabled(false) might be okay, since the logging would be limited to this binary.
Adds per-callsite toggling to our existing seastar
vlogmacros. Requires changes in Seastar made in redpanda-data/seastar#272.This feature allows users to force log lines on or off at a directory, file, or line granularity. When a log line is forced on, it will be logged regardless of the globally configured seastar log level (e.g. even if a log level is
TRACEand the currently configured level isINFO, the log line will still appear to the user).Similarily, when a log line is forced off, it will not be shown, regardless of the globally configured seastar log level.
Some examples of the available API through the admin endpoint:
(Rules apply in order; last match wins.)
Backports Required
Release Notes