[TEST](Counter) counter test by jacktengg · Pull Request #62475 · apache/doris

jacktengg · 2026-04-14T03:37:38Z

What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

Release note

None

Check List (For Author)

Test
- Regression test
- Unit Test
- Manual test (add detailed scripts or steps below)
- No need to test or manual test. Explain why:
  - This is a refactor/code format and no logic has been changed.
  - Previous test can cover this change.
  - No code files have been changed.
  - Other reason
Behavior changed:
- No.
- Yes.
Does this need documentation?
- No.
- Yes.

Check List (For Reviewer who merge this PR)

Confirm the release note
Confirm test cases
Confirm document
Add branch pick label

Thearas · 2026-04-14T03:37:45Z

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

What problem was fixed (it's best to include specific error reporting information). How it was fixed.
Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
What features were added. Why was this function added?
Which code was refactored and why was this part of the code refactored?
Which functions were optimized and what is the difference before and after the optimization?

jacktengg · 2026-04-14T03:37:53Z

run buildall

jacktengg · 2026-04-14T09:52:38Z

run buildall

jacktengg · 2026-04-14T16:04:32Z

run buildall

…right

…U timer Problem Summary: Scanner::_cpu_watch (ThreadCpuStopWatch using CLOCK_THREAD_CPUTIME_ID) was started via resume() on a scanner worker thread but read via pause() on the pipeline task thread. Since CLOCK_THREAD_CPUTIME_ID is a per-thread CPU clock, reading it on a different thread produces garbage/negative values, triggering the DCHECK: Check failed: _value.load() > -1L (-39943795 vs. -1) delta: -252570258 In the non-EOS path of _scanner_scan(), update_scanner_profile() (which calls pause()) was only called for the EOS path. The non-EOS path left _cpu_watch running and later ScannerScheduler::submit() called pause() from the pipeline task thread. Fix: 1. Always call update_scanner_profile() before push_back_scan_task() in _scanner_scan(), ensuring pause() runs on the scanner worker thread for both EOS and non-EOS paths. 2. Reinitialize _cpu_watch after reading in _update_scan_cpu_timer() so that any subsequent cross-thread pause() call in submit() safely reads 0. None - Test: Manual test - verified the logic by code analysis - Behavior changed: No - Does this need documentation: No Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Problem Summary: MonotonicStopWatch::elapsed_time() can return a small negative value (e.g. -203 ns) due to rare CLOCK_MONOTONIC rollbacks. When stop() accumulates this into _total_time and _fresh_profile_counter() sets it on a RuntimeProfile::Counter, the DCHECK asserting value > -1 fires and crashes the process. Stack trace: RuntimeProfile::Counter::set() at runtime_profile.h:222 PipelineTask::close() at pipeline_task.cpp:925 close_task() at task_scheduler.cpp:86 The fix clamps the running-case delta in elapsed_time() and elapsed_time_seconds() to max(0, delta). Since stop() calls elapsed_time() while _running is still true, _total_time can never accumulate a negative value, preventing the crash for all downstream callers. None - Test: No need to test (clock rollback is non-deterministic hardware behavior; the fix is a trivial arithmetic clamp) - Behavior changed: No - Does this need documentation: No Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

### What problem does this PR solve? Problem Summary: ParquetReader::_total_groups is declared as `size_t _total_groups;` without a default member initializer. When init_reader() fails before reaching the assignment `_total_groups = _t_metadata->row_groups.size()` (e.g., because _open_file() fails), the field remains uninitialized. ASAN fills freshly allocated memory with 0xBE, so _total_groups becomes 0xBEBEBEBEBEBEBEBE. When _collect_profile() later reads _total_groups via COUNTER_UPDATE, this garbage value is cast to int64_t (-4702111234474983746) and triggers: Check failed: _value.load() > -1L (-4702111234474983746 vs. -1) ### Release note None ### Check List (For Author) - Test: No need to test - trivial default member initializer addition - Behavior changed: No - Does this need documentation: No Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…ation ### What problem does this PR solve? Problem Summary: MemoryReclamation::revoke_tasks_memory() updates a freed_memory_counter with current_memory_bytes() from task memory trackers. Due to concurrent batched memory tracking, current_memory_bytes() can return small negative values (e.g., -96 bytes). This negative delta triggers the DCHECK in Counter::update(): Check failed: _value.load() > -1L (-96 vs. -1). The fix clamps current_memory_bytes() to std::max(int64_t(0), ...) at both COUNTER_UPDATE sites in revoke_tasks_memory(), since freed_memory is a logically non-negative quantity and slightly negative tracker consumption indicates no reclaimable memory. ### Release note None ### Check List (For Author) - Test: No need to test - memory tracker going slightly negative is a known transient condition from concurrent batched tracking; the fix is a trivial clamp on a profiling counter - Behavior changed: No - Does this need documentation: No Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

### What problem does this PR solve? Problem Summary: OrcReader has three members without default initializers: - _decimal_scale_params_index (size_t): used as an index into _decimal_scale_params vector during decimal column decoding. Although currently reset to 0 at the start of each batch read, leaving it uninitialized is a latent bug if code paths change. - _orc_once_max_read_bytes (int64_t): used in _create_file_reader() to configure ORCFileInputStream read buffer size. - _orc_max_merge_distance_bytes (int64_t): used in _create_file_reader() to configure ORCFileInputStream merge distance. All three follow the same pattern as the ParquetReader _total_groups bug: members that are assigned during init_reader() but could be read while still uninitialized if an error occurs before the assignment. ### Release note None ### Check List (For Author) - Test: No need to test - trivial default member initializer additions - Behavior changed: No - Does this need documentation: No Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

jacktengg · 2026-04-15T07:35:52Z

run buildall

jacktengg · 2026-04-15T15:25:55Z

/review

github-actions

Findings:

be/src/common/config.cpp: changing enable_task_executor_in_external_table default to false is an unrelated behavior change that disables the task-executor path for all external table scans.
be/src/common/daemon.cpp: the new counter-check suppression is thread-local and only applied on memory_maintenance_thread(). /profile refreshes the same process-memory counters on HTTP threads, so debug BEs can still hit the new DCHECK there.

Critical checkpoint conclusions:

Goal / correctness: Partially achieved. The scanner stopwatch and uninitialized-field fixes make sense, but the process-profile path is still incomplete and the config default change introduces a regression.
Minimal / focused: No. The config default flip is unrelated to the stated bugfixes.
Concurrency: The scanner CPU timer change keeps pause() on the worker thread and I did not find a new lock-order issue in the touched scan path.
Lifecycle / static init: No cross-TU static-init issue found; enable_profile_counter_check being thread_local is the problem because the workaround only affects one thread.
Config changes: Yes. One config default changes user-visible behavior without justification or coverage.
Parallel code paths: Incomplete. /profile -> ProcessProfile::refresh_profile() still reaches the new counter checks outside the daemon thread.
Special conditions / invariants: The new generic non-negative assertion is not valid for every sampled process-memory counter path.
Tests: No regression coverage was added for the remaining profile-refresh path or for the external-scan scheduler behavior change.
Test results: The PR only claims manual analysis / no-need-to-test, so there is no automated proof that all crash paths are closed.
Observability: Sufficient for the touched code.
Transaction / persistence: Not applicable.
Data write / atomicity: Not applicable.
FE/BE variable passing: Not applicable.
Performance: No blocking issue in the scanner fix itself, but the config flip has broader runtime impact.
Other issues: None beyond the blockers above.

github-actions · 2026-04-15T15:46:44Z

 DEFINE_Bool(enable_task_executor_in_internal_table, "true");
 // Enable task executor in external table scan.
-DEFINE_Bool(enable_task_executor_in_external_table, "true");
+DEFINE_Bool(enable_task_executor_in_external_table, "false");


This flips the default external-table scan scheduler from TaskExecutorSimplifiedScanScheduler to ThreadPoolSimplifiedScanScheduler for every workload group (be/src/runtime/workload_group/workload_group.cpp, lines 580-588), but none of the scanner/counter fixes in this PR depend on that behavior anymore. Merging this silently disables the task-executor path for all external scans without any justification or coverage in the PR.

Suggested change

DEFINE_Bool(enable_task_executor_in_external_table, "false");

DEFINE_Bool(enable_task_executor_in_external_table, "true");

github-actions · 2026-04-15T15:46:44Z

 }

 void Daemon::memory_maintenance_thread() {
+    doris::enable_profile_counter_check = 0;


Disabling the check only on memory_maintenance_thread() does not cover the other path that refreshes the same process-memory counters: /profile calls ProcessProfile::refresh_profile() from an HTTP worker thread (be/src/service/http/default_path_handlers.cpp, line 192), which reaches MemoryProfile::refresh_memory_overview_profile(). That code sets UntrackedMemory = VmRSS - all_tracked_mem_sum, and there is no invariant that sampled RSS is always greater than or equal to tracked bytes. In debug builds the new HighWaterMarkCounter::set() DCHECK will still fire on that path, so this line only hides the crash on one thread instead of fixing the generic issue.

jacktengg · 2026-04-16T02:25:59Z

run buildall

hello-stephen · 2026-04-16T06:29:12Z

BE UT Coverage Report

Increment line coverage 73.91% (68/92) 🎉

Increment coverage report
Complete coverage report

Category	Coverage
Function Coverage	53.11% (20180/37996)
Line Coverage	36.68% (190122/518301)
Region Coverage	32.94% (147660/448251)
Branch Coverage	34.06% (64603/189688)

hello-stephen · 2026-04-16T06:42:30Z

BE Regression && UT Coverage Report

Increment line coverage 85.12% (103/121) 🎉

Increment coverage report
Complete coverage report

Category	Coverage
Function Coverage	73.66% (27405/37207)
Line Coverage	57.30% (296099/516736)
Region Coverage	54.57% (246945/452502)
Branch Coverage	56.19% (106916/190284)

jacktengg force-pushed the wt-counter-test branch from 9007a2e to 67273e9 Compare April 14, 2026 09:52

yiguolei and others added 10 commits April 15, 2026 11:38

[enhancement](counter) add dcheck to profile to test all counters is …

13e49da

…right

fix

4003df9

fix

a2de111

improve

7c7ebd9

fix

8add74a

jacktengg force-pushed the wt-counter-test branch from d479d3a to 8add74a Compare April 15, 2026 07:35

jacktengg added 2 commits April 15, 2026 17:13

fix

07472c7

fix be UT

b5fde6f

github-actions bot requested changes Apr 15, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[TEST](Counter) counter test#62475

[TEST](Counter) counter test#62475
jacktengg wants to merge 12 commits intoapache:masterfrom
jacktengg:wt-counter-test

jacktengg commented Apr 14, 2026

Uh oh!

Thearas commented Apr 14, 2026

Uh oh!

jacktengg commented Apr 14, 2026

Uh oh!

jacktengg commented Apr 14, 2026

Uh oh!

jacktengg commented Apr 14, 2026

Uh oh!

jacktengg commented Apr 15, 2026

Uh oh!

jacktengg commented Apr 15, 2026

Uh oh!

github-actions bot left a comment

Uh oh!

github-actions bot Apr 15, 2026

Uh oh!

github-actions bot Apr 15, 2026

Uh oh!

jacktengg commented Apr 16, 2026

Uh oh!

hello-stephen commented Apr 16, 2026

Uh oh!

hello-stephen commented Apr 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

	DEFINE_Bool(enable_task_executor_in_external_table, "false");
	DEFINE_Bool(enable_task_executor_in_external_table, "true");

Conversation

jacktengg commented Apr 14, 2026

What problem does this PR solve?

Release note

Check List (For Author)

Check List (For Reviewer who merge this PR)

Uh oh!

Thearas commented Apr 14, 2026

Uh oh!

jacktengg commented Apr 14, 2026

Uh oh!

jacktengg commented Apr 14, 2026

Uh oh!

jacktengg commented Apr 14, 2026

Uh oh!

jacktengg commented Apr 15, 2026

Uh oh!

jacktengg commented Apr 15, 2026

Uh oh!

github-actions bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions bot Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

jacktengg commented Apr 16, 2026

Uh oh!

hello-stephen commented Apr 16, 2026

BE UT Coverage Report

Uh oh!

hello-stephen commented Apr 16, 2026

BE Regression && UT Coverage Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants