Skip to content

metrics: add task schedule latency metric#7986

Open
robholt wants to merge 13 commits intotokio-rs:masterfrom
robholt:robholt/schedule-latency-pr
Open

metrics: add task schedule latency metric#7986
robholt wants to merge 13 commits intotokio-rs:masterfrom
robholt:robholt/schedule-latency-pr

Conversation

@robholt
Copy link
Copy Markdown

@robholt robholt commented Mar 24, 2026

This new metric tracks the amount of time between when a task is scheduled and when it is polled, also known as queue delay. This duration is recorded in a histogram, just like the poll time metric.

Motivation

Measuring the amount of time a unit of work spends waiting in a queue, also known as "queue delay" or "sojourn time", is an essential component of the CoDel active queue management algorithm (and any algorithm based on it). CoDel is particularly popular because it requires no configuration or tuning to specific workloads, it maintains low latency and high throughput, and relies on a single parameter (queue delay) so it is relatively simple to implement.

For example, an HTTP server built on tokio could use this metric to power an overload protection layer that uses the CoDel algorithm to reject requests and prevent a latency spiral while still serving close to the maximum possible request volume, without the need for specific workload tuning.

There is some precedent for a task scheduler to expose this metric. The Go runtime exposes this metric as /sched/latencies:seconds, and the Linux kernel's schedstat tracks process scheduling latency per-cpu and per-process.

Solution

Measuring this requires keeping track of the time each task enters the queue in order to calculate the elapsed duration when the task is finally polled. I stored this time in the task header because it's hot data that's updated each time the task is scheduled. This does add some overhead, so I put the metric behind tokio_unstable, just like the poll time histogram metric.

This new metric tracks the amount of time between when a task is
scheduled and when it is polled, also known as queue delay. This
duration is recorded in a histogram, just like the poll time metric.

This metric is useful for implementing queue management algorithms in
systems using tokio. For example, it could be used to implement a
generic http load shedder using the CoDel algorithm.
@github-actions github-actions Bot added R-loom-current-thread Run loom current-thread tests on this PR R-loom-multi-thread Run loom multi-thread tests on this PR labels Mar 24, 2026
Comment thread tokio/src/runtime/builder.rs Outdated
Comment thread tokio/src/runtime/metrics/runtime.rs Outdated
Comment thread tokio/src/runtime/metrics/batch.rs Outdated

// Check for a task in the LIFO slot
let task = match core.lifo_slot.take() {
Some(task) => task,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should core.stats.start_poll(task.get_scheduled_at()); be called here too ?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I considered this, but it would also change the behavior of the poll time metric. I do think that would be more correct, but wanted to avoid changing too many things in one PR. I'm happy to make the change though if you think that's fine!

Comment thread tokio/tests/rt_unstable_metrics.rs
Comment thread tokio/src/runtime/metrics/batch.rs Outdated
Comment thread tokio/src/runtime/metrics/runtime.rs Outdated
Comment thread tokio/src/runtime/scheduler/multi_thread/worker.rs Outdated
Comment thread tokio/src/runtime/builder.rs Outdated
Comment thread tokio/src/runtime/metrics/worker.rs
Comment thread tokio/src/runtime/task/core.rs Outdated
Comment on lines 197 to 202
/// The last time this task was scheduled. Used to measure schedule latency.
#[cfg(tokio_unstable)]
///
/// Only enabled when the target supports 64-bit atomics because the metric
/// that uses this field also requires 64-bit atomics.
#[cfg(all(tokio_unstable, target_has_atomic = "64"))]
pub(super) scheduled_at: UnsafeCell<Option<Instant>>,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How big is the task header?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This field adds 16 bytes to the header, bringing it up to 48 bytes in total (56 if tracing is enabled too).

This is causing the header_lte_cache_line test (at the bottom of this file) to fail on 32-bit targets. Restricting the field to targets that support 64-bit atomics did not work because some 32-bit targets do support 64-bit atomics.

Maybe sticking this behind a feature (like tracing) would be best?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm okay with changing that test to be <= 64 to accomodate 32-bit targets, they are not that important anyway.

Regardless, the Instant struct is pretty large and space is at a premium in the task header. I don't think we want to dedicate 16 bytes just so we can have nanosecond precision even if it takes a billion years for it to get scheduled. Could we store a u32 of milliseconds since the runtime was created, or something like that? I'm pretty sure we already store a sort of "epoch" instant in the tokio::runtime::time part of the driver.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good idea, I updated the PR to use this "epoch" approach. I used a u64 though because the struct is already 8-byte aligned on all targets because of the 64-bit owner_id field, so a u32 just wastes 4 bytes. Using a u64 also means we can keep the nanosecond precision, since u64::MAX in nanoseconds is equal to ~584 years.

That brings the size of the struct up to 40 bytes now.

And you're right, the time driver does track the startup time of the time driver. I opted not to use it however because the time driver is not enabled by default, so using it would mean the time driver must be enabled by users in order to use this schedule latency tracking.

Comment thread tokio/src/runtime/scheduler/current_thread/mod.rs Outdated
@mattiapitossi mattiapitossi added the A-tokio Area: The main tokio crate label Mar 26, 2026
@mattiapitossi mattiapitossi added the S-waiting-on-author Status: awaiting some action (such as code changes) from the PR or issue author. label Mar 28, 2026
Comment thread tokio/src/runtime/task/mod.rs Outdated
Comment thread tokio/src/runtime/builder.rs Outdated
Comment thread tokio/src/runtime/builder.rs Outdated
Comment thread tokio/src/runtime/metrics/runtime.rs Outdated
Comment thread tokio/src/runtime/metrics/runtime.rs Outdated
@@ -247,6 +249,16 @@ impl<S> Notified<S> {
pub(crate) fn task_meta<'meta>(&self) -> crate::runtime::TaskMeta<'meta> {
self.0.task_meta()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we please expose this field in the task meta? That would really help for observability tools because it means that during on_poll_start we can unconditionally know what the scheduling delay was

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be best to do that in a follow-up PR. I agree it would be very useful, but it'll require some refactoring of the way TaskMeta is constructed.

Comment on lines +104 to +108
/// Startup time of this scheduler.
///
/// This instant is used as the basis of task `scheduled_at` measurements.
#[cfg(all(tokio_unstable, target_pointer_width = "64"))]
started_at: Instant,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is using these cfgs all over the place. Can we please reduce them a bit? You can define a module that contains a ScheduleLatencyInstant which is either empty or contains an Instant depending on the configuration, and then code actually using it can stop having cfgs. Similarly for the u64 in the task header.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, the cfgs did get a bit out of control after the 64-bit restriction.

I've updated the PR to remove almost all of the cfgs and centralized the logic into a new schedule_latency module. The size of the task Header remains the same as the previous version on both 64-bit and 32-bit targets.

Comment thread tokio/src/runtime/scheduler/multi_thread/worker.rs Outdated
Comment thread tokio/src/runtime/task/mod.rs Outdated
Comment thread tokio/src/runtime/task/mod.rs Outdated
@robholt robholt requested a review from Darksonn April 29, 2026 13:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-tokio Area: The main tokio crate R-loom-current-thread Run loom current-thread tests on this PR R-loom-multi-thread Run loom multi-thread tests on this PR S-waiting-on-author Status: awaiting some action (such as code changes) from the PR or issue author.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants