metrics: add task schedule latency metric by robholt · Pull Request #7986 · tokio-rs/tokio

robholt · 2026-03-24T18:42:07Z

This new metric tracks the amount of time between when a task is scheduled and when it is polled, also known as queue delay. This duration is recorded in a histogram, just like the poll time metric.

Motivation

Measuring the amount of time a unit of work spends waiting in a queue, also known as "queue delay" or "sojourn time", is an essential component of the CoDel active queue management algorithm (and any algorithm based on it). CoDel is particularly popular because it requires no configuration or tuning to specific workloads, it maintains low latency and high throughput, and relies on a single parameter (queue delay) so it is relatively simple to implement.

For example, an HTTP server built on tokio could use this metric to power an overload protection layer that uses the CoDel algorithm to reject requests and prevent a latency spiral while still serving close to the maximum possible request volume, without the need for specific workload tuning.

There is some precedent for a task scheduler to expose this metric. The Go runtime exposes this metric as /sched/latencies:seconds, and the Linux kernel's schedstat tracks process scheduling latency per-cpu and per-process.

Solution

Measuring this requires keeping track of the time each task enters the queue in order to calculate the elapsed duration when the task is finally polled. I stored this time in the task header because it's hot data that's updated each time the task is scheduled. This does add some overhead, so I put the metric behind tokio_unstable, just like the poll time histogram metric.

This new metric tracks the amount of time between when a task is scheduled and when it is polled, also known as queue delay. This duration is recorded in a histogram, just like the poll time metric. This metric is useful for implementing queue management algorithms in systems using tokio. For example, it could be used to implement a generic http load shedder using the CoDel algorithm.

martin-g · 2026-03-25T06:53:56Z


                // Check for a task in the LIFO slot
                let task = match core.lifo_slot.take() {
                    Some(task) => task,


Should core.stats.start_poll(task.get_scheduled_at()); be called here too ?

I considered this, but it would also change the behavior of the poll time metric. I do think that would be more correct, but wanted to avoid changing too many things in one PR. I'm happy to make the change though if you think that's fine!

Darksonn · 2026-03-25T15:42:35Z

    /// The last time this task was scheduled. Used to measure schedule latency.
-    #[cfg(tokio_unstable)]
+    ///
+    /// Only enabled when the target supports 64-bit atomics because the metric
+    /// that uses this field also requires 64-bit atomics.
+    #[cfg(all(tokio_unstable, target_has_atomic = "64"))]
    pub(super) scheduled_at: UnsafeCell<Option<Instant>>,


How big is the task header?

This field adds 16 bytes to the header, bringing it up to 48 bytes in total (56 if tracing is enabled too).

This is causing the header_lte_cache_line test (at the bottom of this file) to fail on 32-bit targets. Restricting the field to targets that support 64-bit atomics did not work because some 32-bit targets do support 64-bit atomics.

Maybe sticking this behind a feature (like tracing) would be best?

I'm okay with changing that test to be <= 64 to accomodate 32-bit targets, they are not that important anyway.

Regardless, the Instant struct is pretty large and space is at a premium in the task header. I don't think we want to dedicate 16 bytes just so we can have nanosecond precision even if it takes a billion years for it to get scheduled. Could we store a u32 of milliseconds since the runtime was created, or something like that? I'm pretty sure we already store a sort of "epoch" instant in the tokio::runtime::time part of the driver.

That's a good idea, I updated the PR to use this "epoch" approach. I used a u64 though because the struct is already 8-byte aligned on all targets because of the 64-bit owner_id field, so a u32 just wastes 4 bytes. Using a u64 also means we can keep the nanosecond precision, since u64::MAX in nanoseconds is equal to ~584 years.

That brings the size of the struct up to 40 bytes now.

And you're right, the time driver does track the startup time of the time driver. I opted not to use it however because the time driver is not enabled by default, so using it would mean the time driver must be enabled by users in order to use this schedule latency tracking.

rcoh · 2026-04-03T14:10:29Z

@@ -247,6 +249,16 @@ impl<S> Notified<S> {
    pub(crate) fn task_meta<'meta>(&self) -> crate::runtime::TaskMeta<'meta> {
        self.0.task_meta()


can we please expose this field in the task meta? That would really help for observability tools because it means that during on_poll_start we can unconditionally know what the scheduling delay was

I think it would be best to do that in a follow-up PR. I agree it would be very useful, but it'll require some refactoring of the way TaskMeta is constructed.

Darksonn · 2026-04-04T07:26:46Z

+    /// Startup time of this scheduler.
+    ///
+    /// This instant is used as the basis of task `scheduled_at` measurements.
+    #[cfg(all(tokio_unstable, target_pointer_width = "64"))]
+    started_at: Instant,


This PR is using these cfgs all over the place. Can we please reduce them a bit? You can define a module that contains a ScheduleLatencyInstant which is either empty or contains an Instant depending on the configuration, and then code actually using it can stop having cfgs. Similarly for the u64 in the task header.

Makes sense, the cfgs did get a bit out of control after the 64-bit restriction.

I've updated the PR to remove almost all of the cfgs and centralized the logic into a new schedule_latency module. The size of the task Header remains the same as the previous version on both 64-bit and 32-bit targets.

github-actions Bot added R-loom-current-thread Run loom current-thread tests on this PR R-loom-multi-thread Run loom multi-thread tests on this PR labels Mar 24, 2026

martin-g reviewed Mar 25, 2026

View reviewed changes

robholt added 2 commits March 25, 2026 11:12

metrics: clarify schedule latency docs

a355889

metrics: restrict supported platforms for schedule latency

eb3cd6e

Darksonn reviewed Mar 25, 2026

View reviewed changes

robholt added 2 commits March 25, 2026 11:55

fix unused import on targets without 64-bit atomics

46db3a2

make Notified::set_scheduled_at safe

804b42a

mattiapitossi added the A-tokio Area: The main tokio crate label Mar 26, 2026

use u64 for scheduled_at instead of Instant

f409f15

mattiapitossi added the S-waiting-on-author Status: awaiting some action (such as code changes) from the PR or issue author. label Mar 28, 2026

Only run schedule latency test on 64-bit targets

cc3ba46

martin-g reviewed Mar 31, 2026

View reviewed changes

Comment thread tokio/src/runtime/task/mod.rs Outdated

Comment thread tokio/src/runtime/builder.rs Outdated

Comment thread tokio/src/runtime/builder.rs Outdated

Comment thread tokio/src/runtime/metrics/runtime.rs Outdated

Comment thread tokio/src/runtime/metrics/runtime.rs Outdated

Improve docs and fix naming

68522a1

rcoh reviewed Apr 3, 2026

View reviewed changes

Darksonn reviewed Apr 4, 2026

View reviewed changes

Comment thread tokio/src/runtime/scheduler/multi_thread/worker.rs Outdated

robholt added 2 commits April 7, 2026 14:08

Consolidate schedule latency logic into ScheduleLatencyInstant

475cf18

fix spellcheck test

f97ad0c

Darksonn reviewed Apr 12, 2026

View reviewed changes

Comment thread tokio/src/runtime/task/mod.rs Outdated

Comment thread tokio/src/runtime/task/mod.rs Outdated

robholt added 2 commits April 15, 2026 10:51

Add schedule-latency crate feature

ecf3b13

Put schedule_latency_histogram methods behind feature too

cb8383f

robholt requested a review from Darksonn April 29, 2026 13:53

Merge branch 'master' into robholt/schedule-latency-pr

e3b17a6

		@@ -247,6 +249,16 @@ impl<S> Notified<S> {
		pub(crate) fn task_meta<'meta>(&self) -> crate::runtime::TaskMeta<'meta> {
		self.0.task_meta()

Uh oh!

Conversation

robholt commented Mar 24, 2026

Motivation

Solution

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants