Skip to content

select_macro: Use Backoff snooze#1251

Open
pawurb wants to merge 1 commit intocrossbeam-rs:masterfrom
pawurb:select-backoff
Open

select_macro: Use Backoff snooze#1251
pawurb wants to merge 1 commit intocrossbeam-rs:masterfrom
pawurb:select-backoff

Conversation

@pawurb
Copy link
Copy Markdown

@pawurb pawurb commented Apr 29, 2026

Hi, I'm using crossbeam channels with select! macro in https://github.com/pawurb/hotpath-rs. I've noticed a significant overhead of Thread::unpark calls when sending messages every 1μs, visible in samply traces:

Screenshot 2026-04-29 at 22 26 19

I was able to resolve it by batching. But I've noticed that while eg. recv method uses backoff spin, if the same channel type is used inside a select! macro, backoff does not apply.

I used the following example to reproduce and benchmark it with hyperfine:

use std::sync::LazyLock;
use std::thread;
use std::time::{Duration, Instant};

use crossbeam_channel::{bounded, select, unbounded};

const DEFAULT_MESSAGES: u64 = 1_000_000;
const DEFAULT_SEND_INTERVAL_NS: u64 = 1000;

static MESSAGES: LazyLock<u64> = LazyLock::new(|| {
    std::env::var("MESSAGES_NUM")
        .ok()
        .and_then(|s| s.parse().ok())
        .unwrap_or(DEFAULT_MESSAGES)
});

static SEND_INTERVAL: LazyLock<Duration> = LazyLock::new(|| {
    Duration::from_nanos(
        std::env::var("SEND_INTERVAL_NS")
            .ok()
            .and_then(|s| s.parse().ok())
            .unwrap_or(DEFAULT_SEND_INTERVAL_NS),
    )
});

fn spin_for(d: Duration) {
    let start = Instant::now();
    while start.elapsed() < d {}
}

fn main() {
    let (work_tx, work_rx) = unbounded::<u64>();
    let (ctrl_tx, ctrl_rx) = bounded::<()>(1);

    let consumer = thread::spawn(move || {
        let mut count: u64 = 0;
        loop {
            select! {
                recv(work_rx) -> msg => match msg {
                    Ok(_) => count += 1,
                    Err(_) => break,
                },
                recv(ctrl_rx) -> _ => break,
            }
        }
        count
    });

    let start = Instant::now();
    for i in 0..*MESSAGES {
        work_tx.send(i).unwrap();
        spin_for(*SEND_INTERVAL);
    }
    drop(work_tx);
    let _ = ctrl_tx.send(());

    let received = consumer.join().unwrap();
    let elapsed = start.elapsed();

    println!("sent     : {}", *MESSAGES);
    println!("received : {}", received);
    println!("elapsed  : {:?}", elapsed);
    println!(
        "per-msg  : {:.2} ns",
        elapsed.as_nanos() as f64 / *MESSAGES as f64
    );
}

Results are similar on linux and macos:

  ┌──────────┬──────────────┬──────────────┬────────────────┐
  │ interval │   base (ms)  │   PR (ms)    │     rel Δ      │
  ├──────────┼──────────────┼──────────────┼────────────────┤
  │ 100 ns   │ 21.8 ± 0.9   │ 17.5 ± 3.3   │ -19.7%         │
  ├──────────┼──────────────┼──────────────┼────────────────┤
  │ 250 ns   │ 39.5 ± 1.3   │ 32.6 ± 3.4   │ -17.5%         │
  ├──────────┼──────────────┼──────────────┼────────────────┤
  │ 500 ns   │ 74.3 ± 3.8   │ 57.4 ± 5.1   │ -22.7%         │
  ├──────────┼──────────────┼──────────────┼────────────────┤
  │ 750 ns   │ 107.4 ± 2.3  │ 84.4 ± 2.9   │ -21.4%         │
  ├──────────┼──────────────┼──────────────┼────────────────┤
  │ 1000 ns  │ 139.8 ± 4.0  │ 106.5 ± 2.2  │ -23.8%         │
  ├──────────┼──────────────┼──────────────┼────────────────┤
  │ 1500 ns  │ 207.1 ± 1.3  │ 157.0 ± 0.7  │ -24.2%         │
  ├──────────┼──────────────┼──────────────┼────────────────┤
  │ 2000 ns  │ 270.8 ± 1.7  │ 225.3 ± 2.0  │ -16.8%         │
  ├──────────┼──────────────┼──────────────┼────────────────┤
  │ 2500 ns  │ 301.9 ± 3.1  │ 293.1 ± 0.7  │ -2.9%          │
  ├──────────┼──────────────┼──────────────┼────────────────┤
  │ 5 µs     │ 120.7 ± 9.0  │ 115.6 ± 2.6  │ -4.2%          │
  ├──────────┼──────────────┼──────────────┼────────────────┤
  │ 10 µs    │ 220.9 ± 3.5  │ 229.0 ± 5.9  │ +3.7%          │
  ├──────────┼──────────────┼──────────────┼────────────────┤
  │ 25 µs    │ 520.8 ± 4.7  │ 524.4 ± 5.0  │ +0.7%          │
  ├──────────┼──────────────┼──────────────┼────────────────┤
  │ 50 µs    │ 1022.2 ± 3.2 │ 1024.2 ± 4.1 │ +0.2%          │
  ├──────────┼──────────────┼──────────────┼────────────────┤
  │ 100 µs   │ 2025.6 ± 4.7 │ 2024.9 ± 1.3 │ -0.0%          │
  └──────────┴──────────────┴──────────────┴────────────────┘

~20% measurable improvement for messages sent up to ~2μs (I suspect it's ~max backoff spin duration), and no regression later. Let me know if you would consider this, change. I've seen other repo issues complaining about too much spinning. But maybe it makes sense for consistency with receivers implementation.

I can prepare other benchmarks, but not sure which scenario to test.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

1 participant