Conversation
When the block executor builds its parallel execution pool and the process has at least as many physical cores available as worker threads, pin each worker 1:1 to a distinct physical core. This prevents two workers from landing on HT siblings of the same core, which degrades throughput on CPU-bound Block-STM workloads. ## Approach - Detect the physical cores the process may run on by grouping the CPUs from `sched_getaffinity` (via `core_affinity::get_core_ids()`) by their `/sys/.../topology/thread_siblings_list`, keeping one representative per group. Linux only; other platforms return `None`. - Pin via a `rayon` `start_handler` that calls `core_affinity::set_for_current` using the worker's thread index. - Skip pinning (fall back to the OS scheduler) when topology cannot be detected (non-Linux, sysfs unreadable, any read fails), the process has fewer physical cores than `num_threads`, or `num_threads == 0`. - Per-thread pin failures are logged at `warn!` and the thread continues unpinned; the pool still comes up. ## Robustness - `core_affinity::get_core_ids` respects the process's affinity mask, so this composes correctly with cgroups, containers, and `taskset`. - Unit tests cover the sibling-grouping logic (HT on/off, restricted affinity set, read-failure path). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
When the block executor builds its parallel execution pool and the
process has at least as many physical cores available as worker threads,
pin each worker 1:1 to a distinct physical core. This prevents two
workers from landing on HT siblings of the same core, which degrades
throughput on CPU-bound Block-STM workloads.
Approach
from
sched_getaffinity(viacore_affinity::get_core_ids()) bytheir
/sys/.../topology/thread_siblings_list, keeping onerepresentative per group. Linux only; other platforms return
None.rayonstart_handlerthat callscore_affinity::set_for_currentusing the worker's thread index.detected (non-Linux, sysfs unreadable, any read fails), the process
has fewer physical cores than
num_threads, ornum_threads == 0.warn!and the thread continuesunpinned; the pool still comes up.
Robustness
core_affinity::get_core_idsrespects the process's affinity mask, sothis composes correctly with cgroups, containers, and
taskset.affinity set, read-failure path).
Co-Authored-By: Claude Opus 4.7 (1M context) noreply@anthropic.com