Skip to content

Predictable raft::resources#3052

Open
achirkin wants to merge 26 commits into
NVIDIA:mainfrom
achirkin:enh-predictable-resources
Open

Predictable raft::resources#3052
achirkin wants to merge 26 commits into
NVIDIA:mainfrom
achirkin:enh-predictable-resources

Conversation

@achirkin

@achirkin achirkin commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Make the raft::resources resource initialization semantics more predictable.

  1. All resources still are still initialized lazily on-demand (no change), but behave as-if they were initialized during the raft::resources handle construction: all copies of the raft::resources handle point to the same resources (not a breaking change, fixes the re-initialization issue).
  2. set_resource changes to non-const semantics (breaking change).
    Before: all set_xxx resource-updating calls were operating on const handle
    Now: all set_xxx resources require a non-const handle.

Thread-safety

First and foremost, thread-safety of using a specific resource depends on that resource. Here we discuss the thread-safety of using raft::resources handle itself (get_resource and set_resource functions).

Accessing the same resource by const ref

Updates (set_resource) are not possible (a user should copy a handle and modify the new one). All concurrent get_resource calls are atomic and safe, even if they initialize the factories and resources under the hood. The worst can happen is the same resource being initialized concurrently in two threads but only one being stored in the handle (the other one is discarded).

Accessing the same resource by non-const ref

Using the same object by non-const ref from multiple threads is always unsafe.

Accessing copies of the same resource

The resources and factories are updated atomically. Modifying any resource doesn't propagate to the copies. Accessing a resource while another threads access or modifies the same resource via another handle is thus safe.

Implementation details

The PR adds one more layer of indirection (one extra shared_ptr) to each resource, which may cause an extra runtime overhead. This is unavoidable if we want to allow lazy-initialized resources back-propagate across handles.
On the other hand, the PR removes the handle mutex in favor of C++20 std::atomic<shared_ptr>, which may reduce the runtime overheads a little bit.

Breaking changes

  • All resource setters change the function signature - this is a big breaking change. However, all known use-sites already call the resources setters on non-const raft::resources handles.
  • std::atomic<shared_ptr> requires enforcing C++20 for all dependent libraries. An alternative would be to put a mutex per raft::resource_cell.

@achirkin achirkin self-assigned this Jun 9, 2026
@achirkin achirkin requested review from a team as code owners June 9, 2026 11:39
@achirkin achirkin added improvement Improvement / enhancement to an existing function breaking Breaking change labels Jun 9, 2026
@achirkin achirkin moved this to In Progress in Unstructured Data Processing Jun 9, 2026
@achirkin achirkin mentioned this pull request Jun 11, 2026
@achirkin

Copy link
Copy Markdown
Contributor Author

@coderabbitai review

@coderabbitai

coderabbitai Bot commented Jun 24, 2026

Copy link
Copy Markdown
✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai

coderabbitai Bot commented Jun 24, 2026

Copy link
Copy Markdown

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

This PR changes RAFT resource storage to shared atomic cells, updates resource helper initialization to use default factories, revises multi-GPU and NCCL resource setup, changes memory snapshot handling, and raises C++/CUDA language requirements to C++20.

Changes

RAFT resource storage and core semantics

Layer / File(s) Summary
Core storage and semantics
cpp/include/raft/core/resources.hpp, cpp/include/raft/core/resource/resource_types.hpp, cpp/include/raft/core/device_resources.hpp, cpp/include/raft/core/handle.hpp, cpp/CMakeLists.txt, cpp/internal/CMakeLists.txt, cpp/.clang-format, cpp/tests/core/handle.cpp
raft::resources moves to shared resource_cell storage with atomic lazy initialization, related copy/move behavior is defaulted, C++20 is required, formatting pragmas expand, and tests cover the updated semantics.
Default factory helpers
cpp/include/raft/core/resource/*.hpp
Resource helper APIs switch missing-factory paths to ensure_default_factory, and several setters now require mutable resources&.
Multi-GPU and NCCL setup
cpp/include/raft/core/resource/multi_gpu.hpp, cpp/include/raft/core/resource/nccl_comm.hpp
root_rank becomes value-based, multi-GPU factory setup uses default factories, and NCCL communicators are initialized from the derived device ID list.
Memory snapshots and invalidation
cpp/include/raft/core/memory_stats_resources.hpp, cpp/include/raft/core/memory_tracking_resources.hpp
Snapshot storage switches to shared cells, and THRUST_POLICY invalidation now replaces the cell entry.

Estimated code review effort: 4 (Complex) | ~60 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title is concise and directly reflects the main change: making raft::resources behavior more predictable.
Description check ✅ Passed The description clearly matches the changeset and explains the resource semantics and setter API updates.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
cpp/include/raft/core/resource/device_memory_resource.hpp (1)

255-323: 📐 Maintainability & Code Quality | 🟠 Major | 🏗️ Heavy lift

HIGH: Public workspace setter signatures now hard-break const-callers.

Changing set_workspace_resource, set_workspace_to_pool_resource, set_workspace_to_global_resource, and set_large_workspace_resource from resources const& to resources& is a source-breaking change in a public header, and I don't see deprecated bridge overloads or migration shims here. Downstream code that forwards a const handle into these helpers will stop compiling immediately.

As per path instructions, "public C++ APIs should be stateless and lightweight wrappers" and "breaking C++ API changes require deprecation over releases when possible".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cpp/include/raft/core/resource/device_memory_resource.hpp` around lines 255 -
323, The public workspace setter APIs now require a mutable resources handle,
which breaks existing const-callers. Restore compatibility in the workspace
helper functions set_workspace_resource, set_workspace_to_pool_resource,
set_workspace_to_global_resource, and set_large_workspace_resource by adding
const-friendly overloads or a non-breaking bridge that forwards from resources
const& without changing existing call sites, while keeping the new behavior
internally where mutation is needed.

Source: Path instructions

cpp/include/raft/core/resource/sub_comms.hpp (1)

54-67: 🩺 Stability & Availability | 🟠 Major | 🏗️ Heavy lift

HIGH: set_subcomm mutates shared state through a const handle.

Line 59 keeps resources const&, but the function inserts into a shared unordered_map with no synchronization. Under the new shared-cell copy semantics that mutation propagates across copies and can race with get_subcomm or another set_subcomm, which is at odds with the documented "different instances of raft::resources are safe" expectation.

As per path instructions, "different instances of raft::resources are safe", and the PR objective says copies "can be accessed and modified independently without propagating changes."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cpp/include/raft/core/resource/sub_comms.hpp` around lines 54 - 67,
`set_subcomm` is mutating shared `resources` state through a const handle, which
can propagate across copies and race with `get_subcomm` or another
`set_subcomm`. Update the `set_subcomm` path in `sub_comms.hpp` so
sub-communicator updates are isolated per `resources` instance rather than
writing directly through shared state; use the existing
`resources`/`sub_comms_resource` APIs to make the mutation copy-local and add
synchronization or ownership separation as needed. Keep the fix centered around
`set_subcomm`, `get_subcomm`, and `sub_comms_resource_factory` so the “different
instances are safe” and independent-copy semantics are preserved.

Source: Path instructions

🧹 Nitpick comments (2)
cpp/tests/core/handle.cpp (1)

354-360: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low value

Strengthen the propagation assertion (optional).

stream_a == stream_b is trivially satisfied because both resolve to the default cuda_stream_per_thread, so it doesn't actually prove the concrete resource is shared via the cell. Consider explicitly setting a non-default stream on a before the lazy-init path, or rely solely on has_resource_factory(b) as the propagation signal. Not a defect — the has_resource_factory check on Line 358 already validates the core behavior.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cpp/tests/core/handle.cpp` around lines 354 - 360, The propagation test in
the resource lazy-init path is too weak because `resource::get_cuda_stream(a)`
and `resource::get_cuda_stream(b)` can both return the default
`cuda_stream_per_thread`, so the equality check in the `handle.cpp` test does
not prove shared state. Update the test around `resource::get_cuda_stream` and
`has_resource_factory` by either setting a non-default CUDA stream on `a` before
lazy initialization and then asserting `b` sees the same concrete stream, or
remove the trivial `ASSERT_EQ(stream_a, stream_b)` and rely on the
`has_resource_factory(resource::resource_type::CUDA_STREAM_VIEW)` assertion as
the propagation check.
cpp/include/raft/core/resource/cublaslt_handle.hpp (1)

49-50: 🚀 Performance & Scalability | 🔵 Trivial

Avoid allocating the default factory on every get_cublaslt_handle() call.
ensure_default_factory() still receives a freshly constructed cublaslt_resource_factory each time, so repeated lookups pay an unnecessary heap allocation before the CAS. Have you considered a lazy-builder overload or an early has_resource_factory() check?

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cpp/include/raft/core/resource/cublaslt_handle.hpp` around lines 49 - 50, The
get_cublaslt_handle() path is still creating a new cublaslt_resource_factory on
every call before ensuring the default factory. Update the cublaslt_handle.hpp
logic around get_cublaslt_handle() and ensure_default_factory() so the default
factory is only constructed lazily when no factory is already present, using an
early has_resource_factory() check or a lazy-builder overload. Keep the resource
lookup behavior the same, but avoid the repeated heap allocation before the CAS.

Source: Coding guidelines

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@cpp/include/raft/core/resource/comms.hpp`:
- Line 61: The public resource setter API was changed from const to non-const,
which is a breaking change for existing callers using const resources handles.
Update the relevant public-facing documentation and migration/upgrade notes to
explicitly call out the `set_comms` signature change in `resources` and the
similar setter changes in this PR, and include guidance for downstream users to
update their call sites accordingly. Use the `set_comms` function and any
related setters as the symbols to reference in the release notes so the API
change is easy to find.

In `@cpp/include/raft/core/resource/cublas_handle.hpp`:
- Around line 62-64: The cublas handle access path in cublas_handle.hpp is
mutating a shared CUBLAS_HANDLE cell by rebinding its stream on every call,
which breaks safety when copied resources diverge via CUDA_STREAM_VIEW. Update
the cublas handle setup/access logic around cublas_resource_factory and
get_cublas_handle so each raft::resources instance keeps stream ordering
isolated, rather than reusing and retargeting the same handle across copies.
Ensure the handle’s stream is bound per-resource or per-stream-view in a way
that does not affect sibling copies and preserves correct ordering with
get_cuda_stream().

In `@cpp/include/raft/core/resource/cusolver_dn_handle.hpp`:
- Around line 68-69: get_cusolver_dn_handle currently returns a cached
cusolverDnHandle_t without reattaching it to the current stream, so after
set_cuda_stream() the handle may still target an old stream. Update
get_cusolver_dn_handle to call cusolverDnSetStream(...) on the retrieved handle
using get_cuda_stream(res) before returning it, and apply the same rebinding
pattern to get_cusparse_handle and get_cusolver_sp_handle so all stream-bound
solver handles stay aligned with the active raft::resources stream.

In `@cpp/include/raft/core/resource/multi_gpu.hpp`:
- Around line 55-58: The public root-rank API change in get_root_rank and
set_root_rank needs a temporary deprecated compatibility bridge to avoid
breaking downstream callers. Add a deprecated shim in the multi_gpu resource API
that preserves the previous mutable-reference/caller behavior while forwarding
to the new implementation, and make sure set_root_rank still accepts existing
const resources usage through a migration wrapper. Include Doxygen on any new
public API in resources/multi_gpu.hpp and mark the old entry points deprecated
with a clear migration path before removing them.

In `@cpp/include/raft/core/resource/nccl_comm.hpp`:
- Around line 78-80: `get_resource` in `nccl_comm.hpp` is using
`has_resource_factory(NCCL_COMM)` as an initialization gate, which races under
the shared-cell model and can return a partially initialized communicator vector
or run `_init_nccl_comms(res)` twice. Make `get_resource` (and the NCCL_COMM
path in `ensure_default_factory`/`nccl_comm_resource_factory`) rely on a single
atomic initialization mechanism so that only one caller publishes the factory
and performs `_init_nccl_comms(res)`, while concurrent callers on the same const
handle safely observe the fully initialized state.

In `@cpp/include/raft/core/resources.hpp`:
- Line 81: The public resource update API in resources.hpp has a breaking
constness change, so add a compatibility/deprecation path instead of only
exposing the new mutable form. Keep add_resource_factory and the related set_*
wrappers locatable as the affected symbols, and provide transitional overloads
or forwarding methods that preserve the old public surface where possible while
steering callers to the new API. For the public header, add Doxygen for any new
or transitional APIs and mark the breaking replacements with deprecation
warnings so downstream users get a clear migration path.
- Around line 100-104: `ensure_default_factory` currently indexes `cells_` with
`rtype` without validating the resource type, so a factory returning `LAST_KEY`
can access past the vector. Update `ensure_default_factory` in `resources.hpp`
to mirror the `add_resource_factory` guard for
`resource::resource_type::LAST_KEY` before touching `cells_`, and only perform
the `cells_[rtype]->factory.compare_exchange_strong(...)` path for valid
resource types.

---

Outside diff comments:
In `@cpp/include/raft/core/resource/device_memory_resource.hpp`:
- Around line 255-323: The public workspace setter APIs now require a mutable
resources handle, which breaks existing const-callers. Restore compatibility in
the workspace helper functions set_workspace_resource,
set_workspace_to_pool_resource, set_workspace_to_global_resource, and
set_large_workspace_resource by adding const-friendly overloads or a
non-breaking bridge that forwards from resources const& without changing
existing call sites, while keeping the new behavior internally where mutation is
needed.

In `@cpp/include/raft/core/resource/sub_comms.hpp`:
- Around line 54-67: `set_subcomm` is mutating shared `resources` state through
a const handle, which can propagate across copies and race with `get_subcomm` or
another `set_subcomm`. Update the `set_subcomm` path in `sub_comms.hpp` so
sub-communicator updates are isolated per `resources` instance rather than
writing directly through shared state; use the existing
`resources`/`sub_comms_resource` APIs to make the mutation copy-local and add
synchronization or ownership separation as needed. Keep the fix centered around
`set_subcomm`, `get_subcomm`, and `sub_comms_resource_factory` so the “different
instances are safe” and independent-copy semantics are preserved.

---

Nitpick comments:
In `@cpp/include/raft/core/resource/cublaslt_handle.hpp`:
- Around line 49-50: The get_cublaslt_handle() path is still creating a new
cublaslt_resource_factory on every call before ensuring the default factory.
Update the cublaslt_handle.hpp logic around get_cublaslt_handle() and
ensure_default_factory() so the default factory is only constructed lazily when
no factory is already present, using an early has_resource_factory() check or a
lazy-builder overload. Keep the resource lookup behavior the same, but avoid the
repeated heap allocation before the CAS.

In `@cpp/tests/core/handle.cpp`:
- Around line 354-360: The propagation test in the resource lazy-init path is
too weak because `resource::get_cuda_stream(a)` and
`resource::get_cuda_stream(b)` can both return the default
`cuda_stream_per_thread`, so the equality check in the `handle.cpp` test does
not prove shared state. Update the test around `resource::get_cuda_stream` and
`has_resource_factory` by either setting a non-default CUDA stream on `a` before
lazy initialization and then asserting `b` sees the same concrete stream, or
remove the trivial `ASSERT_EQ(stream_a, stream_b)` and rely on the
`has_resource_factory(resource::resource_type::CUDA_STREAM_VIEW)` assertion as
the propagation check.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 38c5e204-a599-42f7-a81d-cdc0784e4e52

📥 Commits

Reviewing files that changed from the base of the PR and between 7c7f3ec and 473bdc7.

📒 Files selected for processing (29)
  • cpp/CMakeLists.txt
  • cpp/include/raft/core/device_resources.hpp
  • cpp/include/raft/core/handle.hpp
  • cpp/include/raft/core/memory_stats_resources.hpp
  • cpp/include/raft/core/memory_tracking_resources.hpp
  • cpp/include/raft/core/resource/comms.hpp
  • cpp/include/raft/core/resource/cublas_handle.hpp
  • cpp/include/raft/core/resource/cublaslt_handle.hpp
  • cpp/include/raft/core/resource/cuda_stream.hpp
  • cpp/include/raft/core/resource/cuda_stream_pool.hpp
  • cpp/include/raft/core/resource/cusolver_dn_handle.hpp
  • cpp/include/raft/core/resource/cusolver_sp_handle.hpp
  • cpp/include/raft/core/resource/cusparse_handle.hpp
  • cpp/include/raft/core/resource/custom_resource.hpp
  • cpp/include/raft/core/resource/detail/stream_sync_event.hpp
  • cpp/include/raft/core/resource/device_id.hpp
  • cpp/include/raft/core/resource/device_memory_resource.hpp
  • cpp/include/raft/core/resource/device_properties.hpp
  • cpp/include/raft/core/resource/managed_memory_resource.hpp
  • cpp/include/raft/core/resource/multi_gpu.hpp
  • cpp/include/raft/core/resource/nccl_comm.hpp
  • cpp/include/raft/core/resource/pinned_memory_resource.hpp
  • cpp/include/raft/core/resource/resource_types.hpp
  • cpp/include/raft/core/resource/stream_view.hpp
  • cpp/include/raft/core/resource/sub_comms.hpp
  • cpp/include/raft/core/resource/thrust_policy.hpp
  • cpp/include/raft/core/resources.hpp
  • cpp/internal/CMakeLists.txt
  • cpp/tests/core/handle.cpp

Comment thread cpp/include/raft/core/resource/comms.hpp
Comment thread cpp/include/raft/core/resource/cublas_handle.hpp Outdated
Comment thread cpp/include/raft/core/resource/cusolver_dn_handle.hpp Outdated
Comment thread cpp/include/raft/core/resource/multi_gpu.hpp
Comment thread cpp/include/raft/core/resource/nccl_comm.hpp Outdated
Comment thread cpp/include/raft/core/resources.hpp
Comment thread cpp/include/raft/core/resources.hpp
@cjnolet cjnolet requested review from a team as code owners July 1, 2026 15:01
@gforsyth gforsyth requested review from a team as code owners July 1, 2026 18:25
Comment thread ci/build_cpp.sh Outdated
@gforsyth gforsyth force-pushed the enh-predictable-resources branch from 18b01f9 to 34c4bfa Compare July 1, 2026 21:17

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
cpp/include/raft/core/resource/sub_comms.hpp (1)

14-23: 🩺 Stability & Availability | 🟠 Major | 🏗️ Heavy lift

Unsynchronized shared sub-comm state

raft::resources copies now share this cell, but communicators_ is still read and mutated without any locking. That makes set_subcomm() / get_subcomm() on copied handles race on the same std::unordered_map, which can corrupt the container or crash.

Have you considered guarding this state or switching set_subcomm() to the same replace-the-cell pattern used by set_comms()?

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cpp/include/raft/core/resource/sub_comms.hpp` around lines 14 - 23,
`sub_comms_resource` currently exposes a shared `communicators_` map through
`get_resource()` with no synchronization, so copied `raft::resources` handles
can race when `set_subcomm()` and `get_subcomm()` touch the same state. Fix this
by making the sub-comm state safe for concurrent access in the
`sub_comms_resource`/`set_subcomm()`/`get_subcomm()` path, either by guarding
`communicators_` with the same locking approach used elsewhere or by switching
`set_subcomm()` to a replace-the-cell pattern instead of mutating the shared
`std::unordered_map` in place.

Source: Path instructions

♻️ Duplicate comments (5)
cpp/include/raft/core/resource/cublaslt_handle.hpp (1)

47-51: 🚀 Performance & Scalability | 🟠 Major | 🏗️ Heavy lift

Same eager-allocation pattern as other resource getters.

get_cublaslt_handle now heap-allocates a shared_ptr<cublaslt_resource_factory> on every call rather than only when CUBLASLT_HANDLE is unregistered. See the primary write-up on cpp/include/raft/core/resource/cuda_stream.hpp (lines 60-64).

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cpp/include/raft/core/resource/cublaslt_handle.hpp` around lines 47 - 51, The
get_cublaslt_handle helper eagerly creates a new cublaslt_resource_factory on
every call, matching the same allocation issue seen in the other resource
getter. Update get_cublaslt_handle to only construct and register the factory
when CUBLASLT_HANDLE is not already present, using the existing resources APIs
and the cublaslt_resource_factory symbol to preserve lazy initialization.
cpp/include/raft/core/resource/pinned_memory_resource.hpp (1)

56-63: 🚀 Performance & Scalability | 🟠 Major | 🏗️ Heavy lift

Factory heap-allocated on every call, even when resource already exists.

res.ensure_default_factory(std::make_shared<pinned_memory_resource_factory>()) builds and heap-allocates a shared_ptr<pinned_memory_resource_factory> unconditionally on every invocation of get_pinned_memory_resource_ref, regardless of whether PINNED_MEMORY_RESOURCE is already registered — the argument is evaluated eagerly before ensure_default_factory can decide whether it's needed. The prior has_resource_factory/add_resource_factory pattern only allocated the factory on the (rare) first call. See the related discussion on cpp/include/raft/core/resource/cuda_stream.hpp (lines 60-64) for the primary write-up; same pattern repeats here.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cpp/include/raft/core/resource/pinned_memory_resource.hpp` around lines 56 -
63, The get_pinned_memory_resource_ref helper eagerly allocates a
pinned_memory_resource_factory on every call because ensure_default_factory is
passed a freshly constructed shared_ptr unconditionally. Update
get_pinned_memory_resource_ref to use the same lazy registration pattern as the
cuda_stream resource helper: check whether the PINNED_MEMORY_RESOURCE factory
already exists before creating and registering pinned_memory_resource_factory,
so the heap allocation only happens on the first call.
cpp/include/raft/core/resource/cusolver_dn_handle.hpp (1)

66-70: 🚀 Performance & Scalability | 🟠 Major | 🏗️ Heavy lift

Same eager-allocation pattern as other resource getters.

Same concern as flagged in cpp/include/raft/core/resource/cuda_stream.hpp (lines 60-64): get_cusolver_dn_handle now heap-allocates a factory shared_ptr (plus a nested one via get_cuda_stream(res)) on every call, not just the first.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cpp/include/raft/core/resource/cusolver_dn_handle.hpp` around lines 66 - 70,
The get_cusolver_dn_handle helper is eagerly allocating a new
cusolver_dn_resource_factory and a new CUDA stream-backed factory on every call
instead of only initializing once. Update get_cusolver_dn_handle to follow the
same lazy-caching pattern used by the other resource getters, so the factory
setup happens only when the resource is missing and reused afterward; use the
existing resources / ensure_default_factory / get_resource flow to avoid
repeated heap allocations.
cpp/include/raft/core/resource/cusparse_handle.hpp (1)

58-62: 🚀 Performance & Scalability | 🟠 Major | 🏗️ Heavy lift

Same eager-allocation pattern as other resource getters.

get_cusparse_handle heap-allocates a factory shared_ptr (plus a nested one via get_cuda_stream(res)) on every call now, not just on first registration. See the primary write-up on cpp/include/raft/core/resource/cuda_stream.hpp (lines 60-64).

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cpp/include/raft/core/resource/cusparse_handle.hpp` around lines 58 - 62, The
get_cusparse_handle accessor is eagerly allocating a new
cusparse_resource_factory on every call instead of only when the resource is
first registered. Update get_cusparse_handle to follow the same lazy
registration pattern as the other resource getters, reusing the existing factory
once present and only constructing the cusparse_resource_factory (and calling
get_cuda_stream(res)) when the CUSPARSE_HANDLE resource is missing.
cpp/include/raft/core/resource/thrust_policy.hpp (1)

54-59: 🚀 Performance & Scalability | 🟠 Major | 🏗️ Heavy lift

Same eager-allocation-on-every-call pattern, plus a second nested allocation.

get_thrust_policy allocates a shared_ptr<thrust_policy_resource_factory> on every call regardless of whether THRUST_POLICY is already registered, and the constructor argument get_cuda_stream(res) triggers its own independent ensure_default_factory allocation for CUDA_STREAM_VIEW internally — so a single get_thrust_policy call now does up to 2 heap allocations even in steady state, versus none previously. See the primary comment on cpp/include/raft/core/resource/cuda_stream.hpp (lines 60-64).

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cpp/include/raft/core/resource/thrust_policy.hpp` around lines 54 - 59,
get_thrust_policy currently does eager work on every call: it always creates a
thrust_policy_resource_factory and also forces get_cuda_stream(res), which may
allocate again even when THRUST_POLICY already exists. Update get_thrust_policy
to first check whether the THRUST_POLICY resource is already registered and only
call ensure_default_factory when it is missing, reusing the existing resource
afterward. Keep the fix localized around get_thrust_policy and its use of
thrust_policy_resource_factory/get_cuda_stream so steady-state calls avoid
unnecessary heap allocations.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@cpp/include/raft/core/resource/cuda_stream.hpp`:
- Around line 60-64: The hot path in get_cuda_stream() eagerly constructs a new
cuda_stream_resource_factory on every call because ensure_default_factory is
passed a freshly created shared_ptr by value. Update get_cuda_stream() to avoid
creating the factory unless it is actually needed, and apply the same lazy
pattern to the other resource getter helpers that delegate through
get_cuda_stream(res) so they do not incur repeated heap allocations after
initialization.

---

Outside diff comments:
In `@cpp/include/raft/core/resource/sub_comms.hpp`:
- Around line 14-23: `sub_comms_resource` currently exposes a shared
`communicators_` map through `get_resource()` with no synchronization, so copied
`raft::resources` handles can race when `set_subcomm()` and `get_subcomm()`
touch the same state. Fix this by making the sub-comm state safe for concurrent
access in the `sub_comms_resource`/`set_subcomm()`/`get_subcomm()` path, either
by guarding `communicators_` with the same locking approach used elsewhere or by
switching `set_subcomm()` to a replace-the-cell pattern instead of mutating the
shared `std::unordered_map` in place.

---

Duplicate comments:
In `@cpp/include/raft/core/resource/cublaslt_handle.hpp`:
- Around line 47-51: The get_cublaslt_handle helper eagerly creates a new
cublaslt_resource_factory on every call, matching the same allocation issue seen
in the other resource getter. Update get_cublaslt_handle to only construct and
register the factory when CUBLASLT_HANDLE is not already present, using the
existing resources APIs and the cublaslt_resource_factory symbol to preserve
lazy initialization.

In `@cpp/include/raft/core/resource/cusolver_dn_handle.hpp`:
- Around line 66-70: The get_cusolver_dn_handle helper is eagerly allocating a
new cusolver_dn_resource_factory and a new CUDA stream-backed factory on every
call instead of only initializing once. Update get_cusolver_dn_handle to follow
the same lazy-caching pattern used by the other resource getters, so the factory
setup happens only when the resource is missing and reused afterward; use the
existing resources / ensure_default_factory / get_resource flow to avoid
repeated heap allocations.

In `@cpp/include/raft/core/resource/cusparse_handle.hpp`:
- Around line 58-62: The get_cusparse_handle accessor is eagerly allocating a
new cusparse_resource_factory on every call instead of only when the resource is
first registered. Update get_cusparse_handle to follow the same lazy
registration pattern as the other resource getters, reusing the existing factory
once present and only constructing the cusparse_resource_factory (and calling
get_cuda_stream(res)) when the CUSPARSE_HANDLE resource is missing.

In `@cpp/include/raft/core/resource/pinned_memory_resource.hpp`:
- Around line 56-63: The get_pinned_memory_resource_ref helper eagerly allocates
a pinned_memory_resource_factory on every call because ensure_default_factory is
passed a freshly constructed shared_ptr unconditionally. Update
get_pinned_memory_resource_ref to use the same lazy registration pattern as the
cuda_stream resource helper: check whether the PINNED_MEMORY_RESOURCE factory
already exists before creating and registering pinned_memory_resource_factory,
so the heap allocation only happens on the first call.

In `@cpp/include/raft/core/resource/thrust_policy.hpp`:
- Around line 54-59: get_thrust_policy currently does eager work on every call:
it always creates a thrust_policy_resource_factory and also forces
get_cuda_stream(res), which may allocate again even when THRUST_POLICY already
exists. Update get_thrust_policy to first check whether the THRUST_POLICY
resource is already registered and only call ensure_default_factory when it is
missing, reusing the existing resource afterward. Keep the fix localized around
get_thrust_policy and its use of thrust_policy_resource_factory/get_cuda_stream
so steady-state calls avoid unnecessary heap allocations.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: b3c8bea1-cce3-4baa-9e8a-37fe515192df

📥 Commits

Reviewing files that changed from the base of the PR and between 2141422 and 3733629.

📒 Files selected for processing (30)
  • cpp/.clang-format
  • cpp/CMakeLists.txt
  • cpp/include/raft/core/device_resources.hpp
  • cpp/include/raft/core/handle.hpp
  • cpp/include/raft/core/memory_stats_resources.hpp
  • cpp/include/raft/core/memory_tracking_resources.hpp
  • cpp/include/raft/core/resource/comms.hpp
  • cpp/include/raft/core/resource/cublas_handle.hpp
  • cpp/include/raft/core/resource/cublaslt_handle.hpp
  • cpp/include/raft/core/resource/cuda_stream.hpp
  • cpp/include/raft/core/resource/cuda_stream_pool.hpp
  • cpp/include/raft/core/resource/cusolver_dn_handle.hpp
  • cpp/include/raft/core/resource/cusolver_sp_handle.hpp
  • cpp/include/raft/core/resource/cusparse_handle.hpp
  • cpp/include/raft/core/resource/custom_resource.hpp
  • cpp/include/raft/core/resource/detail/stream_sync_event.hpp
  • cpp/include/raft/core/resource/device_id.hpp
  • cpp/include/raft/core/resource/device_memory_resource.hpp
  • cpp/include/raft/core/resource/device_properties.hpp
  • cpp/include/raft/core/resource/managed_memory_resource.hpp
  • cpp/include/raft/core/resource/multi_gpu.hpp
  • cpp/include/raft/core/resource/nccl_comm.hpp
  • cpp/include/raft/core/resource/pinned_memory_resource.hpp
  • cpp/include/raft/core/resource/resource_types.hpp
  • cpp/include/raft/core/resource/stream_view.hpp
  • cpp/include/raft/core/resource/sub_comms.hpp
  • cpp/include/raft/core/resource/thrust_policy.hpp
  • cpp/include/raft/core/resources.hpp
  • cpp/internal/CMakeLists.txt
  • cpp/tests/core/handle.cpp

Comment thread cpp/include/raft/core/resource/cuda_stream.hpp
achirkin added 5 commits July 3, 2026 11:56
…to avoid overheads of creating and discarding the factories
…to avoid overheads of creating and discarding the factories
…to avoid overheads of creating and discarding the factories
…to avoid overheads of creating and discarding the factories
@achirkin

achirkin commented Jul 3, 2026

Copy link
Copy Markdown
Contributor Author

Note for reviewers: in the last few commits I've addressed coderabbitai comments, mostly minimizing the diff. I don't expect this could affect the state of any downstream projects (full local build with dependencies is in progress, I'll report if I see any failures).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

breaking Breaking change improvement Improvement / enhancement to an existing function

Projects

Status: In Progress

Development

Successfully merging this pull request may close these issues.

5 participants