NV embedding layer loading by z52527 · Pull Request #197 · triton-inference-server/pytorch_backend

z52527 · 2026-06-10T03:36:15Z

Problem

Some PT2/AOTI models need a one-time native initialization at model load — before the package is loaded — that cannot live inside model.pt2. A concrete example: a model whose custom ops resolve embedding tables from a process-global registry at execute time, where the weights are stored in sidecar files next to the package (not inside model.pt2). Loading the .pt2 does not load them, so the first inference request fails:

... no binding registered for layer_id=1
-> Inductor model instance execution failure ... model_container_runner.cpp:150

The Triton PyTorch backend currently has no entry point to run such load-time initialization, so today it can only be done with fragile workarounds (e.g. an LD_PRELOAD shim that hijacks an unrelated CUDA call to piggyback on its timing).

What this PR does

Adds a generic, optional model-init hook to the PT2 path. When a model's config.pbtxt sets the MODEL_INIT_LIBRARY parameter, the backend dlopen()s that shared library in LoadModel() — right before the AOTIModelPackageLoader is created (CUDA is already initialized there) — and calls a fixed C entry point. Design choices:

Backend stays generic — it does not link against the library and knows nothing about what it does. It only dlopens the .so, calls triton_pytorch_model_init(model_dir, device_index), and releases it (triton_pytorch_model_release + dlclose) on unload. All model-specific logic lives in the user-provided plug-in, built and shipped downstream.
No new dependency — the only added link is ${CMAKE_DL_LIBS} (CMake's built-in libdl variable, for dlopen/dlsym/dlclose). The backend links no vendor libraries, so the stock nvcr.io/nvidia/pytorch:26.05-py3 build is unaffected.
No-op by default — a model without the parameter is completely unaffected; a dlopen/dlsym failure fails model load (fail-closed).
Scoped to the model lifetime — the handle is held in ModelState and freed in the destructor.
Fixes a pre-existing parameters bug — ParseParameters() was guarded by if (!ModelConfig().Find("parameters", ...)), so PT2 parameters were parsed only when the section was absent; any model that set a parameters block had them all (INFERENCE_MODE, …) silently ignored. Dropping the ! makes them take effect.

The library must export:

// Called once at model load, before the package is loaded.
void* triton_pytorch_model_init(const char* model_dir, int device_index);
// Optional; called once on unload with the handle returned above.
void  triton_pytorch_model_release(void* state);

Verified end-to-end on the official nvcr.io/nvidia/tritonserver:26.05-py3 image with an AOTI model (parameter parsed → hook loaded → model READY → inference correct); also compiles against main.

Some PT2/AOTI models need one-time native initialization at model load -- before the package is loaded -- that cannot live inside model.pt2. For example, a model whose custom ops resolve weights from a process-global registry at execute time, where those weights are stored outside the package and must be read in and registered first. Add an optional, per-model hook. When a model's config.pbtxt sets the MODEL_INIT_LIBRARY parameter, the backend dlopen()s that shared library at model load and calls its triton_pytorch_model_init(model_dir, device_index) entry point, holds the returned handle in ModelState, and releases it (triton_pytorch_model_fini + dlclose) on unload. The backend does not link against the library and knows nothing about what it does -- it only loads it, calls a fixed C entry point, and releases it on unload. Unset by default (a complete no-op). dlopen/dlsym failure fails model load. This also fixes a pre-existing bug in ParseParameters(): the block was guarded by `if (!ModelConfig().Find("parameters", ...))`, i.e. parameters were parsed only when the block was ABSENT. Any PT2 model that actually provides a `parameters` section had every parameter (INFERENCE_MODE, CACHE_CLEANING_ENABLED, ... and MODEL_INIT_LIBRARY) silently ignored. Dropping the `!` makes parameters take effect when present. - src/pt2/model_state.{cc,hh}: fix the inverted Find check; parse MODEL_INIT_LIBRARY (string_value via MemberAsString); run the hook in LoadModel; free in the destructor - CMakeLists.txt: link ${CMAKE_DL_LIBS} for dlopen/dlsym/dlclose - README.md: document the parameter, the C ABI, and the trust note Verified end-to-end on nvcr.io/nvidia/tritonserver:26.05-py3 with an HSTU GR-ranking AOTI model: parameters parsed, hook dlopen'd, 2 NVE layers loaded into the process-global registry, model READY, inference returns correct logits. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Runchu Zhao <zhaorunchu@gmail.com>

whoisj

Overall, this is good. Thank you for the contribution. I've left a couple of comments but am willing to accept this change as-is.

mc-nv · 2026-06-16T16:19:10Z

    triton-core-backendapi # from repo-core
    triton-core-serverstub # from repo-core
    triton-backend-utils   # from repo-backend
+    ${CMAKE_DL_LIBS}       # dlopen/dlsym/dlclose for the MODEL_INIT_LIBRARY hook


CMAKE_ prefix is a reserved keyword, afraid it's usage may create some confusion. See examples below.
http://cmake.org/cmake/help/latest/manual/cmake-variables.7.html

Also change description explains following, which means those libraries are vital to preserve the functionality, isn't?

An HSTU AOTI package (model.pt2) calls nve_ops::embedding_lookup(keys, layer_id),
which looks the embedding table up by layer_id in a process-global
NVELayerRegistry. The embedding weights do not live inside model.pt2 —
they are separate files next to it (<model_dir>/metadata.json +
<model_dir>/weights/*.nve). So loading the .pt2 does not load them, and the
first inference request fails

Could you please share the missed libraries origin ?

@z52527 please follow up with @mc-nv's requests. This is blocking merge.

Sorry for the confusion. PR description was stale; it still described the earlier "link NVE into the backend" version. I've updated it.
In the current revision the backend links no NVE or vendor libraries. There are no missing libraries to source. The only CMake change is target_link_libraries(... ${CMAKE_DL_LIBS}). CMAKE_DL_LIBS is a built-in CMake variable for the platform's dynamic-loading library (libdl), needed for the dlopen/dlsym/dlclose the hook uses.

whoisj · 2026-06-16T18:33:09Z

CI Pipeline ID: 55028851

whoisj · 2026-06-16T23:40:10Z

I've pushed a change to make the pre-commit hook happy. @z52527, please pull your branch to receive my contribution before working on it to avoid conflicts; or duplicate my work in your own commit. Thanks.

mc-nv reviewed Jun 10, 2026

View reviewed changes

Comment thread CMakeLists.txt

Comment thread CMakeLists.txt Outdated

whoisj requested changes Jun 10, 2026

View reviewed changes

Comment thread README.md

z52527 force-pushed the nve-layer-loading branch from 52831c9 to 13b1e93 Compare June 11, 2026 10:16

whoisj previously approved these changes Jun 11, 2026

View reviewed changes

Comment thread src/pt2/model_state.cc

Comment thread src/pt2/model_state.cc

mc-nv reviewed Jun 16, 2026

View reviewed changes

make pre-commit happy

0b2f0b3

whoisj dismissed their stale review via 0b2f0b3 June 16, 2026 23:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NV embedding layer loading#197

NV embedding layer loading#197
z52527 wants to merge 2 commits into
triton-inference-server:mainfrom
z52527:nve-layer-loading

z52527 commented Jun 10, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

whoisj left a comment

Uh oh!

Uh oh!

Uh oh!

mc-nv Jun 16, 2026

Uh oh!

whoisj Jun 16, 2026

Uh oh!

z52527 Jun 17, 2026

Uh oh!

whoisj commented Jun 16, 2026

Uh oh!

whoisj commented Jun 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

3 participants

Conversation

z52527 commented Jun 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

What this PR does

Uh oh!

Uh oh!

Uh oh!

Uh oh!

whoisj left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

mc-nv Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

whoisj Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

z52527 Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

whoisj commented Jun 16, 2026

Uh oh!

whoisj commented Jun 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

3 participants

z52527 commented Jun 10, 2026 •

edited

Loading