NV embedding layer loading#197
Conversation
Some PT2/AOTI models need one-time native initialization at model load -- before
the package is loaded -- that cannot live inside model.pt2. For example, a model
whose custom ops resolve weights from a process-global registry at execute time,
where those weights are stored outside the package and must be read in and
registered first.
Add an optional, per-model hook. When a model's config.pbtxt sets the
MODEL_INIT_LIBRARY parameter, the backend dlopen()s that shared library at model
load and calls its triton_pytorch_model_init(model_dir, device_index) entry
point, holds the returned handle in ModelState, and releases it
(triton_pytorch_model_fini + dlclose) on unload. The backend does not link
against the library and knows nothing about what it does -- it only loads it,
calls a fixed C entry point, and releases it on unload.
Unset by default (a complete no-op). dlopen/dlsym failure fails model load.
This also fixes a pre-existing bug in ParseParameters(): the block was guarded
by `if (!ModelConfig().Find("parameters", ...))`, i.e. parameters were parsed
only when the block was ABSENT. Any PT2 model that actually provides a
`parameters` section had every parameter (INFERENCE_MODE, CACHE_CLEANING_ENABLED,
... and MODEL_INIT_LIBRARY) silently ignored. Dropping the `!` makes parameters
take effect when present.
- src/pt2/model_state.{cc,hh}: fix the inverted Find check; parse
MODEL_INIT_LIBRARY (string_value via MemberAsString); run the hook in
LoadModel; free in the destructor
- CMakeLists.txt: link ${CMAKE_DL_LIBS} for dlopen/dlsym/dlclose
- README.md: document the parameter, the C ABI, and the trust note
Verified end-to-end on nvcr.io/nvidia/tritonserver:26.05-py3 with an HSTU
GR-ranking AOTI model: parameters parsed, hook dlopen'd, 2 NVE layers loaded into
the process-global registry, model READY, inference returns correct logits.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Runchu Zhao <zhaorunchu@gmail.com>
52831c9 to
13b1e93
Compare
whoisj
left a comment
There was a problem hiding this comment.
Overall, this is good. Thank you for the contribution. I've left a couple of comments but am willing to accept this change as-is.
| triton-core-backendapi # from repo-core | ||
| triton-core-serverstub # from repo-core | ||
| triton-backend-utils # from repo-backend | ||
| ${CMAKE_DL_LIBS} # dlopen/dlsym/dlclose for the MODEL_INIT_LIBRARY hook |
There was a problem hiding this comment.
CMAKE_ prefix is a reserved keyword, afraid it's usage may create some confusion. See examples below.
http://cmake.org/cmake/help/latest/manual/cmake-variables.7.html
Also change description explains following, which means those libraries are vital to preserve the functionality, isn't?
An HSTU AOTI package (model.pt2) calls nve_ops::embedding_lookup(keys, layer_id),
which looks the embedding table up by layer_id in a process-global
NVELayerRegistry. The embedding weights do not live inside model.pt2 —
they are separate files next to it (<model_dir>/metadata.json +
<model_dir>/weights/*.nve). So loading the .pt2 does not load them, and the
first inference request fails
Could you please share the missed libraries origin ?
There was a problem hiding this comment.
Sorry for the confusion. PR description was stale; it still described the earlier "link NVE into the backend" version. I've updated it.
In the current revision the backend links no NVE or vendor libraries. There are no missing libraries to source. The only CMake change is target_link_libraries(... ${CMAKE_DL_LIBS}). CMAKE_DL_LIBS is a built-in CMake variable for the platform's dynamic-loading library (libdl), needed for the dlopen/dlsym/dlclose the hook uses.
|
CI Pipeline ID: 55028851 |
|
I've pushed a change to make the pre-commit hook happy. @z52527, please pull your branch to receive my contribution before working on it to avoid conflicts; or duplicate my work in your own commit. Thanks. |
Problem
Some PT2/AOTI models need a one-time native initialization at model load — before the package is loaded — that cannot live inside
model.pt2. A concrete example: a model whose custom ops resolve embedding tables from a process-global registry at execute time, where the weights are stored in sidecar files next to the package (not insidemodel.pt2). Loading the.pt2does not load them, so the first inference request fails:The Triton PyTorch backend currently has no entry point to run such load-time initialization, so today it can only be done with fragile workarounds (e.g. an
LD_PRELOADshim that hijacks an unrelated CUDA call to piggyback on its timing).What this PR does
Adds a generic, optional model-init hook to the PT2 path. When a model's
config.pbtxtsets theMODEL_INIT_LIBRARYparameter, the backenddlopen()s that shared library inLoadModel()— right before theAOTIModelPackageLoaderis created (CUDA is already initialized there) — and calls a fixed C entry point. Design choices:dlopens the.so, callstriton_pytorch_model_init(model_dir, device_index), and releases it (triton_pytorch_model_release+dlclose) on unload. All model-specific logic lives in the user-provided plug-in, built and shipped downstream.${CMAKE_DL_LIBS}(CMake's built-in libdl variable, fordlopen/dlsym/dlclose). The backend links no vendor libraries, so the stocknvcr.io/nvidia/pytorch:26.05-py3build is unaffected.dlopen/dlsymfailure fails model load (fail-closed).ModelStateand freed in the destructor.ParseParameters()was guarded byif (!ModelConfig().Find("parameters", ...)), so PT2 parameters were parsed only when the section was absent; any model that set aparametersblock had them all (INFERENCE_MODE, …) silently ignored. Dropping the!makes them take effect.The library must export:
Verified end-to-end on the official
nvcr.io/nvidia/tritonserver:26.05-py3image with an AOTI model (parameter parsed → hook loaded → model READY → inference correct); also compiles againstmain.