Skip to content

Mchornyi/tri 971 cmake client dependency management#901

Draft
mc-nv wants to merge 18 commits into
mainfrom
mchornyi/tri-971-cmake-client-dependency-management
Draft

Mchornyi/tri 971 cmake client dependency management#901
mc-nv wants to merge 18 commits into
mainfrom
mchornyi/tri-971-cmake-client-dependency-management

Conversation

@mc-nv

@mc-nv mc-nv commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

No description provided.

mc-nv and others added 18 commits May 14, 2026 15:47
- conanfile.py: declares C++ deps (rapidjson, gtest, libcurl, grpc,
  protobuf, re2); injects TRITON_COMMON_SOURCE_DIR and
  TRITON_SKIP_THIRD_PARTY_FETCH into CMakeToolchain so the build uses
  the monorepo common/ sibling instead of fetching from GitHub.

- cmake/CMakePresets.json: four presets covering cpu/cuda x ubuntu/manylinux
  for x86_64 and aarch64; all ubuntu presets enable TRITON_ENABLE_JAVA_HTTP.

- CMakeLists.txt: Conan mode adds common/, src/c++, src/python, src/java
  via add_subdirectory; legacy ExternalProject path is unchanged.

- src/c++/CMakeLists.txt: Conan mode skips FetchContent; adds
  GTest::gtest alias for test compatibility.

- src/python/CMakeLists.txt: Conan mode skips FetchContent of repo-common;
  proto-py-library comes from the parent add_subdirectory(common/).

- Conan profiles: linux-gcc13-{release,debug}-x86_64 (Ubuntu /usr/bin/gcc-13)
  and linux-gcc13-release-manylinux-x86_64 (AlmaLinux SCL gcc-toolset-13).

- Dockerfile.sdk: multi-stage Conan build producing an image at parity
  with nvcr.io/nvidia/tritonserver:*-py3-sdk; uses Python 3.12 venv,
  BuildKit cache mounts, BuildKit secret for Conan token, Java API
  bindings, QA suites, Model Analyzer, DCGM, entrypoint banner.

- Dockerfile.sdk.manylinux: builds a manylinux_2_28_x86_64 Python wheel
  on quay.io/pypa/manylinux_2_28_x86_64 (AlmaLinux 8) with auditwheel repair.
CMake rejects unknown top-level keys in CMakePresets.json with
"Invalid extra field" — \$comment is not part of the spec.
Examples require OpenCV headers at configure time when
TRITON_ENABLE_EXAMPLES=ON; runtime stage already had libopencv-dev.
cmake --build --preset resolves CMakePresets.json from CWD, not the
source dir. WORKDIR is /workspace so --build --preset was looking for
/workspace/CMakePresets.json. Fixed by cd-ing into client/ first.
Same fix applied to Dockerfile.sdk.manylinux.
Examples compile \$<TARGET_OBJECTS:json-utils-library> directly into
executables; OBJECT library PUBLIC include dirs are not propagated when
used as sources rather than via target_link_libraries. Adding
find_package + include_directories at the src/c++ level (before library/
and examples/ subdirs) matches the existing pattern for protobuf.
ninja runs the link command from the preset build root, not from the
target's binary subdir. The relative ldscript path works in ExternalProject
mode (which sets CWD per-target) but fails in add_subdirectory/preset mode.
Use \${CMAKE_CURRENT_BINARY_DIR} for both grpcclient and httpclient.
In Conan mode two compat shims were missing after add_subdirectory(common):
- repo-common_BINARY_DIR: used by src/python to locate generated
  model_config_pb2.py (was only found via _deps/repo-common-build in
  FetchContent mode; now points to triton-common-build).
- include_directories(TRITON_COMMON_SOURCE_DIR/include): needed by
  C++ library and tests for triton/common/triton_json.h.
build_wheel.py hardcodes the FetchContent path ../_deps/repo-common-build/
relative to the library binary dir. In Conan/add_subdirectory mode the
actual path is triton-common-build/ under the build root. Create a
_deps/repo-common-build -> repo-common_BINARY_DIR symlink at configure
time so the script finds proto .py files without modification.
The option defaults to ON and generates cmake --install rules that copy
from ../../third-party/grpc/include etc., which don't exist when deps
come from Conan packages. Explicitly set it OFF via the toolchain.
tritondevelopertoolsserver.java may not exist in all javacpp-presets
tag versions. Use -f so the script does not fail when the file is absent.
…_VERSION

- Layer 3: check for /opt/tritonserver/lib/libtritonserver.so before
  running cppbuild.sh — client-only builds skip gracefully with INFO msg.
- Layer 2: write TRITON_VERSION file so the runtime COPY succeeds even
  when client/ has no pre-existing TRITON_VERSION (falls back to 0.0.0).
The Conan-cached protobuf/3.21.12 binary (built on Ubuntu) ships a
protoc that requires GLIBC_2.33/2.34 and GLIBCXX_3.4.29/3.4.32.
AlmaLinux 8 (manylinux_2_28 base) only provides glibc 2.28, so protoc
crashes when gRPC tries to run it during its build step.

Add --build=protobuf to force protoc to compile natively on the
AlmaLinux 8 build machine, where it will link against glibc 2.28.
…l for pure-Python wheel

Three root causes diagnosed and fixed:

1. Conan --build=protobuf was silently ignored — Conan 2 uses fnmatch so the
   bare name "protobuf" does not match "protobuf/3.21.12"; changed to
   '--build=protobuf*', added '--build=grpc*' and '--build=gtest*' so all
   three packages that were downloaded as Ubuntu-compiled binaries (glibc 2.38+)
   are now rebuilt from source on AlmaLinux 8 (glibc 2.28), eliminating
   GLIBC_2.33 / __isoc23_strtol linker errors.

2. Conan profile cppstd=17 caused protobuf package ID to differ from the
   Artifactory record (gnu17), so the --build flag hit a compatible-package
   fallback and was bypassed. Changed to cppstd=gnu17 to align IDs.

3. auditwheel repair was called on a pure-Python wheel (py3-none-any);
   it requires ELF binaries inside the wheel. tritonclient has no compiled
   C extensions — stubs are generated Python. Replace repair with a direct
   cp of the any-platform wheel.
… guard GTest re-find

Conan 2 creates imported targets but does not set the legacy CMake variables
(RAPIDJSON_INCLUDE_DIRS, Protobuf_INCLUDE_DIRS) that common/CMakeLists.txt
uses in target_include_directories / include_directories.  Pre-find both
packages in the Conan mode block of the root CMakeLists.txt and populate
the variables from the imported target's INTERFACE_INCLUDE_DIRECTORIES so
the common subdirectory compiles correctly.

Guard find_package(GTest) in src/c++/CMakeLists.txt with
if(NOT TARGET GTest::gtest): common unconditionally finds GTest, so a second
find_package call triggers Conan's GTest-Target-release.cmake to call
set_property on already-defined ALIAS targets, which CMake rejects.

Also bump Dockerfile.sdk base image to 26.04-py3-min and fix the wheel glob
to match the py3-none-any wheel produced by the pure-Python tritonclient build.
Five fixes uncovered by running qa/L0_sdk against the Conan-based SDK
image:

* Source the authoritative TRITON_VERSION from server/TRITON_VERSION
  in both Dockerfile.sdk and Dockerfile.sdk.manylinux. The tarball
  (v<ver>.clients.tar.gz), wheel (tritonclient-<ver>), and
  /workspace/TRITON_VERSION now ship the real release version instead
  of "0.0.0".

* Run `conan install --deployer=full_deploy` and copy every Conan
  package include/ and lib*/ tree into /workspace/install. This brings
  grpc/grpc++/grpcpp/absl/google/curl/openssl headers plus the static
  archives that qa/L0_sdk compiles and links against. Without this,
  the legacy ExternalProject layout could not be reproduced.

* Symlink perf_analyzer into /usr/local/bin/. The pip-installed
  perf_analyzer entry point lives at /opt/venv-tritonclient/bin/; the
  L0_sdk wheel-install check requires the exact path /usr/local/bin/
  perf_analyzer.

* Pin setuptools<80 in the runtime venv. setuptools 80+ no longer
  ships bundled pkg_resources, which tritonclient.utils.cuda_shared_
  memory still imports.

* Mirror Conan's libssl.a / libcrypto.a into
  /usr/lib/x86_64-linux-gnu/ and remove the unversioned libssl.so /
  libcrypto.so symlinks. Conan-built libcurl.a uses OpenSSL 3.2
  symbols (SSL_get0_group_name) that are absent from Ubuntu's
  OpenSSL 3.0 .a; gcc -lssl -lcrypto now resolves to our archives.
  The .so.3 SONAME is preserved for Python's ssl module.

L0_sdk now reports "*** Test Passed ***" inside
triton-client-sdk:tri-971-local (grpc_test, grpc_test_static,
http_test, http_test_static all green).
Follow-ups from code review of e407db1:

* Multi-arch guard for the OpenSSL .a override in the final stage.
  Resolve the multiarch directory from `uname -m` instead of
  hardcoding `/usr/lib/x86_64-linux-gnu/`, so a future arm64 build
  fails fast with a clear message instead of a confusing `cp` error.

* Replace `for X in $(find ...)` with `while IFS= read -r X` +
  process substitution in the Conan dep-copy step.  Word-split-safe
  if a future Conan package name ever contains whitespace.

* Drop `2>/dev/null` from the `cp -rn` calls so genuine errors
  (no-space, permission, etc.) surface in build logs.  The
  `|| true` already covers the expected `cp -n` collision exits.

* Tighten the `setuptools<80` comment to state the actual reason
  (80+ removed bundled `pkg_resources`).

* Document `--build-arg TRITON_SERVER_REPO_SUBDIR=server` in the
  Dockerfile.sdk.manylinux header so the example invocation matches
  Dockerfile.sdk.

* Bump copyright year on both Dockerfiles to 2025-2026.

Rebuilt and re-ran qa/L0_sdk inside the image — all four sub-tests
remain green.
@mc-nv mc-nv requested a review from whoisj June 12, 2026 16:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

1 participant