Mchornyi/tri 971 cmake client dependency management#901
Draft
mc-nv wants to merge 18 commits into
Draft
Conversation
- conanfile.py: declares C++ deps (rapidjson, gtest, libcurl, grpc,
protobuf, re2); injects TRITON_COMMON_SOURCE_DIR and
TRITON_SKIP_THIRD_PARTY_FETCH into CMakeToolchain so the build uses
the monorepo common/ sibling instead of fetching from GitHub.
- cmake/CMakePresets.json: four presets covering cpu/cuda x ubuntu/manylinux
for x86_64 and aarch64; all ubuntu presets enable TRITON_ENABLE_JAVA_HTTP.
- CMakeLists.txt: Conan mode adds common/, src/c++, src/python, src/java
via add_subdirectory; legacy ExternalProject path is unchanged.
- src/c++/CMakeLists.txt: Conan mode skips FetchContent; adds
GTest::gtest alias for test compatibility.
- src/python/CMakeLists.txt: Conan mode skips FetchContent of repo-common;
proto-py-library comes from the parent add_subdirectory(common/).
- Conan profiles: linux-gcc13-{release,debug}-x86_64 (Ubuntu /usr/bin/gcc-13)
and linux-gcc13-release-manylinux-x86_64 (AlmaLinux SCL gcc-toolset-13).
- Dockerfile.sdk: multi-stage Conan build producing an image at parity
with nvcr.io/nvidia/tritonserver:*-py3-sdk; uses Python 3.12 venv,
BuildKit cache mounts, BuildKit secret for Conan token, Java API
bindings, QA suites, Model Analyzer, DCGM, entrypoint banner.
- Dockerfile.sdk.manylinux: builds a manylinux_2_28_x86_64 Python wheel
on quay.io/pypa/manylinux_2_28_x86_64 (AlmaLinux 8) with auditwheel repair.
CMake rejects unknown top-level keys in CMakePresets.json with "Invalid extra field" — \$comment is not part of the spec.
Examples require OpenCV headers at configure time when TRITON_ENABLE_EXAMPLES=ON; runtime stage already had libopencv-dev.
cmake --build --preset resolves CMakePresets.json from CWD, not the source dir. WORKDIR is /workspace so --build --preset was looking for /workspace/CMakePresets.json. Fixed by cd-ing into client/ first. Same fix applied to Dockerfile.sdk.manylinux.
Examples compile \$<TARGET_OBJECTS:json-utils-library> directly into executables; OBJECT library PUBLIC include dirs are not propagated when used as sources rather than via target_link_libraries. Adding find_package + include_directories at the src/c++ level (before library/ and examples/ subdirs) matches the existing pattern for protobuf.
ninja runs the link command from the preset build root, not from the
target's binary subdir. The relative ldscript path works in ExternalProject
mode (which sets CWD per-target) but fails in add_subdirectory/preset mode.
Use \${CMAKE_CURRENT_BINARY_DIR} for both grpcclient and httpclient.
In Conan mode two compat shims were missing after add_subdirectory(common): - repo-common_BINARY_DIR: used by src/python to locate generated model_config_pb2.py (was only found via _deps/repo-common-build in FetchContent mode; now points to triton-common-build). - include_directories(TRITON_COMMON_SOURCE_DIR/include): needed by C++ library and tests for triton/common/triton_json.h.
build_wheel.py hardcodes the FetchContent path ../_deps/repo-common-build/ relative to the library binary dir. In Conan/add_subdirectory mode the actual path is triton-common-build/ under the build root. Create a _deps/repo-common-build -> repo-common_BINARY_DIR symlink at configure time so the script finds proto .py files without modification.
The option defaults to ON and generates cmake --install rules that copy from ../../third-party/grpc/include etc., which don't exist when deps come from Conan packages. Explicitly set it OFF via the toolchain.
tritondevelopertoolsserver.java may not exist in all javacpp-presets tag versions. Use -f so the script does not fail when the file is absent.
…_VERSION - Layer 3: check for /opt/tritonserver/lib/libtritonserver.so before running cppbuild.sh — client-only builds skip gracefully with INFO msg. - Layer 2: write TRITON_VERSION file so the runtime COPY succeeds even when client/ has no pre-existing TRITON_VERSION (falls back to 0.0.0).
The Conan-cached protobuf/3.21.12 binary (built on Ubuntu) ships a protoc that requires GLIBC_2.33/2.34 and GLIBCXX_3.4.29/3.4.32. AlmaLinux 8 (manylinux_2_28 base) only provides glibc 2.28, so protoc crashes when gRPC tries to run it during its build step. Add --build=protobuf to force protoc to compile natively on the AlmaLinux 8 build machine, where it will link against glibc 2.28.
…l for pure-Python wheel Three root causes diagnosed and fixed: 1. Conan --build=protobuf was silently ignored — Conan 2 uses fnmatch so the bare name "protobuf" does not match "protobuf/3.21.12"; changed to '--build=protobuf*', added '--build=grpc*' and '--build=gtest*' so all three packages that were downloaded as Ubuntu-compiled binaries (glibc 2.38+) are now rebuilt from source on AlmaLinux 8 (glibc 2.28), eliminating GLIBC_2.33 / __isoc23_strtol linker errors. 2. Conan profile cppstd=17 caused protobuf package ID to differ from the Artifactory record (gnu17), so the --build flag hit a compatible-package fallback and was bypassed. Changed to cppstd=gnu17 to align IDs. 3. auditwheel repair was called on a pure-Python wheel (py3-none-any); it requires ELF binaries inside the wheel. tritonclient has no compiled C extensions — stubs are generated Python. Replace repair with a direct cp of the any-platform wheel.
… guard GTest re-find Conan 2 creates imported targets but does not set the legacy CMake variables (RAPIDJSON_INCLUDE_DIRS, Protobuf_INCLUDE_DIRS) that common/CMakeLists.txt uses in target_include_directories / include_directories. Pre-find both packages in the Conan mode block of the root CMakeLists.txt and populate the variables from the imported target's INTERFACE_INCLUDE_DIRECTORIES so the common subdirectory compiles correctly. Guard find_package(GTest) in src/c++/CMakeLists.txt with if(NOT TARGET GTest::gtest): common unconditionally finds GTest, so a second find_package call triggers Conan's GTest-Target-release.cmake to call set_property on already-defined ALIAS targets, which CMake rejects. Also bump Dockerfile.sdk base image to 26.04-py3-min and fix the wheel glob to match the py3-none-any wheel produced by the pure-Python tritonclient build.
Five fixes uncovered by running qa/L0_sdk against the Conan-based SDK image: * Source the authoritative TRITON_VERSION from server/TRITON_VERSION in both Dockerfile.sdk and Dockerfile.sdk.manylinux. The tarball (v<ver>.clients.tar.gz), wheel (tritonclient-<ver>), and /workspace/TRITON_VERSION now ship the real release version instead of "0.0.0". * Run `conan install --deployer=full_deploy` and copy every Conan package include/ and lib*/ tree into /workspace/install. This brings grpc/grpc++/grpcpp/absl/google/curl/openssl headers plus the static archives that qa/L0_sdk compiles and links against. Without this, the legacy ExternalProject layout could not be reproduced. * Symlink perf_analyzer into /usr/local/bin/. The pip-installed perf_analyzer entry point lives at /opt/venv-tritonclient/bin/; the L0_sdk wheel-install check requires the exact path /usr/local/bin/ perf_analyzer. * Pin setuptools<80 in the runtime venv. setuptools 80+ no longer ships bundled pkg_resources, which tritonclient.utils.cuda_shared_ memory still imports. * Mirror Conan's libssl.a / libcrypto.a into /usr/lib/x86_64-linux-gnu/ and remove the unversioned libssl.so / libcrypto.so symlinks. Conan-built libcurl.a uses OpenSSL 3.2 symbols (SSL_get0_group_name) that are absent from Ubuntu's OpenSSL 3.0 .a; gcc -lssl -lcrypto now resolves to our archives. The .so.3 SONAME is preserved for Python's ssl module. L0_sdk now reports "*** Test Passed ***" inside triton-client-sdk:tri-971-local (grpc_test, grpc_test_static, http_test, http_test_static all green).
Follow-ups from code review of e407db1: * Multi-arch guard for the OpenSSL .a override in the final stage. Resolve the multiarch directory from `uname -m` instead of hardcoding `/usr/lib/x86_64-linux-gnu/`, so a future arm64 build fails fast with a clear message instead of a confusing `cp` error. * Replace `for X in $(find ...)` with `while IFS= read -r X` + process substitution in the Conan dep-copy step. Word-split-safe if a future Conan package name ever contains whitespace. * Drop `2>/dev/null` from the `cp -rn` calls so genuine errors (no-space, permission, etc.) surface in build logs. The `|| true` already covers the expected `cp -n` collision exits. * Tighten the `setuptools<80` comment to state the actual reason (80+ removed bundled `pkg_resources`). * Document `--build-arg TRITON_SERVER_REPO_SUBDIR=server` in the Dockerfile.sdk.manylinux header so the example invocation matches Dockerfile.sdk. * Bump copyright year on both Dockerfiles to 2025-2026. Rebuilt and re-ran qa/L0_sdk inside the image — all four sub-tests remain green.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.