Is this a duplicate?
Type of Bug
Runtime Error
Component
cuda.pathfinder
Describe the bug
In a CI environment that worked when using pynvml.py, migrating to cuda.bindings.nvml fails to find the nvidia-ml.so.1 library.
rapidsai/ucxx#640 (comment)
How to Reproduce
I'm not sure what is different about the CI environment in ucxx yet.
Expected behavior
cuda.bindings.nvml should find the underlying .so in all of the cases that pynvml.py did.
For reference, here is what pynvml.py does:
def _LoadNvmlLibrary():
'''
Load the library if it isn't loaded already
'''
global nvmlLib
if (nvmlLib == None):
# lock to ensure only one caller loads the library
libLoadLock.acquire()
try:
# ensure the library still isn't loaded
if (nvmlLib == None):
try:
if (sys.platform[:3] == "win"):
# cdecl calling convention
try:
# Check for nvml.dll in System32 first for DCH drivers
nvmlLib = CDLL(os.path.join(os.getenv("WINDIR", "C:/Windows"), "System32/nvml.dll"))
except OSError as ose:
# If nvml.dll is not found in System32, it should be in ProgramFiles
# load nvml.dll from %ProgramFiles%/NVIDIA Corporation/NVSMI/nvml.dll
nvmlLib = CDLL(os.path.join(os.getenv("ProgramFiles", "C:/Program Files"), "NVIDIA Corporation/NVSMI/nvml.dll"))
else:
# assume linux
nvmlLib = CDLL("libnvidia-ml.so.1")
except OSError as ose:
_nvmlCheckReturn(NVML_ERROR_LIBRARY_NOT_FOUND)
if (nvmlLib == None):
_nvmlCheckReturn(NVML_ERROR_LIBRARY_NOT_FOUND)
finally:
# lock is always freed
libLoadLock.release()
Operating System
rockylinux8
nvidia-smi output
No response
Is this a duplicate?
Type of Bug
Runtime Error
Component
cuda.pathfinder
Describe the bug
In a CI environment that worked when using
pynvml.py, migrating tocuda.bindings.nvmlfails to find thenvidia-ml.so.1library.rapidsai/ucxx#640 (comment)
How to Reproduce
I'm not sure what is different about the CI environment in ucxx yet.
Expected behavior
cuda.bindings.nvmlshould find the underlying .so in all of the cases thatpynvml.pydid.For reference, here is what
pynvml.pydoes:Operating System
rockylinux8
nvidia-smi output
No response