Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how does one fix when torch can't find cuda, error: version libcublasLt.so.11 not defined in file libcublasLt.so.11 with link time reference?

Tags:

cuda

pytorch

I get this error with a pytorch import python -c "import torch":

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/afs/cs.stanford.edu/u/brando9/ultimate-utils/ultimate-utils-proj-src/uutils/__init__.py", line 13, in <module>
    import torch
  File "/dfs/scratch0/brando9/miniconda/envs/metalearning_gpu/lib/python3.9/site-packages/torch/__init__.py", line 191, in <module>
    _load_global_deps()
  File "/dfs/scratch0/brando9/miniconda/envs/metalearning_gpu/lib/python3.9/site-packages/torch/__init__.py", line 153, in _load_global_deps
    ctypes.CDLL(lib_path, mode=ctypes.RTLD_GLOBAL)
  File "/dfs/scratch0/brando9/miniconda/envs/metalearning_gpu/lib/python3.9/ctypes/__init__.py", line 382, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: /dfs/scratch0/brando9/miniconda/envs/metalearning_gpu/lib/python3.9/site-packages/torch/lib/../../nvidia/cublas/lib/libcublas.so.11: symbol cublasLtHSHMatmulAlgoInit, version libcublasLt.so.11 not defined in file libcublasLt.so.11 with link time reference

how does one fix it?

related:

  • Could not load dynamic library 'libcublasLt.so.11'; dlerror: libcublasLt.so.11: cannot open shared object file: No such file or directory
  • https://github.com/pytorch/pytorch/issues/51080
like image 814
Charlie Parker Avatar asked Aug 30 '25 18:08

Charlie Parker


2 Answers

Like eval said, it is because pytorch1.13 automatically install nvidia_cublas_cu11, nvidia_cuda_nvrtc_cu11, nvidia_cuda_runtime_cu11 and nvidia_cudnn_cu11. While I have my own CUDA toolKit already installed, I have the same problem.

In my case, I used pip uninstall nvidia_cublas_cu11 and solved the problem. I think the PyTorch team should solve this issue, since users often have their own CUDAtoolkit installed.

like image 96
lenin Avatar answered Sep 04 '25 04:09

lenin


The error is from dlopen libcublas.so from .../python3.9/site-packages/torch/lib/nvidia/cublas/lib/, which is the pip package "nvidia-cuda-runtime" install location.

libcublasLt.so.11 is dynamically linked to libcublas.so.11. The problem is that when you have a different cuda runtime installation (usually in /usr/local/cuda), dlopen probably gets the wrong one. You can run ldd .../python3.9/site-packages/torch/lib/nvidia/cublas/lib/libcublas.so to check the actual path of libcublasLt.so.11, which is supposed to be the one under .../python3.9/site-packages/torch/lib/nvidia/cublas/lib/

Workarounds:

  1. Set env LD_LIBRARY_PATH=.../python3.9/site-packages/torch/lib/nvidia/cublas/lib/:$LD_LIBRARY_PATH when launching python. So that dlopen can firstly look for .so files in that directory.

  2. Using older torch. It was since 1.13.0 torch pip install started using pip nvidia-* packages. Before that cuda libs are statically linked. That's why older torch pip install has no problem even if you have existing cuda install.

like image 45
eval Avatar answered Sep 04 '25 05:09

eval