how does one fix when torch can't find cuda, error: version libcublasLt.so.11 not defined in file libcublasLt.so.11 with link time reference?

Question

I get this error with a pytorch import python -c "import torch":

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/afs/cs.stanford.edu/u/brando9/ultimate-utils/ultimate-utils-proj-src/uutils/__init__.py", line 13, in <module>
    import torch
  File "/dfs/scratch0/brando9/miniconda/envs/metalearning_gpu/lib/python3.9/site-packages/torch/__init__.py", line 191, in <module>
    _load_global_deps()
  File "/dfs/scratch0/brando9/miniconda/envs/metalearning_gpu/lib/python3.9/site-packages/torch/__init__.py", line 153, in _load_global_deps
    ctypes.CDLL(lib_path, mode=ctypes.RTLD_GLOBAL)
  File "/dfs/scratch0/brando9/miniconda/envs/metalearning_gpu/lib/python3.9/ctypes/__init__.py", line 382, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: /dfs/scratch0/brando9/miniconda/envs/metalearning_gpu/lib/python3.9/site-packages/torch/lib/../../nvidia/cublas/lib/libcublas.so.11: symbol cublasLtHSHMatmulAlgoInit, version libcublasLt.so.11 not defined in file libcublasLt.so.11 with link time reference

how does one fix it?

Could not load dynamic library 'libcublasLt.so.11'; dlerror: libcublasLt.so.11: cannot open shared object file: No such file or directory
https://github.com/pytorch/pytorch/issues/51080

lenin · Accepted Answer

Like eval said, it is because pytorch1.13 automatically install nvidia_cublas_cu11, nvidia_cuda_nvrtc_cu11, nvidia_cuda_runtime_cu11 and nvidia_cudnn_cu11. While I have my own CUDA toolKit already installed, I have the same problem.

In my case, I used pip uninstall nvidia_cublas_cu11 and solved the problem. I think the PyTorch team should solve this issue, since users often have their own CUDAtoolkit installed.

eval · Answer

The error is from dlopen libcublas.so from .../python3.9/site-packages/torch/lib/nvidia/cublas/lib/, which is the pip package "nvidia-cuda-runtime" install location.

libcublasLt.so.11 is dynamically linked to libcublas.so.11. The problem is that when you have a different cuda runtime installation (usually in /usr/local/cuda), dlopen probably gets the wrong one. You can run ldd .../python3.9/site-packages/torch/lib/nvidia/cublas/lib/libcublas.so to check the actual path of libcublasLt.so.11, which is supposed to be the one under .../python3.9/site-packages/torch/lib/nvidia/cublas/lib/

Workarounds:

Set env LD_LIBRARY_PATH=.../python3.9/site-packages/torch/lib/nvidia/cublas/lib/:$LD_LIBRARY_PATH when launching python. So that dlopen can firstly look for .so files in that directory.
Using older torch. It was since 1.13.0 torch pip install started using pip nvidia-* packages. Before that cuda libs are statically linked. That's why older torch pip install has no problem even if you have existing cuda install.

how does one fix when torch can't find cuda, error: version libcublasLt.so.11 not defined in file libcublasLt.so.11 with link time reference?

Tags:

cuda

pytorch

Charlie Parker

2 Answers

lenin

eval

Recent Activity

Donate For Us

how does one fix when torch can't find cuda, error: version libcublasLt.so.11 not defined in file libcublasLt.so.11 with link time reference?

Tags:

cuda

pytorch

Charlie Parker

2 Answers

lenin

eval

Related questions

Recent Activity

Donate For Us