2022-04-29 03:35:16.853021: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcudart.so.10.0'; dlerror: libcudart.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda-10.1/lib64
2022-04-29 03:35:16.853249: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcublas.so.10.0'; dlerror: libcublas.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda-10.1/lib64
2022-04-29 03:35:16.853461: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcufft.so.10.0'; dlerror: libcufft.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda-10.1/lib64
2022-04-29 03:35:16.853664: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcurand.so.10.0'; dlerror: libcurand.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda-10.1/lib64
2022-04-29 03:35:16.853869: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcusolver.so.10.0'; dlerror: libcusolver.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda-10.1/lib64
2022-04-29 03:35:16.854067: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcusparse.so.10.0'; dlerror: libcusparse.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda-10.1/lib64
出现问题的原因
最后在网上搜索发现问题出现的根本原因在于CUDA版本(10.1)和TensorFlow版本(1.14.0)不一致造成的。因为最新的CUDA版本已经更新到10.1+,但是TensorFlow最新只支持到10.0,所以才会出现各种找不到10.0的库。为什么耗费了这么久?因为CUDA很狡猾,在用nvcc -V命令查看时,给出的版本号是10.0,但用watch -n 1 nvidia -smi查询时,右上角显示的却是10.1。这里推荐用下面这个命令查询:
(Shugang3DGCN) usr@ubuntu16:~$ conda list
在其中找到cudatoolkit这一项,这个版本号是比较准确的。
解决办法
我这显示的cudatoolkit版本是10.1,因为需要10.0的版本,可以用如下命令直接覆盖安装
(Shugang3DGCN) usr@ubuntu16:~$ conda install cudatoolkit=10.0
最后,再用conda list查询是否覆盖成功:
在改成10.0之后,代码应该就不会再出问题了。
link?
|