现象描述
输入 nvidia-smi 显示如下错误:
jiang@jiang-ThinkStation-P520:~$ nvidia-smi
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.
NVIDIA-SMI has failed because it couldn’t communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.
前几天测试的时候还好好的,突然不行了。 然后查看cuda和cudnn都是有的。
jiang@jiang-ThinkStation-P520:~$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Wed_Oct_23_19:24:38_PDT_2019
Cuda compilation tools, release 10.2, V10.2.89
jiang@jiang-ThinkStation-P520:~$
jiang@jiang-ThinkStation-P520:~$
jiang@jiang-ThinkStation-P520:~$
jiang@jiang-ThinkStation-P520:~$ cat /usr/local/cuda/include/cudnn.h | grep CUDNN_MAJOR -A 2
#define CUDNN_MAJOR 7
#define CUDNN_MINOR 6
#define CUDNN_PATCHLEVEL 5
--
#define CUDNN_VERSION (CUDNN_MAJOR * 1000 + CUDNN_MINOR * 100 + CUDNN_PATCHLEVEL)
#include "driver_types.h"
jiang@jiang-ThinkStation-P520:~$
jiang@jiang-ThinkStation-P520:~$
原因分析
然后百度发现,有的说是内核自动升级了与英伟达显卡不匹配导致的,得指定内核版本。
解决办法
然后我用了下面方式最后正常了,
(1)首先,查看自己安装的nvidia版本
ls /usr/src | grep nvidia
jiang@jiang-ThinkStation-P520:~$ ls /usr/src | grep nvidia
nvidia-460.56
jiang@jiang-ThinkStation-P520:~$
(2)然后,终端执行一下命令
sudo apt install dkms
sudo dkms install -m nvidia -v 460.56
(3)再次输入nvidia-smi,显示:
过程日志
jiang@jiang-ThinkStation-P520:~$ ls /usr/src | grep nvidia
nvidia-460.56
jiang@jiang-ThinkStation-P520:~$
jiang@jiang-ThinkStation-P520:~$
jiang@jiang-ThinkStation-P520:~$
jiang@jiang-ThinkStation-P520:~$ sudo apt install dkms
正在读取软件包列表... 完成
正在分析软件包的依赖关系树
正在读取状态信息... 完成
dkms 已经是最新版 (2.3-3ubuntu9.7)。
dkms 已设置为手动安装。
下列软件包是自动安装的并且现在不需要了:
libatomic1:i386 libbsd0:i386 libdrm-amdgpu1:i386 libdrm-intel1:i386 libdrm-nouveau2:i386 libdrm-radeon1:i386 libdrm2:i386 libedit2:i386 libelf1:i386 libexpat1:i386
libffi6:i386 libfwup1 libgl1:i386 libgl1-mesa-dri:i386 libglapi-mesa:i386 libglvnd0:i386 libglx-mesa0:i386 libglx0:i386 libllvm10:i386 libllvm9 libnvidia-cfg1-440-server
libnvidia-cfg1-450-server libnvidia-common-440 libnvidia-common-450 libnvidia-common-460 libnvidia-compute-440-server libnvidia-compute-450-server
libnvidia-decode-440-server libnvidia-decode-450-server libnvidia-encode-440-server libnvidia-encode-450-server libnvidia-extra-440-server libnvidia-extra-450-server
libnvidia-fbc1-440-server libnvidia-fbc1-450-server libpciaccess0:i386 libsensors4:i386 libstdc++6:i386 libx11-6:i386 libx11-xcb1:i386 libxau6:i386 libxcb-dri2-0:i386
libxcb-dri3-0:i386 libxcb-glx0:i386 libxcb-present0:i386 libxcb-sync1:i386 libxcb1:i386 libxdamage1:i386 libxdmcp6:i386 libxext6:i386 libxfixes3:i386 libxnvctrl0
libxshmfence1:i386 libxxf86vm1:i386 linux-hwe-5.4-headers-5.4.0-47 linux-hwe-5.4-headers-5.4.0-48 nvidia-compute-utils-440-server nvidia-compute-utils-450-server
nvidia-prime nvidia-settings nvidia-utils-440-server nvidia-utils-450-server screen-resolution-extra xserver-xorg-video-nvidia-440-server
xserver-xorg-video-nvidia-450-server
使用'sudo apt autoremove'来卸载它(它们)。
升级了 0 个软件包,新安装了 0 个软件包,要卸载 0 个软件包,有 73 个软件包未被升级。
jiang@jiang-ThinkStation-P520:~$
jiang@jiang-ThinkStation-P520:~$
jiang@jiang-ThinkStation-P520:~$
jiang@jiang-ThinkStation-P520:~$
jiang@jiang-ThinkStation-P520:~$ sudo dkms install -m nvidia -v 460.56
Creating symlink /var/lib/dkms/nvidia/460.56/source ->
/usr/src/nvidia-460.56
DKMS: add completed.
Kernel preparation unnecessary for this kernel. Skipping...
Building module:
cleaning build area...
'make' -j8 NV_EXCLUDE_BUILD_MODULES='' KERNEL_UNAME=5.4.0-91-generic IGNORE_CC_MISMATCH='' modules.......
.
..
Signing module:
- /var/lib/dkms/nvidia/460.56/5.4.0-91-generic/x86_64/module/nvidia-uvm.ko
- /var/lib/dkms/nvidia/460.56/5.4.0-91-generic/x86_64/module/nvidia-modeset.ko
- /var/lib/dkms/nvidia/460.56/5.4.0-91-generic/x86_64/module/nvidia-drm.ko
- /var/lib/dkms/nvidia/460.56/5.4.0-91-generic/x86_64/module/nvidia.ko
Secure Boot not enabled on this system.
cleaning build area...
DKMS: build completed.
nvidia.ko:
Running module version sanity check.
- Original module
- No original module exists within this kernel
- Installation
- Installing to /lib/modules/5.4.0-91-generic/updates/dkms/
nvidia-uvm.ko:
Running module version sanity check.
- Original module
- No original module exists within this kernel
- Installation
- Installing to /lib/modules/5.4.0-91-generic/updates/dkms/
nvidia-modeset.ko:
Running module version sanity check.
- Original module
- No original module exists within this kernel
- Installation
- Installing to /lib/modules/5.4.0-91-generic/updates/dkms/
nvidia-drm.ko:
Running module version sanity check.
- Original module
- No original module exists within this kernel
- Installation
- Installing to /lib/modules/5.4.0-91-generic/updates/dkms/
depmod....
DKMS: install completed.
jiang@jiang-ThinkStation-P520:~$
jiang@jiang-ThinkStation-P520:~$
jiang@jiang-ThinkStation-P520:~$ nvidia-smi
Fri Dec 31 15:52:30 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.156.00 Driver Version: 460.56 CUDA Version: 11.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 GeForce RTX 2080 Off | 00000000:65:00.0 Off | N/A |
| 22% 50C P0 29W / 225W | 0MiB / 7974MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
jiang@jiang-ThinkStation-P520:~$
jiang@jiang-ThinkStation-P520:~$
参考:
1、https://www.jianshu.com/p/6b998ba2c6a6 2、https://blog.csdn.net/sinat_23619409/article/details/85220561
|