开发: C++知识库 Java知识库 JavaScript Python PHP知识库人工智能区块链大数据移动开发嵌入式开发工具数据结构与算法开发测试游戏开发网络协议系统运维
教程: HTML教程 CSS教程 JavaScript教程 Go语言教程 JQuery教程 VUE教程 VUE3教程 Bootstrap教程 SQL数据库教程 C语言教程 C++教程 Java教程 Python教程 Python3教程 C#教程
数码: 电脑笔记本显卡显示器固态硬盘硬盘耳机手机 iphone vivo oppo 小米华为单反装机图拉丁

-> 人工智能 -> PyTorch安装成功，但不能使用GPU功能：PyTorch no longer supports this GPU. CUDA error: no kernel image is available -> 正文阅读

[人工智能]PyTorch安装成功，但不能使用GPU功能：PyTorch no longer supports this GPU. CUDA error: no kernel image is available

导师配了一个台式机，便着手配置PyTorch环境。根据台式机的显卡驱动(472.12)、CUDA、cuDNN版本安装好PyTorch之后，调用torch.cuda.is_available()函数，可以发现PyTorch-GPU版本已经安装成功。

import torch

print(torch.__version__)
print(torch.cuda.is_available())

# 1.10.1
# True

但是安装的PyTorch却无法调用GPU进行运算

a = torch.Tensor(5,3)
print(a)
a.cuda()

# tensor([[1.0194e-38, 9.6429e-39, 9.2755e-39],
#         [9.1837e-39, 9.3674e-39, 1.0745e-38],
#         [1.0653e-38, 9.5510e-39, 1.0561e-38],
#         [1.0194e-38, 1.1112e-38, 1.0561e-38],
#         [9.9184e-39, 1.0653e-38, 4.1327e-39]])

RuntimeError: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

Jupyter Notebook还提示我们：

D:\Anaconda3\lib\site-packages\torch\cuda\__init__.py:83: UserWarning: 
    Found GPU%d %s which is of cuda capability %d.%d.
    PyTorch no longer supports this GPU because it is too old.
    The minimum cuda capability supported by this library is %d.%d.
    
  warnings.warn(old_gpu_warn.format(d, name, major, minor, min_arch // 10, min_arch % 10))

PyTorch no longer supports this GPU because it is too old. 我们的GPU型号比较旧(GeForce GT 730，2G显存，算力3.5)，现在的PyTorch已经不支持了。

PyTorch安装成功，但不能使用GPU功能：PyTorch no longer supports this GPU because it is too old. 及CUDA error: no kernel image is available for execution on the device

1.根据Python的提示内容进行修改
2.降低PyTorch版本
3.根据显卡算力选择相应的PyTorch版本
4.使用pip install --pre安装PyTorch
5.Building PyTorch from source
6.继续降低PyTorch版本

1.根据Python的提示内容进行修改

在按照Python的提示设置CUDA_LAUNCH_BLOCKING=1，即禁用所有cuda应用程序异步执行，仍然不能正常使用GPU进行运算

import os
os.environ['CUDA_LAUNCH_BLOCKING'] = '1'

a = torch.Tensor(5,3)
print(a)
a.cuda()

# tensor([[1.0194e-38, 9.6429e-39, 9.2755e-39],
#         [9.1837e-39, 9.3674e-39, 1.0745e-38],
#         [1.0653e-38, 9.5510e-39, 1.0561e-38],
#         [1.0194e-38, 1.1112e-38, 1.0561e-38],
#         [9.9184e-39, 1.0653e-38, 4.1327e-39]])

RuntimeError: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

2.降低PyTorch版本

由于Jupyter Notebook提示“当前的PyTorch已经不支持我们的GPU”，故可以尝试降低PyTorch版本。

但将PyTorch版本由1.10.1降到1.9.1、1.9.0、1.8.0后，Python仍然报出相同的错误提示，无法调用GPU进行运算。在添加了conda的下载镜像源之后，使用conda install来下载PyTorch依然非常慢（至少需要4h，且安装过程可能中断，此处省略n天……），故我们采用离线的方式来安装PyTorch。

在清华镜像网站中可以下载cudatoolkit、pytorch、torchvision、torchaudio的离线安装包。通过conda install --use-local安装离线安装包（在tar.brz文件的下载目录中运行）

conda install --use-local pytorch-1.7.1-py3.9_cuda110_cudnn8_0.tar.bz2
conda install --use-local cudatoolkit-11.0.221-h74a9793_0.tar.bz2
conda install --use-local torchvision-0.8.2-py39_cu110.tar.bz2
conda install -c anaconda torchaudio==0.7.2 // 有些包在conda默认的channels中不包含，比如cudatoolkit-8.0，cudnn等，这时只需要在conda install指令后加上-c anaconda即可
conda install --use-local torchvision-0.8.2-py39_cu110.tar.bz2 // torchvision的版本变成了0.2.2

由于有些项目还不支持最新的Python3.9，故新建一个基于Python3.7的环境（便于以后使用），同时安装对应版本的CUDA、cuDNN，并不断对PyTorch降级。

conda install --use-local pytorch-1.6.0-py3.7_cuda102_cudnn7_0.tar.bz2
conda install --use-local cudatoolkit-10.2.89-h74a9793_1.tar.bz2
conda install --use-local torchaudio-0.6.0-py37.tar.bz2
conda install --use-local torchvision-0.7.0-py37_cu102.tar.bz2

PyTorch1.5.1版本不需要torchaudio

conda install --use-local pytorch-1.5.1-py3.7_cuda92_cudnn7_0.tar.bz2
conda install --use-local cudatoolkit-9.2-0.tar.bz2
conda install --use-local torchvision-0.6.1-py37_cu92.tar.bz2

此时，Jupyter Notebook已经“不再提示”GPU型号比较旧，PyTorch不支持了。但是Python仍然报出相同的错误提示，无法调用GPU进行运算。

a = torch.Tensor(5,3)
print(a)
a.cuda()

# tensor([[1.0194e-38, 9.6429e-39, 9.2755e-39],
#         [9.1837e-39, 9.3674e-39, 1.0745e-38],
#         [1.0653e-38, 9.5510e-39, 1.0561e-38],
#         [1.0194e-38, 1.1112e-38, 1.0561e-38],
#         [9.9184e-39, 1.0653e-38, 4.1327e-39]])

RuntimeError: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

3.根据显卡算力选择相应的PyTorch版本

根据一些博客的讨论，错误RuntimeError: CUDA error: no kernel image is available for execution on the device可能是由于GPU的算力小于3.5。于是我们查找资料，探究各个版本的PyTorch所支持的GPU算力：

PyTorch	Pyton	CUDA	cuDNN	Architectures
pytorch-1.0.0	py3.7	cuda10.0.130	cudnn7.4.1_1	sm_30, sm_35, sm_50, sm_60, sm_61, sm_70, sm_75
pytorch-1.0.0	py3.7	cuda8.0.61	cudnn7.1.2_1	sm_20, sm_35, sm_37, sm_50, sm_52, sm_53, sm_60, sm_61
pytorch-1.0.0	py3.7	cuda9.0.176	cudnn7.4.1_1	sm_35, sm_37, sm_50, sm_52, sm_53, sm_60, sm_61, sm_70
pytorch-1.0.1	py3.7	cuda10.0.130	cudnn7.4.2_0	sm_35, sm_50, sm_60, sm_61, sm_70, sm_75
pytorch-1.0.1	py3.7	cuda10.0.130	cudnn7.4.2_2	sm_35, sm_50, sm_60, sm_61, sm_70, sm_75
pytorch-1.0.1	py3.7	cuda8.0.61	cudnn7.1.2_0	sm_35, sm_37, sm_50, sm_52, sm_53, sm_60, sm_61
pytorch-1.0.1	py3.7	cuda8.0.61	cudnn7.1.2_2	sm_35, sm_37, sm_50, sm_52, sm_53, sm_60, sm_61
pytorch-1.0.1	py3.7	cuda9.0.176	cudnn7.4.2_0	sm_35, sm_50, sm_60, sm_61, sm_70
pytorch-1.0.1	py3.7	cuda9.0.176	cudnn7.4.2_2	sm_35, sm_50, sm_60, sm_70
pytorch-1.1.0	py3.7	cuda10.0.130	cudnn7.5.1_0	sm_35, sm_50, sm_60, sm_61, sm_70, sm_75
pytorch-1.1.0	py3.7	cuda9.0.176	cudnn7.5.1_0	sm_35, sm_50, sm_60, sm_61, sm_70
pytorch-1.2.0	py3.7	cuda9.2.148	cudnn7.6.2_0	sm_35, sm_50, sm_60, sm_61, sm_70
pytorch-1.2.0	py3.7	cuda10.0.130	cudnn7.6.2_0	sm_35, sm_50, sm_60, sm_61, sm_70, sm_75
pytorch-1.2.0	py3.7	cuda9.2.148	cudnn7.6.2_0	sm_35, sm_50, sm_60, sm_61, sm_70
pytorch-1.3.0	py3.7	cuda10.0.130	cudnn7.6.3_0	sm_30, sm_35, sm_50, sm_60, sm_61, sm_70, sm_75
pytorch-1.3.0	py3.7	cuda10.1.243	cudnn7.6.3_0	sm_30, sm_35, sm_50, sm_60, sm_61, sm_70, sm_75
pytorch-1.3.0	py3.7	cuda9.2.148	cudnn7.6.3_0	sm_35, sm_50, sm_60, sm_61, sm_70
pytorch-1.3.1	py3.7	cuda10.0.130	cudnn7.6.3_0	sm_35, sm_37, sm_50, sm_60, sm_61, sm_70, sm_75
pytorch-1.3.1	py3.7	cuda10.1.243	cudnn7.6.3_0	sm_35, sm_37, sm_50, sm_60, sm_61, sm_70, sm_75
pytorch-1.3.1	py3.7	cuda9.2.148	cudnn7.6.3_0	sm_35, sm_37, sm_50, sm_60, sm_61, sm_70
pytorch-1.4.0	py3.7	cuda10.0.130	cudnn7.6.3_0	sm_35, sm_37, sm_50, sm_60, sm_61, sm_70, sm_75
pytorch-1.4.0	py3.7	cuda10.1.243	cudnn7.6.3_0	sm_35, sm_37, sm_50, sm_60, sm_61, sm_70, sm_75
pytorch-1.4.0	py3.7	cuda9.2.148	cudnn7.6.3_0	sm_35, sm_37, sm_50, sm_60, sm_61, sm_70
pytorch-1.5.0	py3.7	cuda10.1.243	cudnn7.6.3_0	sm_35, sm_37, sm_50, sm_60, sm_61, sm_70, sm_75
pytorch-1.5.0	py3.7	cuda10.2.89	cudnn7.6.5_0	sm_35, sm_37, sm_50, sm_60, sm_61, sm_70, sm_75
pytorch-1.5.0	py3.7	cuda9.2.148	cudnn7.6.3_0	sm_35, sm_37, sm_50, sm_60, sm_61, sm_70
pytorch-1.5.1	py3.7	cuda10.1.243	cudnn7.6.3_0	sm_35, sm_37, sm_50, sm_60, sm_61, sm_70, sm_75
pytorch-1.5.1	py3.7	cuda10.2.89	cudnn7.6.5_0	sm_35, sm_37, sm_50, sm_60, sm_61, sm_70, sm_75
pytorch-1.5.1	py3.7	cuda9.2.148	cudnn7.6.3_0	sm_35, sm_37, sm_50, sm_60, sm_61, sm_70
pytorch-1.6.0	py3.7	cuda10.1.243	cudnn7.6.3_0	sm_35, sm_37, sm_50, sm_60, sm_61, sm_70, sm_75
pytorch-1.6.0	py3.7	cuda10.2.89	cudnn7.6.5_0	sm_35, sm_37, sm_50, sm_60, sm_61, sm_70, sm_75
pytorch-1.6.0	py3.7	cuda9.2.148	cudnn7.6.3_0	sm_35, sm_37, sm_50, sm_60, sm_61, sm_70
pytorch-1.7.0	py3.7	cuda10.1.243	cudnn7.6.3_0	sm_35, sm_37, sm_50, sm_60, sm_61, sm_70, sm_75
pytorch-1.7.0	py3.7	cuda10.2.89	cudnn7.6.5_0	sm_35, sm_37, sm_50, sm_60, sm_61, sm_70, sm_75
pytorch-1.7.0	py3.7	cuda11.0.221	cudnn8.0.3_0	sm_37, sm_50, sm_60, sm_61, sm_70, sm_75, sm_80
pytorch-1.7.0	py3.7	cuda9.2.148	cudnn7.6.3_0	sm_35, sm_37, sm_50, sm_60, sm_61, sm_70
pytorch-1.7.1	py3.7	cuda10.1.243	cudnn7.6.3_0	sm_35, sm_37, sm_50, sm_60, sm_61, sm_70, sm_75
pytorch-1.7.1	py3.7	cuda10.2.89	cudnn7.6.5_0	sm_35, sm_37, sm_50, sm_60, sm_61, sm_70, sm_75
pytorch-1.7.1	py3.7	cuda11.0.221	cudnn8.0.5_0	sm_35, sm_37, sm_50, sm_60, sm_61, sm_70, sm_75, sm_80
pytorch-1.7.1	py3.7	cuda9.2.148	cudnn7.6.3_0	sm_37, sm_50, sm_60, sm_61, sm_70
pytorch-1.8.0	py3.7	cuda10.1	cudnn7.6.3_0	sm_35, sm_37, sm_50, sm_60, sm_61, sm_70, sm_75
pytorch-1.8.0	py3.7	cuda10.2	cudnn7.6.5_0	sm_35, sm_37, sm_50, sm_60, sm_61, sm_70, sm_75
pytorch-1.8.0	py3.7	cuda11.1	cudnn8.0.5_0	sm_35, sm_37, sm_50, sm_60, sm_61, sm_70, sm_75, sm_80, sm_86
pytorch-1.8.1	py3.7	cuda10.1	cudnn7.6.3_0	sm_35, sm_37, sm_50, sm_60, sm_61, sm_70, sm_75
pytorch-1.8.1	py3.7	cuda10.2	cudnn7.6.5_0	sm_35, sm_37, sm_50, sm_60, sm_61, sm_70, sm_75
pytorch-1.8.1	py3.7	cuda11.1	cudnn8.0.5_0	sm_35, sm_37, sm_50, sm_60, sm_61, sm_70, sm_75, sm_80, sm_86

参考：pytorch 报错 RuntimeError: CUDA error: no kernel image is available for execution on the device

我的显卡(GeForce GT 730，2G显存)，算力为3.5，应该适用于绝大多数PyTorch版本，但无法调用GPU进行运算。

因此，需要从其他途径寻找解决办法。

4.使用pip install --pre安装PyTorch

通过查找资料，我们发现有一些国际友人也遇到了和我们同样的问题，他们通过pip install --pre解决了该问题

pip uninstall torch
pip install --pre torch torchvision -f https://download.pytorch.org/whl/nightly/cu110/torch_nightly.html

参考： Cuda error: no kernel image is available for execution on the device #31285

其中的cu110对应着CUDA11.0。如果自己的CUDA版本为CUDA11.x，则可以将此处替换为cu11x

这种方法“或许”可行，但是由于python服务器与中国大陆的距离较远，pip install的速度非常慢，上述命令的下载速度仅有2kb/s（好羡慕国外的网友）。或许我们得放弃这种方法了。

通过查看上述代码的python安装过程提示，可以发现pip install --pre torch torchvision是从某个网站逐个下载某个版本的torch和torchvision。因此，我们可以尝试从国内的镜像源下载并安装该版本的PyTorch安装包

pip install C:\Users\Lenovo\Downloads\torch-1.10.1-cp37-cp37m-win_amd64.whl

结果发现安装的PyTorch是CPU版本的。无计可施，我们只能放弃这种方法。

5.Building PyTorch from source

在查阅了大量的资料后，我们发现了一种可行的方法：Building PyTorch from source.

即使GPU的型号很老，也能通过这种方法使用较新版本的PyTorch来进行GPU运算。

We will have to:

adjust TORCH_CUDA_ARCH_LIST to match the desired archs
choose the docker image that matches your nvidia-drivers, e.g. the one listed in the script below uses the latest of this writing docker image, and I had to update my nvidia kernel to 465 and cuda to 11.3. If your drivers are less than that use an earlier image. See comment notes for details.
I map this folder on my fs ~/github/00pytorch/pytorch/docker to docker’s /tmp/out where I copy the wheels. So make sure to edit it to the real folder on your system

It will take a few hours to build if you have a relatively strong machine, but otherwise it’s very painless to do.

####################
### source build ###
####################

# one time prep

https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html#docker

# then
docker run --runtime=nvidia --rm nvidia/cuda:11.0-base nvidia-smi

# find the latest container at https://ngc.nvidia.com/catalog/containers/nvidia:pytorch (use Tags tab),
# but also check that the driver version isn't too high in release notes here:
# https://docs.nvidia.com/deeplearning/frameworks/pytorch-release-notes/index.html
# then pull it:
docker pull nvcr.io/nvidia/pytorch:21.04-py3

#docker run --gpus all --ipc=host --rm -it nvcr.io/nvidia/pytorch:21.04-py3
# to mount some host system dir inside the docker -v src:tgt
docker run --gpus all --ipc=host --rm -it -v ~/github/00pytorch/pytorch/docker:/tmp/output nvcr.io/nvidia/pytorch:21.04-py3

# once docker is running:
conda create -n pytorch-dev python=3.8 -y
bash
conda init bash
conda activate pytorch-dev
conda install numpy ninja pyyaml mkl mkl-include setuptools cmake cffi typing_extensions future six requests dataclasses -y
# adjust cuda113 below to whatever cuda version the image is for (cuda110, etc.)
conda install -c pytorch magma-cuda113 -y

git clone --recursive https://github.com/pytorch/pytorch
cd pytorch
# if you are updating an existing checkout
#git submodule sync
#git submodule update --init --recursive
#git pull

# to build a wheel
unset PYTORCH_BUILD_VERSION
unset PYTORCH_VERSION
TORCH_CUDA_ARCH_LIST="6.1 8.6" \
CUDA_HOME="/usr/local/cuda" \
CMAKE_PREFIX_PATH="$(dirname $(which conda))/../" \
USE_SYSTEM_NCCL=1 \
NCCL_INCLUDE_DIR="/usr/include/" \
NCCL_LIB_DIR="/usr/lib/" \
python setup.py bdist_wheel 2>&1 | tee build.log
pip install dist/*whl
# make a copy of the wheel outside the docker
cp dist/*whl /tmp/output

# adjust TORCH_CUDA_ARCH_LIST if needed, the full list is:
# TORCH_CUDA_ARCH_LIST="5.2 6.0 6.1 7.0 7.5 8.0 8.6+PTX"

# had to install nccl .so objects on the target system

# could also add:
# USE_OPENCV=1 \
# but need to have matching .so objects on the target system

# NEXT: build torchvision - since many packages depend on it

cd ..
git clone https://github.com/pytorch/vision
cd vision
# if you are updating an existing checkout
#git pull

# to build a wheel
TORCH_CUDA_ARCH_LIST="6.1 8.6" \
python setup.py bdist_wheel
pip install dist/*whl
# make a copy of the wheel outside the docker
cp dist/*whl /tmp/output


cd ..
git clone --recursive https://github.com/pytorch/audio
cd audio
# if you are updating an existing checkout
#git submodule sync
#git submodule update --init --recursive
#git pull

# to build a wheel
TORCH_CUDA_ARCH_LIST="6.1 8.6" \
BUILD_SOX=1 python setup.py bdist_wheel
pip install dist/*whl
# make a copy of the wheel outside the docker
cp dist/*whl /tmp/output

参考：

Cuda error: no kernel image is available for execution on the device #31285
Building PyTorch from source on Windows to work with an old GPU
How to install pytorch FROM SOURCE (with cuda enabled for a deprecated CUDA cc 3.5 of an old gpu) using anaconda prompt on Windows 10?
How to Compile the Latest Pytorch from Source in Windows with CUDA Support
Building Pytorch from source with cuda support on WSL2(Ubuntu 20.04, cuda11.4, Windows11)
build from source 安装 PyTorch及很多坑

这可能也会花费很多时间，我们先尝试别的方法.

6.继续降低PyTorch版本

通过查找同型号显卡(GeForce GT 730)的PyTorch安装步骤，发现有网友使用PyTorch1.0.0版本安装成功并正常使用。但是这个版本太低了，很多新功能应该无法使用。

我们只能尝试继续对PyTorch降级，最终降到1.2.0版本时，终于可以正常使用了！

import torch

print(torch.__version__)
print(torch.cuda.is_available())

# 1.2.0
# True

a = torch.Tensor(5,3)
print(a)
print(a.cuda())

# tensor([[7.5305e+16, 6.3619e-43, 7.5305e+16],
#         [6.3619e-43, 7.5296e+16, 6.3619e-43],
#         [7.5296e+16, 6.3619e-43, 7.5305e+16],
#         [6.3619e-43, 7.5305e+16, 6.3619e-43],
#         [7.5291e+16, 6.3619e-43, 7.5291e+16]])
# tensor([[7.5305e+16, 6.3619e-43, 7.5305e+16],
#         [6.3619e-43, 7.5296e+16, 6.3619e-43],
#         [7.5296e+16, 6.3619e-43, 7.5305e+16],
#         [6.3619e-43, 7.5305e+16, 6.3619e-43],
#         [7.5291e+16, 6.3619e-43, 7.5291e+16]], device='cuda:0')