记录一下最近跑TinaFace代码在原来服务器跑没有问题,新服务器跑遇到的错误
首先,按照官网步骤安装相关包: 本人环境: 显卡驱动版本: NVIDIA-SMI 460.73.01 Driver Version: 460.106.00 CUDA Version: 11.2 CUDA版本:nvcc -V: Cuda compilation tools, release 11.1, V11.1.74
pytorch 1.8.1
torchvision 0.9.1
mmcv-full 1.4.6
mmdet 2.22.0
cudatoolkit 11.1.1
ps:如果没有安装上mmcv或者mmdet,不要怀疑,肯定是你的版本有问题。这个问题博主也遇到了。
检查上面版本,通常来讲,没有任何问题。 cudatoolkit严格按照cuda版本安装的,mmdet也是根据cuda版本和pytorch版本安装的。
不出意外的话出意外了,最后一步执行命令 pip install -v -e . 编辑vedadet报错:
Installing collected packages: vedadet
Running setup.py develop for vedadet
Running command /mnt/data/cbm/software/anaconda3/envs/lw/bin/python -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/mnt/data1/lw/vedadet-main/setup.py'"'"'; __file__='"'"'/mnt/data1/lw/vedadet-main/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' develop --no-deps
running develop
running egg_info
writing vedadet.egg-info/PKG-INFO
writing dependency_links to vedadet.egg-info/dependency_links.txt
writing requirements to vedadet.egg-info/requires.txt
writing top-level names to vedadet.egg-info/top_level.txt
reading manifest file 'vedadet.egg-info/SOURCES.txt'
adding license file 'LICENSE'
writing manifest file 'vedadet.egg-info/SOURCES.txt'
running build_ext
building 'vedadet.ops.nms.nms_ext' extension
Emitting ninja build file /mnt/data1/lw/vedadet-main/build/temp.linux-x86_64-3.8/build.ninja...
Compiling objects...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
[1/1] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output /mnt/data1/lw/vedadet-main/build/temp.linux-x86_64-3.8/vedadet/ops/nms/src/cuda/nms_kernel.o.d -DWITH_CUDA -I/mnt/data/cbm/software/anaconda3/envs/lw/lib/python3.8/site-packages/torch/include -I/mnt/data/cbm/software/anaconda3/envs/lw/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/mnt/data/cbm/software/anaconda3/envs/lw/lib/python3.8/site-packages/torch/include/TH -I/mnt/data/cbm/software/anaconda3/envs/lw/lib/python3.8/site-packages/torch/include/THC -I/usr/local/cuda/include -I/mnt/data/cbm/software/anaconda3/envs/lw/include/python3.8 -c -c /mnt/data1/lw/vedadet-main/vedadet/ops/nms/src/cuda/nms_kernel.cu -o /mnt/data1/lw/vedadet-main/build/temp.linux-x86_64-3.8/vedadet/ops/nms/src/cuda/nms_kernel.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=nms_ext -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 -std=c++14
FAILED: /mnt/data1/lw/vedadet-main/build/temp.linux-x86_64-3.8/vedadet/ops/nms/src/cuda/nms_kernel.o
/usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output /mnt/data1/lw/vedadet-main/build/temp.linux-x86_64-3.8/vedadet/ops/nms/src/cuda/nms_kernel.o.d -DWITH_CUDA -I/mnt/data/cbm/software/anaconda3/envs/lw/lib/python3.8/site-packages/torch/include -I/mnt/data/cbm/software/anaconda3/envs/lw/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/mnt/data/cbm/software/anaconda3/envs/lw/lib/python3.8/site-packages/torch/include/TH -I/mnt/data/cbm/software/anaconda3/envs/lw/lib/python3.8/site-packages/torch/include/THC -I/usr/local/cuda/include -I/mnt/data/cbm/software/anaconda3/envs/lw/include/python3.8 -c -c /mnt/data1/lw/vedadet-main/vedadet/ops/nms/src/cuda/nms_kernel.cu -o /mnt/data1/lw/vedadet-main/build/temp.linux-x86_64-3.8/vedadet/ops/nms/src/cuda/nms_kernel.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=nms_ext -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 -std=c++14
nvcc fatal : Unsupported gpu architecture 'compute_86'
ninja: build stopped: subcommand failed.
Traceback (most recent call last):
File "/mnt/data/cbm/software/anaconda3/envs/lw/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1667, in _run_ninja_build
subprocess.run(
File "/mnt/data/cbm/software/anaconda3/envs/lw/lib/python3.8/subprocess.py", line 516, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/mnt/data1/lw/vedadet-main/setup.py", line 119, in <module>
setup(
File "/mnt/data/cbm/software/anaconda3/envs/lw/lib/python3.8/site-packages/setuptools/__init__.py", line 153, in setup
return distutils.core.setup(**attrs)
File "/mnt/data/cbm/software/anaconda3/envs/lw/lib/python3.8/distutils/core.py", line 148, in setup
dist.run_commands()
File "/mnt/data/cbm/software/anaconda3/envs/lw/lib/python3.8/distutils/dist.py", line 966, in run_commands
self.run_command(cmd)
File "/mnt/data/cbm/software/anaconda3/envs/lw/lib/python3.8/distutils/dist.py", line 985, in run_command
cmd_obj.run()
File "/mnt/data/cbm/software/anaconda3/envs/lw/lib/python3.8/site-packages/setuptools/command/develop.py", line 34, in run
self.install_for_development()
File "/mnt/data/cbm/software/anaconda3/envs/lw/lib/python3.8/site-packages/setuptools/command/develop.py", line 114, in install_for_development
self.run_command('build_ext')
File "/mnt/data/cbm/software/anaconda3/envs/lw/lib/python3.8/distutils/cmd.py", line 313, in run_command
self.distribution.run_command(command)
File "/mnt/data/cbm/software/anaconda3/envs/lw/lib/python3.8/distutils/dist.py", line 985, in run_command
cmd_obj.run()
File "/mnt/data/cbm/software/anaconda3/envs/lw/lib/python3.8/site-packages/setuptools/command/build_ext.py", line 79, in run
_build_ext.run(self)
File "/mnt/data/cbm/software/anaconda3/envs/lw/lib/python3.8/site-packages/Cython/Distutils/old_build_ext.py", line 186, in run
_build_ext.build_ext.run(self)
File "/mnt/data/cbm/software/anaconda3/envs/lw/lib/python3.8/distutils/command/build_ext.py", line 340, in run
self.build_extensions()
File "/mnt/data/cbm/software/anaconda3/envs/lw/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 708, in build_extensions
build_ext.build_extensions(self)
File "/mnt/data/cbm/software/anaconda3/envs/lw/lib/python3.8/site-packages/Cython/Distutils/old_build_ext.py", line 195, in build_extensions
_build_ext.build_ext.build_extensions(self)
File "/mnt/data/cbm/software/anaconda3/envs/lw/lib/python3.8/distutils/command/build_ext.py", line 449, in build_extensions
self._build_extensions_serial()
File "/mnt/data/cbm/software/anaconda3/envs/lw/lib/python3.8/distutils/command/build_ext.py", line 474, in _build_extensions_serial
self.build_extension(ext)
File "/mnt/data/cbm/software/anaconda3/envs/lw/lib/python3.8/site-packages/setuptools/command/build_ext.py", line 202, in build_extension
_build_ext.build_extension(self, ext)
File "/mnt/data/cbm/software/anaconda3/envs/lw/lib/python3.8/distutils/command/build_ext.py", line 528, in build_extension
objects = self.compiler.compile(sources,
File "/mnt/data/cbm/software/anaconda3/envs/lw/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 529, in unix_wrap_ninja_compile
_write_ninja_file_and_compile_objects(
File "/mnt/data/cbm/software/anaconda3/envs/lw/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1354, in _write_ninja_file_and_compile_objects
_run_ninja_build(
File "/mnt/data/cbm/software/anaconda3/envs/lw/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1683, in _run_ninja_build
raise RuntimeError(message) from e
RuntimeError: Error compiling objects for extension
ERROR: Command errored out with exit status 1: /mnt/data/cbm/software/anaconda3/envs/lw/bin/python -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/mnt/data1/lw/vedadet-main/setup.py'"'"'; __file__='"'"'/mnt/data1/lw/vedadet-main/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' develop --no-deps Check the logs for full command output.
总结一下,就是报这个错误:nvcc fatal : Unsupported gpu architecture 'compute_86' 意思就是我的GPU算力架构太高了,不支持computer_86。博主GPU算力是3090显卡
网上很多博主说pytorch暂不支持computer_86,有很多建议说是将算力改为computer_75等。 这种自降身价的事,博主是不会做,万一改不回来了,岂不是亏大了。而且也有很多3090显卡的博主降算力后,推理代码时出现各种问题。
解决方案: 虽然 nvcc -V 显示的版本是11.1,但是cuda有个编译版本:一般在/usr/loacal/cuda/ 文件夹下: 这可以可以发现,本地安装了多个cuda版本。cuda文件是编译时采用的cuda版本文件的软件链接。即虽然是nvcc -V版本是11.1,但是编译版本是cuda10.2,导致出错。
临时解决方案:
将conda环境从cuda11.1版本降低到10.2,执行指令pip install -v -e . 能够编译完成,但是执行代码时候会报错:
UserWarning: GeForce RTX 3090 with CUDA capability sm_86 is not compatible with the current PyTorch installation. The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_61 sm_70 sm_75 compute_37. If you want to use the GeForce RTX 30
pytorch 3090显卡对应的cuda版本过低,可以直接将conda环境中的cuda版本升级到11.1,临时解决
注意:虽然两次都是报错cuda问题,第一次需要重新加载编辑本地代码,因此执行了编辑时的cuda版本,但是代码运行时执行是nvcc -V的版本。又会报版本过低。
但是在使用mmdetection框架时,通常都会需要自己设计网络框架后,需要重新编辑本地代码,同样还是需要执行pip install -v -e . 问题不会被解决。
永久解决方案:
将编译时用到的cuda版本升级到11.1
相关参考博客如下: nvcc fatal : Unsupported gpu architecture ‘compute_86‘
安装CUDA时,nvcc --version和cat /usr/local/cuda/version.txt版本不一致
|