之前已经在TensorRT7.2.1下将模型转换为trt格式的文件,现在一台新服务器上使用tensorrt8.0.1的容器部署,同样是T4显卡下,出现下面的问题:
This container was built for NVIDIA Driver Release 470.42 or later, but version 460.91.03 was detected and compatibility mode is UNAVAILABLE.?
查询了tensorrt8.0.1容器的官方文档 ,有下面一行:
Release 21.07 is based on?NVIDIA CUDA 11.4.0, which requires?NVIDIA Driver?release 470 or later.??However, if you are running on Data Center GPUs (formerly Tesla), for example, T4, you may use NVIDIA driver release 418.40 (or later R418), 440.33 (or later R440), 450.51 (or later R450), or 460.27 (or later R460).?
地址:
https://docs.nvidia.com/deeplearning/tensorrt/container-release-notes/rel_21-07.html#rel_21-07
说明是driver版本不匹配。
解决方法
1、在tensorrt8.0.1容器里重新将模型转换为trt模型,模型可以转换成功,运行时也没有报错,但是推理结果不正确(一定要核对推理结果是否正确),说明该方法不可行。
2、在新服务器上重新安装tensorrt7.2.1容器,在里面进行推理,结果正确,此方法成功解决。
方法总结
tensorrt版本的NVIDIA Driver需求要与服务器的NVIDIA Driver相匹配,不然即使转换成功,也会导致推理的结果不正确。我服务器的NVIDIA Driver版本为460.91.03,官方文档上tensorrt7.2.1的要求是
Release 20.11 is based on?NVIDIA CUDA 11.1.0, which requires?NVIDIA Driver?release 455 or later.?
?所以能正确推理出结果。
总结原因,没有认真了解tensorrt版本与NVIDIA Driver版本的匹配性,导致部署时走了弯路。
|