一、普通gpu透传步骤
1: 在BIOS中打开硬件辅助虚拟化功能?持
对于intel cpu, 在主板中开启VT-x及VT-d选项
VT-x为开启虚拟化需要
VT-d为开启PCI passthrough
这两个选项?般在BIOS中Advance下CPU和System或相关条?中设置,例如:
VT: Intel Virtualization Technology
VT-d: Intel VT for Directed I/O
对于 amd cpu, 在主板中开启SVM及IOMMU选项
SVM为开启虚拟化需要
IOMMU为开启PCI passthrough
2:确认内核?持iommu
cat /proc/cmdline | grep iommu
如果没有输出, 则需要修改kernel启动参数
对于intel cpu
1. 编辑 /etc/default/grub ?件, 在 GRUB_CMDLINE_LINUX ?后?添加:
intel_iommu=on
例如:
GRUB_CMDLINE_LINUX="crashkernel=auto rd.lvm.lv=centos/root
rd.lvm.lv=centos/swap rhgb quiet intel_iommu=on"
如果没有 GRUB_CMDLINE_LINUX , 则使? GRUB_CMDLINE_LINUX_DEFAULT
2. 更新grub
grub2-mkconfig -o /boot/grub2/grub.cfg
如果是uefi启动,需要修改启动文件,如下:
grub2-mkconfig -o /boot/efi/EFI/centos/grub.cfg
reboot 重启机器
对于amd cpu
与intel cpu的区别为, 添加的不是 intel_iommu=on , ?是 iommu=on , 其他步骤?样
3:从默认驱动程序解绑网卡(如果设备之前没有提供其他程序使用可忽略)
echo "8086 10fb" > /sys/bus/pci/drivers/pci-stub/new_id
echo "0000:81:00.0" > /sys/bus/pci/devices/0000:81:00.0/driver/unbind
echo "0000:81:00.0" > /sys/bus/pci/drivers/pci-stub/bind
至此,准备透传的网卡已准备就绪。
4:确认pci设备驱动信息
[root@compute01 ~]# lspci -nn | grep -i Eth
1a:00.0 Ethernet controller [0200]: Intel Corporation Ethernet Connection X722 for 10GbE SFP+ [8086:37d0] (rev 09)
1a:00.1 Ethernet controller [0200]: Intel Corporation Ethernet Connection X722 for 10GbE SFP+ [8086:37d0] (rev 09)
1a:00.2 Ethernet controller [0200]: Intel Corporation Ethernet Connection X722 for 1GbE [8086:37d1] (rev 09)
1a:00.3 Ethernet controller [0200]: Intel Corporation Ethernet Connection X722 for 1GbE [8086:37d1] (rev 09)
3b:00.0 Ethernet controller [0200]: Intel Corporation Ethernet Controller X710 for 10GbE SFP+ [8086:1572] (rev 02)
3b:00.1 Ethernet controller [0200]: Intel Corporation Ethernet Controller X710 for 10GbE SFP+ [8086:1572] (rev 02)
3c:00.0 Ethernet controller [0200]: Intel Corporation Ethernet Controller X710 for 10GbE SFP+ [8086:1572] (rev 02)
3c:00.1 Ethernet controller [0200]: Intel Corporation Ethernet Controller X710 for 10GbE SFP+ [8086:1572] (rev 02)
5e:00.0 Ethernet controller [0200]: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection [8086:10fb] (rev 01)
5e:00.1 Ethernet controller [0200]: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection [8086:10fb] (rev 01)
[root@compute01 ~]# lspci -v -s 1a:00.0 ##查看pci设备的具体信息
[8086:10fb] verdor ID为8086 project ID为10fb
配置openstack,如下: 1:配置nova-scheduler 在filter_scheduler中加? PciPassthroughFilter , 同时添加 available_filters = nova.scheduler.filters.all_filters
[filter_scheduler]
host_subset_size = 10
max_io_ops_per_host = 10
enabled_filters = RetryFilter,AvailabilityZoneFilter,ComputeFilter,ComputeCapabilitiesFilter,ImagePropertiesFilter,ServerGroupAntiAffinityFilter,ServerGroupAffinityFilter,AggregateCoreFilter,AggregateDiskFilter,DifferentHostFilter,SameHostFilter,PciPassthroughFilter
available_filters = nova.scheduler.filters.all_filters
2:配置nova-api 添加新的块pci
[pci]
alias = {"vendor_id":"8086","product_id":"10fb","device_type":"type-PCI","name":"a1"}
重启api以及scheduler容器
docker nova-api nova-scheduler
3:配置透传设备所在的计算节点
[pci]
passthrough_whitelist = { "vendor_id":"8086","product_id":"10fb" }
alias = { "vendor_id":"8086", "product_id":"10fb", "device_type":"type-PCI", "name":"a1" }
重启nova-compute服务 注意:如果是网卡"device_type":“type-PF”。如果是gpu则"device_type":“type-PCI” 4:创建带pci标签的flavor
openstack flavor set ml.large --property "pci_passthrough:alias"="a1:1"
使?该flavor创建虚拟机, 虚拟机会?动调度到透传设备的节点上 openstack flavor set FLAVOR-NAME --property pci_passthrough:alias=ALIAS:COUNT 参考官网链接: https://docs.openstack.org/nova/pike/admin/pci-passthrough.html
二、同一pci插槽带有其他其他pci设备的GPU透传
(常见型号:Quadro RTX 6000/8000)
解决方法如下:
将同一个pci槽位的所有设备都直通给一个虚拟机
如果只透传gpu,会出现以下报错
2020-01-14 23:24:01.468 14281 ERROR nova.virt.libvirt.guest [req-fe905189-9d2e-48a3-a848-82149a686c60 74caf2133c6cabb260b88f1a0eba7e0ef524f70eb00cd1f99a6585b9d5545572 836f840d0035448e9b90a9d8da3fd769 - 397d0639d4e9451b9ff85a3e9d73da43 397d0639d4e9451b9ff85a3e9d73da43] Error launching a defined domain with XML: <domain type='kvm'>
2020-01-14 23:24:01.469 14281 ERROR nova.virt.libvirt.driver [req-fe905189-9d2e-48a3-a848-82149a686c60 74caf2133c6cabb260b88f1a0eba7e0ef524f70eb00cd1f99a6585b9d5545572 836f840d0035448e9b90a9d8da3fd769 - 397d0639d4e9451b9ff85a3e9d73da43 397d0639d4e9451b9ff85a3e9d73da43] [instance: 2f315777-b4bd-4a81-b7cb-3ccbb28ddfc6] Failed to start libvirt guest: libvirtError: internal error: qemu unexpectedly closed the monitor: 2020-01-14T15:24:01.257459Z qemu-kvm: -device vfio-pci,host=06:00.0,id=hostdev0,bus=pci.0,addr=0x5: vfio error: 0000:06:00.0: group 46 is not viable
2020-01-14 23:24:01.641 14281 ERROR nova.compute.manager [req-fe905189-9d2e-48a3-a848-82149a686c60 74caf2133c6cabb260b88f1a0eba7e0ef524f70eb00cd1f99a6585b9d5545572 836f840d0035448e9b90a9d8da3fd769 - 397d0639d4e9451b9ff85a3e9d73da43 397d0639d4e9451b9ff85a3e9d73da43] [instance: 2f315777-b4bd-4a81-b7cb-3ccbb28ddfc6] Instance failed to spawn: libvirtError: internal error: qemu unexpectedly closed the monitor: 2020-01-14T15:24:01.257459Z qemu-kvm: -device vfio-pci,host=06:00.0,id=hostdev0,bus=pci.0,addr=0x5: vfio error: 0000:06:00.0: group 46 is not viable
2020-01-14 23:24:01.641 14281 ERROR nova.compute.manager [instance: 2f315777-b4bd-4a81-b7cb-3ccbb28ddfc6] Traceback (most recent call last):
2020-01-14 23:24:01.641 14281 ERROR nova.compute.manager [instance: 2f315777-b4bd-4a81-b7cb-3ccbb28ddfc6] File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 2274, in _build_resources
2020-01-14 23:24:01.641 14281 ERROR nova.compute.manager [instance: 2f315777-b4bd-4a81-b7cb-3ccbb28ddfc6] yield resources
2020-01-14 23:24:01.641 14281 ERROR nova.compute.manager [instance: 2f315777-b4bd-4a81-b7cb-3ccbb28ddfc6] File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 2054, in _build_and_run_instance
2020-01-14 23:24:01.641 14281 ERROR nova.compute.manager [instance: 2f315777-b4bd-4a81-b7cb-3ccbb28ddfc6] block_device_info=block_device_info)
2020-01-14 23:24:01.641 14281 ERROR nova.compute.manager [instance: 2f315777-b4bd-4a81-b7cb-3ccbb28ddfc6] File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 3170, in spawn
2020-01-14 23:24:01.641 14281 ERROR nova.compute.manager [instance: 2f315777-b4bd-4a81-b7cb-3ccbb28ddfc6] destroy_disks_on_failure=True)
2020-01-14 23:24:01.641 14281 ERROR nova.compute.manager [instance: 2f315777-b4bd-4a81-b7cb-3ccbb28ddfc6] File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 5674, in _create_domain_and_network
2020-01-14 23:24:01.641 14281 ERROR nova.compute.manager [instance: 2f315777-b4bd-4a81-b7cb-3ccbb28ddfc6] destroy_disks_on_failure)
2020-01-14 23:24:01.641 14281 ERROR nova.compute.manager [instance: 2f315777-b4bd-4a81-b7cb-3ccbb28ddfc6] File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__
2020-01-14 23:24:01.641 14281 ERROR nova.compute.manager [instance: 2f315777-b4bd-4a81-b7cb-3ccbb28ddfc6] return self._domain.createWithFlags(flags)
2020-01-14 23:24:01.641 14281 ERROR nova.compute.manager [instance: 2f315777-b4bd-4a81-b7cb-3ccbb28ddfc6] File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 186, in doit
2020-01-14 23:24:01.641 14281 ERROR nova.compute.manager [instance: 2f315777-b4bd-4a81-b7cb-3ccbb28ddfc6] result = proxy_call(self._autowrap, f, *args, **kwargs)
2020-01-14 23:24:01.641 14281 ERROR nova.compute.manager [instance: 2f315777-b4bd-4a81-b7cb-3ccbb28ddfc6] File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 144, in proxy_call
2020-01-14 23:24:01.641 14281 ERROR nova.compute.manager [instance: 2f315777-b4bd-4a81-b7cb-3ccbb28ddfc6] rv = execute(f, *args, **kwargs)
2020-01-14 23:24:01.641 14281 ERROR nova.compute.manager [instance: 2f315777-b4bd-4a81-b7cb-3ccbb28ddfc6] File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 125, in execute
2020-01-14 23:24:01.641 14281 ERROR nova.compute.manager [instance: 2f315777-b4bd-4a81-b7cb-3ccbb28ddfc6] six.reraise(c, e, tb)
2020-01-14 23:24:01.641 14281 ERROR nova.compute.manager [instance: 2f315777-b4bd-4a81-b7cb-3ccbb28ddfc6] File "/usr/lib/python2.7/site-packages/eventlet/tpool.py", line 83, in tworker
2020-01-14 23:24:01.641 14281 ERROR nova.compute.manager [instance: 2f315777-b4bd-4a81-b7cb-3ccbb28ddfc6] rv = meth(*args, **kwargs)
2020-01-14 23:24:01.641 14281 ERROR nova.compute.manager [instance: 2f315777-b4bd-4a81-b7cb-3ccbb28ddfc6] File "/usr/lib64/python2.7/site-packages/libvirt.py", line 1110, in createWithFlags
2020-01-14 23:24:01.641 14281 ERROR nova.compute.manager [instance: 2f315777-b4bd-4a81-b7cb-3ccbb28ddfc6] if ret == -1: raise libvirtError ('virDomainCreateWithFlags() failed', dom=self)
2020-01-14 23:24:01.641 14281 ERROR nova.compute.manager [instance: 2f315777-b4bd-4a81-b7cb-3ccbb28ddfc6] libvirtError: internal error: qemu unexpectedly closed the monitor: 2020-01-14T15:24:01.257459Z qemu-kvm: -device vfio-pci,host=06:00.0,id=hostdev0,bus=pci.0,addr=0x5: vfio error: 0000:06:00.0: group 46 is not viable
2020-01-14 23:24:01.641 14281 ERROR nova.compute.manager [instance: 2f315777-b4bd-4a81-b7cb-3ccbb28ddfc6] Please ensure all devices within the iommu_group are bound to their vfio bus driver.
2020-01-14 23:24:01.641 14281 ERROR nova.compute.manager [instance: 2f315777-b4bd-4a81-b7cb-3ccbb28ddfc6]
具体系统iommu配置参考以上配置 ####### 检查当前显卡设备信息
[root@ostack-228-26 ~]# lspci -nn | grep NVID
06:00.0 VGA compatible controller [0300]: NVIDIA Corporation Device [10de:1e04] (rev a1)
06:00.1 Audio device [0403]: NVIDIA Corporation Device [10de:10f7] (rev a1)
06:00.2 USB controller [0c03]: NVIDIA Corporation Device [10de:1ad6] (rev a1)
06:00.3 Serial bus controller [0c80]: NVIDIA Corporation Device [10de:1ad7] (rev a1)
#################################
可以看到,其实我的这台设备上有1个vga设备,这个pci设备一共有4个硬件:
VGA、Audio、USB、Serial bus
#################################
确认驱动 由于我们的物理服务器操作系统,并没有安装NVIDIA显卡驱动,所以我们会发现如下 信息。其中USB设备使用了xhci_hcd驱动,这个驱动是服务器自带的。
lspci -vv -s 06:00.0 | grep driver
lspci -vv -s 06:00.1 | grep driver
lspci -vv -s 06:00.2 | grep driver
Kernel driver in use: xhci_hcd
lspci -vv -s 06:00.3 | grep driver
如果我们安装了NVIDIA驱动的话, 可能会获得如下输出:
lspci -vv -s 06:00.0 | grep driver
Kernel driver in use: nvidia
lspci -vv -s 06:00.1 | grep driver
Kernel driver in use: snd_hda_intel
lspci -vv -s 06:00.2 | grep driver
Kernel driver in use: xhci_hcd
lspci -vv -s 06:00.3 | grep driver
#####################################
配置vfio驱动,如下: 配置系统加载模块
配置加载vfio-pci模块,编辑/etc/modules-load.d/openstack-gpu.conf,添加如下内容:
vfio_pci
pci_stub
vfio
vfio_iommu_type1
kvm
kvm_intel
###############################
配置vfio加载的设备 配置使用vfio驱动的设备(这里的设备就是上面我们查到的设备的)
编辑/etc/modprobe.d/vfio.conf,添加如下配置:
options vfio-pci ids=10de:1e04,10de:10f7,10de:1ad6,10de:1ad7
##########################################
重启系统
reboot
#######################################
查看启动信息,确认vfio模块是否加载
dmesg | grep -i vfio
[ 6.755346] VFIO - User Level meta-driver version: 0.3
[ 6.803197] vfio_pci: add [10de:1b06[ffff:ffff]] class 0x000000/00000000
[ 6.803306] vfio_pci: add [10de:10ef[ffff:ffff]] class 0x000000/00000000
重启以后,我们查看设备使用的驱动,都显示vfio说明正确
lspci -vv -s 06:00.0 | grep driver
Kernel driver in use: vfio-pci
lspci -vv -s 06:00.1 | grep driver
Kernel driver in use: vfio-pci
lspci -vv -s 06:00.2 | grep driver
Kernel driver in use: xhci_hcd
lspci -vv -s 06:00.3 | grep driver
Kernel driver in use: vfio-pci
################################
#############################################
隐藏虚拟机的hypervisor ID
因为NIVIDIA显卡的驱动会检测是否跑在虚拟机里,如果在虚拟机里驱动就会出错,所以我们
需要对显卡驱动隐藏hypervisor id。在OpenStack的Pike版本中的Glance 镜像引入了
img_hide_hypervisor_id=true的property,所以可以对镜像执行如下的命令隐藏hupervisor id:
openstack image set [IMG-UUID] --property img_hide_hypervisor_id=true
#############################################
启动实例。
#############################################
通过此镜像安装的instance就会隐藏hypervisor id。
可以通过下边的命令查看hypervisor id是否隐藏:
cpuid | grep hypervisor_id
hypervisor_id = "KVMKVMKVM "
hypervisor_id = "KVMKVMKVM "
上边的显示结果说明没有隐藏,下边的显示结果说明已经隐藏:
cpuid | grep hypervisor_id
hypervisor_id = " @ @ "
hypervisor_id = " @ @ "
#############################################
编辑/etc/modprobe.d/vfio.conf,添加如下配置:
options vfio-pci ids=10de:1e04,10de:10f7,10de:1ad6,10de:1ad7
##########################################
重启系统
reboot
#######################################
查看启动信息,确认vfio模块是否加载
dmesg | grep -i vfio
[ 6.755346] VFIO - User Level meta-driver version: 0.3
[ 6.803197] vfio_pci: add [10de:1b06[ffff:ffff]] class 0x000000/00000000
[ 6.803306] vfio_pci: add [10de:10ef[ffff:ffff]] class 0x000000/00000000
重启以后,我们查看设备使用的驱动,都显示vfio说明正确
lspci -vv -s 06:00.0 | grep driver
Kernel driver in use: vfio-pci
lspci -vv -s 06:00.1 | grep driver
Kernel driver in use: vfio-pci
lspci -vv -s 06:00.2 | grep driver
Kernel driver in use: xhci_hcd
lspci -vv -s 06:00.3 | grep driver
Kernel driver in use: vfio-pci
################################
#############################################
隐藏虚拟机的hypervisor ID
因为NIVIDIA显卡的驱动会检测是否跑在虚拟机里,如果在虚拟机里驱动就会出错,所以我们
需要对显卡驱动隐藏hypervisor id。在OpenStack的Pike版本中的Glance 镜像引入了
img_hide_hypervisor_id=true的property,所以可以对镜像执行如下的命令隐藏hupervisor id:
openstack image set [IMG-UUID] --property img_hide_hypervisor_id=true
#############################################
启动实例。
#############################################
通过此镜像安装的instance就会隐藏hypervisor id。
可以通过下边的命令查看hypervisor id是否隐藏:
cpuid | grep hypervisor_id
hyp
修改控制节点nova_api配置文件,nova_api配置,将其他三个pci设备添加进来,如下:
[pci]
alias = {"name":"a1","product_id":"1e04","vendor_id":"10de","device_type":"type-PCI"}
alias = {"name":"a2","product_id":"10f7","vendor_id":"10de","device_type":"type-PCI"}
alias = {"name":"a3","product_id":"1ad6","vendor_id":"10de","device_type":"type-PCI"}
alias = {"name":"a4","product_id":"1ad7","vendor_id":"10de","device_type":"type-PCI"}
修改nova-scheduler文件,添加PciPassthroughFilter,同时添加 available_filters = nova.scheduler.filters.all_filters,如下:
[filter_scheduler]
host_subset_size = 10
max_io_ops_per_host = 10
enabled_filters = RetryFilter,AvailabilityZoneFilter,ComputeFilter,ComputeCapabilitiesFilter,ImagePropertiesFilter,ServerGroupAntiAffinityFilter,ServerGroupAffinityFilter,AggregateCoreFilter,AggregateDiskFilter,DifferentHostFilter,SameHostFilter,PciPassthroughFilter
available_filters = nova.scheduler.filters.all_filters
重启服务
systemctl restart openstack-nova-api openstack-nova-scheduler
配置计算节点,编辑nova.conf文件,如下:
[pci]
alias = {"name":"a1","product_id":"1e04","vendor_id":"10de","device_type":"type-PCI"}
alias = {"name":"a2","product_id":"10f7","vendor_id":"10de","device_type":"type-PCI"}
alias = {"name":"a3","product_id":"1ad6","vendor_id":"10de","device_type":"type-PCI"}
alias = {"name":"a4","product_id":"1ad7","vendor_id":"10de","device_type":"type-PCI"}
passthrough_whitelist = [{ "vendor_id": "10de", "product_id": "1e04" },
{ "vendor_id": "10de", "product_id": "10f7" },
{ "vendor_id": "10de", "product_id": "1ad6" },
{ "vendor_id": "10de", "product_id": "1ad7" }]
重启计算节点服务
docker restart nova_compute
创建带有显卡直通信息的flavor
openstack flavor create --ram 2048 --disk 20 --vcpus 2 m1.large
openstack flavor set m1.large --property pci_passthrough:alias='a1:1,a2:1,a3:1,a4:1'
#############################################
三、 T4显卡做透传
**具体系统内核配置请参照第一节,如上:**
1、T4显卡默认支持vgpu,所以默认是走的PF。在配置上有所不同,修改device_type为PF,如下: 修改控制节点nova-api文件如下:
[pci]
alias = {"vendor_id":"8086","product_id":"10fb","device_type":"type-PF","name":"a1"}
修改nova-scheduler文件,添加PciPassthroughFilter,同时添加 available_filters = nova.scheduler.filters.all_filters,PCI的配置项,如下:
[filter_scheduler]
host_subset_size = 10
max_io_ops_per_host = 10
enabled_filters = RetryFilter,AvailabilityZoneFilter,ComputeFilter,ComputeCapabilitiesFilter,ImagePropertiesFilter,ServerGroupAntiAffinityFilter,ServerGroupAffinityFilter,AggregateCoreFilter,AggregateDiskFilter,DifferentHostFilter,SameHostFilter,PciPassthroughFilter
available_filters = nova.scheduler.filters.all_filters
[pci]
alias = {"vendor_id":"8086","product_id":"10fb","device_type":"type-PF","name":"a1"}
重启服务,如下:
docker restart nova_api nova_scheduler
修改计算节点nova.conf配置如下:
[pci]
passthrough_whitelist = { "vendor_id":"8086","product_id":"10fb" }
alias = { "vendor_id":"8086", "product_id":"10fb", "device_type":"type-PF", "name":"a1" }
重启nova-compute服务
docker restart nova_compute
##########数据库查看pic_devices如下:##################
MariaDB [nova]> select * from pci_devices\G
*************************** 1. row ***************************
created_at: 2020-12-11 08:05:35
updated_at: 2020-12-11 08:07:10
deleted_at: NULL
deleted: 0
id: 18
compute_node_id: 3
address: 0000:d8:00.0
product_id: 1eb8
vendor_id: 10de
dev_type: type-PF
dev_id: pci_0000_d8_00_0
label: label_10de_1eb8
status: available
extra_info: {}
instance_uuid: NULL
request_id: NULL
numa_node: 1
parent_addr: NULL
uuid: 68043d3f-153b-4be6-b341-9f02f8fe7ffd
1 row in set (0.000 sec)
#########################################
确认gpu驱动为vfio-pci
######################################
禁用系统默认安装的 nouveau 驱动,修改/etc/modprobe.d/blacklist.conf 文件:
echo -e "blacklist nouveau\noptions nouveau modeset=0" > /etc/modprobe.d/blacklist.conf
######################################
[root@sxjn-icontron01 ~]#
[root@sxjn-icontron01 ~]# lspci -vv -s d8:00.0 | grep driver
Kernel driver in use: pci-stub
[root@sxjn-icontron01 ~]# ####本环境为非vfio-pci,需要修改为vfio-pci,
##########修改方法如下###########
配置加载vfio-pci模块,编辑/etc/modules-load.d/openstack-gpu.conf,添加如下内容:
vfio_pci
pci_stub
vfio
vfio_iommu_type1
kvm
kvm_intel
###############################
编辑/etc/modprobe.d/vfio.conf,添加如下配置:
options vfio-pci ids=10de:1e04
##########################################
重启系统
reboot
#######################################
查看启动信息,确认vfio模块是否加载
[root@sxjn-icontron01 ~]#
[root@sxjn-icontron01 ~]# lspci -vv -s d8:00.0 | grep driver
Kernel driver in use: vfio-pci
[root@sxjn-icontron01 ~]#
为创建的flavor设置metadata,如下:
openstack flavor set ml.large --property "pci_passthrough:alias"="a1:1"
使用flavor创建虚拟机
四、docker中使用GPU
首先要透传gpu到虚拟机中,具体透传步骤参考上述步骤: 虚拟机中具体步骤如下: 1、系统配置如下:
安装基础安装包
yum install dkms gcc kernel-devel kernel-headers ###与内核版本保持一致,不然之后安装驱动加载模块失败
######################################
禁用系统默认安装的 nouveau 驱动,修改/etc/modprobe.d/blacklist.conf 文件:
echo -e "blacklist nouveau\noptions nouveau modeset=0" > /etc/modprobe.d/blacklist.conf
######################################
######################################
备份原来的镜像文件
mv /boot/initramfs-$(uname -r).img /boot/initramfs-$(uname -r).img.bak
重建新的镜像文件
dracut /boot/initramfs-$(uname -r).img $(uname -r)
重启系统
reboot
# 查看nouveau是否启动,如果结果为空即为禁用成功
lsmod | grep nouveau
#####################################
2、安装gpu驱动,如下:
sh NVIDIA-Linux-x86_64-450.80.02.run --kernel-source-path=/usr/src/kernels/3.10.0-514.el7.x86_64 -k $(uname -r) --dkms -s -no-x-check -no-nouveau-check -no-opengl-files
下载与内核版本对应的驱动,通过此链接选择驱动
https://www.nvidia.cn/Download/index.aspx?lang=cn
安装完驱动之后执行如下命令,确认正确:
[root@gpu-3 nvdia-docker]# nvidia-smi
Tue Dec 15 21:51:35 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.80.02 Driver Version: 450.80.02 CUDA Version: 11.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla T4 Off | 00000000:00:06.0 Off | 0 |
| N/A 73C P0 29W / 70W | 0MiB / 15109MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
3、安装cuda-deriver以及cuda,本环境安装的是cuda-11.1,nvidia是455.45.01
yum install cuda cuda-drivers nvidia-driver-latest-dkms
###########################################################
在本地先下载好离线安转包,
参考链接:
https://developer.nvidia.com/cuda-downloads?target_os=Linux&target_arch=x86_64&target_distro=CentOS&target_version=7&target_type=rpmlocal
缺少的安装包可在此网站下载:
https://developer.download.nvidia.com/compute/cuda/repos/rhel7/x86_64/
###########################################################
4、安装docker
yum install docker-ce nvidia-docker2 ###nvidia-docker2版本不能太低
编辑daemon.json文件,确认配置如下:
[root@gpu-2 nvidia]# vim /etc/docker/daemon.json
{
"runtimes": {
"nvidia": {
"path": "nvidia-container-runtime",
"runtimeArgs": []
}
}
}
[root@gpu-1 cuda]# systemctl daemon-reload
[root@gpu-1 cuda]# systemctl restart docker
5、下载带有cuda驱动的image,进行测试;
docker pull nvidia/cuda:11.0-base ##下载docker镜像
[root@gpu-1 cuda]# nvidia-docker run -it image_id /bin/bash
root@f244e0a31a90:/# nvidia-smi
Wed Dec 16 03:11:54 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 455.45.01 Driver Version: 455.45.01 CUDA Version: 11.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla T4 Off | 00000000:00:06.0 Off | 0 |
| N/A 56C P0 19W / 70W | 0MiB / 15109MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
root@f244e0a31a90:/#
root@f244e0a31a90:/# exit
五、虚拟机透传usb设备
**1:lsusb查看usb设备信息,如下:**
##################################
Bus 001 Device 002: ID 8087:0020 Intel Corp. Integrated Rate Matching Hub
Bus 002 Device 002: ID 8087:0020 Intel Corp. Integrated Rate Matching Hub
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 002 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 002 Device 004: ID 0930:6544 Toshiba Corp. Kingston DataTraveler 2.0 Stick (2GB)
2:编辑usb.xml文件,方法一:
<hostdev mode='subsystem' type='usb'>
<source>
<vendor id='0930'/> ####verdon ID
<product id='6544'/> ####product ID
</source>
</hostdev>
编辑usb.xml文件,方法二:
<hostdev mode='subsystem' type='usb'>
<source>
<address bus='002' device='004'/> ####usb的address地址
</source>
</hostdev>
3;attach设备到虚拟机
virsh attach-device instance-00000001 usb.xml
登录虚拟机里面就可以看到相应的usb设备,看到虚拟机识别到u盘为sdb。
4:卸载设备
virsh detach-device instance-00000001 usb.xml
|