[驱动下载地址](https://www.nvidia.cn/geforce/drivyum install nvidia-container-runtimeers/) 关于驱动安装及卸载 直接 chmod +x *.run 这个驱动 sh *.run (这个驱动名) 卸载(centos系统)
/usr/bin/nvidia-uninstall
宿主机安装驱动 容器如何调用? 1.安装nvidia-container-runtime
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-container-runtime/$distribution/nvidia-container-runtime.repo | \
sudo tee /etc/yum.repos.d/nvidia-container-runtime.repo
安装
yum install nvidia-container-runtime
测试
docker run -it --rm --gpus all centos nvidia-smi
yaml调用:
apiVersion: apps/v1
kind: Deployment
metadata:
name: gpu-master
namespace: gpu
spec:
replicas: 1
selector:
matchLabels:
app: gpu-master
template:
metadata:
labels:
app: gpu-master
spec:
hostname: gpu-master
containers:
- name: gpu-master
image: 192.168.168.10:5000/library/pytorch-gpu:v3
env:
- name: NVIDIA_DRIVER_CAPABILITIES
value: compute,utility
- name: NVIDIA_VISIBLE_DEVICES
value: all
securityContext:
privileged: true
runAsUser: 0
resources:
limits:
nvidia.com/gpu: "1"
requests:
nvidia.com/gpu: "1"
volumeMounts:
- name: code-host-path
mountPath: /persistent
volumes:
- name: code-host-path
hostPath:
path: /root/gpu/gpucod
部署起来后进入pod查看 参考文档地址 参考文档地址 一些问题排查的文档 排查文档 参考链接 查的东西比较多 都列了下 官网的一些版本说明及兼容行问题
|