准备 nvidia 驱动和CUDA
-
下载nvidia驱动和CUDA nvdia驱动网址https://www.nvidia.cn/Download/index.aspx?lang=cn CUDA网址https://developer.nvidia.com/cuda-toolkit-archive -
建立nvidia文件夹并拷贝 sudo mkdir /work sudo chown -R casia:casia /work/ cd /work/ sudo apt-get update sudo apt-get install -y gcc make python3-pip mkdir nvidia cd nvidia/ 将下载好的nvidia驱动和CUDA拷贝到改文件夹 -
安装nvidia驱动和CUDA sudo sh NVIDIA-Linux-x86_64-450.102.04.run 三次回车 sudo sh cuda_11.0.2_450.51.05_linux.run 键入accept回车->选择Install回车 -
检验 nvidia-smi 4 安装 nvidia-docker 在使用带有 cuda 环境的 docker 容器之前,首先需要安装 nvidia-docker 组件
安装docker
sudo apt-get update sudo apt-get install -y apt-transport-https ca-certificates curl gnupg-agent software-properties-common
curl -fsSL https://mirrors.ustc.edu.cn/docker-ce/linux/ubuntu/gpg | sudo apt-key add -
sudo add-apt-repository “deb [arch=amd64] https://mirrors.ustc.edu.cn/docker-ce/linux/ubuntu/ $(lsb_release -cs) stable”
sudo apt-get update sudo apt-get install -y docker-ce docker-ce-cli containerd.io
sudo gpasswd -a ${USER} docker sudo service docker restart
添加 nvidia-docker 源
sudo apt-get install curl
$ curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add - $ curl -s -L https://nvidia.github.io/nvidia-docker/ubuntu18.04/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list $ sudo apt-get update 4.3 安装 nvidia-docker2 安装 nvidia-docker2 后重启 docker 使得 nvidia-docker2 生效。 $ sudo apt-get install -y nvidia-docker2 vim $ sudo systemctl restart docker
配置nvidia-docker
修改/etc/docker/daemon.json文件配置如下 sudo vim /etc/docker/daemon.json { “default-runtime”: “nvidia”, “runtimes”: { “nvidia”: { “path”: “nvidia-container-runtime”, “runtimeArgs”: [] } } }
sudo systemctl daemon-reload sudo systemctl restart docker
关闭swap
sudo swapoff -a
vim /etc/fstab
注释掉/swapfile 一行
安装k8s
sudo apt-get update && apt-get install -y apt-transport-https
sudo curl https://mirrors.aliyun.com/kubernetes/apt/doc/apt-key.gpg | apt-key add -
echo "deb https://mirrors.aliyun.com/kubernetes/apt/ kubernetes-xenial main" > /etc/apt/sources.list.d/kubernetes.list
sudo apt-get update
apt-cache madison kubelet
sudo apt-get install -y kubelet=1.20.5-00 kubeadm=1.20.5-00 kubectl=1.20.5-00
sudo systemctl enable kubelet
sudo systemctl start kubelet
编辑docker daemon.json
vim /etc/docker/daemon.json
重启docker systemctl restart docker
添加节点
master上执行 获取token
kubeadm token list
kubeadm token create --print-join-command
worker上执行 sudo kubeadm join 192.168.1.2:6443 --token 6yex72.30fxcz9l7ps0zuap --discovery-token-ca-cert-hash sha256:7b97fc66dad88395c35f064e3c8dcb172476b494043381fe5b35acd697f5ad1
安装nfs
sudo apt install -y nfs-common
|