[系统运维] kubeadm 部署 k8s 集群

开发: C++知识库 Java知识库 JavaScript Python PHP知识库人工智能区块链大数据移动开发嵌入式开发工具数据结构与算法开发测试游戏开发网络协议系统运维
教程: HTML教程 CSS教程 JavaScript教程 Go语言教程 JQuery教程 VUE教程 VUE3教程 Bootstrap教程 SQL数据库教程 C语言教程 C++教程 Java教程 Python教程 Python3教程 C#教程
数码: 电脑笔记本显卡显示器固态硬盘硬盘耳机手机 iphone vivo oppo 小米华为单反装机图拉丁

-> 系统运维 -> kubeadm 部署 k8s 集群 -> 正文阅读

[系统运维]kubeadm 部署 k8s 集群

规划

1、服务器配置

OS	配置	用途
CentOS 7.9（172.27.0.13）	2C/4G	k8s-master
CentOS 7.9（172.27.0.10）	2C/4G	k8s-work1
CentOS 7.9（172.27.0.11）	2C/4G	k8s-work2

注：这是演示 k8s 集群安装的实验环境，配置较低，生产环境中我们的服务器配置至少都是 8C/16G 的基础配置。

2、版本选择

CentOS：7.9+
k8s组件版本：1.23.6（当前最新）

一、服务器基础配置

1、配置主机名

所有节点执行

[root@server ~]# hostnamectl set-hostname k8s-master
[root@server ~]# hostnamectl set-hostname k8s-work1
[root@server ~]# hostnamectl set-hostname k8s-work2

2、关闭防火墙

所有节点执行

# 关闭firewalld
[root@k8s-master ~]# systemctl stop firewalld

# 关闭selinux
[root@k8s-master ~]# sed -i 's/enforcing/disabled/' /etc/selinux/config
[root@k8s-master ~]# setenforce 0

3、互做本地解析

所有节点执行

[root@k8s-master ~]# vim /etc/hosts

172.27.0.13 k8s-master
172.27.0.9 k8s-work1
172.27.0.10 k8s-work2

4、SSH 免密通信（可选）

所有节点执行

[root@k8s-master ~]# ssh-keygen 
Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa): 
Enter passphrase (empty for no passphrase): 
Enter same passphrase again: 
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:qmMRk/pyFrxMRCqzeko/fPbVBzPYz1Em4u5cNR7dvzs root@k8s-master
The key's randomart image is:
+---[RSA 2048]----+
|                 |
|    .            |
|   o .     . . o |
|o . =     + . + o|
| + + o  S. * . +o|
|. . =  .  o * + +|
|...+ +.  . o = ..|
|o +oO+  . o o  E.|
|.o *=...   o   oo|
+----[SHA256]-----+

所有节点执行（互发公钥）

[root@k8s-master ~]# ssh-copy-id root@172.27.0.9
/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/root/.ssh/id_rsa.pub"
The authenticity of host '172.27.0.9 (172.27.0.9)' can't be established.
ECDSA key fingerprint is SHA256:IzYTCZWXEv8rTdYYx+RdTyi+EJF2Jqggz0pT5v/oZwk.
ECDSA key fingerprint is MD5:d0:89:66:b8:73:d0:eb:3b:19:cb:b2:3c:82:d0:a5:ff.
Are you sure you want to continue connecting (yes/no)? yes
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
root@172.27.0.9's password: 

Number of key(s) added: 1

Now try logging into the machine, with:   "ssh 'root@172.27.0.9'"
and check to make sure that only the key(s) you wanted were added.

5、加载 br_netfilter 模块

确保 br_netfilter 模块被加载
所有节点执行

# 加载模块
[root@k8s-master ~]# modprobe br_netfilter

# 查看加载请看
[root@k8s-master ~]# lsmod | grep br_netfilter
br_netfilter           22256  0 
bridge                151336  1 br_netfilter

# 永久生效
cat <<EOF | tee /etc/modules-load.d/k8s.conf
br_netfilter
EOF

6、允许 iptables 检查桥接流量

所有节点执行

[root@k8s-master ~]# cat <<EOF | sudo tee /etc/modules-load.d/k8s.conf
br_netfilter
EOF

[root@k8s-master ~]# cat <<EOF | sudo tee /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
EOF

[root@k8s-master ~]# sudo sysctl --system

7、关闭 swap

所有节点执行

# 临时关闭
[root@k8s-master ~]# swapoff -a

# 永久关闭
[root@k8s-master ~]# sed -ri 's/.*swap.*/#&/' /etc/fstab

8、时间同步

所有节点执行

# 同步网络时间
[root@k8s-master ~]# ntpdate time.nist.gov
26 Apr 19:58:05 ntpdate[13947]: the NTP socket is in use, exiting

# 将网络时间写入硬件时间
[root@k8s-master ~]# hwclock --systohc

9、安装 Docker

所有节点执行

过程略，需要 Docker 快速安装脚本的可私我。

10、安装 kubeadm、kubelet

所有节点执行

添加 k8s 镜像源

地址：https://developer.aliyun.com/mirror/kubernetes?spm=a2c6h.13651102.0.0.1cd01b116JYQIn

[root@k8s-master ~]# cat <<EOF > /etc/yum.repos.d/kubernetes.repo
[kubernetes]
name=Kubernetes
baseurl=https://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64/
enabled=1
gpgcheck=0
repo_gpgcheck=0
gpgkey=https://mirrors.aliyun.com/kubernetes/yum/doc/yum-key.gpg https://mirrors.aliyun.com/kubernetes/yum/doc/rpm-package-key.gpg
EOF

建立 k8s YUM 缓存
```
[root@k8s-master ~]# yum makecache
```

安装 k8s 相关工具

# 查看可安装版本
[root@k8s-master ~]# yum list kubelet --showduplicates

...
...
kubelet.x86_64                                           1.23.0-0                                             kubernetes
kubelet.x86_64                                           1.23.1-0                                             kubernetes
kubelet.x86_64                                           1.23.2-0                                             kubernetes
kubelet.x86_64                                           1.23.3-0                                             kubernetes
kubelet.x86_64                                           1.23.4-0                                             kubernetes
kubelet.x86_64                                           1.23.5-0                                             kubernetes
kubelet.x86_64                                           1.23.6-0                                             kubernetes

# 开始安装（指定你要安装的版本）
[root@k8s-master ~]# yum install -y kubelet-1.23.6 kubeadm-1.23.6 kubectl-1.23.6

# 设置开机自启动并启动kubelet（kubelet由systemd管理）
[root@k8s-master ~]# systemctl enable kubelet && systemctl start kubelet

二、Master 节点

1、k8s 初始化

Master 节点执行

[root@k8s-master ~]# kubeadm init \
  --apiserver-advertise-address=172.27.0.13 \
  --image-repository registry.aliyuncs.com/google_containers \
  --kubernetes-version v1.23.6 \
  --service-cidr=10.96.0.0/12 \
  --pod-network-cidr=10.244.0.0/16 \
  --ignore-preflight-errors=all

参数说明：

--apiserver-advertise-address  # 集群master地址
--image-repository             # 指定k8s镜像仓库地址
--kubernetes-version           # 指定K8s版本（与kubeadm、kubelet版本保持一致）
--service-cidr                 # Pod统一访问入口
--pod-network-cidr             # Pod网络（与CNI网络保持一致）

初始化后输出内容：

...
...
[bootstrap-token] configured RBAC rules to allow certificate rotation for all node client certificates in the cluster
[bootstrap-token] Creating the "cluster-info" ConfigMap in the "kube-public" namespace
[kubelet-finalize] Updating "/etc/kubernetes/kubelet.conf" to point to a rotatable kubelet client certificate and key
[addons] Applied essential addon: CoreDNS
[addons] Applied essential addon: kube-proxy

Your Kubernetes control-plane has initialized successfully!

To start using your cluster, you need to run the following as a regular user:

  mkdir -p $HOME/.kube
  sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
  sudo chown $(id -u):$(id -g) $HOME/.kube/config

Alternatively, if you are the root user, you can run:

  export KUBECONFIG=/etc/kubernetes/admin.conf

You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
  https://kubernetes.io/docs/concepts/cluster-administration/addons/

Then you can join any number of worker nodes by running the following on each as root:

kubeadm join 172.27.0.13:6443 --token hgtxra.fccj35x2szia3r3c \
	--discovery-token-ca-cert-hash sha256:578cf5ca4cf588e3d84005d06f6503bf5d9ee25f63b0cfab4f78677a24b92bdd

2、根据输出提示创建相关文件

Master 节点执行

[root@k8s-master ~]# mkdir -p $HOME/.kube
[root@k8s-master ~]# cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
[root@k8s-master ~]# chown $(id -u):$(id -g) $HOME/.kube/config

3、查看 k8s 运行的容器

Master 节点执行

[root@k8s-master ~]# kubectl get pods -n kube-system
NAME                                 READY   STATUS    RESTARTS   AGE
coredns-6d8c4cb4d-85dx9              0/1     Pending   0          53m
coredns-6d8c4cb4d-f7wld              0/1     Pending   0          53m
etcd-k8s-master                      1/1     Running   1          53m
kube-apiserver-k8s-master            1/1     Running   1          53m
kube-controller-manager-k8s-master   1/1     Running   1          53m
kube-proxy-5mpdp                     1/1     Running   0          13m
kube-proxy-9lp29                     1/1     Running   0          12m
kube-proxy-9ttf6                     1/1     Running   0          53m
kube-scheduler-k8s-master            1/1     Running   1          53m

4、查看 k8s 节点

Master 节点执行

[root@k8s-master ~]# kubectl get nodes
NAME         STATUS     ROLES                  AGE    VERSION
k8s-master   NotReady   control-plane,master   8m9s   v1.23.6

可看到当前只有 k8s-master 节点，而且状态是 NotReady（未就绪），因为我们还没有部署网络插件（kubectl apply -f [podnetwork].yaml），于是接着部署容器网络（CNI）。

5、容器网络（CNI）部署

Master 节点执行
插件地址：https://kubernetes.io/docs/concepts/cluster-administration/addons/
该地址在 k8s-master 初始化成功时打印出来。

选择一个主流的容器网络插件部署（Calico）

下载yml文件

wget https://docs.projectcalico.org/manifests/calico.yaml

根据初始化的输出提示执行启动指令

[root@k8s-master ~]# kubectl apply -f calico.yaml

...
...
clusterrolebinding.rbac.authorization.k8s.io/calico-kube-controllers created
clusterrole.rbac.authorization.k8s.io/calico-node created
clusterrolebinding.rbac.authorization.k8s.io/calico-node created
daemonset.apps/calico-node created
serviceaccount/calico-node created
deployment.apps/calico-kube-controllers created
serviceaccount/calico-kube-controllers created
Warning: policy/v1beta1 PodDisruptionBudget is deprecated in v1.21+, unavailable in v1.25+; use policy/v1 PodDisruptionBudget
poddisruptionbudget.policy/calico-kube-controllers created

看看该yaml文件所需要启动的容器

[root@k8s-master ~]# cat calico.yaml |grep image
          image: docker.io/calico/cni:v3.22.2
          image: docker.io/calico/cni:v3.22.2
          image: docker.io/calico/pod2daemon-flexvol:v3.22.2
          image: docker.io/calico/node:v3.22.2
          image: docker.io/calico/kube-controllers:v3.22.2

查看容器是否都 Running

[root@k8s-master ~]# kubectl get pods -n kube-system
NAME                                      READY   STATUS    RESTARTS   AGE
calico-kube-controllers-7c845d499-rh7tb   1/1     Running   0          5m27s
calico-node-fpdjb                         1/1     Running   0          5m28s
calico-node-jsdf4                         1/1     Running   0          5m28s
calico-node-kmpnr                         1/1     Running   0          5m28s
coredns-6d8c4cb4d-85dx9                   1/1     Running   0          98m
coredns-6d8c4cb4d-f7wld                   1/1     Running   0          98m
etcd-k8s-master                           1/1     Running   1          99m
kube-apiserver-k8s-master                 1/1     Running   1          99m
kube-controller-manager-k8s-master        1/1     Running   1          99m
kube-proxy-5mpdp                          1/1     Running   0          58m
kube-proxy-9lp29                          1/1     Running   0          58m
kube-proxy-9ttf6                          1/1     Running   0          98m
kube-scheduler-k8s-master                 1/1     Running   1          99m

三、work 节点

1、work 节点加入 k8s 集群

所有 work 节点执行

# 复制k8s-master初始化屏幕输出的语句并在work节点执行
[root@k8s-work1 ~]# kubeadm join 172.27.0.13:6443 --token hgtxra.fccj35x2szia3r3c --discovery-token-ca-cert-hash sha256:578cf5ca4cf588e3d84005d06f6503bf5d9ee25f63b0cfab4f78677a24b92bdd
[root@k8s-work2 ~]# kubeadm join 172.27.0.13:6443 --token hgtxra.fccj35x2szia3r3c --discovery-token-ca-cert-hash sha256:578cf5ca4cf588e3d84005d06f6503bf5d9ee25f63b0cfab4f78677a24b92bdd

2、查询集群节点

Master 节点执行

[root@k8s-master ~]# kubectl get nodes
NAME         STATUS   ROLES                  AGE   VERSION
k8s-master   Ready    control-plane,master   99m   v1.23.6
k8s-work1    Ready    <none>                 59m   v1.23.6
k8s-work2    Ready    <none>                 58m   v1.23.6

都为就绪状态了

四、验证

k8s 集群部署 nginx 服务，并通过浏览器进行访问验证。

1、创建 pod

kubectl create deployment nginx --image=nginx
kubectl expose deployment nginx --port=80 --type=NodePort
kubectl get pod,svc  # 查看NodeIP

2、访问 Nginx
在这里插入图片描述

至此：kubeadm方式的k8s集群已经部署完成。

FAQ

1、k8s编译报错

...
...
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp [::1]:10248: connect: connection refused.

	Unfortunately, an error has occurred:
		timed out waiting for the condition

	This error is likely caused by:
		- The kubelet is not running
		- The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)

	If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:
		- 'systemctl status kubelet'
		- 'journalctl -xeu kubelet'

	Additionally, a control plane component may have crashed or exited when started by the container runtime.
	To troubleshoot, list all containers using your preferred container runtimes CLI.

	Here is one example how you may list all Kubernetes containers running in docker:
		- 'docker ps -a | grep kube | grep -v pause'
		Once you have found the failing container, you can inspect its logs with:
		- 'docker logs CONTAINERID'

error execution phase wait-control-plane: couldn't initialize a Kubernetes cluster
To see the stack trace of this error execute with --v=5 or higher

查看日志

Apr 26 20:33:30 test3 kubelet: I0426 20:33:30.588349   21936 docker_service.go:264] "Docker Info" dockerInfo=&{ID:2NSH:KJPQ:XOKI:5XHN:ULL3:L4LG:SXA4:PR6J:DITW:HHCF:2RKL:U2NJ Containers:0 ContainersRunning:0 ContainersPaused:0 ContainersStopped:0 Images:7 Driver:overlay2 DriverStatus:[[Backing Filesystem extfs] [Supports d_type true] [Native Overlay Diff true]] SystemStatus:[] Plugins:{Volume:[local] Network:[bridge host macvlan null overlay] Authorization:[] Log:[awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog]} MemoryLimit:true SwapLimit:true KernelMemory:true KernelMemoryTCP:false CPUCfsPeriod:true CPUCfsQuota:true CPUShares:true CPUSet:true PidsLimit:false IPv4Forwarding:true BridgeNfIptables:true BridgeNfIP6tables:true Debug:false NFd:24 OomKillDisable:true NGoroutines:45 SystemTime:2022-04-26T20:33:30.583063427+08:00 LoggingDriver:json-file CgroupDriver:cgroupfs CgroupVersion: NEventsListener:0 KernelVersion:3.10.0-1160.59.1.el7.x86_64 OperatingSystem:CentOS Linux 7 (Core) OSVersion: OSType:linux Architecture:x86_64 IndexServerAddress:https://index.docker.io/v1/ RegistryConfig:0xc000263340 NCPU:2 MemTotal:3873665024 GenericResources:[] DockerRootDir:/var/lib/docker HTTPProxy: HTTPSProxy: NoProxy: Name:k8s-master Labels:[] ExperimentalBuild:false ServerVersion:18.06.3-ce ClusterStore: ClusterAdvertise: Runtimes:map[runc:{Path:docker-runc Args:[] Shim:<nil>}] DefaultRuntime:runc Swarm:{NodeID: NodeAddr: LocalNodeState:inactive ControlAvailable:false Error: RemoteManagers:[] Nodes:0 Managers:0 Cluster:<nil> Warnings:[]} LiveRestoreEnabled:false Isolation: InitBinary:docker-init ContainerdCommit:{ID:468a545b9edcd5932818eb9de8e72413e616e86e Expected:468a545b9edcd5932818eb9de8e72413e616e86e} RuncCommit:{ID:a592beb5bc4c4092b1b1bac971afed27687340c5 Expected:a592beb5bc4c4092b1b1bac971afed27687340c5} InitCommit:{ID:fec3683 Expected:fec3683} SecurityOptions:[name=seccomp,profile=default] ProductLicense: DefaultAddressPools:[] Warnings:[]}
Apr 26 20:33:30 test3 kubelet: E0426 20:33:30.588383   21936 server.go:302] "Failed to run kubelet" err="failed to run Kubelet: misconfiguration: kubelet cgroup driver: \"systemd\" is different from docker cgroup driver: \"cgroupfs\""

看报错的最后解释kubelet cgroup driver: \"systemd\" is different from docker cgroup driver: \"cgroupfs\""很明显 kubelet 与 Docker 的 cgroup 驱动程序不同，kubelet 为 systemd，而 Docker 为 cgroupfs。

简单查看一下docker驱动：

[root@k8s-master opt]# docker info |grep Cgroup
Cgroup Driver: cgroupfs

解决方案
重置初始化操作并删除相关文件，然后再修改 Docker 的 cgroup 驱动程序为 systemd 即可

# 重置初始化
[root@k8s-master ~]# kubeadm reset

# 删除相关配置文件
[root@k8s-master ~]# rm -rf $HOME/.kube/config  && rm -rf $HOME/.kube

# 修改 Docker 驱动为 systemd（即"exec-opts": ["native.cgroupdriver=systemd"]）
[root@k8s-master opt]# cat /etc/docker/daemon.json 
{
  "registry-mirrors": ["https://q1rw9tzz.mirror.aliyuncs.com"],
  "exec-opts": ["native.cgroupdriver=systemd"]
}

# 重启 Docker
[root@k8s-master opt]# systemctl daemon-reload 
[root@k8s-master opt]# systemctl restart docker.service

# 再次初始化k8s即可
[root@k8s-master ~]# kubeadm init ...

2、work 节点加入 k8s 集群报错

报错1：

accepts at most 1 arg(s), received 3
To see the stack trace of this error execute with --v=5 or higher

原因：命令不对，我是直接复制粘贴 k8s-master 初始化的终端输出结果，导致报错，所以最好先复制到 txt 文本下修改好格式再粘贴执行。
查看日志

报错2：

[root@k8s-work1 ~]# kubeadm join 172.27.0.13:6443 --token hgtxra.fccj35x2szia3r3c --discovery-token-ca-cert-hash sha256:578cf5ca4cf588e3d84005d06f6503bf5d9ee25f63b0cfab4f78677a24b92bdd
[preflight] Running pre-flight checks
error execution phase preflight: [preflight] Some fatal errors occurred:
	[ERROR Port-10250]: Port 10250 is in use
[preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`
To see the stack trace of this error execute with --v=5 or higher

解决方案

根据提示查看被占端口

[root@k8s-work1 ~]# netstat -lntp
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name    
tcp        0      0 127.0.0.1:10248         0.0.0.0:*               LISTEN      17616/kubelet       
tcp        0      0 0.0.0.0:22              0.0.0.0:*               LISTEN      1152/sshd           
tcp        0      0 127.0.0.1:36281         0.0.0.0:*               LISTEN      17616/kubelet       
tcp6       0      0 :::10250                :::*                    LISTEN      17616/kubelet       
tcp6       0      0 :::10255                :::*                    LISTEN      17616/kubelet

原因：10250端口被占用了，kill 掉然后再次 join 即可

[root@k8s-work1 ~]#  kill -9 17616
[root@k8s-work1 ~]# netstat -lntp
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name    
tcp        0      0 0.0.0.0:22              0.0.0.0:*               LISTEN      1152/sshd