CentOS7.9上部署OpenShift3.11集群
OCP官网文档:https://docs.openshift.com/container-platform/3.11/welcome/index.html
版本:
- openshift 3.11.0
- centos7.9
- ansible 2.6.5
环境信息
主机名 | IP | 系统版本 | 配置 |
---|
master.ocp.cn | 192.168.108.110 | CentOS 7.9 | 4C2G | node1.ocp.cn | 192.168.108.111 | CentOS 7.9 | 4C2G | node2.ocp.cn | 192.168.108.112 | CentOS 7.9 | 4C2G | infra.ocp.cn | 192.168.108.113 | CentOS 7.9 | 4C2G |
基本配置
开启Selinux
以下操作每个节点都需要做
grep "^\s*[^#\t].*$" /etc/selinux/config
SELINUX=enforcing
SELINUXTYPE=targeted
如果没有开启,通过下面的方式开启
sed -i 's/SELINUX=.*/SELINUX=enforcing/' /etc/selinux/config
sed -i 's/SELINUXTYPE=.*/SELINUXTYPE=targeted/' /etc/selinux/config
设置hosts与主机名
以下操作每个节点都需要做
cat >> /etc/hosts <<EOF
192.168.108.110 master.ocp.cn
192.168.108.111 node1.ocp.cn
192.168.108.112 node2.ocp.cn
192.168.108.113 infra.ocp.cn
EOF
hostnamectl set-hostname master.ocp.cn
hostnamectl set-hostname node1.ocp.cn
hostnamectl set-hostname node2.ocp.cn
hostnamectl set-hostname infra.ocp.cn
安装docker(可省略)
可以先自行安装docker(在每个节点上),也可以让ansible执行检查的时候安装(prerequisites.yml )
yum install -y docker-1.13.1
注意:如果需要单独设置docker的存储有下面几种方式
- 直接添加一块空闲的硬盘作为后端存储,这种方式内部做的操作
- 将空余块设备(可以是分区)创建成physical volume(pv)
- 再由这些PV组成volume group(vg)
- 从vg中建立两个logical volume(lv),data和matedata
- 将data和matedata映射成thin-pool
cat > /etc/sysconfig/docker-storage-setup<<EOF
DEVS=/dev/vdc # 添加的硬盘
VG=docker-vg
EOF
docker-storage-setup
- 使用一个已经存在的 volume group
cat > /etc/sysconfig/docker-storage-setup<<EOF
VG=docker-vg # 已存在的并且有空闲空间的存储 vgs查看VFree字段就是空闲值
EOF
docker-storage-setup
- 直接使用系统所在的volume group的剩余空间
docker-storage-setup
- 直接使用系统空间,什么也不做即可,使用命令
df -h /var/lib/docker 可以查看使用的是
最后启动docker
systemctl enable docker
systemctl start docker
systemctl is-active docker
安装OCP
配置ssh免密登录
在Master上操作即可
ssh-keygen
for host in \
master.ocp.cn node1.ocp.cn node2.ocp.cn infra.ocp.cn;\
do \
ssh-copy-id $host; \
done
拉取3.11版本的OCP playbook
在Master上操作
yum install -y git
cd ~
git clone -b release-3.11 https://github.com/openshift/openshift-ansible.git
git配置代理(下载不动可配置代理)
git config --global http.proxy http://192.168.108.1:1080
git config --global https.proxy https://192.168.108.1:1080
git config --global --unset http.proxy
git config --global --unset https.proxy
安装ansible
在Master上安装,ansible版本为2.6.5,有两种安装方式
- 下载rpm包安装
yum install -y wget
wget http://mirror.centos.org/centos/7/extras/x86_64/Packages/centos-release-ansible26-1-3.el7.centos.noarch.rpm
yum install -y centos-release-configmanagement
rpm -ivh centos-release-ansible26-1-3.el7.centos.noarch.rpm
wget https://releases.ansible.com/ansible/rpm/release/epel-7-x86_64/ansible-2.6.5-1.el7.ans.noarch.rpm
yum install -y python-jinja2 python-paramiko python-setuptools python-six python2-cryptography sshpass PyYAML
rpm -ivh ansible-2.6.5-1.el7.ans.noarch.rpm
- 源码安装,需要python3
源码安装python3,安装rpm包请参考:https://centos.pkgs.org/7/epel-x86_64/python36-rpm-4.11.3-9.el7.x86_64.rpm.html
yum install -y wget
yum -y groupinstall "Development tools"
yum -y install zlib-devel bzip2-devel openssl-devel ncurses-devel sqlite-devel readline-devel tk-devel gdbm-devel db4-devel libpcap-devel xz-devel
yum install libffi-devel -y
wget https://www.python.org/ftp/python/3.7.0/Python-3.7.0.tar.xz
tar -xvJf Python-3.7.0.tar.xz
mkdir /usr/local/python3
cd Python-3.7.0
./configure --prefix=/usr/local/python3
make && make install
ln -s /usr/local/python3/bin/python3 /usr/local/bin/python3
ln -s /usr/local/python3/bin/pip3 /usr/local/bin/pip3
python3 -V
pip3 -V
- 安装ansible,装完ansible默认位置在
/usr/local/python3/bin 中
pip3 install --upgrade setuptools
wget --no-check-certificate https://releases.ansible.com/ansible/ansible-2.6.5.tar.gz
tar fxz ansible-2.6.5.tar.gz && cd ansible-2.6.5
python3 setup.py install
配置ansible
cp /etc/ansible/hosts /etc/ansible/hosts.bak
cat > /etc/ansible/hosts<<EOF
[OSEv3:children]
masters
nodes
etcd
[OSEv3:vars]
ansible_ssh_user=root
openshift_deployment_type=origin
#openshift_release="3.11"
openshift_master_identity_providers=[{'name': 'htpasswd_auth', 'login': 'true', 'challenge': 'true', 'kind': 'HTPasswdPasswordIdentityProvider'}]
openshift_public_hostname=openshift.ocp.cn
openshift_master_default_subdomain=ocp.cn
openshift_ca_cert_expire_days=3650
openshift_node_cert_expire_days=3650
openshift_master_cert_expire_days=3650
etcd_ca_default_days=3650
#openshift_hosted_manage_registry=false
openshift_disable_check=memory_availability,disk_availability,docker_image_availability,docker_storage
#openshift_enable_service_catalog=false
#template_service_broker_install=false
#ansible_service_broker_install=false
#osn_storage_plugin_deps=[]
#openshift_enable_service_catalog=false
#openshift_cluster_monitoring_operator_install=false
[masters]
master.ocp.cn
[etcd]
master.ocp.cn
[nodes]
master.ocp.cn openshift_node_group_name='node-config-master'
node1.ocp.cn openshift_node_group_name='node-config-compute'
node2.ocp.cn openshift_node_group_name='node-config-compute'
infra.ocp.cn openshift_node_group_name='node-config-infra'
EOF
执行laybook安装
ansible-playbook ~/openshift-ansible/playbooks/prerequisites.yml
最后若全部显示failed=0 说明成功
ansible-playbook ~/openshift-ansible/playbooks/deploy_cluster.yml
ansible-playbook ~/openshift-ansible/playbooks/adhoc/uninstall.yml
访问
htpasswd -c -b /etc/origin/master/htpasswd <user> <password>
- Windows上添加hosts:
C:\Windows\System32\drivers\etc
192.168.108.110 master.ocp.cn
192.168.108.110 openshift.ocp.cn
在浏览器访问https://openshift.ocp.cn:8443
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-p4HXbo1r-1627217803986)(files/image-20210724224322193.png)]
安装注意事项
- 操作系统语言不能是中文
- infra节点会自动部署router,lb不要放在infra节点上,所以80端口不能冲突
- 如果web console访问端口改成443,lb不能放一起,端口冲突
- 硬盘格式XFS才支持overlay2
- 开启Selinux
- 保证能联网
- 如果lb和master在一个节点上,会有8443端口已被占用的问题,建议安装时lb不要放在master节点上
- 如果etcd放在master节点上,会以静态pod形式启动。如果放在node节点上,会以系统服务的形式启动。我在安装过程中,一个etcd放在了master上,另一个放在了node上,导致etcd启动失败。建议安装时etcd要么全放在master节点上,要么全放在node节点上。
- 安装过程中,直接安装了带有nfs持久存储的监控,需要提前安装java-1.8.0-openjdk-headless python-passlib,这一点官网没有提及,不提前装安装会报错。
- docker 启用配置参数
–selinux-enabled=false ,但是操作系统Selinux 必须开启,否则安装报错
基本使用
这里以部署一个nginx为例,在default这个namespace下
- 创建yaml文件
oc create deployment web --image=nginx:1.14 --dry-run -o yaml > nginx.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
creationTimestamp: null
labels:
app: web
name: web
spec:
replicas: 1
selector:
matchLabels:
app: web
strategy: {}
template:
metadata:
creationTimestamp: null
labels:
app: web
spec:
containers:
- image: nginx:1.14
name: nginx
resources: {}
status: {}
- 部署,此时会发现容器起不来,可用
docker logs 查看对应节点上的容器日志会发现一堆Permission Denied ,这是因为OpenShift默认不给root权限,而nginx是需要root权限的
oc apply -f nginx.yaml
- 将anyuid这个scc( Security Context Constraints)赋给default,再重新部署下就可以了
oc adm policy add-scc-to-user anyuid -z default
scc "anyuid" added to: ["system:serviceaccount:default:default"]
参考:https://stackoverflow.com/questions/42363105/permission-denied-mkdir-in-container-on-openshift
OpenShift will by default run containers as a non root user. As a result, your application can fail if it requires it runs as root. Whether you can configure your container to run as root will depend on permissions you have in the cluster.
It is better to design your container and application so that it doesn’t have to run as root.
A few suggestions.
- Create a special UNIX user to run the application as and set that user (using its uid), in the USER statement of the
Dockerfile . Make the group for the user be the root group. - Fixup permissions on the
/src directory and everything under it so owned by the special user. Ensure that everything is group root. Ensure that anything that needs to be writable is writable to group root. - Ensure you set
HOME to /src in Dockerfile .
With that done, when OpenShift runs your container as an assigned uid, where group is root, then by virtue of everything being group writable, application can still update files under /src . The HOME variable being set ensures that anything written to home directory by code goes into writable /src area.
OC常用命令合集
oc get nodes
oc describe node node-name
oc get pods -n namespace-name
oc get all
oc status
oc get pods -o wide -n namespace-name
oc describe pod pod-name -n namespace-name
oc get limitrange -n namespace-name
oc describe limitrange limitrange.config -n namespace-name
oc edit limitrange limitrange.config -n namespace-name
oc project project-name
oc adm policy add-scc-to-user anyuid -z default
oc adm policy remove-scc-from-user anyuid -z default
oc get pod
oc get service
oc get endpoints
oc delete pod pod-name -n namespace-name
oc rollout history DeploymentConfig/dc-name
oc rollout history DeploymentConfig/dc-name --revision=5
oc scale dc pod-name --replicas=3 -n namespace
oc autoscale dc dc-name --min=2 --max=10 --cpu-percent=80
oc get scc
oc describe scc anyuid
oc describe clusterrole.rbac
oc describe clusterrolebinding.rbac
oc adm policy add-cluster-role-to-user cluster-admin username
oc get routes --all-namespaces
oc logs -f pod-name
docker ps -a|grep pod-name
docker exec -it containerID /bin/sh
oc new-project my-project
oc status
oc api-resources
oc adm must-gather
oc adm top pods
oc adm top node
oc adm top images
oc adm cordon node1
oc adm manage-node <node1> --schedulable = false
oc adm manage-node node1 --schedulable
oc adm drain node1
oc delete node
oc get csr
oc adm certificate approve csr-name
oc adm certificate deny csr_name
oc get csr|xargs oc adm certificate approve csr
echo 'openshift_master_bootstrap_auto_approve=true' >> /etc/ansible/hosts
oc get project projectname
oc describe project projectname
oc get pod pod-name -o yaml
oc get nodes --show-labels
oc label nodes node-name label-key=label-value
oc label nodes node-name key=new-value --overwrite
oc label nodes node-name key-
oc adm manage-node node-name --list-pods
oc adm manage-node node-name --schedulable=false
oc login --token=iz56jscHZp9mSN3kHzjayaEnNo0DMI_nRlaiJyFmN74 --server=https://console.qa.c.sm.net:8443
TOKEN=$(oc get secret $(oc get serviceaccount default -o jsonpath='{.secrets[0].name}') -o jsonpath='{.data.token}' | base64 --decode )
APISERVER=$(oc config view --minify -o jsonpath='{.clusters[0].cluster.server}')
curl $APISERVER/api --header "Authorization: Bearer $TOKEN" --insecure
oc rsh pod-name
oc whoami --show-server
oc whoami
oc get dc -n namespace
oc get deploy -n namespace
oc edit deploy/deployname -o yaml -n namespace
oc get cronjob
oc edit cronjob/cronjob-name -n namespace-name
oc describe cm/configmap-name -n namespace-name
oc get configmap -n namespace-name
oc get cm -n namespace-name
cat /etc/origin/master/master-config.yaml|grep cidr
oc edit vm appmngr54321-poc-msltoibh -n appmngr54321-poc -o yaml
oc serviceaccounts get-token sa-name
oc login url --token=token
oc scale deployment deployment-name --replicas 5
oc config view
oc api-versions
oc api-resources
oc get hpa --all-namespaces
oc describe hpa/hpaname -n namespace
oc create serviceaccount caller
oc adm policy add-cluster-role-to-user cluster-admin -z caller
oc serviceaccounts get-token caller
echo $KUBECONFIG
oc get cm --all-namespaces -l app=deviation
oc describe PodMetrics podname
oc api-resources -o wide
oc delete pod --selector logging-infra=fluentd
oc get pods -n logger -w
oc explain pv.spec
oc get MutatingWebhookConfiguration
oc get ValidatingWebhookConfiguration
oc annotate validatingwebhookconfigurations <validating_webhook_name> service.beta.openshift.io/inject-cabundle=true
oc annotate mutatingwebhookconfigurations <mutating_webhook_name> service.beta.openshift.io/inject-cabundle=true
oc get pv --selector=='path=testforocp'
oc get cm --all-namespaces -o=jsonpath='{.items[?(@..metadata.annotations.cpuUsage=="0")].metadata.name}'
ns=$(oc get cm --selector='app=dev' --all-namespaces|awk '{print $1}'|grep -v NAMESPACE)
for i in $ns;do oc get cm dev -oyaml -n $i >> /tmp/test.txt; done;
问题排查
- 所有节点重启后,只有master状态是ready,其他是NotReady,如下
NAME STATUS ROLES AGE
infra.ocp.cn NotReady infra 6d
master.ocp.cn Ready master 6d
node1.ocp.cn NotReady compute 6d
node2.ocp.cn NotReady compute 6d
此时docker ps 会发现所有容器都未启动,然后手动启动所有容器:docker ps -aq | xargs docker restart ,发现容器之间是有依赖的,所以需要重复执行一下,在观察,发现总有那么几个容器一会自动退出了,并且在master上查看节点仍然是NotReady状态
原因有可能是node节点上的origin-node服务未启动,执行下列命令启动该进程即可
systemctl start origin-node
systemctl status origin-node -l
这个服务会拉起所有的容器,那些起不来的容器会重新创建,然后稍等片刻,在master处可以看到节点是ready状态了。所以每次节点重启后除了master都要启动下该服务
那这个服务究竟是什么呢?k8s里面不是kubelet么,但是我没找到该服务,查看下这个服务的详情,它代表的应该是一个OpenShift Node
cat /etc/systemd/system/origin-node.service
[Unit]
Description=OpenShift Node
After=docker.service
After=chronyd.service
After=ntpd.service
Wants=docker.service
Documentation=https://github.com/openshift/origin
Wants=dnsmasq.service
After=dnsmasq.service
[Service]
Type=notify
EnvironmentFile=/etc/sysconfig/origin-node
ExecStart=/usr/local/bin/openshift-node
LimitNOFILE=65536
LimitCORE=infinity
WorkingDirectory=/var/lib/origin/
SyslogIdentifier=origin-node
Restart=always
RestartSec=5s
TimeoutStartSec=300
OOMScoreAdjust=-999
[Install]
WantedBy=multi-user.target
参考
- https://www.jianshu.com/p/a83cece9f305
- https://docs.openshift.com/container-platform/3.10/install/running_install.html
- https://blog.aishangwei.net/?p=1179#2-2
- https://blog.csdn.net/qq_16240085/article/details/86004707
- https://docs.openshift.com/container-platform/3.11/welcome/index.html
- https://stackoverflow.com/questions/42363105/permission-denied-mkdir-in-container-on-openshift
|