在CentOS7上使用kubeadm快速部署Kubernetes高可用集群

环境要求


Master节点

  • 物理机虚拟机均可,至少1台,高可用集群至少2台(etcd集群必须奇数台)
  • 推荐配置:实验环境2核2G、测试环境2核4G、生产环境8核16G
  • 关闭所有swap分区或不划分swap分区

Node节点

  • 物理机虚拟机均可,大于等于1台
  • 推荐配置:实验环境2核2G、测试环境4核8G、生产环境16核64G
  • 关闭所有swap分区或不划分swap分区

操作系统版本

CentOS7.5及以上

演示环境信息


主机名 配置 操作系统 IP地址 角色
k8s-master1 2核2G CentOS7.5 10.211.55.4 Master
Node
k8s-master2 2核2G CentOS7.5 10.211.55.5 Master
Node
k8s-master3 2核2G CentOS7.5 10.211.55.6 Master
Node
VIP:10.211.55.10

系统初始化


关闭防火墙

1
2
systemctl stop firewalld
systemctl disable firewalld

关闭selinux

1
2
setenforce 0
sed -i "s/^SELINUX=enforcing/SELINUX=disabled/g" /etc/selinux/config

关闭swap

1
2
swapoff -a
echo 'swapoff -a ' >> /etc/rc.d/rc.local

配置主机名

1
hostnamectl set-hostname ${HOSTNAME}

添加所有节点的本地host解析

1
2
3
4
5
cat >> /etc/hosts << EOF
10.211.55.4 k8s-master1
10.211.55.5 k8s-master2
10.211.55.6 k8s-master3
EOF

安装基础软件包

1
yum install vim net-tools lrzsz unzip dos2unix telnet sysstat iotop pciutils lsof tcpdump psmisc bc wget socat -y

内核开启网络支持

1
2
3
4
5
6
7
8
9
10
11
cat >  /etc/sysctl.d/k8s.conf << EOF
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
net.ipv4.ip_forward = 1
net.ipv4.ip_nonlocal_bind = 1
net.ipv4.neigh.default.gc_thresh1 = 80000
net.ipv4.neigh.default.gc_thresh2 = 90000
net.ipv4.neigh.default.gc_thresh3 = 100000
EOF
modprobe br_netfilter
sysctl --system

配置Master1到所有节点(包括自身)的ssh免密登录

在k8s-master1上执行以下命令生成密钥文件(一路直接回车即可)

1
ssh-keygen -t rsa

然后把公钥拷贝到所有节点(包括自身)

1
2
3
ssh-copy-id -i ~/.ssh/id_rsa.pub k8s-master1
ssh-copy-id -i ~/.ssh/id_rsa.pub k8s-master2
ssh-copy-id -i ~/.ssh/id_rsa.pub k8s-master3

在k8s-master1上通过ssh命令验证到所有节点(包括自身)均可免密登录

1
2
3
ssh k8s-master1
ssh k8s-master2
ssh k8s-master3

节点之间时间同步

Server端

说明
如果环境可以访问互联网,可以不需要自己搭建Server端,参考Client端部分设置所有节点与公共NTP服务器(例如ntp.aliyun.com)同步即可

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
# 设置时区为Asia/Shanghai
timedatectl set-timezone Asia/Shanghai

# 安装chrony并备份配置文件
yum install chrony ntpdate -y
cp -a /etc/chrony.conf /etc/chrony.conf.bak

# 修改Server端配置文件如下,标注的地方需要修改
cat > /etc/chrony.conf << EOF
stratumweight 0
driftfile /var/lib/chrony/drift
rtcsync
makestep 10 3
allow 10.211.55.0/24 # 设置为实际环境客户端所属IP网段
smoothtime 400 0.01

bindcmdaddress 127.0.0.1
bindcmdaddress ::1

local stratum 8
manual
keyfile /etc/chrony.keys
#initstepslew 10 client1 client3 client6
noclientlog
logchange 0.5
logdir /var/log/chrony
EOF

# 启动服务,设置开机自启
systemctl restart chronyd.service
systemctl enable chronyd.service
systemctl status chronyd.service

Client端

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
# 设置时区为Asia/Shanghai
timedatectl set-timezone Asia/Shanghai

# 安装chrony并备份配置文件
yum install chrony ntpdate -y
cp -a /etc/chrony.conf /etc/chrony.conf.bak

# 修改Client端配置文件
sed -i "s%^server%#server%g" /etc/chrony.conf
echo "server 10.211.55.4 iburst" >> /etc/chrony.conf # 设置NTP Server服务器,IP地址替换为实际环境Server端的IP地址

# 手动同步一次时间,IP地址替换为实际环境Server端的IP地址
ntpdate 10.211.55.4

# 启动服务,设置开机自启
systemctl restart chronyd.service
systemctl enable chronyd.service
systemctl status chronyd.service

chronyc sources # 查看NTP Server端状态
chronyc tracking # 查看NTP详细信息

所有节点(Master和Node)安装Docker


卸载旧版本的Docker

1
2
3
4
5
6
7
8
yum remove docker \
docker-client \
docker-client-latest \
docker-common \
docker-latest \
docker-latest-logrotate \
docker-logrotate \
docker-engine

配置Docker-CE Repository

1
2
3
4
5
6
7
8
9
# 安装所需要的包,yum-utils提供了yum-config-manager工具,device-mapper-persistent-data和lvm2是设备映射存储驱动所需要的
yum install -y yum-utils \
device-mapper-persistent-data \
lvm2

# 设置稳定版的repo仓库
yum-config-manager \
--add-repo \
https://download.docker.com/linux/centos/docker-ce.repo

说明
如果无法访问国外网站,可配置国内阿里云的Docker源

1
wget https://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo -O /etc/yum.repos.d/docker-ce.repo

安装Docker-CE

1
2
# 安装最新版本的Docker-CE
yum install docker-ce docker-ce-cli containerd.io -y

说明
如果要安装指定版本的Docker,参考以下步骤

1
2
3
4
5
6
# 列出repo仓库中可用的Docker版本并按降序排列
yum list docker-ce --showduplicates | sort -r

# 确认要安装的版本号(例如:18.09.9)并指定版本安装
# yum install docker-ce-<VERSION> docker-ce-cli-<VERSION> containerd.io -y
yum install docker-ce-18.09.9 docker-ce-cli-18.09.9 containerd.io -y

启动Docker并设置开机自启

1
2
3
systemctl start docker
systemctl enable docker
systemctl status docker

设置阿里云镜像加速器(可选)

1
2
3
4
5
6
7
8
9
10
mkdir -p /etc/docker
cat > /etc/docker/daemon.json << EOF
{
"registry-mirrors": ["https://lerc8rqe.mirror.aliyuncs.com"]
}
EOF

systemctl daemon-reload
systemctl restart docker
docker info

所有节点(Master和Node)安装cri-dockerd


说明
当部署Kubernetes v1.24及以上版本时才需安装cri-dockerd

Kubernetes v1.24移除docker-shim的支持,而Docker Engine默认又不支持CRI标准,因此二者默认无法再直接集成。为此,Mirantis和Docker联合创建了cri-dockerd项目,用于为Docker Engine提供一个能够支持到CRI规范的桥梁,从而能够让Docker作为Kubernetes容器引擎。

Release页下载cri-dockerd的二进制包(本文以0.3.10版本为例)到/root目录下并执行以下命令解压安装

1
2
3
cd /root/
tar zxf cri-dockerd-0.3.10.amd64.tgz
mv cri-dockerd/cri-dockerd /usr/bin/

创建服务配置文件cri-docker.servicecri-docker.socket

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
cat > /usr/lib/systemd/system/cri-docker.service << EOF
[Unit]
Description=CRI Interface for Docker Application Container Engine
Documentation=https://docs.mirantis.com
After=network-online.target firewalld.service docker.service
Wants=network-online.target
Requires=cri-docker.socket

[Service]
Type=notify
ExecStart=/usr/bin/cri-dockerd --container-runtime-endpoint fd://
ExecReload=/bin/kill -s HUP \$MAINPID
TimeoutSec=0
RestartSec=2
Restart=always

# Note that StartLimit* options were moved from "Service" to "Unit" in systemd 229.
# Both the old, and new location are accepted by systemd 229 and up, so using the old location
# to make them work for either version of systemd.
StartLimitBurst=3

# Note that StartLimitInterval was renamed to StartLimitIntervalSec in systemd 230.
# Both the old, and new name are accepted by systemd 230 and up, so using the old name to make
# this option work for either version of systemd.
StartLimitInterval=60s

# Having non-zero Limit*s causes performance problems due to accounting overhead
# in the kernel. We recommend using cgroups to do container-local accounting.
LimitNOFILE=infinity
LimitNPROC=infinity
LimitCORE=infinity

# Comment TasksMax if your systemd version does not support it.
# Only systemd 226 and above support this option.
TasksMax=infinity
Delegate=yes
KillMode=process

[Install]
WantedBy=multi-user.target
EOF

cat > /usr/lib/systemd/system/cri-docker.socket << EOF
[Unit]
Description=CRI Docker Socket for the API
PartOf=cri-docker.service

[Socket]
ListenStream=%t/cri-dockerd.sock
SocketMode=0660
SocketUser=root
SocketGroup=root

[Install]
WantedBy=sockets.target
EOF

启动cri-docker并设置开机自启

1
2
3
4
5
6
7
systemctl daemon-reload
systemctl enable --now cri-docker.socket
systemctl enable cri-docker.service
systemctl start cri-docker.socket
systemctl start cri-docker.service
systemctl status cri-docker.socket
systemctl status cri-docker.service

所有节点(Master和Node)安装kubeadm、kubelet和kubectl


组件名称 说明
Kubeadm Kubernetes的自动化部署工具,降低了部署难度,提高效率。
Kubelet 负责与其他节点集群通信,并进行本节点Pod和容器生命周期的管理。
Kubectl Kubernetes集群管理工具。

配置Kubernetes Repository

1
2
3
4
5
6
7
8
9
cat > /etc/yum.repos.d/kubernetes.repo << EOF
[kubernetes]
name=Kubernetes
baseurl=https://packages.cloud.google.com/yum/repos/kubernetes-el7-x86_64
enabled=1
gpgcheck=1
repo_gpgcheck=1
gpgkey=https://packages.cloud.google.com/yum/doc/yum-key.gpg https://packages.cloud.google.com/yum/doc/rpm-package-key.gpg
EOF

说明
如果无法访问国外网站,可配置国内的Kubernetes源

1
2
3
4
5
6
7
8
9
cat > /etc/yum.repos.d/kubernetes.repo << EOF
[kubernetes]
name=Kubernetes
baseurl=http://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64
enabled=1
gpgcheck=0
repo_gpgcheck=0
gpgkey=http://mirrors.aliyun.com/kubernetes/yum/doc/yum-key.gpg http://mirrors.aliyun.com/kubernetes/yum/doc/rpm-package-key.gpg
EOF

关闭SELinux

关闭SELinux主要是为了允许容器访问主机文件系统,在kubelet对SELinux的支持改进前需要先关闭SELinux,由于在前期准备中已做,此步可忽略

打开net.bridge.bridge-nf-call-iptables内核参数

配置net.bridge.bridge-nf-call-iptables内核参数为1,不开启的话可能出现流量绕过iptables而导致流量路由错误的问题,由于在前期准备中已做,此步可忽略

安装kubeadm、kubelet和kubectl

1
2
3
4
# 安装最新版本的kubeadm、kubelet、kubectl
yum install kubeadm kubelet kubectl --disableexcludes=kubernetes -y
systemctl enable --now kubelet
# kubelet现在每隔几秒钟就重新启动一次,因为它在一个crashloop中等待kubeadm告诉它该做什么

说明
如果要安装指定版本的kubeadm、kubelet、kubectl,参考以下步骤

1
2
3
4
5
6
7
8
# 列出repo仓库中可用的kubeadm、kubelet、kubectl版本并按降序排列
yum list kubeadm --showduplicates | sort -r

# 确认要安装的版本号(例如:1.23.17)并指定版本安装
# yum install kubeadm-<VERSION> kubelet-<VERSION> kubectl-<VERSION> --disableexcludes=kubernetes -y
yum install kubeadm-1.23.17 kubelet-1.23.17 kubectl-1.23.17 --disableexcludes=kubernetes -y
systemctl enable --now kubelet
# kubelet现在每隔几秒钟就重新启动一次,因为它在一个crashloop中等待kubeadm告诉它该做什么

配置HAProxy和Keepalived(单Master节点部署忽略此步)


HAProxy

说明
所有HAProxy节点上的配置文件都是一样的

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
# 安装HAProxy并备份配置文件
yum install -y haproxy
cp -a /etc/haproxy/haproxy.cfg /etc/haproxy/haproxy.cfg.bak

# 修改HAProxy配置文件如下,标注的地方根据实际环境情况修改
cat > /etc/haproxy/haproxy.cfg << EOF
global
chroot /var/lib/haproxy
daemon
group haproxy
user haproxy
log 127.0.0.1 local0 info
pidfile /var/lib/haproxy.pid
maxconn 20000
spread-checks 3
nbproc 8

defaults
log global
mode tcp
retries 3
option redispatch

listen stats
bind 0.0.0.0:9000
mode http
stats enable
stats uri /
stats refresh 15s
stats realm Haproxy\ Stats
stats auth k8s:k8s
timeout server 15s
timeout client 15s
timeout connect 15s
bind-process 1

listen k8s-apiserver
bind 0.0.0.0:8443 # 指定绑定的IP和端口,端口建议用非6443端口(此处用8443),因为如果HAProxy是和kube-apiserver部署在同一台服务器上,用6443会产生端口冲突,如果不是部署在同一台机器上则此处端口可以使用6443
mode tcp
balance roundrobin
timeout server 15s
timeout client 15s
timeout connect 15s
server k8s-master1-kube-apiserver 10.211.55.4:6443 check port 6443 inter 5000 fall 5 # 转发到k8s-master1的kube-apiserver上,kube-apiserver端口默认是6443
server k8s-master2-kube-apiserver 10.211.55.5:6443 check port 6443 inter 5000 fall 5 # 转发到k8s-master2的kube-apiserver上,kube-apiserver端口默认是6443
server k8s-master3-kube-apiserver 10.211.55.6:6443 check port 6443 inter 5000 fall 5 # 转发到k8s-master3的kube-apiserver上,kube-apiserver端口默认是6443
EOF

# 修改/etc/sysconfig/rsyslog文件
vim /etc/sysconfig/rsyslog
# 把SYSLOGD_OPTIONS=""改为SYSLOGD_OPTIONS="-c 2 -r -m 0"
SYSLOGD_OPTIONS="-c 2 -r -m 0"

# 修改/etc/rsyslog.conf文件,添加以下配置
cat >> /etc/rsyslog.conf << EOF
\$ModLoad imudp
\$UDPServerRun 514
local0.* /var/log/haproxy/haproxy.log
EOF

# 创建日志目录并赋予可写权限
mkdir -p /var/log/haproxy && chmod a+w /var/log/haproxy

# 重启rsyslog服务
systemctl restart rsyslog
netstat -nuple | grep 514

# 启动HAProxy并设置开机自启
haproxy -c -f /etc/haproxy/haproxy.cfg
systemctl restart haproxy
systemctl enable haproxy
systemctl status haproxy

Keepalived

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
# 安装Keeplived并备份配置文件
yum install -y keepalived
cp -a /etc/keepalived/keepalived.conf /etc/keepalived/keepalived.conf.bak

# 修改Keepalived配置文件如下,标注的地方根据实际环境情况修改
cat > /etc/keepalived/keepalived.conf << EOF
! Configuration File for keepalived

global_defs {
router_id k8s-master1 # 标识,用机器主机名作为标识
}

vrrp_script check_haproxy {
script "/etc/keepalived/check_haproxy.sh"
}

vrrp_instance VI_1 {
state MASTER # 设置角色,第一个master节点为MASTER,剩余的节点均为BACKUP
interface eth0 # 设置VIP绑定端口
virtual_router_id 51 # 让MASTER和BACKUP在同一个虚拟路由里,ID号必须相同
priority 150 # 优先级,谁的优先级高谁就是MASTER,值越大优先级越高
advert_int 1 # 心跳间隔时间
authentication {
auth_type PASS # 认证
auth_pass k8s # 密码
}
virtual_ipaddress {
10.211.55.10 # 虚拟IP
}
track_script {
check_haproxy
}
}
EOF

# 创建HAProxy检测脚本
cat > /etc/keepalived/check_haproxy.sh << EOF
#!/bin/bash
count=\$(ps -ef| grep haproxy | egrep -cv "grep|\$\$")

if [ "\$count" -eq 0 ];then
exit 1
else
exit 0
fi
EOF

# 授予HAProxy检测脚本可执行权限
chmod +x /etc/keepalived/check_haproxy.sh

# 启动Keepalived并设置开机自启
systemctl restart keepalived.service
systemctl enable keepalived.service
systemctl status keepalived.service

初始化第一台Master


v1.16.X版本之后(含v1.16.X)

初始化第一台Master

说明
如果初始化失败,可先查询 常见问题处理 是否有对应情况,并且在重新初始化前需先执行kubeadm reset重置环境

1
2
3
4
5
6
7
kubeadm init \
--apiserver-advertise-address=10.211.55.4 \
--kubernetes-version v1.16.4 \
--service-cidr=10.1.0.0/16 \
--pod-network-cidr=10.244.0.0/16 \
--control-plane-endpoint 10.211.55.10:8443 \
--upload-certs

说明
如果无法访问国外网站,可通过--image-repository参数指定使用国内阿里云的仓库地址,例如--image-repository registry.aliyuncs.com/google_containers

说明
当部署Kubernetes v1.24及以上版本时,需要通过--cri-socket参数指定使用的CRI socket路径,例如--cri-socket unix:///var/run/cri-dockerd.sock

参数 说明
--apiserver-advertise-address 指定kube-apiserver监听的ip地址,就是master本机IP地址
--image-repository 指定下载镜像的仓库地址
--kubernetes-version 指定Kubernetes版本
--service-cidr 指定SVC的网络范围,注意不要与服务器的网络范围重叠
--pod-network-cidr 指定Pod的网络范围,默认为10.244.0.0/16,安装网络时yaml文件中指定的网络范围要与此参数设置的一致
--control-plane-endpoint 单Master部署时配置为Master节点IP,端口为6443;多Master部署时指定keepalived的虚拟ip和端口
--upload-certs 上传证书
--cri-socket 指定CRI socket的路径

初始化成功后,会看到以下提示,此部分信息记得先复制记录起来,后续添加master节点和node节点时需要用到

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
Your Kubernetes control-plane has initialized successfully!

To start using your cluster, you need to run the following as a regular user:

mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
https://kubernetes.io/docs/concepts/cluster-administration/addons/

You can now join any number of control-plane nodes by copying certificate authorities
and service account keys on each node and then running the following as root:

kubeadm join 10.211.55.10:6443 --token y55nex.s699ytnwr28o1vu1 \
--discovery-token-ca-cert-hash sha256:dd0932f4d17a864cf4ea3a7eff44c77695b47f39de03eaaf1dfd27762d7dd48b \
--control-plane --certificate-key b2ef4609c5af0d51334d49b20a3713e073ef818593d830b5b17ac58b294cc5c0

Please note that the certificate-key gives access to cluster sensitive data, keep it secret!
As a safeguard, uploaded-certs will be deleted in two hours; If necessary, you can use kubeadm init phase upload-certs to reload certs afterward.

Then you can join any number of worker nodes by running the following on each as root:

kubeadm join 10.211.55.10:6443 --token y55nex.s699ytnwr28o1vu1 \
--discovery-token-ca-cert-hash sha256:dd0932f4d17a864cf4ea3a7eff44c77695b47f39de03eaaf1dfd27762d7dd48b

配置kubectl的config文件

kubectl默认会在用户家目录的.kube目录中寻找config文件。按照上面输出的提示执行以下命令将在初始化时[kubeconfig]步骤生成的admin.conf拷贝到.kube/config,因为此配置文件中记录了apiserver的访问地址,所以后面直接执行kubectl命令就可以正常连接到APIServer中

1
2
3
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

此时执行kubectl get node能看到第一个master节点还处于NotReady状态,这是因为现在还没安装网络插件

v1.16.X版本之前

导出并修改初始化配置文件

说明
由于Kubernetes v1.16.X版本之前的kubeadm不支持--control-plane-endpoint参数,不能像上面那样通过kubeadm init直接初始化第一个master,需要先导出默认配置,然后再根据自己的实际环境修改配置,最后进行初始化。

说明
此方法适用于所有版本,包括Kubernetes v1.16.X之后的版本,在一些特殊场景下也只能采用此种方法,比如使用非Docker的CRI、自定义配置Kubernetes域名后缀时。

在第一个master节点执行以下命令导出默认配置清单

1
kubeadm config print init-defaults > kubeadm-init.yml

编辑kubeadm-init.yml,标注的地方需要修改或增加

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
apiVersion: kubeadm.k8s.io/v1beta1
bootstrapTokens:
- groups:
- system:bootstrappers:kubeadm:default-node-token
token: abcdef.0123456789abcdef
ttl: 24h0m0s # token有效期,添加节点如果token过期需要重新生成
usages:
- signing
- authentication
kind: InitConfiguration
localAPIEndpoint:
advertiseAddress: 10.211.55.4 # 填写本地真实IP
bindPort: 6443
nodeRegistration:
criSocket: /var/run/dockershim.sock # 指定使用的CRI socket路径
name: k8s-master1 # 填写本地主机名
taints:
- effect: NoSchedule
key: node-role.kubernetes.io/master
---
apiServer:
timeoutForControlPlane: 4m0s
apiVersion: kubeadm.k8s.io/v1beta1
certificatesDir: /etc/kubernetes/pki
clusterName: kubernetes
controlPlaneEndpoint: "10.211.55.10:8443" # 如果默认没有此参数则手动新增,单master节点时这里写“本地真实IP:6443”;多master节点时写“VIP:端口”,端口要与HAProxy配置中的bind字段的端口一致
controllerManager: {}
dns:
type: CoreDNS
etcd:
local:
dataDir: /var/lib/etcd
imageRepository: k8s.gcr.io # 如果无法访问国外网址,可将此处的镜像仓库地址改为国内阿里云镜像仓库地址registry.aliyuncs.com/google_containers
kind: ClusterConfiguration
kubernetesVersion: v1.14.6 # 指定安装的Kubernetes版本
networking:
dnsDomain: cluster.local # 默认Kubernetes域名后缀为cluster.local,若要使用非默认域名如koenli.net则修改为koenli.net
podSubnet: "10.244.0.0/16" # 配置Pod的网段,如果默认没有此参数,但有自定义Pod网段的需求,则手动新增
serviceSubnet: 10.1.0.0/16 # 配置SVC的网段
scheduler: {}

预下载镜像

1
kubeadm config images pull --config kubeadm-init.yml

初始化第一台Master

说明
如果初始化失败,可先查询 常见问题处理 是否有对应情况,并且在重新初始化前需先执行kubeadm reset重置环境

执行以下命令初始化第一个master节点

1
kubeadm init --config kubeadm-init.yml

初始化成功后,会看到以下提示,此部分信息记得先复制记录起来,后续添加master节点和node节点时需要用到

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
Your Kubernetes control-plane has initialized successfully!

To start using your cluster, you need to run the following as a regular user:

mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
https://kubernetes.io/docs/concepts/cluster-administration/addons/

You can now join any number of control-plane nodes by copying certificate authorities
and service account keys on each node and then running the following as root:

kubeadm join 10.211.55.10:8443 --token abcdef.0123456789abcdef \
--discovery-token-ca-cert-hash sha256:a0816d4859bcb4e84e25c5c87c5495e9b9e38486d3bfb3d71bc28e1ff8451e13 \
--experimental-control-plane

Then you can join any number of worker nodes by running the following on each as root:

kubeadm join 10.211.55.10:8443 --token abcdef.0123456789abcdef \
--discovery-token-ca-cert-hash sha256:a0816d4859bcb4e84e25c5c87c5495e9b9e38486d3bfb3d71bc28e1ff8451e13

配置kubectl的config文件

kubectl默认会在用户家目录的.kube目录中寻找config文件。按照上面输出的提示执行以下命令将在初始化时[kubeconfig]步骤生成的admin.conf拷贝到.kube/config,因为此配置文件中记录了apiserver的访问地址,所以后面直接执行kubectl命令就可以正常连接到APIServer中

1
2
3
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

此时执行kubectl get node能看到第一个master节点还处于NotReady状态,这是因为现在还没安装网络插件

安装网络插件


Kubernetes支持多种网络类型,详细可参考以下文档:

本文将分别介绍Calico(推荐)和Flannel网络的安装方法

Calico网络

在任意一台master节点执行以下命令下载operator资源清单文件并应用,创建operator

1
2
3
cd /root/
wget https://raw.githubusercontent.com/projectcalico/calico/v3.25.2/manifests/tigera-operator.yaml
kubectl create -f tigera-operator.yaml

说明
由于CRD包较大,kubectl apply可能会超出请求限制,所以建议使用kubectl create

查看tigera-operator的pod创建情况,待所有pod都处于Running状态后继续下面的步骤

1
watch kubectl  get pod -n tigera-operator

在任意一台master节点执行以下命令下载Calico的资源清单文件

1
2
cd /root/
wget https://raw.githubusercontent.com/projectcalico/calico/v3.25.2/manifests/custom-resources.yaml

编辑custom-resources.yaml文件,根据实际环境情况修改cidr配置

说明
cidr指定的网段要与执行kubeadm init--pod-network-cidr参数指定的网段保持一致

1
2
3
4
5
6
7
...
ipPools:
- blockSize: 26
cidr: 10.244.0.0/16
encapsulation: VXLANCrossSubnet
natOutgoing: Enabled
...

修改完成后进行安装

1
kubectl apply -f /root/custom-resources.yaml

查看Calico的pod创建情况,待所有pod都处于Running状态后表示CNI部署完成

1
2
watch kubectl get pods -n calico-system
watch kubectl get pods -n calico-apiserver

Flannel网络

在任意一台master节点执行以下命令下载Flannel网络的yaml文件

1
2
cd /root/
wget https://github.com/flannel-io/flannel/releases/latest/download/kube-flannel.yml

根据实际环境情况修改kube-flannel.yml文件,比如NetworkBackendhostNetwork配置,修改完成后进行安装

说明
Network指定的网段要与执行kubeadm init--pod-network-cidr参数指定的网段保持一致

1
2
cd /root/
kubectl apply -f kube-flannel.yml

查看Flannel的pod创建情况,待所有pod都处于Running状态后表示CNI部署完成

1
watch kubectl get pod -n kube-flannel

加入剩余的Master


v1.16.X版本之后(含v1.16.X)

在剩余的master节点上执行以下命令进行初始化并配置.kube/config文件

说明
以下命令从初始化第一台master节点成功后的提示信息中获取

1
2
3
4
5
6
7
kubeadm join 10.211.55.10:6443 --token y55nex.s699ytnwr28o1vu1 \
--discovery-token-ca-cert-hash sha256:dd0932f4d17a864cf4ea3a7eff44c77695b47f39de03eaaf1dfd27762d7dd48b \
--control-plane --certificate-key b2ef4609c5af0d51334d49b20a3713e073ef818593d830b5b17ac58b294cc5c0

mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

说明
当部署Kubernetes v1.24及以上版本时,需要通过--cri-socket参数指定使用的CRI socket路径,例如--cri-socket unix:///var/run/cri-dockerd.sock

v1.16.X版本之前

在第一台master节点上创建以下内容的脚本(假设为add_master.sh

1
2
3
4
5
6
7
8
9
10
CONTROL_PLANE_IPS="k8s-master2 k8s-master3"
for host in ${CONTROL_PLANE_IPS}
do
ssh root@${host} "mkdir -p /etc/kubernetes/pki/etcd"
scp -r /etc/kubernetes/pki/ca.* root@${host}:/etc/kubernetes/pki/
scp -r /etc/kubernetes/pki/sa.* root@${host}:/etc/kubernetes/pki/
scp -r /etc/kubernetes/pki/front-proxy-ca.* root@${host}:/etc/kubernetes/pki/
scp -r /etc/kubernetes/pki/etcd/ca.* root@${host}:/etc/kubernetes/pki/etcd/
scp -r /etc/kubernetes/admin.conf root@${host}:/etc/kubernetes/
done

说明
CONTROL_PLANE_IPS变量设置其余master的主机名或者IP,之间用空格隔开(如果设置主机名需要确保配置有解析)

配置好CONTROL_PLANE_IPS后运行脚本

1
sh add_master.sh

在剩余的master节点上执行以下命令进行初始化并配置.kube/config文件

说明
以下命令从初始化第一台master节点成功后的提示信息中获取

1
2
3
4
5
6
7
8
kubeadm join 10.211.55.10:8443 --token abcdef.0123456789abcdef \
--discovery-token-ca-cert-hash sha256:a0816d4859bcb4e84e25c5c87c5495e9b9e38486d3bfb3d71bc28e1ff8451e13 \
--experimental-control-plane


mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

说明
当部署Kubernetes v1.24及以上版本时,需要通过--cri-socket参数指定使用的CRI socket路径,例如--cri-socket unix:///var/run/cri-dockerd.sock

配置Master允许调度业务pod(可选)


默认情况下出于安全的考虑业务pod不会调度到master节点上,如果想要让业务pod能够调度到master节点上的话,可以在其中一台master节点上执行以下命令移除污点设置

1
2
3
4
# 旧版本
kubectl taint nodes --all node-role.kubernetes.io/master-
# 新版本
kubectl taint nodes --all node-role.kubernetes.io/control-plane-

加入Node


依次在node节点上执行以下命令

1
2
kubeadm join 10.211.55.10:6443 --token y55nex.s699ytnwr28o1vu1 \
--discovery-token-ca-cert-hash sha256:dd0932f4d17a864cf4ea3a7eff44c77695b47f39de03eaaf1dfd27762d7dd48b

说明
当部署Kubernetes v1.24及以上版本时,需要通过--cri-socket参数指定使用的CRI socket路径,例如--cri-socket unix:///var/run/cri-dockerd.sock

环境测试验证


在任意一个master节点上执行以下命令创建一个nginx pod并暴露端口测试是否可以从外部正常访问

1
2
3
4
5
6
7
8
9
10
# 创建nginx deployment
kubectl create deployment web --image=nginx

# 暴露端口
kubectl expose deployment web --port=80 --type=NodePort

# 查看对应的访问端口
kubectl get service
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
web NodePort 10.1.153.72 <none> 80:30612/TCP 4s

浏览器访问:http://<Node_IP>:30612如果能正常返回nginx欢迎页面,则表示环境一切正常。

说明
验证正常后记得清理测试资源哦~

1
2
kubectl delete service web
kubectl delete deployment web

部署Kubernetes Dashboard


在任意一台master节点上下载Kubernetes Dashboard的yaml文件到/root目录下

说明
本文以安装v2.5.1版本为例,实际安装时请到Github地址获取对应版本的yaml文件下载地址

1
2
cd /root/
wget https://raw.githubusercontent.com/kubernetes/dashboard/v2.5.1/aio/deploy/recommended.yaml

编辑recommended.yaml文件,找到kubernetes-dashboard这个Service的部分,设置其typeNodePortnodePort30001(可自定义)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
......

kind: Service
apiVersion: v1
metadata:
labels:
k8s-app: kubernetes-dashboard
name: kubernetes-dashboard
namespace: kubernetes-dashboard
spec:
type: NodePort # 设置Service为NodePort类型
ports:
- port: 443
targetPort: 8443
nodePort: 30001 # 自定义nodePort端口
selector:
k8s-app: kubernetes-dashboard
......

修改完成后,执行以下命令部署Kubernetes Dashboard

1
kubectl apply -f /root/recommended.yaml

查看Kubernetes Dashboard的pod创建情况,待所有pod都处于Running状态后再继续下面的步骤

1
watch kubectl get pods -n kubernetes-dashboard

创建service account并绑定默认cluster-admin管理员集群角色

1
2
kubectl create serviceaccount dashboard-admin -n kubernetes-dashboard
kubectl create clusterrolebinding dashboard-admin --clusterrole=cluster-admin --serviceaccount=kubernetes-dashboard:dashboard-admin

查询登录token

1
kubectl describe secrets -n kubernetes-dashboard $(kubectl -n kubernetes-dashboard get secret | awk '/dashboard-admin/{print $1}')

使用上面输出的token登录Kubernetes Dashboard(https://<NODE_IP>:30001)。

说明
协议要用HTTPS

说明
如果在Chrome浏览器中访问Kubernetes Dashboard没有”继续前往x.x.x.x(不安全)”的选项,可参考下面的步骤进行处理
1.在刚执行命令部署Kubernetes Dashboard的master节点上执行以下命令删除默认的secret,并用kubeadm自动生成的自签证书创建新的secret(kubeadm自动生成的自签证书默认存放路径在/etc/kubernetes/pki/

1
2
kubectl delete secret kubernetes-dashboard-certs -n kubernetes-dashboard
kubectl create secret generic kubernetes-dashboard-certs --from-file=/etc/kubernetes/pki/apiserver.crt --from-file=/etc/kubernetes/pki/apiserver.key -n kubernetes-dashboard

2.修改/root/recommended.yaml文件,在args下面指定证书文件和Key文件(搜索auto-generate-certificates即可跳转到对应位置)

1
2
3
4
5
args:
# PLATFORM-SPECIFIC ARGS HERE
- --auto-generate-certificates
- --tls-key-file=apiserver.key
- --tls-cert-file=apiserver.crt

3.重新应用recommended.yaml

1
kubectl apply -f /root/recommended.yaml

4.确认Kubernetes Dashboard的pod都处于Running状态

1
watch kubectl get pods -n kubernetes-dashboard

5.重新访问https://<NODE_IP>:30001

至此已经完成一套Kubernetes集群的部署。如果还需要再配置kubectl命令自动补全、安装Helm、安装ingress-nginx、配置Kubernetes对接外部存储做持久化存储,比如Ceph,可继续参考后续章节。

配置kubectl命令自动补全


在所有master节点上执行以下操作

1
2
3
4
5
6
7
8
9
# 安装bash-completion
yum install bash-completion -y
source /usr/share/bash-completion/bash_completion

# 在/etc/profile文件末尾添加内容source <(kubectl completion bash)
sed -i '$a\source <(kubectl completion bash)' /etc/profile

# 使配置生效
source /etc/profile

安装Helm


说明
Helm目前存在Helm2Helm3两个大版本,两个版本互不兼容,根据Helm与Kubernetes版本兼容情况结合实际业务需求安装其中一个版本即可

Helm2

下载所需版本的Helm安装包(以2.17.0版本为例),上传到所有的master节点的/root/helm目录下(如果没有此目录需先创建),执行以下命令安装Helm客户端

1
2
3
4
cd /root/helm/
tar zxf helm-v2.17.0-linux-amd64.tar.gz
cd linux-amd64/
cp -a helm /usr/local/bin/

创建Tiller授权清单并应用

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
cat > /root/helm/tiller-rbac.yaml << EOF
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: tiller
namespace: kube-system

---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: tiller-cluster-rule
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: cluster-admin
subjects:
- kind: ServiceAccount
name: tiller
namespace: kube-system
EOF

kubectl apply -f /root/helm/tiller-rbac.yaml

在其中一台master执行以下命令初始化Helm服务端

1
helm init --service-account tiller --skip-refresh

使用命令查看Tiller的pod

1
kubectl get pods -n kube-system  |grep tiller

待Running后执行helm version,如果出现如下输出说明Helm的客户端和服务端安装完成。

1
2
Client: &version.Version{SemVer:"v2.17.0", GitCommit:"a690bad98af45b015bd3da1a41f6218b1a451dbe", GitTreeState:"clean"}
Server: &version.Version{SemVer:"v2.17.0", GitCommit:"a690bad98af45b015bd3da1a41f6218b1a451dbe", GitTreeState:"clean"}

说明
如果STATUSImagePullBackOff状态,说明拉取镜像失败,可尝试执行kubectl edit pods tiller-deploy-xxxxxxxx -n kube-system编辑Tiller的deployment,更换所使用的镜像为sapcc/tiller:[tag],此镜像为Mirror of https://gcr.io/kubernetes-helm/tiller/

1
2
3
4
...
# image: gcr.io/kubernetes-helm/tiller:v2.17.0
image: sapcc/tiller:v2.17.0
...

保存退出后执行以下命令查看Tiller的pod,待Running后执行helm version确认Helm是否安装完成。

1
kubectl get pods -n kube-system  |grep tiller

说明
如果执行helm version出现类似如下报错

1
2
Client: &version.Version{SemVer:"v2.17.0", GitCommit:"a690bad98af45b015bd3da1a41f6218b1a451dbe", GitTreeState:"clean"}
E1213 15:58:40.605638 10274 portforward.go:400] an error occurred forwarding 34583 -> 44134: error forwarding port 44134 to pod 1e92153b279110f9464193c4ea7d6314ac69e70ce60e7319df9443e379b52ed4, uid : unable to do port forwarding: socat not found

解决方法
在所有node节点上安装socat

1
yum install socat -y

Helm3

下载所需版本的Helm安装包(以3.14.2版本为例),上传到所有的master节点的/root/helm目录下(如果没有此目录需先创建),执行以下命令安装Helm并验证

1
2
3
4
5
6
7
8
# 安装
cd /root/helm/
tar zxf helm-v3.14.2-linux-amd64.tar.gz
cd linux-amd64/
cp -a helm /usr/local/bin/

# 验证
helm version

安装ingress-nginx


访问Github地址,切换到所需版本,找到deploy/static/provider/baremetal/deploy.yaml或者deploy/static/mandatory.yaml资源清单文件,下载并上传到任意一台master节点的/root目录,重命名为ingress.yaml(也可直接将文件内容复制然后在机器上新建文件粘贴,保存为ingress.yaml

说明
本文以1.10.0为例

编辑ingress.yaml文件,在Deployment.spec.template.spec中设置启用hostNetwork并添加Service相关配置

说明
官方提供的yaml文件中使用的3个镜像仓库(2个是相同的)在境外,国内无法访问。建议将其替换为国内的替代镜像仓库

官方镜像仓库 国内替代镜像仓库
registry.k8s.io/ingress-nginx/controller 1.registry.cn-hangzhou.aliyuncs.com/google_containers/nginx-ingress-controller
2.dyrnq/ingress-nginx-controller
registry.k8s.io/ingress-nginx/kube-webhook-certgen 1.registry.cn-hangzhou.aliyuncs.com/google_containers/kube-webhook-certgen
2.dyrnq/kube-webhook-certgen
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
...
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app.kubernetes.io/component: controller
app.kubernetes.io/instance: ingress-nginx
app.kubernetes.io/name: ingress-nginx
app.kubernetes.io/part-of: ingress-nginx
app.kubernetes.io/version: 1.10.0
name: ingress-nginx-controller
namespace: ingress-nginx
spec:
minReadySeconds: 0
revisionHistoryLimit: 10
selector:
matchLabels:
app.kubernetes.io/component: controller
app.kubernetes.io/instance: ingress-nginx
app.kubernetes.io/name: ingress-nginx
strategy:
rollingUpdate:
maxUnavailable: 1
type: RollingUpdate
template:
metadata:
labels:
app.kubernetes.io/component: controller
app.kubernetes.io/instance: ingress-nginx
app.kubernetes.io/name: ingress-nginx
app.kubernetes.io/part-of: ingress-nginx
app.kubernetes.io/version: 1.10.0
spec:
hostNetwork: true # 添加此行启用hostNetwork
containers:
- args:
- /nginx-ingress-controller
- --election-id=ingress-nginx-leader
- --controller-class=k8s.io/ingress-nginx
...

# 在文件末尾添加以下配置
---
apiVersion: v1
kind: Service
metadata:
name: ingress-nginx
namespace: ingress-nginx
spec:
type: ClusterIP
ports:
- name: http
port: 80
targetPort: 80
protocol: TCP
- name: https
port: 443
targetPort: 443
protocol: TCP
selector:
app.kubernetes.io/name: ingress-nginx

执行以下命令安装ingress-nginx

1
2
cd /root/
kubectl apply -f ingress.yaml

确认ingress-nginx的pod创建情况,待所有pod都处于CompletedRunning状态后表示ingress-nginx部署完成

1
watch kubectl get pods -n ingress-nginx

验证ingress-nginx

1
2
3
4
5
6
7
8
9
10
11
12
13
14
# 创建nginx deployment和services
kubectl create deployment web --image=nginx
kubectl expose deployment web --port=80

# 创建ingress
kubectl delete -A ValidatingWebhookConfiguration ingress-nginx-admission
kubectl create ingress web --class=nginx \
--rule="test.koenli.com/*=web:80"

# 获取ingress-nginx-controller pod所在node节点IP
kubectl get pod -o wide -n ingress-nginx | grep ingress-nginx-controller

# 测试通过自定义域名访问,能正常返回Nginx欢迎页面代表正常。其中10.211.55.8需要替换成运行ingress-nginx-controller pod的node节点IP地址。
curl --resolve test.koenli.com:80:10.211.55.8 http://test.koenli.com

说明
验证正常后记得清理测试资源哦~

1
2
3
kubectl delete ingress web
kubectl delete service web
kubectl delete deployment web

配置rbd-provisioner


说明
rbd-provisioner官方已经不再进行维护,建议不要再使用此方式进行对接,推荐使用官方的Ceph CSI

说明
Kubernetes要对接Ceph RBD存储做持久化存储,首先必须要搭建Ceph存储集群,并在Kubernetes的所有节点上安装对应版本的ceph-common客户端命令。关于Ceph集群的搭建和ceph-common此处不进行赘述,可参考Ceph官网文档进行。

Ceph集群和ceph-common都安装完成后,在其中一台master上创建/root/rbd-provisioner目录下,并执行以下命令创建rbd-provisioner所需的yaml文件,标注部分根据实际情况进行修改

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
mkdir /root/rbd-provisioner
cd /root/rbd-provisioner

cat > clusterrole.yaml << EOF
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: rbd-provisioner
rules:
- apiGroups: [""]
resources: ["persistentvolumes"]
verbs: ["get", "list", "watch", "create", "delete"]
- apiGroups: [""]
resources: ["persistentvolumeclaims"]
verbs: ["get", "list", "watch", "update"]
- apiGroups: ["storage.k8s.io"]
resources: ["storageclasses"]
verbs: ["get", "list", "watch"]
- apiGroups: [""]
resources: ["events"]
verbs: ["create", "update", "patch"]
- apiGroups: [""]
resources: ["services"]
resourceNames: ["kube-dns","coredns"]
verbs: ["list", "get"]
- apiGroups: [""]
resources: ["endpoints"]
verbs: ["get", "list", "watch", "create", "update", "patch"]
EOF


cat > clusterrolebinding.yaml << EOF
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: rbd-provisioner
subjects:
- kind: ServiceAccount
name: rbd-provisioner
namespace: ceph
roleRef:
kind: ClusterRole
name: rbd-provisioner
apiGroup: rbac.authorization.k8s.io
EOF

cat > deployment.yaml << EOF
apiVersion: apps/v1
kind: Deployment
metadata:
name: rbd-provisioner
namespace: ceph
spec:
progressDeadlineSeconds: 600
revisionHistoryLimit: 10
replicas: 1
selector:
matchLabels:
app: rbd-provisioner
strategy:
type: Recreate
template:
metadata:
labels:
app: rbd-provisioner
spec:
containers:
- name: rbd-provisioner
imagePullPolicy: IfNotPresent
image: "quay.io/external_storage/rbd-provisioner:latest"
env:
- name: PROVISIONER_NAME
value: ceph.com/rbd
serviceAccount: rbd-provisioner
restartPolicy: Always
EOF

cat > role.yaml << EOF
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: rbd-provisioner
namespace: ceph
rules:
- apiGroups: [""]
resources: ["secrets"]
verbs: ["get"]
- apiGroups: [""]
resources: ["endpoints"]
verbs: ["get", "list", "watch", "create", "update", "patch"]
EOF


cat > rolebinding.yaml << EOF
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: rbd-provisioner
namespace: ceph
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: rbd-provisioner
subjects:
- kind: ServiceAccount
name: rbd-provisioner
namespace: ceph
EOF

cat > serviceaccount.yaml << EOF
apiVersion: v1
kind: ServiceAccount
metadata:
name: rbd-provisioner
namespace: ceph
EOF


cat > storageclass.yaml << EOF
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
annotations:
storageclass.beta.kubernetes.io/is-default-class: "true"
name: rbd
provisioner: ceph.com/rbd
parameters:
monitors: 10.211.55.4:6789,10.211.55.5:6789,10.211.55.6:6789 # 配置Ceph集群的monitor节点信息
pool: k8s # 配置要连接的pool,若没有需要先在ceph集群上创建
adminId: admin
adminSecretNamespace: ceph
adminSecretName: ceph-secret
fsType: ext4
userId: admin
userSecretNamespace: ceph
userSecretName: ceph-secret
imageFormat: "2"
imageFeatures: layering
reclaimPolicy: Delete
volumeBindingMode: Immediate
EOF

cat > secrets.yaml << EOF
apiVersion: v1
kind: Secret
metadata:
name: ceph-secret
namespace: ceph
type: "kubernetes.io/rbd"
data:
# ceph auth add client.kube mon 'allow r' osd 'allow rwx pool=kube'
# ceph auth get-key client.admin | base64
key: QVFEcTN5VmRvK28xRHhBQUlKNW5zQ0xwcTd3N0Q5OTJENm9YeGc9PQ== # 配置Ceph集群的kering,此处填的是经过base64编码后的值
EOF

执行以下命令应用

1
2
3
4
cd /root/rbd-provisioner
kubectl create namespace ceph
kubectl apply -f storageclass.yaml -f clusterrolebinding.yaml -f clusterrole.yaml -f deployment.yaml -f rolebinding.yaml -f role.yaml -f secrets.yaml -f serviceaccount.yaml
kubectl get pods -n ceph | grep rbd-provisioner

配置cephfs-provisioner


说明
cephfs-provisioner官方已经不再进行维护,建议不要再使用此方式进行对接,推荐使用官方的Ceph CSI

说明
Kubernetes要对接CephFS存储做持久化存储,首先必须要搭建Ceph存储集群,并在Kubernetes的所有节点上安装对应版本的ceph-common客户端命令。关于Ceph集群的搭建和ceph-common此处不进行赘述,可参考Ceph官网文档进行。

Ceph集群和ceph-common都安装完成后,在其中一台master上创建/root/cephfs-provisioner目录下,并执行以下命令创建cephfs-provisioner所需的yaml文件,标注部分根据实际情况进行修改

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
mkdir /root/cephfs-provisioner
cd /root/cephfs-provisioner

cat > clusterrole.yaml << EOF
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: cephfs-provisioner
namespace: ceph
rules:
- apiGroups: [""]
resources: ["persistentvolumes"]
verbs: ["get", "list", "watch", "create", "delete"]
- apiGroups: [""]
resources: ["persistentvolumeclaims"]
verbs: ["get", "list", "watch", "update"]
- apiGroups: ["storage.k8s.io"]
resources: ["storageclasses"]
verbs: ["get", "list", "watch"]
- apiGroups: [""]
resources: ["events"]
verbs: ["create", "update", "patch"]
- apiGroups: [""]
resources: ["services"]
resourceNames: ["kube-dns","coredns"]
verbs: ["list", "get"]
EOF

cat > clusterrolebinding.yaml << EOF
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: cephfs-provisioner
subjects:
- kind: ServiceAccount
name: cephfs-provisioner
namespace: ceph
roleRef:
kind: ClusterRole
name: cephfs-provisioner
apiGroup: rbac.authorization.k8s.io
EOF


cat > deployment.yaml << EOF
apiVersion: apps/v1
kind: Deployment
metadata:
name: cephfs-provisioner
namespace: ceph
spec:
progressDeadlineSeconds: 600
replicas: 1
revisionHistoryLimit: 10
selector:
matchLabels:
app: cephfs-provisioner
strategy:
type: Recreate
template:
metadata:
labels:
app: cephfs-provisioner
spec:
containers:
- name: cephfs-provisioner
image: "quay.io/external_storage/cephfs-provisioner:latest"
imagePullPolicy: IfNotPresent
env:
- name: PROVISIONER_NAME
value: ceph.com/cephfs
- name: PROVISIONER_SECRET_NAMESPACE
value: ceph
command:
- "/usr/local/bin/cephfs-provisioner"
args:
- "-id=cephfs-provisioner-1"
- "-disable-ceph-namespace-isolation=true"
- "-enable-quota=true"
serviceAccount: cephfs-provisioner
restartPolicy: Always
terminationGracePeriodSeconds: 30
EOF

cat > role.yaml << EOF
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: cephfs-provisioner
namespace: ceph
rules:
- apiGroups: [""]
resources: ["secrets"]
verbs: ["create", "get", "delete"]
- apiGroups: [""]
resources: ["endpoints"]
verbs: ["get", "list", "watch", "create", "update", "patch"]
EOF


cat > rolebinding.yaml << EOF
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: cephfs-provisioner
namespace: ceph
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: cephfs-provisioner
subjects:
- kind: ServiceAccount
name: cephfs-provisioner
EOF


cat > serviceaccount.yaml << EOF
apiVersion: v1
kind: ServiceAccount
metadata:
name: cephfs-provisioner
namespace: ceph
EOF


cat > storageclass.yaml << EOF
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: cephfs
provisioner: ceph.com/cephfs
parameters:
monitors: 10.211.55.4:6789,10.211.55.5:6789,10.211.55.6:6789 # 配置Ceph集群的monitor节点信息
adminId: admin
adminSecretName: ceph-secret
adminSecretNamespace: "ceph"
reclaimPolicy: Delete
volumeBindingMode: Immediate
EOF

执行以下命令应用

1
2
3
cd /root/cephfs-provisioner
kubectl apply -f storageclass.yaml -f clusterrolebinding.yaml -f clusterrole.yaml -f deployment.yaml -f rolebinding.yaml -f role.yaml -f serviceaccount.yaml
kubectl get pods -n ceph | grep cephfs-provisioner

常见问题处理


问题一

问题现象

初始化第一台master节点失败,错误信息如下

1
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp [::1]:10248: connect: connection refused.

查看/var/log/messages中的详细日志信息存在以下错误日志

1
Feb 21 10:40:22 k8s-master1 kubelet: E0221 10:40:22.353542   11204 server.go:302] "Failed to run kubelet" err="failed to run Kubelet: misconfiguration: kubelet cgroup driver: \"systemd\" is different from docker cgroup driver: \"cgroupfs\""

原因分析

/var/log/messages中的错误日志可知,kubelet无法正常运行,原因是kubelet的cgroup驱动程序默认为systemd,而Docker的cgroup驱动程序默认为cgroupfs,两者配置不一致。

解决方法

编辑/etc/docker/daemon.json文件,新增exec-opts参数设置,修改Docker的cgroup驱动程序为systemd

1
2
3
{
"exec-opts": ["native.cgroupdriver=systemd"]
}

重启Docker

1
2
systemctl restart docker
docker info | grep "Cgroup Driver"

执行kubeadm reset重置环境再重新执行初始化

问题二

问题现象

部署Kubernetes v1.28.2,通过cri-dockerd实现使用Docker作为Kubernetes容器引擎,初始化第一台master节点时失败,错误信息如下

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
[kubelet-check] Initial timeout of 40s passed.

Unfortunately, an error has occurred:
timed out waiting for the condition

This error is likely caused by:
- The kubelet is not running
- The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)

If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:
- 'systemctl status kubelet'
- 'journalctl -xeu kubelet'

Additionally, a control plane component may have crashed or exited when started by the container runtime.
To troubleshoot, list all containers using your preferred container runtimes CLI.
Here is one example how you may list all running Kubernetes containers by using crictl:
- 'crictl --runtime-endpoint unix:///var/run/cri-dockerd.sock ps -a | grep kube | grep -v pause'
Once you have found the failing container, you can inspect its logs with:
- 'crictl --runtime-endpoint unix:///var/run/cri-dockerd.sock logs CONTAINERID'
error execution phase wait-control-plane: couldn't initialize a Kubernetes cluster
To see the stack trace of this error execute with --v=5 or higher

查看/var/log/messages中的详细日志信息存在以下错误日志

1
2
3
Feb 29 15:56:10 k8s-master1 cri-dockerd: time="2024-02-29T15:56:10+08:00" level=info msg="Pulling the image without credentials. Image: registry.k8s.io/pause:3.9"
Feb 29 15:56:15 k8s-master1 kubelet: E0229 15:56:15.605750 5516 remote_runtime.go:193] "RunPodSandbox from runtime service failed" err="rpc error: code = Unknown desc = failed pulling image \"registry.k8s.io/pause:3.9\": Error response from daemon: Head \"https://us-west2-docker.pkg.dev/v2/k8s-artifacts-prod/images/pause/manifests/3.9\": dial tcp 173.194.174.82:443: i/o timeout"
Feb 29 15:56:15 k8s-master1 kubelet: E0229 15:56:15.605807 5516 kuberuntime_sandbox.go:72] "Failed to create sandbox for pod" err="rpc error: code = Unknown desc = failed pulling image \"registry.k8s.io/pause:3.9\": Error response from daemon: Head \"https://us-west2-docker.pkg.dev/v2/k8s-artifacts-prod/images/pause/manifests/3.9\": dial tcp 173.194.174.82:443: i/o timeout" pod="kube-system/kube-scheduler-k8s-master1"

原因分析

/var/log/messages中的错误日志可知,kubelet拉取registry.k8s.io/pause:3.9镜像失败了,但是初始化命令中已经明确指定了--image-repository registry.aliyuncs.com/google_containers参数,查看/var/lib/kubelet/kubeadm-flags.env中的--pod-infra-container-image参数也使用的是指定的阿里云镜像仓库,为何还是从registry.k8s.io拉取呢?查了下相关资料,发现使用cri-dockerd时,还需要在其服务管理文件中指定--pod-infra-container-image参数,否则还是默认从k8s.gcr.ioregistry.k8s.io拉取

解决方法

编辑/usr/lib/systemd/system/cri-docker.service文件,在ExecStart=/usr/bin/cri-dockerd最后添加--pod-infra-container-image=registry.aliyuncs.com/google_containers/pause:3.9参数指定pause镜像的地址

说明
具体的pause镜像版本根据/var/log/messages中的错误日志信息决定

1
2
3
4
5
6
7
...
[Service]
Type=notify
ExecStart=/usr/bin/cri-dockerd --container-runtime-endpoint fd:// --pod-infra-container-image=registry.aliyuncs.com/google_containers/pause:3.9
ExecReload=/bin/kill -s HUP $MAINPID
TimeoutSec=0
...

重启cri-dockerd

1
2
3
4
5
systemctl daemon-reload
systemctl restart cri-docker.socket
systemctl restart cri-docker.service
systemctl status cri-docker.socket
systemctl status cri-docker.service

执行kubeadm reset --cri-socket unix:///var/run/cri-dockerd.sock重置环境再重新执行初始化

参考文档