服务器硬件配置推荐
Mater节点
物理机虚拟机均可,至少1台,高可用集群至少2台(etcd集群必须奇数台)
推荐配置:实验环境2核2G、测试环境2核4G、生产环境8核16G
关闭所有swap分区或不划分swap分区
Node节点
物理机虚拟机均可,大于等于1台
推荐配置:实验环境2核2G、测试环境4核8G、生产环境16核64G
关闭所有swap分区或不划分swap分区
部署架构图
实验环境信息
主机名
配置
操作系统
IP地址
角色
组件
easyk8s1
2核2G
CentOS7.5
10.211.55.7
Master Node
kube-apiserver kube-controller-manager kube-scheduler etcd keepalived haproxy docker cri-dockerd kubelet kube-proxy
easyk8s2
2核2G
CentOS7.5
10.211.55.8
Master Node
kube-apiserver kube-controller-manager kube-scheduler etcd keepalived haproxy docker cri-dockerd kubelet kube-proxy
easyk8s3
2核2G
CentOS7.5
10.211.55.9
Master Node
kube-apiserver kube-controller-manager kube-scheduler etcd keepalived haproxy docker cri-dockerd kubelet kube-proxy
VIP:10.211.55.10
系统初始化
关闭防火墙 1 2 systemctl stop firewalld systemctl disable firewalld
关闭selinux 1 2 setenforce 0 sed -i "s/^SELINUX=enforcing/SELINUX=disabled/g" /etc/selinux/config
关闭swap 1 2 swapoff -a echo 'swapoff -a ' >> /etc/rc.d/rc.local
配置主机名 1 hostnamectl set-hostname ${HOSTNAME}
添加所有节点的本地host解析 1 2 3 4 5 cat >> /etc/hosts << EOF 10.211.55.7 easyk8s1 10.211.55.8 easyk8s2 10.211.55.9 easyk8s3 EOF
安装基础软件包 1 yum install vim net-tools lrzsz unzip dos2unix telnet sysstat iotop pciutils lsof tcpdump psmisc bc wget socat -y
内核开启网络支持 1 2 3 4 5 6 7 8 9 10 11 cat > /etc/sysctl.d/k8s.conf << EOF net.bridge.bridge-nf-call-ip6tables = 1 net.bridge.bridge-nf-call-iptables = 1 net.ipv4.ip_forward = 1 net.ipv4.ip_nonlocal_bind = 1 net.ipv4.neigh.default.gc_thresh1 = 80000 net.ipv4.neigh.default.gc_thresh2 = 90000 net.ipv4.neigh.default.gc_thresh3 = 100000 EOF modprobe br_netfilter sysctl --system
配置ulimit
说明 需重启生效
1 2 3 4 5 6 7 8 9 cat >> /etc/security/limits.conf << EOF * soft nproc 655360 * hard nproc 655360 * soft nofile 655360 * hard nofile 655360 * seft memlock unlimited * hard memlock unlimited EOF reboot
安装ipvsadm 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 yum install ipvsadm ipset sysstat conntrack libseccomp -y cat > /etc/modules-load.d/ipvs.conf <<EOF ip_vs ip_vs_rr ip_vs_wrr ip_vs_sh nf_conntrack ip_tables ip_set xt_set ipt_set ipt_rpfilter ipt_REJECT ipip EOF systemctl restart systemd-modules-load.service lsmod | grep -e ip_vs -e nf_conntrack
配置master1到所有节点(包括自身)的ssh免密登录 在master1上执行以下命令生成密钥文件(一路直接回车即可)
然后把公钥拷贝到所有节点(包括自身)
1 2 3 ssh-copy-id -i ~/.ssh/id_rsa.pub easyk8s1 ssh-copy-id -i ~/.ssh/id_rsa.pub easyk8s2 ssh-copy-id -i ~/.ssh/id_rsa.pub easyk8s3
在master1上通过ssh
命令验证到所有节点(包括自身)均可免密登录
1 2 3 ssh easyk8s1 ssh easyk8s2 ssh easyk8s3
节点之间时间同步 Server端
说明 如果环境可以访问互联网,可以不需要自己搭建Server端,参考Client端部分设置所有节点与公共NTP服务器(例如ntp.aliyun.com
)同步即可
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 timedatectl set-timezone Asia/Shanghai yum install chrony ntpdate -y cp -a /etc/chrony.conf /etc/chrony.conf.bakcat > /etc/chrony.conf << EOF stratumweight 0 driftfile /var/lib/chrony/drift rtcsync makestep 10 3 allow 10.211.55.0/24 # 设置为实际环境客户端所属IP网段 smoothtime 400 0.01 bindcmdaddress 127.0.0.1 bindcmdaddress ::1 local stratum 8 manual keyfile /etc/chrony.keys #initstepslew 10 client1 client3 client6 noclientlog logchange 0.5 logdir /var/log/chrony EOF systemctl restart chronyd.service systemctl enable chronyd.service systemctl status chronyd.service
Client端 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 timedatectl set-timezone Asia/Shanghai yum install chrony ntpdate -y cp -a /etc/chrony.conf /etc/chrony.conf.baksed -i "s%^server%#server%g" /etc/chrony.conf echo "server 10.211.55.7 iburst" >> /etc/chrony.conf ntpdate 10.211.55.7 systemctl restart chronyd.service systemctl enable chronyd.service systemctl status chronyd.service chronyc sources chronyc tracking
安装CFSSL工具
CFSSL是CloudFlare开源的一款PKI/TLS工具。 CFSSL包含一个命令行工具和一个用于签名,验证并且捆绑TLS证书的HTTP API服务。使用Go语言编写
在其中一台节点(建议用master1)上执行以下命令直接进行安装
1 2 3 4 5 curl -s -L -o /usr/local/bin/cfssl https://github.com/cloudflare/cfssl/releases/download/v1.6.5/cfssl_1.6.5_linux_amd64 curl -s -L -o /usr/local/bin/cfssljson https://github.com/cloudflare/cfssl/releases/download/v1.6.5/cfssljson_1.6.5_linux_amd64 curl -s -L -o /usr/local/bin/cfssl-certinfo https://github.com/cloudflare/cfssl/releases/download/v1.6.5/cfssl-certinfo_1.6.5_linux_amd64 chmod +x /usr/local/bin/cfssl*cfssl version
说明 如果环境无法联网,则下载最新版本的cfssl_x.x.x_linux_amd64
、cfssljson_x.x.x_linux_amd64
、cfssl-certinfo_x.x.x_linux_amd64
并上传到其中一台节点的/root
目录下(建议用master1),并执行以下命令安装cfssl
1 2 3 4 5 mv /roo/cfssl_1.6.5_linux_amd64 /usr/local/bin/cfsslmv /root/cfssljson_1.6.5_linux_amd64 /usr/local/bin/cfssljsonmv /root/cfssl-certinfo_1.6.5_linux_amd64 /usr/local/bin/cfssl-certinfochmod +x /usr/local/bin/cfssl*cfssl version
部署etcd数据库集群
etcd是基于Raft的分布式key-value存储系统,由CoreOS开发,常用于服务发现、共享配置以及并发控制(如leader选举、分布式锁等)。Kubernetes使用 etcd存储所有运行数据。
使用cfssl为etcd生成自签证书 在安装了cfssl工具的节点上执行以下命令为etcd创建对应的ca机构并生成自签证书
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 mkdir /root/etcd-cert && cd /root/etcd-certcat > ca-config.json << EOF { "signing": { "default": { "expiry": "876000h" }, "profiles": { "etcd": { "usages": [ "signing", "key encipherment", "server auth", "client auth" ], "expiry": "876000h" } } } } EOF cat > etcd-ca-csr.json << EOF { "CN": "etcd", "key": { "algo": "rsa", "size": 2048 }, "names": [ { "C": "CN", "ST": "GuangDong", "L": "ShenZhen", "O": "etcd", "OU": "Etcd Security" } ], "ca": { "expiry": "876000h" } } EOF cat > etcd-csr.json << EOF { "CN": "etcd", "hosts": [ "10.211.55.7", "10.211.55.8", "10.211.55.9" ], "key": { "algo": "rsa", "size": 2048 }, "names": [ { "C": "CN", "ST": "GuangDong", "L": "ShenZhen", "O": "etcd", "OU": "Etcd Security" } ] } EOF cfssl gencert -initca etcd-ca-csr.json | cfssljson -bare ca cfssl gencert -ca=ca.pem -ca-key=ca-key.pem -config=ca-config.json -profile=etcd etcd-csr.json | cfssljson -bare etcd
部署etcd3.5版本 访问https://github.com/etcd-io/etcd/releases 下载etcd3.5版本的二进制包(本文以3.5.13
为例),并上传到其中一台etcd节点的/root
目录下,然后执行以下命令解压并创建etcd相关目录和配置文件
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 cd /root/tar zxf etcd-v3.5.13-linux-amd64.tar.gz -C /opt/ mv /opt/etcd-v3.5.13-linux-amd64/ /opt/etcdmkdir /opt/etcd/binmkdir /opt/etcd/cfgmkdir /opt/etcd/sslcp -a /opt/etcd/etcd* /opt/etcd/bin/cp -a /root/etcd-cert/{ca,ca-key,etcd,etcd-key}.pem /opt/etcd/ssl/cat > /opt/etcd/cfg/etcd.conf << EOF #[Member] # 自定义此etcd节点的名称,集群内唯一 ETCD_NAME="etcd-1" # 定义etcd数据存放目录 ETCD_DATA_DIR="/var/lib/etcd/default.etcd" # 定义本机和成员之间通信的地址 ETCD_LISTEN_PEER_URLS="https://10.211.55.7:2380" # 定义etcd对外提供服务的地址 ETCD_LISTEN_CLIENT_URLS="https://10.211.55.7:2379,http://127.0.0.1:2379" #[Clustering] # 定义该节点成员对等URL地址,且会通告集群的其余成员节点 ETCD_INITIAL_ADVERTISE_PEER_URLS="https://10.211.55.7:2380" # 此成员的客户端URL列表,用于通告群集的其余部分 ETCD_ADVERTISE_CLIENT_URLS="https://10.211.55.7:2379" # 集群中所有节点的信息 ETCD_INITIAL_CLUSTER="etcd-1=https://10.211.55.7:2380,etcd-2=https://10.211.55.8:2380,etcd-3=https://10.211.55.9:2380" # 创建集群的token,这个值每个集群保持唯一 ETCD_INITIAL_CLUSTER_TOKEN="etcd-cluster" # 设置new为初始静态或DNS引导期间出现的所有成员。如果将此选项设置为existing,则etcd将尝试加入现有群集 ETCD_INITIAL_CLUSTER_STATE="new" # flannel操作etcd使用的是v2的API,而kubernetes操作etcd使用的v3的API,在ETCD3.5版本中默认关闭v2版本,所以为了兼容flannel,要设置开启v2的API ETCD_ENABLE_V2="true" EOF cat > /usr/lib/systemd/system/etcd.service << EOF [Unit] Description=Etcd Server After=network.target After=network-online.target Wants=network-online.target [Service] Type=notify EnvironmentFile=/opt/etcd/cfg/etcd.conf ExecStart=/opt/etcd/bin/etcd \\ --cert-file=/opt/etcd/ssl/etcd.pem \\ --key-file=/opt/etcd/ssl/etcd-key.pem \\ --peer-cert-file=/opt/etcd/ssl/etcd.pem \\ --peer-key-file=/opt/etcd/ssl/etcd-key.pem \\ --trusted-ca-file=/opt/etcd/ssl/ca.pem \\ --peer-trusted-ca-file=/opt/etcd/ssl/ca.pem Restart=on-failure LimitNOFILE=65536 [Install] WantedBy=multi-user.target EOF
将etcd目录和Service文件拷贝到其余etcd集群节点上
1 2 scp -r /opt/etcd/ easyk8s2:/opt/ scp -r /usr/lib/systemd/system/etcd.service easyk8s2:/usr/lib/systemd/system/
修改其余etcd集群节点配置文件/opt/etcd/cfg/etcd.conf
中的ETCD_NAME
、ETCD_LISTEN_PEER_URLS
、ETCD_LISTEN_CLIENT_URLS
、ETCD_INITIAL_ADVERTISE_PEER_URLS
和ETCD_ADVERTISE_CLIENT_URLS
参数值为本机IP
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 ... ETCD_NAME="etcd-2" ETCD_LISTEN_PEER_URLS="https://10.211.55.8:2380" ETCD_LISTEN_CLIENT_URLS="https://10.211.55.8:2379,http://127.0.0.1:2379" ETCD_INITIAL_ADVERTISE_PEER_URLS="https://10.211.55.8:2380" ETCD_ADVERTISE_CLIENT_URLS="https://10.211.55.8:2379" ...
在所有etcd集群节点上设置etcd开机自启并启动etcd
说明 在第一台节点上执行start后会一直卡着无法返回命令提示符,这是因为在等待其他节点准备就绪,继续启动其余节点即可
1 2 3 systemctl daemon-reload systemctl enable etcd systemctl start etcd
在任意etcd节点上执行以下命令查看集群的健康状态,如果所有节点的HEALTH
字段均为true则表示etcd集群部署成功
1 2 3 4 5 6 7 /opt/etcd/bin/etcdctl --cacert=/opt/etcd/ssl/ca.pem --cert=/opt/etcd/ssl/etcd.pem --key=/opt/etcd/ssl/etcd-key.pem --endpoints="https://10.211.55.7:2379,https://10.211.55.8:2379,https://10.211.55.9:2379" endpoint health --write-out=table
如果要查看etcd集群的节点状态信息,可通过以下命令
1 2 3 4 5 6 7 /opt/etcd/bin/etcdctl --cacert=/opt/etcd/ssl/ca.pem --cert=/opt/etcd/ssl/etcd.pem --key=/opt/etcd/ssl/etcd-key.pem --endpoints="https://10.211.55.7:2379,https://10.211.55.8:2379,https://10.211.55.9:2379" endpoint status --write-out=table
卸载etcd 如果安装失败需要卸载重新安装,在所有etcd节点上执行以下命令即可
1 2 3 4 5 systemctl stop etcd systemctl disable etcd rm -rf /opt/etcd/rm -rf /usr/lib/systemd/system/etcd.servicerm -rf /var/lib/etcd/
配置HAProxy和Keepalived(单Master节点部署忽略此步)
HAProxy
说明 所有HAProxy节点上的配置文件都是一样的
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 yum install -y haproxy cp -a /etc/haproxy/haproxy.cfg /etc/haproxy/haproxy.cfg.bakcat > /etc/haproxy/haproxy.cfg << EOF global chroot /var/lib/haproxy daemon group haproxy user haproxy log 127.0.0.1 local0 info pidfile /var/lib/haproxy.pid maxconn 20000 spread-checks 3 nbproc 8 defaults log global mode tcp retries 3 option redispatch listen stats bind 0.0.0.0:9000 mode http stats enable stats uri / stats refresh 15s stats realm Haproxy\ Stats stats auth k8s:k8s timeout server 15s timeout client 15s timeout connect 15s bind-process 1 listen k8s-apiserver bind 0.0.0.0:8443 # 指定绑定的IP和端口,端口建议用非6443端口(此处用8443),因为如果HAProxy是和kube-apiserver部署在同一台服务器上,用6443会产生端口冲突,如果不是部署在同一台机器上则此处端口可以使用6443 mode tcp balance roundrobin timeout server 15s timeout client 15s timeout connect 15s server easyk8s1-kube-apiserver 10.211.55.7:6443 check port 6443 inter 5000 fall 5 #转发到easyk8s1的kube-apiserver上,kube-apiserver端口默认是6443 server easyk8s2-kube-apiserver 10.211.55.8:6443 check port 6443 inter 5000 fall 5 #转发到easyk8s2的kube-apiserver上,kube-apiserver端口默认是6443 server easyk8s3-kube-apiserver 10.211.55.9:6443 check port 6443 inter 5000 fall 5 #转发到easyk8s3的kube-apiserver上,kube-apiserver端口默认是6443 EOF vim /etc/sysconfig/rsyslog SYSLOGD_OPTIONS="-c 2 -r -m 0" cat >> /etc/rsyslog.conf << EOF \$ModLoad imudp \$UDPServerRun 514 local0.* /var/log/haproxy/haproxy.log EOF mkdir -p /var/log/haproxy && chmod a+w /var/log/haproxysystemctl restart rsyslog netstat -nuple | grep 514 haproxy -c -f /etc/haproxy/haproxy.cfg systemctl restart haproxy systemctl enable haproxy systemctl status haproxy
Keepalived 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 yum install -y keepalived cp -a /etc/keepalived/keepalived.conf /etc/keepalived/keepalived.conf.bakcat > /etc/keepalived/keepalived.conf << EOF ! Configuration File for keepalived global_defs { router_id easyk8s1 # 标识,用机器主机名作为标识 } vrrp_script check_haproxy { script "/etc/keepalived/check_haproxy.sh" } vrrp_instance VI_1 { state MASTER # 设置角色,第一个master节点为MASTER,剩余的节点均为BACKUP interface eth0 # 设置VIP绑定端口 virtual_router_id 51 # 让MASTER和BACKUP在同一个虚拟路由里,ID号必须相同 priority 150 # 优先级,谁的优先级高谁就是MASTER,值越大优先级越高 advert_int 1 # 心跳间隔时间 authentication { auth_type PASS # 认证 auth_pass k8s # 密码 } virtual_ipaddress { 10.211.55.10 # 虚拟IP } track_script { check_haproxy } } EOF cat > /etc/keepalived/check_haproxy.sh << EOF #!/bin/bash count=\$(ps -ef| grep haproxy | egrep -cv "grep|\$\$") if [ "\$count" -eq 0 ];then exit 1 else exit 0 fi EOF chmod +x /etc/keepalived/check_haproxy.shsystemctl restart keepalived.service systemctl enable keepalived.service systemctl status keepalived.service
使用cfssl为各组件生成自签证书
在安装了cfssl工具的节点上执行以下命令为各个组件创建对应的ca机构并生成自签证书
1 2 mkdir -p /root/k8s-cert && cd /root/k8s-cert
生成CA证书和API Server的证书 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 cat > ca-config.json << EOF { "signing": { "default": { "expiry": "876000h" }, "profiles": { "kubernetes": { "usages": [ "signing", "key encipherment", "server auth", "client auth" ], "expiry": "876000h" } } } } EOF cat > ca-csr.json << EOF { "CN": "kubernetes", "key": { "algo": "rsa", "size": 2048 }, "names": [ { "C": "CN", "ST": "GuangDong", "L": "ShenZhen", "O": "Kubernetes", "OU": "Kubernetes-manual" } ], "ca": { "expiry": "876000h" } } EOF cfssl gencert -initca ca-csr.json | cfssljson -bare ca cat > apiserver-csr.json << EOF { "CN": "kube-apiserver", "hosts": [ "10.0.0.1", "127.0.0.1", "kubernetes", "kubernetes.default", "kubernetes.default.svc", "kubernetes.default.svc.cluster", "kubernetes.default.svc.cluster.local", "10.211.55.7", "10.211.55.8", "10.211.55.9", "10.211.55.10" ], "key": { "algo": "rsa", "size": 2048 }, "names": [ { "C": "CN", "ST": "GuangDong", "L": "ShenZhen", "O": "Kubernetes", "OU": "Kubernetes-manual" } ] } EOF cfssl gencert -ca=ca.pem -ca-key=ca-key.pem -config=ca-config.json -profile=kubernetes apiserver-csr.json | cfssljson -bare apiserver
生成API Server聚合证书 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 cat > front-proxy-ca-csr.json << EOF { "CN": "front-proxy-ca", "key": { "algo": "rsa", "size": 2048 }, "ca": { "expiry": "876000h" } } EOF cfssl gencert -initca front-proxy-ca-csr.json | cfssljson -bare front-proxy-ca cat > front-proxy-client-csr.json << EOF { "CN": "front-proxy-client", "hosts": [""], "key": { "algo": "rsa", "size": 2048 } } EOF cfssl gencert -ca=front-proxy-ca.pem -ca-key=front-proxy-ca-key.pem -config=ca-config.json -profile=kubernetes front-proxy-client-csr.json | cfssljson -bare front-proxy-client
生成Controller Manage的证书 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 cat > kube-controller-manager-csr.json << EOF { "CN": "system:kube-controller-manager", "hosts": [""], "key": { "algo": "rsa", "size": 2048 }, "names": [ { "C": "CN", "ST": "GuangDong", "L": "ShenZhen", "O": "system:kube-controller-manager", "OU": "Kubernetes-manual" } ] } EOF cfssl gencert -ca=ca.pem -ca-key=ca-key.pem -config=ca-config.json -profile=kubernetes kube-controller-manager-csr.json | cfssljson -bare kube-controller-manager
生成kube-scheduler的证书 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 cat > kube-scheduler-csr.json << EOF { "CN": "system:kube-scheduler", "hosts": [""], "key": { "algo": "rsa", "size": 2048 }, "names": [ { "C": "CN", "ST": "GuangDong", "L": "ShenZhen", "O": "system:kube-scheduler", "OU": "Kubernetes-manual" } ] } EOF cfssl gencert -ca=ca.pem -ca-key=ca-key.pem -config=ca-config.json -profile=kubernetes kube-scheduler-csr.json | cfssljson -bare kube-scheduler
生成admin的证书 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 cat > admin-csr.json << EOF { "CN": "admin", "hosts": [""], "key": { "algo": "rsa", "size": 2048 }, "names": [ { "C": "CN", "ST": "GuangDong", "L": "ShenZhen", "O": "system:masters", "OU": "Kubernetes-manual" } ] } EOF cfssl gencert -ca=ca.pem -ca-key=ca-key.pem -config=ca-config.json -profile=kubernetes admin-csr.json | cfssljson -bare admin
生成kube-proxy的证书 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 cat > kube-proxy-csr.json << EOF { "CN": "system:kube-proxy", "hosts": [""], "key": { "algo": "rsa", "size": 2048 }, "names": [ { "C": "CN", "ST": "GuangDong", "L": "ShenZhen", "O": "system:kube-proxy", "OU": "Kubernetes-manual" } ] } EOF cfssl gencert -ca=ca.pem -ca-key=ca-key.pem -config=ca-config.json -profile=kubernetes kube-proxy-csr.json | cfssljson -bare kube-proxy
创建ServiceAccount Key ——secret 1 2 openssl genrsa -out /root/k8s-cert/sa.key 2048 openssl rsa -in /root/k8s-cert/sa.key -pubout -out /root/k8s-cert/sa.pub
部署Master组件
部署apiserver、controller-manager和scheduler 二进制包下载地址:https://github.com/kubernetes/kubernetes/releases
在每个release版本的CHANGELOG中有每个版本的二进制包下载列表,下载对应平台下的Server Binaries(包含master/node组件),上传到其中一台Master节点的/root
目录下(本文以1.29.3
为例)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 mkdir /opt/kubernetesmkdir /opt/kubernetes/binmkdir /opt/kubernetes/cfgmkdir /opt/kubernetes/sslmkdir /opt/kubernetes/logscd /root/tar zxf kubernetes-server-linux-amd64.tar.gz cp -a /root/kubernetes/server/bin/kube-apiserver /opt/kubernetes/bin/cp -a /root/kubernetes/server/bin/kube-controller-manager /opt/kubernetes/bin/cp -a /root/kubernetes/server/bin/kube-scheduler /opt/kubernetes/bin/cp -a /root/kubernetes/server/bin/kubectl /usr/local/bin/cp -a /root/k8s-cert/* /opt/kubernetes/ssl/cat > /opt/kubernetes/cfg/kube-apiserver.conf << EOF KUBE_APISERVER_OPTS="--v=2 \\ --allow-privileged=true \\ --bind-address=10.211.55.7 \\ --secure-port=6443 \\ --advertise-address=10.211.55.7 \\ --service-cluster-ip-range=10.0.0.0/24 \\ --service-node-port-range=30000-32767 \\ --etcd-servers=https://10.211.55.7:2379,https://10.211.55.8:2379,https://10.211.55.9:2379 \\ --etcd-cafile=/opt/etcd/ssl/ca.pem \\ --etcd-certfile=/opt/etcd/ssl/etcd.pem \\ --etcd-keyfile=/opt/etcd/ssl/etcd-key.pem \\ --client-ca-file=/opt/kubernetes/ssl/ca.pem \\ --tls-cert-file=/opt/kubernetes/ssl/apiserver.pem \\ --tls-private-key-file=/opt/kubernetes/ssl/apiserver-key.pem \\ --kubelet-client-certificate=/opt/kubernetes/ssl/apiserver.pem \\ --kubelet-client-key=/opt/kubernetes/ssl/apiserver-key.pem \\ --service-account-key-file=/opt/kubernetes/ssl/sa.pub \\ --service-account-signing-key-file=/opt/kubernetes/ssl/sa.key \\ --service-account-issuer=https://kubernetes.default.svc.cluster.local \\ --enable-admission-plugins=NamespaceLifecycle,LimitRanger,ServiceAccount,DefaultStorageClass,DefaultTolerationSeconds,NodeRestriction,ResourceQuota \\ --authorization-mode=RBAC,Node \\ --enable-bootstrap-token-auth=true \\ --requestheader-client-ca-file=/opt/kubernetes/ssl/front-proxy-ca.pem \\ --proxy-client-cert-file=/opt/kubernetes/ssl/front-proxy-client.pem \\ --proxy-client-key-file=/opt/kubernetes/ssl/front-proxy-client-key.pem \\ --requestheader-allowed-names=aggregator \\ --requestheader-group-headers=X-Remote-Group \\ --requestheader-extra-headers-prefix=X-Remote-Extra- \\ --requestheader-username-headers=X-Remote-User \\ --enable-aggregator-routing=true \\ --token-auth-file=/opt/kubernetes/cfg/token.csv \\ --audit-log-maxage=30 \\ --audit-log-maxbackup=3 \\ --audit-log-maxsize=100 \\ --audit-log-path=/opt/kubernetes/logs/k8s-audit.log \\ --default-not-ready-toleration-seconds=20 \\ --default-unreachable-toleration-seconds=20" EOF
参数
说明
--v=2
指定日志级别为2(日志级别分0-8),数字越大,日志越详细
--allow-privileged
允许特权容器运行
--bind-address
设置监听地址
--secure-port
设置监听端口,默认为6443
--advertise-address
通告地址,让其他节点通过此IP来连接API Server
--service-cluster-ip-range
指定Kubernetes集群中Service的CIDR范围,例如10.0.0.0/24,该CIDR范围不能与部署机器的IP地址有重合
--service-node-port-range
指定Kubernetes集群中NodePort的范围,默认值为30000~32767
--etcd-servers
指定etcd服务器的地址
--etcd-cafile
指定etcd服务器的CA证书
--etcd-certfile
指定etcd服务器的证书
--etcd-keyfile
指定etcd服务器的私钥
--client-ca-file
指定客户端CA的证书
--tls-cert-file
指定API Server服务的证书
--tls-private-key-file
指定API Server服务的私钥
--kubelet-client-certificate
、--kubelet-client-key
指定与kubelet通信的客户端证书和私钥
--service-account-key-file
指定服务账户公钥文件
--service-account-signing-key-file
指定服务账户签名密钥文件
--service-account-issuer
指定服务账户的发布者
--enable-admission-plugins
指定启用的准入插件
--authorization-mode
指定授权模式
--enable-bootstrap-token-auth
启用bootstrap token引导令牌认证
--requestheader-client-ca-file
指定请求头中的客户端CA证书
--proxy-client-cert-file
、--proxy-client-key-file
指定代理客户端的证书和私钥
--requestheader-allowed-names
指定请求头中允许的名字
--requestheader-group-headers
指定请求头中的组头
--requestheader-extra-headers-prefix
指定请求头中的额外头前缀
--requestheader-username-headers
指定请求头中的用户名头
--enable-aggregator-routing
启用聚合路由
--token-auth-file
指定用于身份验证的令牌文件
--audit-log-maxage
、--audit-log-maxbackup
、--audit-log-maxsize
、--audit-log-path
设置日志轮转、日志路径相关的配置参数
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 cat > /opt/kubernetes/cfg/kube-controller-manager.conf << EOF KUBE_CONTROLLER_MANAGER_OPTS="--v=2 \\ --bind-address=127.0.0.1 \\ --root-ca-file=/opt/kubernetes/ssl/ca.pem \\ --cluster-signing-cert-file=/opt/kubernetes/ssl/ca.pem \\ --cluster-signing-key-file=/opt/kubernetes/ssl/ca-key.pem \\ --service-account-private-key-file=/opt/kubernetes/ssl/sa.key \\ --kubeconfig=/opt/kubernetes/cfg/kube-controller-manager.kubeconfig \\ --leader-elect=true \\ --use-service-account-credentials=true \\ --node-monitor-grace-period=20s \\ --node-monitor-period=2s \\ --controllers=*,bootstrapsigner,tokencleaner \\ --allocate-node-cidrs=true \\ --service-cluster-ip-range=10.0.0.0/24 \\ --cluster-cidr=10.244.0.0/16 \\ --requestheader-client-ca-file=/opt/kubernetes/ssl/front-proxy-ca.pem \\ --cluster-signing-duration=876000h0m0s \\ --node-startup-grace-period=20s \\ --node-eviction-rate=1" EOF
参数
说明
--v=2
指定日志级别为2(日志级别分0-8),数字越大,日志越详细
--bind-address
设置监听地址
--root-ca-file
设置根证书的路径,用于验证其他组件的证书。
--cluster-signing-cert-file
设置用于签名集群证书的证书文件路径
--cluster-signing-key-file
设置用于签名集群证书的私钥文件路径
--service-account-private-key-file
设置用于签名服务账户令牌的私钥文件路径
--kubeconfig
指定kubeconfig文件的路径,此文件包含了与Kubernetes API服务器通信所需的配置信息
--leader-elect
设置启用Leader选举机制,确保只有一个控制器管理器作为Leader在运行
--use-service-account-credentials
设置使用服务账户的凭据进行认证和授权
--node-monitor-grace-period
设置将一个运行的Node节点标记为不健康之前允许其无响应的时间
--node-monitor-period
设置节点控制器对节点状态进行同步的周期
--controllers
设置要启用的控制器列表。*
表示启用所有默认启用的控制器;foo
表示启用名为foo的控制器;-foo
表示禁用名为foo的控制器。 控制器的全集:attachdetach、bootstrapsigner、cloud-node-lifecycle、clusterrole-aggregation、cronjob、csrapproving、csrcleaner、csrsigning、daemonset、deployment、disruption、endpoint、endpointslice、endpointslicemirroring、ephemeral-volume、garbagecollector、horizontalpodautoscaling、job、namespace、nodeipam、nodelifecycle、persistentvolume-binder、persistentvolume-expander、podgc、pv-protection、pvc-protection、replicaset、replicationcontroller、resourcequota、root-ca-cert-publisher、route、service、serviceaccount、serviceaccount-token、statefulset、tokencleaner、ttl、ttl-after-finished; 默认禁用的控制器:bootstrapsigner 和 tokencleaner。
--allocate-node-cidrs
设置允许基于CNI来为Pod分配和设置子网掩码
--service-cluster-ip-range
指定Kubernetes集群中Service的CIDR范围,与kube-apiserver.conf
中的--service-cluster-ip-range
参数配置保持一致
--cluster-cidr
定义集群的CIDR范围,要与CNI插件的CIDR范围保持一致
--requestheader-client-ca-file
设置请求头中客户端CA的证书文件路径,用于认证请求头中的CA证书
--cluster-signing-duration
设置签发证书的有效期
--node-startup-grace-period
设置将一个启动节点标记为不健康之前允许其无响应的时间
--node-eviction-rate
设置删除不健康节点上的pod的宽限时间
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 KUBE_CONFIG="/opt/kubernetes/cfg/kube-controller-manager.kubeconfig" KUBE_APISERVER="https://10.211.55.10:8443" cd /root/k8s-cert/kubectl config set-cluster kubernetes \ --certificate-authority=/opt/kubernetes/ssl/ca.pem \ --embed-certs=true \ --server=${KUBE_APISERVER} \ --kubeconfig=${KUBE_CONFIG} kubectl config set-context system:kube-controller-manager@kubernetes \ --cluster=kubernetes \ --user=system:kube-controller-manager \ --kubeconfig=${KUBE_CONFIG} kubectl config set-credentials system:kube-controller-manager \ --client-certificate=./kube-controller-manager.pem \ --client-key=./kube-controller-manager-key.pem \ --embed-certs=true \ --kubeconfig=${KUBE_CONFIG} kubectl config use-context system:kube-controller-manager@kubernetes \ --kubeconfig=${KUBE_CONFIG}
1 2 3 4 5 6 7 cat > /opt/kubernetes/cfg/kube-scheduler.conf << EOF KUBE_SCHEDULER_OPTS="--v=2 \\ --bind-address=127.0.0.1 \\ --leader-elect=true \\ --kubeconfig=/opt/kubernetes/cfg/kube-scheduler.kubeconfig" EOF
参数
说明
--v=2
指定日志级别为2(日志级别分0-8),数字越大,日志越详细
--bind-address
设置监听地址
--leader-elect
启用自动选举
--kubeconfig
指定kubeconfig文件的路径,此文件包含了与Kubernetes API服务器通信所需的配置信息
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 KUBE_CONFIG="/opt/kubernetes/cfg/kube-scheduler.kubeconfig" KUBE_APISERVER="https://10.211.55.10:8443" cd /root/k8s-cert/kubectl config set-cluster kubernetes \ --certificate-authority=/opt/kubernetes/ssl/ca.pem \ --embed-certs=true \ --server=${KUBE_APISERVER} \ --kubeconfig=${KUBE_CONFIG} kubectl config set-credentials system:kube-scheduler \ --client-certificate=./kube-scheduler.pem \ --client-key=./kube-scheduler-key.pem \ --embed-certs=true \ --kubeconfig=${KUBE_CONFIG} kubectl config set-context system:kube-scheduler@kubernetes \ --cluster=kubernetes \ --user=kube-scheduler \ --kubeconfig=${KUBE_CONFIG} kubectl config use-context system:kube-scheduler@kubernetes --kubeconfig=${KUBE_CONFIG}
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 cat > /usr/lib/systemd/system/kube-apiserver.service << EOF [Unit] Description=Kubernetes API Server Documentation=https://github.com/kubernetes/kubernetes [Service] EnvironmentFile=/opt/kubernetes/cfg/kube-apiserver.conf ExecStart=/opt/kubernetes/bin/kube-apiserver \$KUBE_APISERVER_OPTS Restart=on-failure [Install] WantedBy=multi-user.target EOF cat > /usr/lib/systemd/system/kube-controller-manager.service << EOF [Unit] Description=Kubernetes Controller Manager Documentation=https://github.com/kubernetes/kubernetes [Service] EnvironmentFile=/opt/kubernetes/cfg/kube-controller-manager.conf ExecStart=/opt/kubernetes/bin/kube-controller-manager \$KUBE_CONTROLLER_MANAGER_OPTS Restart=on-failure [Install] WantedBy=multi-user.target EOF cat > /usr/lib/systemd/system/kube-scheduler.service << EOF [Unit] Description=Kubernetes Scheduler Documentation=https://github.com/kubernetes/kubernetes [Service] EnvironmentFile=/opt/kubernetes/cfg/kube-scheduler.conf ExecStart=/opt/kubernetes/bin/kube-scheduler \$KUBE_SCHEDULER_OPTS Restart=on-failure [Install] WantedBy=multi-user.target EOF
随机生成一个32位字符串,用以创建token.csv
文件
1 2 3 token=`head -c 16 /dev/urandom | od -An -t x | tr -d ' ' ` echo "$token ,kubelet-bootstrap,10001,'system:node-bootstrapper'" > /opt/kubernetes/cfg/token.csv
说明 此处apiserver配置的token(32位随机字符串)必须要与后面node节点bootstrap.kubeconfig
配置文件里的token
一致
将kubernetes工作目录、etc工作目录下的ssl目录(证书和私钥文件)、Master组件的Service文件和kubectl二进制文件拷贝到其余Master节点的对应目录下
1 2 3 4 5 scp -r /opt/kubernetes/ easyk8s2:/opt/ ssh easyk8s2 mkdir -p /opt/etcd scp -r /opt/etcd/ssl/ easyk8s2:/opt/etcd/ scp -r /usr/lib/systemd/system/{kube-apiserver,kube-controller-manager,kube-scheduler}.service easyk8s2:/usr/lib/systemd/system/ scp -r /usr/local/bin/kubectl easyk8s2:/usr/local/bin/
修改其余Master节点上/opt/kubernetes/cfg/kube-apiserver.conf
配置文件中的--bind-address
和--advertise-address
参数为本机IP
1 2 3 4 ... --bind-address=10.211.55.8 \ --advertise-address=10.211.55.8 \ ...
在所有Master节点上执行以下命令设置api-server、controller-manager、scheduler开机自启并启动
1 2 3 4 5 6 7 8 9 10 systemctl daemon-reload systemctl enable kube-apiserver systemctl enable kube-controller-manager systemctl enable kube-scheduler systemctl start kube-apiserver systemctl start kube-controller-manager systemctl start kube-scheduler systemctl status kube-apiserver systemctl status kube-controller-manager systemctl status kube-scheduler
在master1节点上执行以下命令生成kubectl访问集群的kubeconfig文件
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 mkdir -p /root/.kubeKUBE_CONFIG="/root/.kube/config" KUBE_APISERVER="https://10.211.55.10:8443" cd /root/k8s-cert/kubectl config set-cluster kubernetes \ --certificate-authority=/opt/kubernetes/ssl/ca.pem \ --embed-certs=true \ --server=${KUBE_APISERVER} \ --kubeconfig=${KUBE_CONFIG} kubectl config set-credentials kubernetes-admin \ --client-certificate=./admin.pem \ --client-key=./admin-key.pem \ --embed-certs=true \ --kubeconfig=${KUBE_CONFIG} kubectl config set-context kubernetes-admin@kubernetes \ --cluster=kubernetes \ --user=kubernetes-admin \ --kubeconfig=${KUBE_CONFIG} kubectl config use-context kubernetes-admin@kubernetes --kubeconfig=${KUBE_CONFIG}
将/root/.kube
目录拷贝到其余master节点上,使得所有master节点均可通过kubectl访问集群
1 scp -r /root/.kube/ easyk8s2:/root/
此时你可以通过执行kubectl get cs
获取Kubernetes的各服务端组件状态看是否为Healthy
1 2 3 4 5 6 Warning: v1 ComponentStatus is deprecated in v1.19+ NAME STATUS MESSAGE ERROR controller-manager Healthy ok scheduler Healthy ok etcd-0 Healthy ok
在任意一台master上执行以下命令授权kubelet-bootstrap用户允许请求证书
1 kubectl create clusterrolebinding kubelet-bootstrap --clusterrole=system:node-bootstrapper --user=kubelet-bootstrap
部署Node组件
安装Docker 二进制包下载地址:https://download.docker.com/linux/static/stable/
到对应平台的目录下载所需版本的Docker二进制包,并上传到所有Node节点的/root
目录下(本文以x86
平台下的25.0.5
为例),然后依次在所有Node节点上执行以下命令安装Docker
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 cd /root/tar zxf docker-25.0.5.tgz chmod 755 docker/*cp -a docker/* /usr/bin/cat > /usr/lib/systemd/system/docker.service << EOF [Unit] Description=Docker Application Container Engine Documentation=https://docs.docker.com After=network-online.target firewalld.service containerd.service Wants=network-online.target [Service] Type=notify ExecStart=/usr/bin/dockerd ExecReload=/bin/kill -s HUP \$MAINPID TimeoutSec=0 RestartSec=2 Restart=always StartLimitBurst=3 StartLimitInterval=60s LimitNOFILE=1048576 LimitNPROC=1048576 LimitCORE=infinity TasksMax=infinity Delegate=yes KillMode=process [Install] WantedBy=multi-user.target EOF mkdir -p /etc/dockercat > /etc/docker/daemon.json <<EOF { "registry-mirrors": ["https://lerc8rqe.mirror.aliyuncs.com"], "exec-opts": ["native.cgroupdriver=systemd"] } EOF systemctl daemon-reload systemctl start docker systemctl enable docker systemctl status docker
安装cri-dockerd Kubernetes v1.24移除docker-shim的支持,而Docker Engine默认又不支持CRI标准,因此二者默认无法再直接集成。为此,Mirantis和Docker联合创建了cri-dockerd项目,用于为Docker Engine提供一个能够支持到CRI规范的桥梁,从而能够让Docker作为Kubernetes容器引擎。
在Release页 下载cri-dockerd的二进制包,并上传到所有Node节点的/root
目录下(本文以0.3.12
版本为例),然后执行以下命令解压安装
1 2 3 cd /root/tar zxf cri-dockerd-0.3.12.amd64.tgz mv cri-dockerd/cri-dockerd /usr/bin/
创建服务配置文件cri-docker.service
和cri-docker.socket
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 cat > /usr/lib/systemd/system/cri-docker.service << EOF [Unit] Description=CRI Interface for Docker Application Container Engine Documentation=https://docs.mirantis.com After=network-online.target firewalld.service docker.service Wants=network-online.target Requires=cri-docker.socket [Service] Type=notify ExecStart=/usr/bin/cri-dockerd --container-runtime-endpoint fd:// ExecReload=/bin/kill -s HUP \$MAINPID TimeoutSec=0 RestartSec=2 Restart=always # Note that StartLimit* options were moved from "Service" to "Unit" in systemd 229. # Both the old, and new location are accepted by systemd 229 and up, so using the old location # to make them work for either version of systemd. StartLimitBurst=3 # Note that StartLimitInterval was renamed to StartLimitIntervalSec in systemd 230. # Both the old, and new name are accepted by systemd 230 and up, so using the old name to make # this option work for either version of systemd. StartLimitInterval=60s # Having non-zero Limit*s causes performance problems due to accounting overhead # in the kernel. We recommend using cgroups to do container-local accounting. LimitNOFILE=infinity LimitNPROC=infinity LimitCORE=infinity # Comment TasksMax if your systemd version does not support it. # Only systemd 226 and above support this option. TasksMax=infinity Delegate=yes KillMode=process [Install] WantedBy=multi-user.target EOF cat > /usr/lib/systemd/system/cri-docker.socket << EOF [Unit] Description=CRI Docker Socket for the API PartOf=cri-docker.service [Socket] ListenStream=%t/cri-dockerd.sock SocketMode=0660 SocketUser=root SocketGroup=root [Install] WantedBy=sockets.target EOF
启动cri-docker并设置开机自启
1 2 3 4 5 6 7 systemctl daemon-reload systemctl enable --now cri-docker.socket systemctl enable cri-docker.service systemctl start cri-docker.socket systemctl start cri-docker.service systemctl status cri-docker.socket systemctl status cri-docker.service
部署kubelet和kube-proxy 二进制包下载地址:https://github.com/kubernetes/kubernetes/releases
在每个release版本的CHANGELOG中有每个版本的二进制包下载列表,下载对应平台下的Server Binaries(包含master/node组件),上传到所有Node节点的/root
目录下(本文以1.29.3
为例),然后依次在所有Node节点上执行以下操作安装kubelet和kube-proxy
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 mkdir -p /opt/kubernetesmkdir -p /opt/kubernetes/binmkdir -p /opt/kubernetes/cfgmkdir -p /opt/kubernetes/sslmkdir -p /opt/kubernetes/logscd /root/tar zxf kubernetes-server-linux-amd64.tar.gz cp -a /root/kubernetes/server/bin/kubelet /opt/kubernetes/bin/cp -a /root/kubernetes/server/bin/kube-proxy /opt/kubernetes/bin/cat > /opt/kubernetes/cfg/kubelet.conf << EOF KUBELET_OPTS="--v=2 \\ --hostname-override=easyk8s1 \\ --bootstrap-kubeconfig=/opt/kubernetes/cfg/bootstrap.kubeconfig \\ --kubeconfig=/opt/kubernetes/cfg/kubelet.kubeconfig \\ --config=/opt/kubernetes/cfg/kubelet-config.yml \\ --cert-dir=/opt/kubernetes/ssl \\ --pod-infra-container-image=registry.aliyuncs.com/google_containers/pause:3.9 \\ --container-runtime-endpoint=unix:///var/run/cri-dockerd.sock" EOF
参数
说明
--v=2
指定日志级别为2(日志级别分0-8),数字越大,日志越详细
--hostname-override
指定当前节点注册到Kubernetes显示的名称(集群内唯一),建议设置为本机主机名
--bootstrap-kubeconfig
指定用于启动引导kubelet的kubeconfig配置的bootstrap-kubeconfig文件路径和名称
--kubeconfig
指定kubelet的kubeconfig文件的路径和名称
--config
指定kubelet的配置文件路径和名称
--cert-dir
指定证书存放目录
--pod-infra-container-image
指定管理Pod网络容器的pause镜像(registry.aliyuncs.com/google_containers/pause:3.9)[可选参数] 此参数在新版本中已弃用,将在未来版本中删除,pause镜像信息将从CRI(例如cri-dockerd)中获取
--container-runtime-endpoint
指定使用的CRI socket路径,例如unix:///var/run/cri-dockerd.sock[可选参数]
说明 在启动kubelet时,如果--kubeconfig
标志所指定的文件并不存在,会使用通过标志--bootstrap-kubeconfig
所指定的启动引导kubeconfig配置来向API服务器请求客户端证书。在证书请求被批复并被kubelet收回时,一个引用所生成的密钥和所获得证书的kubeconfig文件会被写入到通过--kubeconfig
所指定的文件路径下。证书和密钥文件会被放到--cert-dir
所指定的目录中。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 cat > /opt/kubernetes/cfg/kubelet-config.yml << EOF kind: KubeletConfiguration apiVersion: kubelet.config.k8s.io/v1beta1 address: 0.0.0.0 port: 10250 readOnlyPort: 10255 cgroupDriver: systemd clusterDNS: - 10.0.0.2 clusterDomain: cluster.local failSwapOn: false authentication: anonymous: enabled: false webhook: cacheTTL: 2m0s enabled: true x509: clientCAFile: /opt/kubernetes/ssl/ca.pem authorization: mode: Webhook webhook: cacheAuthorizedTTL: 5m0s cacheUnauthorizedTTL: 30s evictionHard: imagefs.available: 15% memory.available: 100Mi nodefs.available: 10% nodefs.inodesFree: 5% maxOpenFiles: 1000000 maxPods: 110 EOF
参数
说明
clusterDNS
指定集群DNS服务器地址,一般是--service-cluster-ip-range
参数指定的网段的第二个IP地址
clusterDomain
指定集群的域名后缀,默认为cluster.local
1 2 3 4 5 cat > /opt/kubernetes/cfg/kube-proxy.conf << EOF KUBE_PROXY_OPTS="--v=2 \\ --config=/opt/kubernetes/cfg/kube-proxy-config.yml" EOF
参数
说明
--v=2
指定日志级别为2(日志级别分0-8),数字越大,日志越详细
--config
指定kube-proxy的配置文件路径和名称
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 cat > /opt/kubernetes/cfg/kube-proxy-config.yml << EOF kind: KubeProxyConfiguration apiVersion: kubeproxy.config.k8s.io/v1alpha1 bindAddress: 0.0.0.0 metricsBindAddress: 0.0.0.0:10249 clientConnection: kubeconfig: /opt/kubernetes/cfg/kube-proxy.kubeconfig hostnameOverride: easyk8s1 clusterCIDR: 10.0.0.0/24 mode: ipvs ipvs: scheduler: "rr" iptables: masqueradeAll: true EOF
参数
说明
hostnameOverride
指定本机主机名(集群内唯一)
clusterCIDR
指定Service Cluster IP段,与kube-apiserver.conf
中的--service-cluster-ip-range
参数配置保持一致
mode
指定kube-proxy的代理模式,可支持ipvs或者iptables
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 cat > /usr/lib/systemd/system/kubelet.service << EOF [Unit] Description=Kubernetes Kubelet After=docker.service Wants=docker.service [Service] EnvironmentFile=/opt/kubernetes/cfg/kubelet.conf ExecStart=/opt/kubernetes/bin/kubelet \$KUBELET_OPTS Restart=on-failure LimitNOFILE=65536 [Install] WantedBy=multi-user.target EOF cat > /usr/lib/systemd/system/kube-proxy.service << EOF [Unit] Description=Kubernetes Proxy After=network.target [Service] EnvironmentFile=/opt/kubernetes/cfg/kube-proxy.conf ExecStart=/opt/kubernetes/bin/kube-proxy \$KUBE_PROXY_OPTS Restart=on-failure LimitNOFILE=65536 [Install] WantedBy=multi-user.target EOF
执行以下操作生成bootstrap.kubeconfig
配置文件和kube-proxy.kubeconfig
配置文件,并拷贝到所有Node节点
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 KUBE_CONFIG="/root/k8s-cert/bootstrap.kubeconfig" KUBE_APISERVER="https://10.211.55.10:8443" TOKEN="16654491086d9095bd387c665efe01dd" kubectl config set-cluster kubernetes \ --certificate-authority=/opt/kubernetes/ssl/ca.pem \ --embed-certs=true \ --server=${KUBE_APISERVER} \ --kubeconfig=${KUBE_CONFIG} kubectl config set-credentials kubelet-bootstrap \ --token=${TOKEN} \ --kubeconfig=${KUBE_CONFIG} kubectl config set-context kubelet-bootstrap@kubernetes \ --cluster=kubernetes \ --user=kubelet-bootstrap \ --kubeconfig=${KUBE_CONFIG} kubectl config use-context kubelet-bootstrap@kubernetes --kubeconfig=${KUBE_CONFIG} scp -r /root/k8s-cert/bootstrap.kubeconfig easyk8s1:/opt/kubernetes/cfg/bootstrap.kubeconfig KUBE_CONFIG="/root/k8s-cert/kube-proxy.kubeconfig" KUBE_APISERVER="https://10.211.55.10:8443" cd /root/k8s-cert/kubectl config set-cluster kubernetes \ --certificate-authority=/opt/kubernetes/ssl/ca.pem \ --embed-certs=true \ --server=${KUBE_APISERVER} \ --kubeconfig=${KUBE_CONFIG} kubectl config set-credentials kube-proxy \ --client-certificate=./kube-proxy.pem \ --client-key=./kube-proxy-key.pem \ --embed-certs=true \ --kubeconfig=${KUBE_CONFIG} kubectl config set-context kube-proxy@kubernetes \ --cluster=kubernetes \ --user=kube-proxy \ --kubeconfig=${KUBE_CONFIG} kubectl config use-context kube-proxy@kubernetes --kubeconfig=${KUBE_CONFIG} scp -r /root/k8s-cert/kube-proxy.kubeconfig easyk8s1:/opt/kubernetes/cfg/kube-proxy.kubeconfig
从上面安装了cfssl工具的主机上拷贝ca证书和kube-proxy的自签证书和私钥到所有Node节点的/opt/kubernetes/ssl
目录下
1 2 cd /root/k8s-cert/scp -r ca.pem kube-proxy.pem kube-proxy-key.pem easyk8s1:/opt/kubernetes/ssl/
设置kubelet和kube-proxy开机自启并启动
1 2 3 4 5 6 7 systemctl daemon-reload systemctl enable kubelet systemctl enable kube-proxy systemctl start kubelet systemctl start kube-proxy systemctl status kubelet systemctl status kube-proxy
允许给Node颁发证书 当kubelet和kube-proxy成功启动后,此时在任意一台master节点上执行kubectl get csr
可以看到有新的节点请求颁发证书(CONDITION
字段处于Pending
状态),执行以下命令允许给Node颁发证书
1 2 kubectl certificate approve node-csr--p9rVRfwl6f5UvR8iRQvpiHuN53qfwMNSRVSfSjTURk
授权颁发证书后,在任意一台master节点执行kubectl get node
能看到Node节点都还处于NotReady
状态,这是因为现在还没安装网络插件
补充知识点 1.若kubectl或kube-proxy配置文件中的hostname-override
配置参数漏修改,导致授权后master无法正常获取到Node节点信息,除了修改kubelet.conf
的--hostname-override
配置和kube-proxy-config.yml
的hostnameOverride
配置外,还需要将kubelet.kubeconfig
文件(这个文件是master认证后客户端自动生成的)删除,才可重新申请授权,否则报错信息类似如下:
1 kubelet_node_status.go:94] Unable to register node "k8s-node2" with API server: nodes "k8s-node2" is forbidden: node "k8s-node1" is not allowed to modify node "k8s-node2"
2.TLS Bootstrapping 机制流程(Kubelet) 3.如何删除一个Node节点并重新接入集群 在Master节点操作
1 2 kubectl drain 10.211.55.9 --delete-local-data kubectl delete node 10.211.55.9
在Node节点操作
1 2 3 4 rm -rf /opt/kubernetes/cfg/kubelet.kubeconfigrm -rf /opt/kubernetes/ssl/kubelet*systemctl restart kubelet systemctl restart kube-proxy
在Master节点重新授权
1 2 kubectl get csr kubectl certificate approve xxxx
授权apiserver访问kubelet 为提供安全性,kubelet禁止匿名访问,必须授权才可以。一个常见的表现就是无法通过kubectl logs
查看pod的日志,错误输出类似如下:
Error from server (Forbidden): Forbidden (user=kube-apiserver, verb=get, resource=nodes, subresource=proxy) ( pods/log calico-node-gnh4r)
在任意一台master节点上执行以下命令进行授权
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 cat > /root/apiserver-to-kubelet-rbac.yaml << EOF apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: annotations: rbac.authorization.kubernetes.io/autoupdate: "true" labels: kubernetes.io/bootstrapping: rbac-defaults name: system:kube-apiserver-to-kubelet rules: - apiGroups: - "" resources: - nodes/proxy - nodes/stats - nodes/log - nodes/spec - nodes/metrics - pods/log verbs: - "*" --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: system:kube-apiserver namespace: "" roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: system:kube-apiserver-to-kubelet subjects: - apiGroup: rbac.authorization.k8s.io kind: User name: kube-apiserver EOF kubectl apply -f /root/apiserver-to-kubelet-rbac.yaml
部署CNI网络 Kubernetes支持多种网络类型,详细可参考以下文档:
本文将介绍Calico网络的安装方法
在任意一台master节点执行以下命令下载operator资源清单文件并应用,创建operator
1 2 3 cd /root/wget https://raw.githubusercontent.com/projectcalico/calico/v3.27.2/manifests/tigera-operator.yaml kubectl create -f tigera-operator.yaml
说明 由于CRD包较大,kubectl apply
可能会超出请求限制,所以建议使用kubectl create
查看tigera-operator的pod创建情况,待所有pod都处于Running
状态后继续下面的步骤
1 watch kubectl get pod -n tigera-operator
在任意一台master节点执行以下命令下载Calico的资源清单文件
1 2 cd /root/wget https://raw.githubusercontent.com/projectcalico/calico/v3.27.2/manifests/custom-resources.yaml
编辑custom-resources.yaml
文件,根据实际环境情况修改cidr
配置
说明 cidr
指定的网段要与/opt/kubernetes/cfg/kube-controller-manager.conf
中的--cluster-cidr
参数指定的网段一致
1 2 3 4 5 6 7 ... ipPools: - blockSize: 26 cidr: 10.244 .0 .0 /16 encapsulation: VXLANCrossSubnet natOutgoing: Enabled ...
修改完成后进行安装
1 kubectl apply -f /root/custom-resources.yaml
查看Calico的pod创建情况,待所有pod都处于Running
状态后表示CNI部署完成
1 2 watch kubectl get pods -n calico-system watch kubectl get pods -n calico-apiserver
环境测试验证 在任意一个master节点上执行以下命令创建一个nginx pod并暴露端口测试是否可以从外部正常访问
1 2 3 4 5 6 7 8 9 10 kubectl create deployment web --image=nginx kubectl expose deployment web --port=80 --type =NodePort kubectl get service NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE web NodePort 10.0.0.34 <none> 80:32163/TCP 61s
浏览器访问:http://<Node_IP>:32163
如果能正常返回nginx欢迎页面,则表示环境一切正常。
说明 验证正常后记得清理测试资源哦~
1 2 kubectl delete service web kubectl delete deployment web
部署CoreDNS
部署CoreDNS主要是为了给Kubernetes的Service提供DNS解析服务,使得程序可以通过Service的名称进行访问
DNS服务监视Kubernetes API,为每一个Service创建DNS记录用于域名解析。
ClusterIP A记录格式:<service-name>.<namespace-name>.svc.<domain_suffix>
,示例:my-svc.my-namespace.svc.cluster.local
使用kubeadm方式部署的Kubernetes会自动安装CoreDNS,二进制部署方式则需要自行安装
从Github地址上下载coredns.yaml.base
文件到任意master节点的/root/
目录下,并重命名为coredns.yaml
,然后参考下方标注修改其中的部分参数
__MACHINE_GENERATED_WARNING__
替换为This is a file generated from the base underscore template file: coredns.yaml.base
__PILLAR__DNS__DOMAIN__
或__DNS__DOMAIN__
替换为cluster.local
,若要使用非默认域名如koenli.net记得要与node节点上/opt/kubernetes/cfg/kubelet-config.yml
文件中的clusterDomain
参数保持一致,并要调整api-server证书中的hosts字段值并重新颁发证书
__PILLAR__DNS__MEMORY__LIMIT__
或__DNS__MEMORY__LIMIT__
替换为170Mi
,此内存限制的值可根据实际环境资源进行调整
__PILLAR__DNS__SERVER__
或__DNS__SERVER__
替换为10.0.0.2
,此IP地址需要与Node节点上/opt/kubernetes/cfg/kubelet-config.yml
文件中配置的clusterDNS
字段的IP一致
说明 官方提供的yaml文件中使用的镜像仓库在境外,国内无法访问。建议将其替换为国内阿里云的替代镜像仓库
官方镜像仓库
阿里云替代镜像仓库
registry.k8s.io/coredns/coredns
registry.cn-hangzhou.aliyuncs.com/google_containers/coredns
以下为我替换后最终的文件内容
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 apiVersion: v1 kind: ServiceAccount metadata: name: coredns namespace: kube-system labels: kubernetes.io/cluster-service: "true" addonmanager.kubernetes.io/mode: Reconcile --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: labels: kubernetes.io/bootstrapping: rbac-defaults addonmanager.kubernetes.io/mode: Reconcile name: system:coredns rules: - apiGroups: - "" resources: - endpoints - services - pods - namespaces verbs: - list - watch - apiGroups: - discovery.k8s.io resources: - endpointslices verbs: - list - watch --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: annotations: rbac.authorization.kubernetes.io/autoupdate: "true" labels: kubernetes.io/bootstrapping: rbac-defaults addonmanager.kubernetes.io/mode: EnsureExists name: system:coredns roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: system:coredns subjects: - kind: ServiceAccount name: coredns namespace: kube-system --- apiVersion: v1 kind: ConfigMap metadata: name: coredns namespace: kube-system labels: addonmanager.kubernetes.io/mode: EnsureExists data: Corefile: | .:53 { errors health { lameduck 5s } ready kubernetes cluster.local in-addr.arpa ip6.arpa { pods insecure fallthrough in-addr.arpa ip6.arpa ttl 30 } prometheus :9153 forward . /etc/resolv.conf { max_concurrent 1000 } cache 30 loop reload loadbalance } --- apiVersion: apps/v1 kind: Deployment metadata: name: coredns namespace: kube-system labels: k8s-app: kube-dns kubernetes.io/cluster-service: "true" addonmanager.kubernetes.io/mode: Reconcile kubernetes.io/name: "CoreDNS" spec: strategy: type: RollingUpdate rollingUpdate: maxUnavailable: 1 selector: matchLabels: k8s-app: kube-dns template: metadata: labels: k8s-app: kube-dns spec: securityContext: seccompProfile: type: RuntimeDefault priorityClassName: system-cluster-critical serviceAccountName: coredns affinity: podAntiAffinity: preferredDuringSchedulingIgnoredDuringExecution: - weight: 100 podAffinityTerm: labelSelector: matchExpressions: - key: k8s-app operator: In values: ["kube-dns" ] topologyKey: kubernetes.io/hostname tolerations: - key: "CriticalAddonsOnly" operator: "Exists" nodeSelector: kubernetes.io/os: linux containers: - name: coredns image: registry.cn-hangzhou.aliyuncs.com/google_containers/coredns:v1.11.1 imagePullPolicy: IfNotPresent resources: limits: memory: 170Mi requests: cpu: 100m memory: 70Mi args: [ "-conf" , "/etc/coredns/Corefile" ] volumeMounts: - name: config-volume mountPath: /etc/coredns readOnly: true ports: - containerPort: 53 name: dns protocol: UDP - containerPort: 53 name: dns-tcp protocol: TCP - containerPort: 9153 name: metrics protocol: TCP livenessProbe: httpGet: path: /health port: 8080 scheme: HTTP initialDelaySeconds: 60 timeoutSeconds: 5 successThreshold: 1 failureThreshold: 5 readinessProbe: httpGet: path: /ready port: 8181 scheme: HTTP securityContext: allowPrivilegeEscalation: false capabilities: add: - NET_BIND_SERVICE drop: - ALL readOnlyRootFilesystem: true dnsPolicy: Default volumes: - name: config-volume configMap: name: coredns items: - key: Corefile path: Corefile --- apiVersion: v1 kind: Service metadata: name: kube-dns namespace: kube-system annotations: prometheus.io/port: "9153" prometheus.io/scrape: "true" labels: k8s-app: kube-dns kubernetes.io/cluster-service: "true" addonmanager.kubernetes.io/mode: Reconcile kubernetes.io/name: "CoreDNS" spec: selector: k8s-app: kube-dns clusterIP: 10.0 .0 .2 ports: - name: dns port: 53 protocol: UDP - name: dns-tcp port: 53 protocol: TCP - name: metrics port: 9153 protocol: TCP
执行以下命令进行安装
1 kubectl apply -f /root/coredns.yaml
查看CoreDNS的pod创建情况,待所有Pod均为Running
状态后表示部署完成
1 watch kubectl get pod -n kube-system
在任意master节点上执行以下命令创建一个busybox容器,在容器中ping service的名称看是否可以正常解析出IP地址,如果可以则说明DNS服务部署成功。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 cat > /root/bs.yaml << EOF apiVersion: v1 kind: Pod metadata: name: busybox namespace: default spec: containers: - image: busybox:1.28.4 command: - sleep - "3600" imagePullPolicy: IfNotPresent name: busybox restartPolicy: Always EOF kubectl apply -f /root/bs.yaml watch kubectl get pods kubectl exec -ti busybox sh / Server: 10.0.0.2 Address 1: 10.0.0.2 kube-dns.kube-system.svc.cluster.local Name: kubernetes Address 1: 10.0.0.1 kubernetes.default.svc.cluster.local / kubectl delete -f /root/bs.yaml
部署Kubernetes Dashboard
说明 Kubernetes-Dashboard v7.0.0版本之后只支持通过Helm进行安装,具体安装哪个版本根据其与Kubernetes版本兼容情况选择一个版本安装即可,Helm的安装请参考后面的安装Helm 章节
v7.0.0版本及之后 在任意一台master节点上执行以下命令安装
1 2 helm repo add kubernetes-dashboard https://kubernetes.github.io/dashboard/ helm upgrade --install kubernetes-dashboard kubernetes-dashboard/kubernetes-dashboard --create-namespace --namespace kubernetes-dashboard
说明 如果使用默认参数安装kubernetes-dashboard-kong出现8444端口占用异常,可使用以下命令安装,在安装时关闭kong.tls功能
1 helm upgrade --install kubernetes-dashboard kubernetes-dashboard/kubernetes-dashboard --create-namespace --namespace kubernetes-dashboard --set kong.admin.tls.enabled=false
查看Kubernetes Dashboard的pod创建情况,待所有pod都处于Running
状态后再继续下面的步骤
1 watch kubectl get pods -n kubernetes-dashboard
执行kubectl edit svc kubernetes-dashboard-kong-proxy -n kubernetes-dashboard
修改Service的类型为NodePort
并设置nodePort
端口
1 2 3 4 5 6 7 8 9 10 11 12 13 14 - IPv4 ipFamilyPolicy: SingleStack ports: - name: kong-proxy-tls nodePort: 30001 port: 443 protocol: TCP targetPort: 8443 selector: app.kubernetes.io/component: app app.kubernetes.io/instance: kubernetes-dashboard app.kubernetes.io/name: kong sessionAffinity: None type: NodePort
创建service account并绑定默认cluster-admin管理员集群角色
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 cat > /root/dashboard-adminuser.yaml << EOF apiVersion: v1 kind: ServiceAccount metadata: name: admin-user namespace: kubernetes-dashboard --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: admin-user roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: cluster-admin subjects: - kind: ServiceAccount name: admin-user namespace: kubernetes-dashboard EOF kubectl apply -f /root/dashboard-adminuser.yaml
获取登录token
1 kubectl create token admin-user -n kubernetes-dashboard
使用上面输出的token登录Kubernetes Dashboard(https://<NODE_IP>:30001
)。
v7.0.0版本之前 在任意一台master节点上下载Kubernetes Dashboard的yaml文件到/root
目录下
说明 本文以安装v2.2.0版本为例,实际安装时请到Github地址 获取对应版本的yaml文件下载地址
1 2 cd /root/wget https://raw.githubusercontent.com/kubernetes/dashboard/v2.2.0/aio/deploy/recommended.yaml
编辑recommended.yaml
文件,找到kubernetes-dashboard
这个Service
的部分,设置其type
为NodePort
,nodePort
为30001(可自定义)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 ... kind: Service apiVersion: v1 metadata: labels: k8s-app: kubernetes-dashboard name: kubernetes-dashboard namespace: kubernetes-dashboard spec: type: NodePort ports: - port: 443 targetPort: 8443 nodePort: 30001 selector: k8s-app: kubernetes-dashboard ...
修改完成后,执行以下命令部署Kubernetes Dashboard
1 kubectl apply -f /root/recommended.yaml
查看Kubernetes Dashboard的pod创建情况,待所有pod都处于Running
状态后再继续下面的步骤
1 watch kubectl get pods -n kubernetes-dashboard
创建service account并绑定默认cluster-admin管理员集群角色
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 cat > /root/dashboard-adminuser.yaml << EOF apiVersion: v1 kind: ServiceAccount metadata: name: admin-user namespace: kubernetes-dashboard --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: admin-user roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: cluster-admin subjects: - kind: ServiceAccount name: admin-user namespace: kubernetes-dashboard EOF kubectl apply -f /root/dashboard-adminuser.yaml
获取登录token
1 kubectl -n kubernetes-dashboard describe secret $(kubectl -n kubernetes-dashboard get secret | grep admin-user | awk '{print $1}' )
使用上面输出的token登录Kubernetes Dashboard(https://<NODE_IP>:30001
)。
说明 协议要用HTTPS
说明 如果在Chrome浏览器中访问Kubernetes Dashboard没有”继续前往x.x.x.x(不安全)”的选项,可参考下面的步骤进行处理 1.在刚执行命令部署Kubernetes Dashboard的master节点上执行以下命令删除默认的secret
,并用自签证书创建新的secret
(注意修改自签证书的路径是否与实际环境一致)
1 2 kubectl delete secret kubernetes-dashboard-certs -n kubernetes-dashboard kubectl create secret generic kubernetes-dashboard-certs --from-file=/opt/kubernetes/ssl/apiserver-key.pem --from-file=/opt/kubernetes/ssl/apiserver.pem -n kubernetes-dashboard
2.修改/root/recommended.yaml
文件,在args
下面指定证书文件和Key文件(搜索auto-generate-certificates
即可跳转到对应位置)
1 2 3 4 5 args: - --auto-generate-certificates - --tls-key-file=apiserver-key.pem - --tls-cert-file=apiserver.pem
3.重新应用recommended.yaml
1 kubectl apply -f /root/recommended.yaml
4.确认Kubernetes Dashboard的pod都处于Running
状态
1 watch kubectl get pods -n kubernetes-dashboard
5.重新访问https://<NODE_IP>:30001
到目前为止一套3Master+3个Node的高可用Kubernetes集群全部搭建完成。如果还需要再配置kubectl命令自动补全、安装ingress-nginx
、安装Helm
可继续参考后续章节。
配置kubectl命令自动补全
在所有master节点上执行以下操作
1 2 3 4 5 6 7 8 9 yum install bash-completion -y source /usr/share/bash-completion/bash_completionsed -i '$a\source <(kubectl completion bash)' /etc/profile source /etc/profile
安装ingress-nginx
访问Github地址 ,切换到所需版本,找到deploy/static/provider/baremetal/deploy.yaml
或者deploy/static/mandatory.yaml
资源清单文件,下载并上传到任意一台master节点的/root
目录,重命名为ingress.yaml
(也可直接将文件内容复制然后在机器上新建文件粘贴,保存为ingress.yaml
)
说明 本文以1.10.0为例
编辑ingress.yaml
文件,在Deployment.spec.template.spec
中设置启用hostNetwork
并添加Service
相关配置
说明 官方提供的yaml文件中使用的3个镜像仓库(2个是相同的)在境外,国内无法访问。建议将其替换为国内的替代镜像仓库
官方镜像仓库
国内替代镜像仓库
registry.k8s.io/ingress-nginx/controller
1.registry.cn-hangzhou.aliyuncs.com/google_containers/nginx-ingress-controller 2.dyrnq/ingress-nginx-controller
registry.k8s.io/ingress-nginx/kube-webhook-certgen
1.registry.cn-hangzhou.aliyuncs.com/google_containers/kube-webhook-certgen 2.dyrnq/kube-webhook-certgen
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 ... apiVersion: apps/v1 kind: Deployment metadata: labels: app.kubernetes.io/component: controller app.kubernetes.io/instance: ingress-nginx app.kubernetes.io/name: ingress-nginx app.kubernetes.io/part-of: ingress-nginx app.kubernetes.io/version: 1.10 .0 name: ingress-nginx-controller namespace: ingress-nginx spec: minReadySeconds: 0 revisionHistoryLimit: 10 selector: matchLabels: app.kubernetes.io/component: controller app.kubernetes.io/instance: ingress-nginx app.kubernetes.io/name: ingress-nginx strategy: rollingUpdate: maxUnavailable: 1 type: RollingUpdate template: metadata: labels: app.kubernetes.io/component: controller app.kubernetes.io/instance: ingress-nginx app.kubernetes.io/name: ingress-nginx app.kubernetes.io/part-of: ingress-nginx app.kubernetes.io/version: 1.10 .0 spec: hostNetwork: true containers: - args: - /nginx-ingress-controller - --election-id=ingress-nginx-leader - --controller-class=k8s.io/ingress-nginx ... --- apiVersion: v1 kind: Service metadata: name: ingress-nginx namespace: ingress-nginx spec: type: ClusterIP ports: - name: http port: 80 targetPort: 80 protocol: TCP - name: https port: 443 targetPort: 443 protocol: TCP selector: app.kubernetes.io/name: ingress-nginx
执行以下命令安装ingress-nginx
1 2 cd /root/kubectl apply -f ingress.yaml
确认ingress-nginx的pod创建情况,待所有pod都处于Completed
或Running
状态后表示ingress-nginx部署完成
1 watch kubectl get pods -n ingress-nginx
验证ingress-nginx
1 2 3 4 5 6 7 8 9 10 11 12 13 14 kubectl create deployment web --image=nginx kubectl expose deployment web --port=80 kubectl delete -A ValidatingWebhookConfiguration ingress-nginx-admission kubectl create ingress web --class=nginx \ --rule="test.koenli.com/*=web:80" kubectl get pod -o wide -n ingress-nginx | grep ingress-nginx-controller curl --resolve test.koenli.com:80:10.211.55.8 http://test.koenli.com
说明 验证正常后记得清理测试资源哦~
1 2 3 kubectl delete ingress web kubectl delete service web kubectl delete deployment web
安装Helm
说明 Helm目前存在Helm2 与Helm3 两个大版本,两个版本互不兼容,根据Helm与Kubernetes版本兼容情况结合实际业务需求安装其中一个版本即可
Helm2 下载所需版本的Helm安装包(以2.17.0
版本为例),上传到所有的master节点的/root/helm
目录下(如果没有此目录需先创建),执行以下命令安装Helm客户端
1 2 3 4 cd /root/helm/tar zxf helm-v2.17.0-linux-amd64.tar.gz cd linux-amd64/cp -a helm /usr/local/bin/
创建Tiller授权清单并应用
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 cat > /root/helm/tiller-rbac.yaml << EOF --- apiVersion: v1 kind: ServiceAccount metadata: name: tiller namespace: kube-system --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: tiller-cluster-rule roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: cluster-admin subjects: - kind: ServiceAccount name: tiller namespace: kube-system EOF kubectl apply -f /root/helm/tiller-rbac.yaml
在其中一台master执行以下命令初始化Helm服务端
1 helm init --service-account tiller --skip-refresh
使用命令查看Tiller的pod
1 kubectl get pods -n kube-system |grep tiller
待Running后执行helm version
,如果出现如下输出说明Helm的客户端和服务端安装完成。
1 2 Client: &version.Version{SemVer:"v2.17.0" , GitCommit:"a690bad98af45b015bd3da1a41f6218b1a451dbe" , GitTreeState:"clean" } Server: &version.Version{SemVer:"v2.17.0" , GitCommit:"a690bad98af45b015bd3da1a41f6218b1a451dbe" , GitTreeState:"clean" }
说明 如果STATUS
为ImagePullBackOff
状态,说明拉取镜像失败,可尝试执行kubectl edit pods tiller-deploy-xxxxxxxx -n kube-system
编辑Tiller的deployment,更换所使用的镜像为sapcc/tiller:[tag]
,此镜像为Mirror of https://gcr.io/kubernetes-helm/tiller/
1 2 3 4 ... image: sapcc/tiller:v2.17.0 ...
保存退出后执行以下命令查看Tiller的pod,待Running后执行helm version
确认Helm是否安装完成。
1 kubectl get pods -n kube-system |grep tiller
说明 如果执行helm version
出现类似如下报错
1 2 Client: &version.Version{SemVer:"v2.17.0" , GitCommit:"a690bad98af45b015bd3da1a41f6218b1a451dbe" , GitTreeState:"clean" } E1213 15:58:40.605638 10274 portforward.go:400] an error occurred forwarding 34583 -> 44134: error forwarding port 44134 to pod 1e92153b279110f9464193c4ea7d6314ac69e70ce60e7319df9443e379b52ed4, uid : unable to do port forwarding: socat not found
解决方法 在所有node节点上安装socat
Helm3 下载所需版本的Helm安装包(以3.14.4版本为例),上传到所有的master节点的/root/helm
目录下(如果没有此目录需先创建),执行以下命令安装Helm并验证
1 2 3 4 5 6 7 8 cd /root/helm/tar zxf helm-v3.14.4-linux-amd64.tar.gz cd linux-amd64/cp -a helm /usr/local/bin/helm version
常见问题处理
问题一 问题现象 部署Calico网络的时候Pod一直处于Pending
状态,查看Pod的Event没有任何信息,于是查看Node节点上的/var/log/message
日志,存在以下错误信息
1 2 3 4 5 Apr 16 09:28:38 easyk8s1 kube-scheduler: E0416 09:28:38.406230 529 reflector.go:147] vendor/k8s.io/client-go/informers/factory.go:159: Failed to watch *v1.Pod: failed to list *v1.Pod: pods is forbidden: User "system:anonymous" cannot list resource "pods" in API group "" at the cluster scope Apr 16 09:28:59 easyk8s1 kube-scheduler: E0416 09:28:59.968506 529 reflector.go:147] vendor/k8s.io/client-go/informers/factory.go:159: Failed to watch *v1.CSIDriver: failed to list *v1.CSIDriver: csidrivers.storage.k8s.io is forbidden: User "system:anonymous" cannot list resource "csidrivers" in API group "storage.k8s.io" at the cluster scope Apr 16 09:29:02 easyk8s1 kube-scheduler: E0416 09:29:02.490493 529 reflector.go:147] vendor/k8s.io/client-go/informers/factory.go:159: Failed to watch *v1.ReplicationController: failed to list *v1.ReplicationController: replicationcontrollers is forbidden: User "system:anonymous" cannot list resource "replicationcontrollers" in API group "" at the cluster scope Apr 16 09:29:03 easyk8s1 kube-scheduler: E0416 09:29:03.649892 529 reflector.go:147] vendor/k8s.io/client-go/informers/factory.go:159: Failed to watch *v1.Node: failed to list *v1.Node: nodes is forbidden: User "system:anonymous" cannot list resource "nodes" in API group "" at the cluster scope Apr 16 09:29:08 easyk8s1 kube-scheduler: W0416 09:29:08.086045 529 reflector.go:539] vendor/k8s.io/client-go/informers/factory.go:159: failed to list *v1.StatefulSet: statefulsets.apps is forbidden: User "system:anonymous" cannot list resource "statefulsets" in API group "apps" at the cluster scope
解决方法 给匿名用户授权
1 kubectl create clusterrolebinding system:anonymous --clusterrole=cluster-admin --user=system:anonymous
问题二 问题现象 部署Calico网络的时候Pod无法正常Running
,查看Pod的Event发现如下错误信息:
1 Warning FailedCreatePodSandBox 40s (x6 over 4m29s) kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed pulling image "registry.k8s.io/pause:3.9" : Error response from daemon: Head "https://us-west2-docker.pkg.dev/v2/k8s-artifacts-prod/images/pause/manifests/3.9" : dial tcp 173.194.174.82:443: i/o timeout
原因分析 由Event日志可知,拉取registry.k8s.io/pause:3.9
镜像失败了,但是在/opt/kubernetes/cfg/kubelet.conf
配置文件中已经明确指定了--pod-infra-container-image
参数指定使用registry.aliyuncs.com/google_containers/pause:${VERSION}
镜像仓库的镜像,为何还是从registry.k8s.io
拉取呢?查阅相关资料,发现--pod-infra-container-image
参数在新版本中已弃用,将在未来版本中删除,pause镜像信息将从CRI(例如cri-dockerd)中获取,所以使用cri-dockerd时,还需要在其服务管理文件中指定--pod-infra-container-image
参数,否则还是默认从k8s.gcr.io
或registry.k8s.io
拉取
解决方法 编辑/usr/lib/systemd/system/cri-docker.service
文件,在ExecStart=/usr/bin/cri-dockerd
最后添加--pod-infra-container-image=registry.aliyuncs.com/google_containers/pause:3.9
参数指定pause镜像的地址
说明 具体的pause镜像版本根据Event日志中的错误信息决定
1 2 3 4 5 6 7 ... [Service] Type=notify ExecStart=/usr/bin/cri-dockerd --container-runtime-endpoint fd:// --pod-infra-container-image=registry.aliyuncs.com/google_containers/pause:3.9 ExecReload=/bin/kill -s HUP $MAINPID TimeoutSec=0 ...
重启cri-dockerd
1 2 3 4 5 systemctl daemon-reload systemctl restart cri-docker.socket systemctl restart cri-docker.service systemctl status cri-docker.socket systemctl status cri-docker.service
参考文档