etcd简介
etcd是用于共享配置和服务发现的分布式,一致性的KV存储系统。etcd是CoreOS公司发起的一个开源项目,授权协议为Apache。
说明
目前etcd在Kubernetes中主要用于存储所有需要持久化的数据,本文主要针对Kubernetes使用etcd的场景,介绍etcd常用命令,以及如何对etcd进行备份与还原,以保证Kubernetes集群的稳定运行。
实验环境信息
| 主机名 |
配置 |
操作系统 |
IP地址 |
角色 |
组件 |
| k8s-master1 |
2核2G |
CentOS7.5 |
10.211.55.4 |
master/node |
kube-apiserver kube-controller-manager kube-scheduler etcd keepalived haproxy docker kubelet kube-proxy |
| k8s-master2 |
2核2G |
CentOS7.5 |
10.211.55.5 |
master/node |
kube-apiserver kube-controller-manager kube-scheduler etcd keepalived haproxy docker kubelet kube-proxy |
| k8s-master3 |
2核2G |
CentOS7.5 |
10.211.55.6 |
master/node |
kube-apiserver kube-controller-manager kube-scheduler etcd keepalived haproxy docker kubelet kube-proxy |
etcdctl常用配置
etcdctl命令
- kubeadm部署的K8S集群etcdctl命令在etcd pod中,宿主机上可通过
yum或二进制方式安装对应版本的etcdctl客户端命令
- 二进制部署的K8S集群etcdctl命令路径根据实际环境而定
CA服务端证书文件
- 通过
--cacert参数指定,3.3版本需先指定使用V3版本API(export ETCDCTL_API=3)
- kubeadm部署的K8S集群证书默认路径为
/etc/kubernetes/pki/etcd/
- 二进制部署的K8S集群证书路径根据实际环境而定
CA客户端证书文件
- 通过
--cert参数指定,3.3版本需先指定使用V3版本API(export ETCDCTL_API=3)
- kubeadm部署的K8S集群证书默认路径为
/etc/kubernetes/pki/etcd/
- 二进制部署的K8S集群证书路径根据实际环境而定
CA客户端密钥文件
- 通过
--key参数指定,3.3版本需先指定使用V3版本API(export ETCDCTL_API=3)
- kubeadm部署的K8S集群证书默认路径为
/etc/kubernetes/pki/etcd/
- 二进制部署的K8S集群证书路径根据实际环境而定
连接端点
- 通过
--endpoints参数指定,格式为https://<ip>:<port>,多节点之间通过英文逗号隔开
默认数据存储目录
- kubeadm部署的K8S集群etcd默认数据存储目录为为
/var/lib/etcd/
- 二进制部署的K8S集群证书路径根据实际环境而定
etcd常用命令
说明
以下示例命令中--cacert/--cert/--key/--endpoints等参数的值请自行按照实际环境进行修改
查看集群状态
二进制部署方式
1 2
| export ETCDCTL_API=3 /opt/etcd/bin/etcdctl --cacert=/opt/etcd/ssl/ca.pem --cert=/opt/etcd/ssl/etcd.pem --key=/opt/etcd/ssl/etcd-key.pem --endpoints=https://10.211.55.4:2379,https://10.211.55.5:2379,https://10.211.55.6:2379 endpoint health
|
kubeadm部署方式
1
| kubectl exec -ti etcd-`hostname` -n kube-system -- sh -c "ETCDCTL_API=3 etcdctl --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/server.crt --key=/etc/kubernetes/pki/etcd/server.key --endpoints='https://10.211.55.4:2379' endpoint health --cluster"
|
获取etcd版本信息
二进制部署方式
1 2
| export ETCDCTL_API=3 /opt/etcd/bin/etcdctl --cacert=/opt/etcd/ssl/ca.pem --cert=/opt/etcd/ssl/etcd.pem --key=/opt/etcd/ssl/etcd-key.pem --endpoints=https://10.211.55.4:2379,https://10.211.55.5:2379,https://10.211.55.6:2379 version
|
kubeadm部署方式
1
| kubectl exec -ti etcd-`hostname` -n kube-system -- sh -c "ETCDCTL_API=3 etcdctl --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/server.crt --key=/etc/kubernetes/pki/etcd/server.key --endpoints='https://10.211.55.4:2379' version"
|
获取etcd所有的key
二进制部署方式
1 2
| export ETCDCTL_API=3 /opt/etcd/bin/etcdctl --cacert=/opt/etcd/ssl/ca.pem --cert=/opt/etcd/ssl/etcd.pem --key=/opt/etcd/ssl/etcd-key.pem --endpoints=https://10.211.55.4:2379,https://10.211.55.5:2379,https://10.211.55.6:2379 get / --prefix --keys-only
|
kubeadm部署方式
1
| kubectl exec -ti etcd-`hostname` -n kube-system -- sh -c "ETCDCTL_API=3 etcdctl --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/server.crt --key=/etc/kubernetes/pki/etcd/server.key --endpoints='https://10.211.55.4:2379' get / --prefix --keys-only"
|
获取某个key信息
以/registry/services/endpoints/default/kubernetes为例
二进制部署方式
1 2 3 4 5 6 7
| export ETCDCTL_API=3 /opt/etcd/bin/etcdctl --cacert=/opt/etcd/ssl/ca.pem --cert=/opt/etcd/ssl/etcd.pem --key=/opt/etcd/ssl/etcd-key.pem --endpoints=https://10.211.55.4:2379,https://10.211.55.5:2379,https://10.211.55.6:2379 get /registry/services/endpoints/default/kubernetes
export ETCDCTL_API=3 /opt/etcd/bin/etcdctl --cacert=/opt/etcd/ssl/ca.pem --cert=/opt/etcd/ssl/etcd.pem --key=/opt/etcd/ssl/etcd-key.pem --endpoints=https://10.211.55.4:2379,https://10.211.55.5:2379,https://10.211.55.6:2379 get /registry/services/endpoints/default/kubernetes --prefix --keys-only
|
说明
有少量不可见字符,这是因为etcd中存储的并不是json的原文,而是protocol buffer序列化后的数据
kubeadm部署方式
1 2 3 4 5
| kubectl exec -ti etcd-`hostname` -n kube-system -- sh -c "ETCDCTL_API=3 etcdctl --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/server.crt --key=/etc/kubernetes/pki/etcd/server.key --endpoints='https://10.211.55.4:2379' get /registry/services/endpoints/default/kubernetes"
kubectl exec -ti etcd-`hostname` -n kube-system -- sh -c "ETCDCTL_API=3 etcdctl --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/server.crt --key=/etc/kubernetes/pki/etcd/server.key --endpoints='https://10.211.55.4:2379' get /registry/services/endpoints/default/kubernetes --prefix --keys-only"
|
说明
有少量不可见字符,这是因为etcd中存储的并不是json的原文,而是protocol buffer序列化后的数据
删除某个key
以/registry/services/endpoints/default/kubernetes为例
二进制部署方式
1 2
| export ETCDCTL_API=3 /opt/etcd/bin/etcdctl --cacert=/opt/etcd/ssl/ca.pem --cert=/opt/etcd/ssl/etcd.pem --key=/opt/etcd/ssl/etcd-key.pem --endpoints=https://10.211.55.4:2379,https://10.211.55.5:2379,https://10.211.55.6:2379 del /registry/services/endpoints/default/kubernetes
|
kubeadm部署方式
1
| kubectl exec -ti etcd-`hostname` -n kube-system -- sh -c "ETCDCTL_API=3 etcdctl --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/server.crt --key=/etc/kubernetes/pki/etcd/server.key --endpoints='https://10.211.55.4:2379' del /registry/services/endpoints/default/kubernetes"
|
etcd备份
说明
不同版本的etcd备份命令可能存在差异,备份操作只要在集群中的某一个节点上执行就行。
命令备份
二进制部署方式
1 2 3 4
|
export ETCDCTL_API=3 /opt/etcd/bin/etcdctl --cacert=/opt/etcd/ssl/ca.pem --cert=/opt/etcd/ssl/etcd.pem --key=/opt/etcd/ssl/etcd-key.pem --endpoints=https://10.211.55.4:2379 snapshot save /mnt/etcd-snapshot-`date +%Y%m%d`.db
|
kubeadm部署方式
在任意master节点上执行如下命令查看etcd版本
1
| kubectl exec -ti etcd-`hostname` -n kube-system -- etcdctl version
|
访问https://github.com/etcd-io/etcd/releases下载对应版本的二进制包(本文以3.4.13为例),并上传到所有master节点的/root目录下
1 2 3 4 5 6 7 8
| cd /root/ tar zxf etcd-v3.4.13-linux-amd64.tar.gz cp -a etcd-v3.4.13-linux-amd64/etcdctl /usr/local/bin/
ETCDCTL_API=3 etcdctl --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/server.crt --key=/etc/kubernetes/pki/etcd/server.key --endpoints='https://10.211.55.4:2379' snapshot save /mnt/etcd-snapshot-`date +%Y%m%d`.db
|
etcd还原
停止kube-apiserver服务和etcd
二进制部署方式
依次在所有master节点执行如下命令停止kube-apiserver服务
1
| systemctl stop kube-apiserver
|
依次在所有etcd节点执行如下命令停止etcd服务
kubeadm部署方式
依次在所有master节点执行如下命令停止kube-apiserver和etcd服务(静态pod)
1
| mv /etc/kubernetes/manifests/ /etc/kubernetes/manifests.bak
|
备份所有etcd节点旧数据
二进制部署方式
依次在所有etcd节点执行如下命令备份旧数据(etcd数据存储路径请按实际环境修改)
1
| mv /var/lib/etcd/default.etcd/ /var/lib/etcd/default.etcd.bak/
|
kubeadm部署方式
依此在所有master节点执行如下命令备份旧数据
1
| mv /var/lib/etcd/ /var/lib/etcd.bak
|
恢复数据
二进制部署方式
恢复前将etcd备份快照文件(例如etcd-snapshot-20210806.db)拷贝到所有etcd节点,假设拷贝到/mnt/目录下,然后依次在所有etcd节点上执行如下命令恢复数据
说明
命令中相关参数的值要与各节点配置文件etcd.conf中的保持一致,--initial-cluster需要配置所有etcd集群节点的名称与URL
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35
|
export ETCDCTL_API=3 /opt/etcd/bin/etcdctl snapshot restore /mnt/etcd-snapshot-20210806.db \ --data-dir="/var/lib/etcd/default.etcd"
export ETCDCTL_API=3 /opt/etcd/bin/etcdctl snapshot restore /mnt/etcd-snapshot-20210806.db \ --name="etcd-1" \ --initial-cluster="etcd-1=https://10.211.55.4:2380,etcd-2=https://10.211.55.5:2380,etcd-3=https://10.211.55.6:2380" \ --initial-cluster-token="etcd-cluster" \ --initial-advertise-peer-urls="https://10.211.55.4:2380" \ --data-dir="/var/lib/etcd/default.etcd"
export ETCDCTL_API=3 /opt/etcd/bin/etcdctl snapshot restore /mnt/etcd-snapshot-20210806.db \ --name="etcd-2" \ --initial-cluster="etcd-1=https://10.211.55.4:2380,etcd-2=https://10.211.55.5:2380,etcd-3=https://10.211.55.6:2380" \ --initial-cluster-token="etcd-cluster" \ --initial-advertise-peer-urls="https://10.211.55.5:2380" \ --data-dir="/var/lib/etcd/default.etcd"
export ETCDCTL_API=3 /opt/etcd/bin/etcdctl snapshot restore /mnt/etcd-snapshot-20210806.db \ --name="etcd-3" \ --initial-cluster="etcd-1=https://10.211.55.4:2380,etcd-2=https://10.211.55.5:2380,etcd-3=https://10.211.55.6:2380" \ --initial-cluster-token="etcd-cluster" \ --initial-advertise-peer-urls="https://10.211.55.6:2380" \ --data-dir="/var/lib/etcd/default.etcd"
|
kubeadm部署方式
恢复前将etcd备份快照文件(例如etcd-snapshot-20210806.db)拷贝到所有master节点,假设拷贝到/mnt/目录下,然后依次在所有etcd节点上执行如下命令恢复数据
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35
|
export ETCDCTL_API=3 etcdctl snapshot restore /mnt/etcd-snapshot-20210806.db \ --data-dir="/var/lib/etcd/"
export ETCDCTL_API=3 etcdctl snapshot restore /mnt/etcd-snapshot-20210806.db \ --name="k8s-master1" \ --initial-cluster="k8s-master1=https://10.211.55.4:2380,k8s-master2=https://10.211.55.5:2380,k8s-master3=https://10.211.55.6:2380" \ --initial-cluster-token="etcd-cluster" \ --initial-advertise-peer-urls="https://10.211.55.4:2380" \ --data-dir="/var/lib/etcd/"
export ETCDCTL_API=3 etcdctl snapshot restore /mnt/etcd-snapshot-20210806.db \ --name="k8s-master2" \ --initial-cluster="k8s-master1=https://10.211.55.4:2380,k8s-master2=https://10.211.55.5:2380,k8s-master3=https://10.211.55.6:2380" \ --initial-cluster-token="etcd-cluster" \ --initial-advertise-peer-urls="https://10.211.55.5:2380" \ --data-dir="/var/lib/etcd/"
export ETCDCTL_API=3 etcdctl snapshot restore /mnt/etcd-snapshot-20210806.db \ --name="k8s-master3" \ --initial-cluster="k8s-master1=https://10.211.55.4:2380,k8s-master2=https://10.211.55.5:2380,k8s-master3=https://10.211.55.6:2380" \ --initial-cluster-token="etcd-cluster" \ --initial-advertise-peer-urls="https://10.211.55.6:2380" \ --data-dir="/var/lib/etcd/"
|
启动etcd和kube-apiserver服务
二进制部署方式
依次在所有etcd节点执行如下命令启动etcd服务
检查etcd集群状态
1 2 3
| export ETCDCTL_API=3 /opt/etcd/bin/etcdctl --cacert=/opt/etcd/ssl/ca.pem --cert=/opt/etcd/ssl/etcd.pem --key=/opt/etcd/ssl/etcd-key.pem --endpoints=https://10.211.55.4:2379,https://10.211.55.5:2379,https://10.211.55.6:2379 endpoint health
|
依次在所有master节点执行如下命令启动kube-apiserver服务
1
| systemctl start kube-apiserver
|
kubeadm部署方式
依此在所有master节点执行如下命令启动etcd和kube-apiserver服务
1
| mv /etc/kubernetes/manifests.bak/ /etc/kubernetes/manifests
|
检查etcd集群状态
1 2
| kubectl exec -ti etcd-`hostname` -n kube-system -- sh -c "ETCDCTL_API=3 etcdctl --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/server.crt --key=/etc/kubernetes/pki/etcd/server.key --endpoints='https://10.211.55.4:2379' endpoint health --cluster"
|
检查Kubernetes集群状态
1 2
| kubectl get cs Kubectl get node
|
自动备份
etcd中存储着整个Kubernetes集群中所有需要持久化的数据,一旦数据丢失,对于整个Kubernetes集群来说都是致命的。实际运维过程中也不可能完全依赖于人工定时备份,因此在实际生产环境中可通过Shell脚本+Crontab的方式实现etcd的自动定时备份。
二进制部署方式
创建备份目录
1
| mkdir -p /opt/etcd/backup
|
编写备份脚本/opt/etcd/backup_etcd.sh,脚本内容如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
| #!/usr/bin/env bash
CACERT="/opt/etcd/ssl/ca.pem" CERT="/opt/etcd/ssl/etcd.pem" KEY="/opt/etcd/ssl/etcd-key.pem" ENDPOINTS="https://10.211.55.4:2379"
export ETCDCTL_API=3 /opt/etcd/bin/etcdctl \ --cacert=${CACERT} \ --cert="${CERT}" \ --key="${KEY}" \ --endpoints=${ENDPOINTS} \ snapshot save /opt/etcd/backup/etcd-snapshot-`date +%Y%m%d`.db
find /opt/etcd/backup/ -name *.db -mtime +30 -exec rm -f {} \;
|
配置crontab备份任务
1 2 3
| crontab -l
0 2 * * * sh /opt/etcd/backup_etcd.sh
|
kubeadm部署方式
创建备份目录
编写备份脚本/backup/backup_etcd.sh,脚本内容如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
| #!/usr/bin/env bash
CACERT="/etc/kubernetes/pki/etcd/ca.crt" CERT="/etc/kubernetes/pki/etcd/server.crt" KEY="/etc/kubernetes/pki/etcd/server.key" ENDPOINTS="https://10.211.55.4:2379"
export ETCDCTL_API=3 etcdctl \ --cacert=${CACERT} \ --cert="${CERT}" \ --key="${KEY}" \ --endpoints=${ENDPOINTS} \ snapshot save /backup/etcd-snapshot-`date +%Y%m%d`.db
find /backup/ -name *.db -mtime +30 -exec rm -f {} \;
|
配置crontab备份任务
1 2 3
| crontab -l
0 2 * * * sh /backup/backup_etcd.sh
|
总结
Kubernetes集群备份主要是备份etcd集群。恢复时,整个恢复顺序大致如下:
停止kube-apiserver --> 停止etcd --> 恢复etcd数据 --> 启动etcd --> 启动kube-apiserve