Metrics Server安装

什么是Metrics Server


Metrics Server是Kubernetes内置的、可扩展的、高效的容器资源指标来源。

Metrics Server从Kubelet收集资源指标,并通过Metrics API在Kubernetes ApiServer中公开这些指标,以供Horizontal Pod Autoscaler和Vertical Pod Autoscaler使用。也可以通过kubectl top访问Metrics API,从而更容易调试自动缩放。

Metrics Server 不适用于非自动伸缩场景。例如不要使用它来将指标转发给监控解决方案,或作为监控解决方案的指标来源。在这种情况下,请直接从kubelet /metrics/resource端点收集指标。

Metrics Server提供:

  • 一个适用于大多数集群的Deployment
  • 快速自动缩放,每15秒收集一次指标。
  • 资源效率,为集群中的每个节点使用1毫秒的CPU内核和2MB内存。
  • 可扩展支持多达5000个节点集群。

使用场景


支持的场景

不支持的场景

  • 非Kubernetes集群
  • 作为资源使用指标的准确来源
  • 基于CPU/内存以外的其他资源的水平自动伸缩

要求


Metrics Server对集群和网络配置有特定要求。这些要求不是所有集群发行版的默认要求。在使用Metrics Server之前,请确保集群发行版满足以下要求:

  • Metrics Server必须可以通过容器IP地址(或节点IP,如果启用了hostNetwork)从kube-apiserver访问
  • kube-apiserver必须启用聚合层
  • 节点必须启用Webhook身份验证和授权
  • Kubelet证书需要由集群证书颁发机构签名(或通过传递--kubelet-insecure-tls给Metrics Server来禁用证书验证)
  • 容器运行时必须实现容器度量RPC(或具有cAdvisor支持)

兼容矩阵


Metrics Server Metrics API group/version Supported Kubernetes version
0.6.x metrics.k8s.io/v1beta1 *1.19+
0.5.x metrics.k8s.io/v1beta1 *1.8+
0.4.x metrics.k8s.io/v1beta1 *1.8+
0.3.x metrics.k8s.io/v1beta1 1.8-1.21

创建Metrics Server证书(可选)


如果Metrics Server不使用外部SSL证书的话则直接跳过此步,Metrics Server会自动生成自签证书

进入集群CA证书所在目录,本文以/root/k8s-cert/为例,创建metrics-server-csr.json文件

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
cd /root/k8s-cert
cat > metrics-server-csr.json << EOF
{
"CN": "metrics-server",
"hosts": [""],
"key": {
"algo": "rsa",
"size": 2048
},
"names": [
{
"C": "CN",
"L": "ShenZhen",
"ST": "ShenZhen",
"O": "k8s",
"OU": "System"
}
]
}
EOF

为Metrics Server生成证书

1
cfssl gencert -ca=ca.pem -ca-key=ca-key.pem -config=ca-config.json -profile=kubernetes metrics-server-csr.json | cfssljson -bare metrics-server

创建Metrics Server的Secret,Secret为metrics-server-certs

1
2
3
kubectl create secret generic metrics-server-certs --from-file=/root/k8s-cert/metrics-server-key.pem --from-file=/root/k8s-cert/metrics-server.pem -n kube-system
kubectl get secret -n kube-system | grep metrics-server-certs
kubectl get secret metrics-server-certs -n kube-system -o yaml

安装Metrics Server


以安装v0.5.0为例

访问https://github.com/kubernetes-sigs/metrics-server/tree/master,并选择相应Tags

使用外部SSL证书

Installation章节获取对应版本的安装命令,复制命令中的URL,执行以下命令下载YAML

1
2
cd /root/
wget https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

编辑components.yaml,新增如下注释部分内容。secretName与前面创建的Metrics Server的Secret名称对应

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
...
---
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
k8s-app: metrics-server
name: metrics-server
namespace: kube-system
spec:
selector:
matchLabels:
k8s-app: metrics-server
strategy:
rollingUpdate:
maxUnavailable: 0
template:
metadata:
labels:
k8s-app: metrics-server
spec:
containers:
- args:
- --cert-dir=/tmp
- --secure-port=443
- --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
- --kubelet-use-node-status-port
- --metric-resolution=15s
# 新增配置证书路径
- --tls-cert-file=/certs/metrics-server.pem
- --tls-private-key-file=/certs/metrics-server-key.pem
image: k8s.gcr.io/metrics-server/metrics-server:v0.5.0
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 3
httpGet:
path: /livez
port: https
scheme: HTTPS
periodSeconds: 10
name: metrics-server
ports:
- containerPort: 443
name: https
protocol: TCP
readinessProbe:
failureThreshold: 3
httpGet:
path: /readyz
port: https
scheme: HTTPS
initialDelaySeconds: 20
periodSeconds: 10
resources:
requests:
cpu: 100m
memory: 200Mi
securityContext:
readOnlyRootFilesystem: true
runAsNonRoot: true
runAsUser: 1000
volumeMounts:
- mountPath: /tmp
name: tmp-dir
# 新增配置secret挂载路径
- name: metrics-server-certs
mountPath: /certs
nodeSelector:
kubernetes.io/os: linux
priorityClassName: system-cluster-critical
serviceAccountName: metrics-server
volumes:
- emptyDir: {}
name: tmp-dir
# 新增挂载secret
- name: metrics-server-certs
secret:
secretName: metrics-server-certs
...

应用YAML

1
kubectl apply -f /root/components.yaml

不使用外部SSL证书

Installation章节获取对应版本的安装命令,复制命令中的URL,执行以下命令下载并应用YAML

说明
先下载YAML文件再进行apply方便后续对配置进行修改

1
2
3
cd /root/
wget https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
kubectl apply -f /root/components.yaml

遇到的问题


应用完YAML文件后执行kubectl get pod -n kube-system 发现Metrics Server的Pod一直处于未就绪状态

查看Pod日志出现如下错误日志

1
2
3
4
# kubectl logs -n kube-system metrics-server-68bf9d85fb-gbf5x
E0825 09:20:05.767459 1 scraper.go:139] "Failed to scrape node" err="Get \"https://10.211.55.7:10250/stats/summary?only_cpu_and_memory=true\": x509: cannot validate certificate for 10.211.55.7 because it doesn't contain any IP SANs" node="easyk8s1"
E0825 09:20:05.769695 1 scraper.go:139] "Failed to scrape node" err="Get \"https://10.211.55.8:10250/stats/summary?only_cpu_and_memory=true\": x509: cannot validate certificate for 10.211.55.8 because it doesn't contain any IP SANs" node="easyk8s2"
E0825 09:20:05.785750 1 scraper.go:139] "Failed to scrape node" err="Get \"https://10.211.55.9:10250/stats/summary?only_cpu_and_memory=true\": x509: cannot validate certificate for 10.211.55.9 because it doesn't contain any IP SANs" node="easyk8s3"

原因是Metrics Server会去连接各Node节点上Kubelet的10250端口以获取信息,但是Kubelet的10250端口使用的是HTTPS协议,连接需要验证TLS证书,此问题有如下两种解决方法。

方法一:不验证Kubelet提供的服务证书的CA(生产环境不推荐)

编辑components.yaml,新增--kubelet-insecure-tls参数

1
2
3
4
5
6
7
8
9
10
11
...
spec:
containers:
- args:
- --cert-dir=/tmp
- --secure-port=443
- --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
- --kubelet-use-node-status-port
- --metric-resolution=15s
- --kubelet-insecure-tls # 新增此行配置
...

重新应用YAML

1
2
kubectl delete -f /root/components.yaml
kubectl apply -f /root/components.yaml

稍等片刻,再次执行kubectl get pod -n kube-system确认Metrics Server的Pod是否就绪,然后执行kubectl top nodekubectl top pod -A看是否可正常获取资源指标。

方法二:开启Bootstrap中的服务证书申请流程(生产环境推荐)

在所有Node节点编辑Kubelet的config.yaml文件(kubeadm方式部署集群为/var/lib/kubelet/config.yaml,二进制方式部署集群以实际环境为准,本文以/opt/kubernetes/cfg/kubelet-config.yml为例),在末尾加入一行serverTLSBootstrap: true

1
2
3
4
5
6
7
8
9
10
11
12
13
# kubeadm方式部署集群
# vim var/lib/kubelet/config.yaml
...
syncFrequency: 0s
volumeStatsAggPeriod: 0s
serverTLSBootstrap: true # 新增此行配置

# 二进制方式部署集群
# vim /opt/kubernetes/cfg/kubelet-config.yml
...
maxOpenFiles: 1000000
maxPods: 110
serverTLSBootstrap: true # 新增此行配置

重启Kubelet

1
2
systemctl restart kubelet
systemctl status kubelet

重启Kubelet后会发现出现了新的CSR

1
2
3
4
5
# kubectl  get csr
NAME AGE SIGNERNAME REQUESTOR CONDITION
csr-5ffk2 29s kubernetes.io/kubelet-serving system:node:easyk8s2 Pending
csr-dh8gp 18s kubernetes.io/kubelet-serving system:node:easyk8s3 Pending
csr-jrgx5 107s kubernetes.io/kubelet-serving system:node:easyk8s1 Pending

如果使用base64 -d对csr的request字段做解码,并查看其请求内容的话,会发现证书请求中带有了SAN记录。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
# 任取一个csr,获取其request值
kubectl get csr csr-5ffk2 -o yaml
...
request: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURSBSRVFVRVNULS0tLS0KTUlJQkhqQ0J4QUlCQURBMk1SVXdFd1lEVlFRS0V3eHplWE4wWlcwNmJtOWtaWE14SFRBYkJnTlZCQU1URkhONQpjM1JsYlRwdWIyUmxPbVZoYzNsck9ITXlNRmt3RXdZSEtvWkl6ajBDQVFZSUtvWkl6ajBEQVFjRFFnQUVaYVhvCmxKeS9lRktnKzh6UUpLUTZsWTJXY1RLdzBFVm9MTzJlZUlKcUNVcUlCTFBZeU9iMGd5Y3Z6MHJkdy81SGI1WWYKc1h0YWVvNnk1aUhVemF0TEpxQXNNQ29HQ1NxR1NJYjNEUUVKRGpFZE1Cc3dHUVlEVlIwUkJCSXdFSUlJWldGegplV3M0Y3pLSEJBclROd2d3Q2dZSUtvWkl6ajBFQXdJRFNRQXdSZ0loQU8yN0NrRUJteHZWTFBXTXdXR3FvQ3dRCnJOYkNMS29ERHlsWDJlMTZveUtEQWlFQTFNeUNXL2VkNVNnSHJBVGtyTFh6VE5aU3RmV2lCenNmSW5Vcm1LeFAKbFo4PQotLS0tLUVORCBDRVJUSUZJQ0FURSBSRVFVRVNULS0tLS0K
...

# 解码并重定向到csr.pem
echo -n "LS0tLS1CRUdJTiBDRVJUSUZJQ0FURSBSRVFVRVNULS0tLS0KTUlJQkhqQ0J4QUlCQURBMk1SVXdFd1lEVlFRS0V3eHplWE4wWlcwNmJtOWtaWE14SFRBYkJnTlZCQU1URkhONQpjM1JsYlRwdWIyUmxPbVZoYzNsck9ITXlNRmt3RXdZSEtvWkl6ajBDQVFZSUtvWkl6ajBEQVFjRFFnQUVaYVhvCmxKeS9lRktnKzh6UUpLUTZsWTJXY1RLdzBFVm9MTzJlZUlKcUNVcUlCTFBZeU9iMGd5Y3Z6MHJkdy81SGI1WWYKc1h0YWVvNnk1aUhVemF0TEpxQXNNQ29HQ1NxR1NJYjNEUUVKRGpFZE1Cc3dHUVlEVlIwUkJCSXdFSUlJWldGegplV3M0Y3pLSEJBclROd2d3Q2dZSUtvWkl6ajBFQXdJRFNRQXdSZ0loQU8yN0NrRUJteHZWTFBXTXdXR3FvQ3dRCnJOYkNMS29ERHlsWDJlMTZveUtEQWlFQTFNeUNXL2VkNVNnSHJBVGtyTFh6VE5aU3RmV2lCenNmSW5Vcm1LeFAKbFo4PQotLS0tLUVORCBDRVJUSUZJQ0FURSBSRVFVRVNULS0tLS0K" | base64 -d > /root/csr.pem

# 查看其请求内容
# openssl req -in /root/csr.pem -noout -text
Certificate Request:
Data:
Version: 0 (0x0)
Subject: O=system:nodes, CN=system:node:easyk8s2
Subject Public Key Info:
Public Key Algorithm: id-ecPublicKey
Public-Key: (256 bit)
pub:
04:65:a5:e8:94:9c:bf:78:52:a0:fb:cc:d0:24:a4:
3a:95:8d:96:71:32:b0:d0:45:68:2c:ed:9e:78:82:
6a:09:4a:88:04:b3:d8:c8:e6:f4:83:27:2f:cf:4a:
dd:c3:fe:47:6f:96:1f:b1:7b:5a:7a:8e:b2:e6:21:
d4:cd:ab:4b:26
ASN1 OID: prime256v1
NIST CURVE: P-256
Attributes:
Requested Extensions:
X509v3 Subject Alternative Name:
DNS:easyk8s2, IP Address:10.211.55.8
Signature Algorithm: ecdsa-with-SHA256
30:46:02:21:00:ed:bb:0a:41:01:9b:1b:d5:2c:f5:8c:c1:61:
aa:a0:2c:10:ac:d6:c2:2c:aa:03:0f:29:57:d9:ed:7a:a3:22:
83:02:21:00:d4:cc:82:5b:f7:9d:e5:28:07:ac:04:e4:ac:b5:
f3:4c:d6:52:b5:f5:a2:07:3b:1f:22:75:2b:98:ac:4f:95:9f

授权csr

1
2
3
4
# kubectl certificate approve csr-5ffk2 csr-dh8gp csr-jrgx5
certificatesigningrequest.certificates.k8s.io/csr-5ffk2 approved
certificatesigningrequest.certificates.k8s.io/csr-dh8gp approved
certificatesigningrequest.certificates.k8s.io/csr-jrgx5 approved

通过之后,Kubelet就有了使用API Server的CA签发的证书了。

稍等片刻,再次执行kubectl get pod -n kube-system确认Metrics Server的Pod是否就绪,然后执行kubectl top nodekubectl top pod -A看是否可正常获取资源指标。

参考文档