什么是Metrics Server
Metrics Server 是Kubernetes内置的、可扩展的、高效的容器资源指标来源。
Metrics Server从Kubelet收集资源指标,并通过Metrics API在Kubernetes ApiServer中公开这些指标,以供Horizontal Pod Autoscaler和Vertical Pod Autoscaler使用。也可以通过kubectl top访问Metrics API,从而更容易调试自动缩放。
Metrics Server 不适用于非自动伸缩场景。例如不要使用它来将指标转发给监控解决方案,或作为监控解决方案的指标来源。在这种情况下,请直接从kubelet /metrics/resource端点收集指标。
Metrics Server提供:
一个适用于大多数集群的Deployment
快速自动缩放,每15秒收集一次指标。
资源效率,为集群中的每个节点使用1毫秒的CPU内核和2MB内存。
可扩展支持多达5000个节点集群。
使用场景
支持的场景
不支持的场景
非Kubernetes集群
作为资源使用指标的准确来源
基于CPU/内存以外的其他资源的水平自动伸缩
要求
Metrics Server对集群和网络配置有特定要求。这些要求不是所有集群发行版的默认要求。在使用Metrics Server之前,请确保集群发行版满足以下要求:
Metrics Server必须可以通过容器IP地址(或节点IP,如果启用了hostNetwork)从kube-apiserver访问
kube-apiserver必须启用聚合层
节点必须启用Webhook身份验证和授权
Kubelet证书需要由集群证书颁发机构签名(或通过传递--kubelet-insecure-tls给Metrics Server来禁用证书验证)
容器运行时必须实现容器度量RPC(或具有cAdvisor支持)
兼容矩阵
Metrics Server
Metrics API group/version
Supported Kubernetes version
0.6.x
metrics.k8s.io/v1beta1
*1.19+
0.5.x
metrics.k8s.io/v1beta1
*1.8+
0.4.x
metrics.k8s.io/v1beta1
*1.8+
0.3.x
metrics.k8s.io/v1beta1
1.8-1.21
创建Metrics Server证书(可选)
如果Metrics Server不使用外部SSL证书的话则直接跳过此步,Metrics Server会自动生成自签证书
进入集群CA证书所在目录,本文以/root/k8s-cert/为例,创建metrics-server-csr.json文件
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 cd /root/k8s-certcat > metrics-server-csr.json << EOF { "CN": "metrics-server", "hosts": [""], "key": { "algo": "rsa", "size": 2048 }, "names": [ { "C": "CN", "L": "ShenZhen", "ST": "ShenZhen", "O": "k8s", "OU": "System" } ] } EOF
为Metrics Server生成证书
1 cfssl gencert -ca=ca.pem -ca-key=ca-key.pem -config=ca-config.json -profile=kubernetes metrics-server-csr.json | cfssljson -bare metrics-server
创建Metrics Server的Secret,Secret为metrics-server-certs
1 2 3 kubectl create secret generic metrics-server-certs --from-file=/root/k8s-cert/metrics-server-key.pem --from-file=/root/k8s-cert/metrics-server.pem -n kube-system kubectl get secret -n kube-system | grep metrics-server-certs kubectl get secret metrics-server-certs -n kube-system -o yaml
安装Metrics Server
以安装v0.5.0为例
访问https://github.com/kubernetes-sigs/metrics-server/tree/master ,并选择相应Tags
使用外部SSL证书 在Installation章节获取对应版本的安装命令,复制命令中的URL,执行以下命令下载YAML
1 2 cd /root/wget https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
编辑components.yaml,新增如下注释部分内容。secretName与前面创建的Metrics Server的Secret名称对应
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 ... --- apiVersion: apps/v1 kind: Deployment metadata: labels: k8s-app: metrics-server name: metrics-server namespace: kube-system spec: selector: matchLabels: k8s-app: metrics-server strategy: rollingUpdate: maxUnavailable: 0 template: metadata: labels: k8s-app: metrics-server spec: containers: - args: - --cert-dir=/tmp - --secure-port=443 - --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname - --kubelet-use-node-status-port - --metric-resolution=15s - --tls-cert-file=/certs/metrics-server.pem - --tls-private-key-file=/certs/metrics-server-key.pem image: k8s.gcr.io/metrics-server/metrics-server:v0.5.0 imagePullPolicy: IfNotPresent livenessProbe: failureThreshold: 3 httpGet: path: /livez port: https scheme: HTTPS periodSeconds: 10 name: metrics-server ports: - containerPort: 443 name: https protocol: TCP readinessProbe: failureThreshold: 3 httpGet: path: /readyz port: https scheme: HTTPS initialDelaySeconds: 20 periodSeconds: 10 resources: requests: cpu: 100m memory: 200Mi securityContext: readOnlyRootFilesystem: true runAsNonRoot: true runAsUser: 1000 volumeMounts: - mountPath: /tmp name: tmp-dir - name: metrics-server-certs mountPath: /certs nodeSelector: kubernetes.io/os: linux priorityClassName: system-cluster-critical serviceAccountName: metrics-server volumes: - emptyDir: {} name: tmp-dir - name: metrics-server-certs secret: secretName: metrics-server-certs ...
应用YAML
1 kubectl apply -f /root/components.yaml
不使用外部SSL证书 在Installation章节获取对应版本的安装命令,复制命令中的URL,执行以下命令下载并应用YAML
说明 先下载YAML文件再进行apply方便后续对配置进行修改
1 2 3 cd /root/wget https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml kubectl apply -f /root/components.yaml
遇到的问题
应用完YAML文件后执行kubectl get pod -n kube-system 发现Metrics Server的Pod一直处于未就绪状态
查看Pod日志出现如下错误日志
1 2 3 4 E0825 09:20:05.767459 1 scraper.go:139] "Failed to scrape node" err="Get \"https://10.211.55.7:10250/stats/summary?only_cpu_and_memory=true\": x509: cannot validate certificate for 10.211.55.7 because it doesn't contain any IP SANs" node="easyk8s1" E0825 09:20:05.769695 1 scraper.go:139] "Failed to scrape node" err="Get \"https://10.211.55.8:10250/stats/summary?only_cpu_and_memory=true\": x509: cannot validate certificate for 10.211.55.8 because it doesn't contain any IP SANs" node="easyk8s2" E0825 09:20:05.785750 1 scraper.go:139] "Failed to scrape node" err="Get \"https://10.211.55.9:10250/stats/summary?only_cpu_and_memory=true\": x509: cannot validate certificate for 10.211.55.9 because it doesn't contain any IP SANs" node="easyk8s3"
原因是Metrics Server会去连接各Node节点上Kubelet的10250端口以获取信息,但是Kubelet的10250端口使用的是HTTPS协议,连接需要验证TLS证书,此问题有如下两种解决方法。
方法一:不验证Kubelet提供的服务证书的CA(生产环境不推荐)
编辑components.yaml,新增--kubelet-insecure-tls参数
1 2 3 4 5 6 7 8 9 10 11 ... spec: containers: - args: - --cert-dir=/tmp - --secure-port=443 - --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname - --kubelet-use-node-status-port - --metric-resolution=15s - --kubelet-insecure-tls ...
重新应用YAML
1 2 kubectl delete -f /root/components.yaml kubectl apply -f /root/components.yaml
稍等片刻,再次执行kubectl get pod -n kube-system确认Metrics Server的Pod是否就绪,然后执行kubectl top node和kubectl top pod -A看是否可正常获取资源指标。
方法二:开启Bootstrap中的服务证书申请流程(生产环境推荐)
在所有Node节点编辑Kubelet的config.yaml文件(kubeadm方式部署集群为/var/lib/kubelet/config.yaml,二进制方式部署集群以实际环境为准,本文以/opt/kubernetes/cfg/kubelet-config.yml为例),在末尾加入一行serverTLSBootstrap: true
1 2 3 4 5 6 7 8 9 10 11 12 13 ... syncFrequency: 0s volumeStatsAggPeriod: 0s serverTLSBootstrap: true ... maxOpenFiles: 1000000 maxPods: 110 serverTLSBootstrap: true
重启Kubelet
1 2 systemctl restart kubelet systemctl status kubelet
重启Kubelet后会发现出现了新的CSR
1 2 3 4 5 NAME AGE SIGNERNAME REQUESTOR CONDITION csr-5ffk2 29s kubernetes.io/kubelet-serving system:node:easyk8s2 Pending csr-dh8gp 18s kubernetes.io/kubelet-serving system:node:easyk8s3 Pending csr-jrgx5 107s kubernetes.io/kubelet-serving system:node:easyk8s1 Pending
如果使用base64 -d对csr的request字段做解码,并查看其请求内容的话,会发现证书请求中带有了SAN记录。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 kubectl get csr csr-5ffk2 -o yaml ... request: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURSBSRVFVRVNULS0tLS0KTUlJQkhqQ0J4QUlCQURBMk1SVXdFd1lEVlFRS0V3eHplWE4wWlcwNmJtOWtaWE14SFRBYkJnTlZCQU1URkhONQpjM1JsYlRwdWIyUmxPbVZoYzNsck9ITXlNRmt3RXdZSEtvWkl6ajBDQVFZSUtvWkl6ajBEQVFjRFFnQUVaYVhvCmxKeS9lRktnKzh6UUpLUTZsWTJXY1RLdzBFVm9MTzJlZUlKcUNVcUlCTFBZeU9iMGd5Y3Z6MHJkdy81SGI1WWYKc1h0YWVvNnk1aUhVemF0TEpxQXNNQ29HQ1NxR1NJYjNEUUVKRGpFZE1Cc3dHUVlEVlIwUkJCSXdFSUlJWldGegplV3M0Y3pLSEJBclROd2d3Q2dZSUtvWkl6ajBFQXdJRFNRQXdSZ0loQU8yN0NrRUJteHZWTFBXTXdXR3FvQ3dRCnJOYkNMS29ERHlsWDJlMTZveUtEQWlFQTFNeUNXL2VkNVNnSHJBVGtyTFh6VE5aU3RmV2lCenNmSW5Vcm1LeFAKbFo4PQotLS0tLUVORCBDRVJUSUZJQ0FURSBSRVFVRVNULS0tLS0K ... echo -n "LS0tLS1CRUdJTiBDRVJUSUZJQ0FURSBSRVFVRVNULS0tLS0KTUlJQkhqQ0J4QUlCQURBMk1SVXdFd1lEVlFRS0V3eHplWE4wWlcwNmJtOWtaWE14SFRBYkJnTlZCQU1URkhONQpjM1JsYlRwdWIyUmxPbVZoYzNsck9ITXlNRmt3RXdZSEtvWkl6ajBDQVFZSUtvWkl6ajBEQVFjRFFnQUVaYVhvCmxKeS9lRktnKzh6UUpLUTZsWTJXY1RLdzBFVm9MTzJlZUlKcUNVcUlCTFBZeU9iMGd5Y3Z6MHJkdy81SGI1WWYKc1h0YWVvNnk1aUhVemF0TEpxQXNNQ29HQ1NxR1NJYjNEUUVKRGpFZE1Cc3dHUVlEVlIwUkJCSXdFSUlJWldGegplV3M0Y3pLSEJBclROd2d3Q2dZSUtvWkl6ajBFQXdJRFNRQXdSZ0loQU8yN0NrRUJteHZWTFBXTXdXR3FvQ3dRCnJOYkNMS29ERHlsWDJlMTZveUtEQWlFQTFNeUNXL2VkNVNnSHJBVGtyTFh6VE5aU3RmV2lCenNmSW5Vcm1LeFAKbFo4PQotLS0tLUVORCBDRVJUSUZJQ0FURSBSRVFVRVNULS0tLS0K" | base64 -d > /root/csr.pemCertificate Request: Data: Version: 0 (0x0) Subject: O=system:nodes, CN=system:node:easyk8s2 Subject Public Key Info: Public Key Algorithm: id-ecPublicKey Public-Key: (256 bit) pub: 04:65:a5:e8:94:9c:bf:78:52:a0:fb:cc:d0:24:a4: 3a:95:8d:96:71:32:b0:d0:45:68:2c:ed:9e:78:82: 6a:09:4a:88:04:b3:d8:c8:e6:f4:83:27:2f:cf:4a: dd :c3:fe:47:6f:96:1f:b1:7b:5a:7a:8e:b2:e6:21: d4:cd :ab:4b:26 ASN1 OID: prime256v1 NIST CURVE: P-256 Attributes: Requested Extensions: X509v3 Subject Alternative Name: DNS:easyk8s2, IP Address:10.211.55.8 Signature Algorithm: ecdsa-with-SHA256 30:46:02:21:00:ed:bb:0a:41:01:9b:1b:d5:2c:f5:8c:c1:61: aa:a0:2c:10:ac:d6:c2:2c:aa:03:0f:29:57:d9:ed:7a:a3:22: 83:02:21:00:d4:cc:82:5b:f7:9d:e5:28:07:ac:04:e4:ac:b5: f3:4c:d6:52:b5:f5:a2:07:3b:1f:22:75:2b:98:ac:4f:95:9f
授权csr
1 2 3 4 certificatesigningrequest.certificates.k8s.io/csr-5ffk2 approved certificatesigningrequest.certificates.k8s.io/csr-dh8gp approved certificatesigningrequest.certificates.k8s.io/csr-jrgx5 approved
通过之后,Kubelet就有了使用API Server的CA签发的证书了。
稍等片刻,再次执行kubectl get pod -n kube-system确认Metrics Server的Pod是否就绪,然后执行kubectl top node和kubectl top pod -A看是否可正常获取资源指标。
参考文档