手动修复v1.13-v1.15版本Kubernetes的issue#78421

由来


2019年的时候在公司内开始做业务的微服务化,逐步接触了Kubernetes,当时使用的版本是1.14.6,为了提高性能,kube-proxy使用ipvs模式,但是发现当删除services后,虽然在kube-proxy的日志中显示已成功删除ipvs规则,但是实际在node节点上执行ipvsadm -Ln发现相应的ipvs规则仍然存在,并没有清除。Google了下,发现社区也有人遇到相同的情况。相关issue见https://github.com/kubernetes/kubernetes/issues/78421

从这个issue可以发现这个问题在Kubernetes v1.13-v1.15版本中都存在,并且可以找到各版本对应的修复PR:

但是这些PR只被merge到了1.13.111.14.71.15.4及以上版本,而我们使用的是1.14.6版本,要解决这个issue就存在两种方式:

  • 升级Kubernetes版本至1.14.7以上
  • 自行编译1.14.6版本的Kubernetes,替换修复

最终我们选择了第二种,本文主要是对编译过程进行记录,同时也作为自行编译Kubernetes的教程。

实战


环境要求

  • 操作系统CentOS7.x
  • 2核8G以上配置

获取源码

获取源码,以v1.14.6为例

1
2
3
yum install git wget -y
cd /root/
git clone --branch v1.14.6 --single-branch --depth 1 https://github.com/kubernetes/kubernetes

查看kube-cross的TAG版本号,Kubernetes使用的Golang版本与之对应,省略最后的-1

1
2
# cat /root/kubernetes/build/build-image/cross/VERSION
v1.12.9-1

说明
Kubernetes v1.14.6使用的Golang版本为1.12.9。各个Kubernetes版本使用的Golang版本是不一样的,编译时安装的Golang版本要严格按照文件中的指定版本

安装Golang

https://golang.org/dl/下载对应版本的Golang,并解压Golang

1
tar zxf go1.12.9.linux-amd64.tar.gz -C /usr/local

配置Golang环境变量

1
2
3
4
5
cat >> /etc/profile << EOF
export PATH=$PATH:/usr/local/go/bin
export GOPATH=/root/go
EOF
source /etc/profile

验证Golang是否安装成功

1
2
# go version
go version go1.12.9 linux/amd64

安装Docker

清理原有Docker环境(如果有的话)

1
2
3
4
5
6
7
8
yum remove docker \
docker-client \
docker-client-latest \
docker-common \
docker-latest \
docker-latest-logrotate \
docker-logrotate \
docker-engine

配置docker-ce repository

1
2
3
4
5
6
7
8
9
# 安装所需要的包,yum-utils提供了yum-config-manager工具,device-mapper-persistent-data和lvm2是设备映射存储驱动所需要的
yum install -y yum-utils \
device-mapper-persistent-data \
lvm2

# 设置稳定版的repo仓库
yum-config-manager \
--add-repo \
https://download.docker.com/linux/centos/docker-ce.repo

说明
若无法访问国外网站,可配置国内阿里云的docker源

1
wget https://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo -O /etc/yum.repos.d/docker-ce.repo

配置好Docker仓库后,执行如下命令安装Docker v18.09.9

1
yum install docker-ce-18.09.9 docker-ce-cli-18.09.9 containerd.io -y

启动Docker并设置开机自启

1
2
3
systemctl start docker
systemctl enable docker
systemctl status docker

设置阿里云镜像加速器(可选)

1
2
3
4
5
6
7
8
9
10
mkdir -p /etc/docker
cat > /etc/docker/daemon.json << EOF
{
"registry-mirrors": ["https://lerc8rqe.mirror.aliyuncs.com"]
}
EOF

systemctl daemon-reload
systemctl restart docker
docker info

编译前准备

安装patch

1
yum install patch -y

执行下面命令去掉dirty,否则因为我们获取后修改了源码,编译出的version信息会带有-dirty字样,例如v1.14.6-dirty

说明
也可以不执行下面的命令,在后面打完patch后先git addgit commit了也行

1
2
cd /root/kubernetes
sed -ri 's#KUBE_GIT_TREE_STATE="dirty"#KUBE_GIT_TREE_STATE="clean"#g' hack/lib/version.sh

拉取kube-cross镜像,镜像标签与/root/kubernetes/build/build-image/cross/VERSION中保持一致

1
docker pull k8s.gcr.io/kube-cross:v1.12.9-1

说明
如果因网络问题无法拉取,可拉取阿里云镜像仓库镜像然后重新打TAG

1
2
docker pull mirrorgooglecontainers/kube-cross:v1.12.9-1
docker tag mirrorgooglecontainers/kube-cross:v1.12.9-1 k8s.gcr.io/kube-cross:v1.12.9-1

把Merge打Patch

Merge的URL为:
https://github.com/kubernetes/kubernetes/pull/81482/commits/c92ecefd49b9a48b9868f2173e8c84b88a7816ed

说明
记得执行以下命令时需要在URL后面添加.patch

1
2
3
4
cd /root/kubernetes
wget https://github.com/kubernetes/kubernetes/pull/81482/commits/c92ecefd49b9a48b9868f2173e8c84b88a7816ed.patch

patch -p1 < c92ecefd49b9a48b9868f2173e8c84b88a7816ed.patch

出现以下提示表示Patch成功

1
patching file pkg/proxy/ipvs/proxier.go

编译

本地二进制文件编译

一切准备就绪,执行下面的命令开始编译。

说明

  • KUBE_BUILD_PLATFORMS指定目标平台
  • 如果只编译一个组件,例如kubectl,可以在make后面添加WHAT=cmd/kubectl指定
  • GOFLAGS=-v开启verbose日志
  • GOGCFLAGS=”-N -l”禁止编译优化和内联,减小可执行程序大小
1
2
3
cd /root/kubernetes
make clean
KUBE_BUILD_PLATFORMS=linux/amd64 KUBE_GIT_VERSION=v1.14.6 ./build/run.sh make all GOFLAGS=-v GOGCFLAGS="-N -l"

出现类似如下输出表示编译完成,生成的二进制可执行程序发布在_output/dockerized/bin/linux/amd64/目录下

1
2
3
4
5
6
7
8
9
10
11
12
...
k8s.io/kubernetes/pkg/kubelet/cadvisor/testing
k8s.io/kubernetes/pkg/kubelet/container/testing
k8s.io/kubernetes/test/utils
k8s.io/kubernetes/pkg/kubemark
k8s.io/kubernetes/cmd/kubemark
+++ [0824 23:43:07] Placing binaries
+++ [0824 23:43:17] Syncing out of container
+++ [0824 23:43:17] Stopping any currently running rsyncd container
+++ [0824 23:43:17] Starting rsyncd container
+++ [0824 23:43:18] Running rsync
+++ [0824 23:43:25] Stopping any currently running rsyncd container

Docker镜像编译

一切准备就绪,执行下面的命令开始编译。

说明
KUBE_BUILD_PLATFORMS指定目标平台
KUBE_BUILD_CONFORMANCE=nKUBE_BUILD_HYPERKUBE=n参数决定是否构建hyperkube-amd64conformance-amd64镜像,默认是y构建,这里设置为n表示不构建
如果只编译一个组件,例如kubectl,可以在make后面添加WHAT=cmd/kubectl指定
GOFLAGS=-v开启verbose日志
GOGCFLAGS=”-N -l”禁止编译优化和内联,减小可执行程序大小。

1
2
3
cd /root/kubernetes
make clean
KUBE_BUILD_PLATFORMS=linux/amd64 KUBE_BUILD_CONFORMANCE=n KUBE_BUILD_HYPERKUBE=n KUBE_GIT_VERSION=v1.14.6 make release-images GOFLAGS=-v GOGCFLAGS="-N -l"

出现类似如下输出表示编译构建完成,生成的二进制可执行程序和Docker镜像tar包发布在_output/release-stage/server/linux-amd64/kubernetes/server/bin/目录下

1
2
3
4
5
6
7
8
9
10
11
12
13
+++ [0825 00:41:14] Syncing out of container
+++ [0825 00:41:18] Building images: linux-amd64
+++ [0825 00:41:18] Starting docker build for image: cloud-controller-manager-amd64
+++ [0825 00:41:18] Starting docker build for image: kube-apiserver-amd64
+++ [0825 00:41:18] Starting docker build for image: kube-controller-manager-amd64
+++ [0825 00:41:18] Starting docker build for image: kube-scheduler-amd64
+++ [0825 00:41:18] Starting docker build for image: kube-proxy-amd64
+++ [0825 00:41:22] Deleting docker image k8s.gcr.io/kube-scheduler:v1.14.6
+++ [0825 00:41:22] Deleting docker image k8s.gcr.io/kube-proxy:v1.14.6
+++ [0825 00:41:26] Deleting docker image k8s.gcr.io/cloud-controller-manager:v1.14.6
+++ [0825 00:41:26] Deleting docker image k8s.gcr.io/kube-controller-manager:v1.14.6
+++ [0825 00:41:27] Deleting docker image k8s.gcr.io/kube-apiserver:v1.14.6
+++ [0825 00:41:28] Docker builds done

说明
如果编译过程中出现如下错误,大概率是因为无法访问国外网络拉取编译所需的Docker镜像导致。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
+++ [0825 00:16:34] Syncing out of container
+++ [0825 00:16:37] Building images: linux-amd64
+++ [0825 00:16:38] Starting docker build for image: cloud-controller-manager-amd64
+++ [0825 00:16:38] Starting docker build for image: kube-apiserver-amd64
+++ [0825 00:16:38] Starting docker build for image: kube-controller-manager-amd64
+++ [0825 00:16:38] Starting docker build for image: kube-scheduler-amd64
+++ [0825 00:16:38] Starting docker build for image: kube-proxy-amd64
Sending build context to Docker daemon 39.2MB
Step 1/2 : FROM k8s.gcr.io/debian-base-amd64:v1.0.0
Get https://k8s.gcr.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while > awaiting headers)
!!! [0825 00:17:09] Call tree:
!!! [0825 00:17:09] 1: /root/kubernetes/build/lib/release.sh:231 kube::release::create_docker_images_for_server(...)
!!! [0825 00:17:09] 2: build/release-images.sh:42 kube::release::build_server_images(...)
Sending build context to Docker daemon 36.63MB
Step 1/2 : FROM k8s.gcr.io/debian-iptables-amd64:v11.0.2
Get https://k8s.gcr.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while > awaiting headers)
!!! [0825 00:17:34] Call tree:
!!! [0825 00:17:34] 1: /root/kubernetes/build/lib/release.sh:231 kube::release::create_docker_images_for_server(...)
!!! [0825 00:17:34] 2: build/release-images.sh:42 kube::release::build_server_images(...)
Sending build context to Docker daemon 115MB
Step 1/2 : FROM k8s.gcr.io/debian-base-amd64:v1.0.0
Get https://k8s.gcr.io/v2/: dial tcp: lookup k8s.gcr.io on 10.211.55.1:53: read udp 10.211.55.6:33611->10.211.55.1:53: i/> o timeout
!!! [0825 00:17:49] Call tree:
!!! [0825 00:17:49] 1: /root/kubernetes/build/lib/release.sh:231 kube::release::create_docker_images_for_server(...)
!!! [0825 00:17:49] 2: build/release-images.sh:42 kube::release::build_server_images(...)
Sending build context to Docker daemon 99.87MB
Step 1/2 : FROM k8s.gcr.io/debian-base-amd64:v1.0.0
Get https://k8s.gcr.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while > awaiting headers)
!!! [0825 00:17:54] Call tree:
!!! [0825 00:17:54] 1: /root/kubernetes/build/lib/release.sh:231 kube::release::create_docker_images_for_server(...)
!!! [0825 00:17:54] 2: build/release-images.sh:42 kube::release::build_server_images(...)
Sending build context to Docker daemon 167MB
Step 1/2 : FROM k8s.gcr.io/debian-base-amd64:v1.0.0
Get https://k8s.gcr.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while > awaiting headers)
!!! [0825 00:18:09] Call tree:
!!! [0825 00:18:09] 1: /root/kubernetes/build/lib/release.sh:231 kube::release::create_docker_images_for_server(...)
!!! [0825 00:18:09] 2: build/release-images.sh:42 kube::release::build_server_images(...)
!!! [0825 00:18:09] previous Docker build failed
!!! [0825 00:18:09] Call tree:
!!! [0825 00:18:09] 1: /root/kubernetes/build/lib/release.sh:231 kube::release::create_docker_images_for_server(...)
!!! [0825 00:18:09] 2: build/release-images.sh:42 kube::release::build_server_images(...)
make: *** [release-images] Error 1

解决方法
手动拉取阿里云镜像仓库相应镜像然后重新打TAG,具体拉取的镜像标签以实际报错信息为准

1
2
3
4
5
docker pull mirrorgooglecontainers/debian-base-amd64:v1.0.0
docker pull mirrorgooglecontainers/debian-iptables-amd64:v11.0.2

docker tag mirrorgooglecontainers/debian-base-amd64:v1.0.0 k8s.gcr.io/debian-base-amd64:v1.0.0
docker tag mirrorgooglecontainers/debian-iptables-amd64:v11.0.2 k8s.gcr.io/debian-iptables-amd64:v11.0.2

修改/root/kubernetes/build/lib/release.sh文件,去掉"${docker_build_opts[@]}",避免构建镜像继续拉取镜像。

1
2
3
4
5
# 修改前
"${DOCKER[@]}" build "${docker_build_opts[@]}" -q -t "${docker_image_tag}" "${docker_build_path}" >/dev/null

# 修改后
"${DOCKER[@]}" build -q -t "${docker_image_tag}" "${docker_build_path}" >/dev/null

整个编译过程结束后,如果集群采用二进制方式部署,则依次替换Master节点和Node节点上的二进制可执行文件,然后重启服务生效;如果集群是kubeadm方式部署,则在所有节点上导入构建好的Docker镜像,并依此修改/etc/kubernetes/manifests/目录下kube-apiserver.yamlkube-controller-manager.yamlkube-scheduler.yaml文件中的image,修改完成立即生效,最后执行kubectl edit daemonset kube-proxy -n kube-system修改kube-proxy的镜像,修改完立即生效。

参考文档