After a cluster upgrade, one of three masters can't connect back to the cluster. I have a HA cluster running in us-east-1a, us-east-1b and us-east-1c, my master that is running in us-east-1a can't join back to the cluster.
I tried to scale down the master-us-east-1a instance group to zero nodes and back it to one node but the EC2 machine starts with the same problem, can't join back to the cluster again, seems to starts with a backup or something.
I tried to connect to the master to restart the services, maybe protukube or docker, but I can't solve the problem too.
Connecting via ssh in the master I noticed that the flannel service is not running in this machine. I tried to run manually via docker without success. Seems that flannel is the network service that should be running and is not.
Thanks in advance.
attachments
> kubectl get nodes
NAME                             STATUS     ROLES    AGE   VERSION
ip-xxx-xxx-xxx-xxx.ec2.internal  Ready      node     33d   v1.11.9
ip-xxx-xxx-xxx-xxx.ec2.internal  Ready      master   33d   v1.11.9
ip-xxx-xxx-xxx-xxx.ec2.internal  Ready      node     33d   v1.11.9
ip-xxx-xxx-xxx-xxx.ec2.internal  Ready      master   33d   v1.11.9
ip-xxx-xxx-xxx-xxx.ec2.internal  Ready      node     33d   v1.11.9
-
> sudo systemctl status kubelet
Jan 10 21:00:55 ip-xxx-xxx-xxx-xxx kubelet[2502]: I0110 21:00:55.026553    2502 kubelet_node_status.go:441] Recording NodeHasSufficientPID event message for node ip-xxx-xxx-xxx-xxx.ec2.internal
Jan 10 21:00:55 ip-xxx-xxx-xxx-xxx kubelet[2502]: I0110 21:00:55.027005    2502 kubelet_node_status.go:79] Attempting to register node ip-xxx-xxx-xxx-xxx.ec2.internal
Jan 10 21:00:55 ip-xxx-xxx-xxx-xxx kubelet[2502]: E0110 21:00:55.027764    2502 kubelet_node_status.go:103] Unable to register node "ip-xxx-xxx-xxx-xxx.ec2.internal" with API server: Post https://127.0.0.1/api/v1/nodes: dial tcp 127.0.0.1:443: connect: connection refused
-
> sudo docker logs k8s_kube-apiserver_kube-apiserver-ip-xxx-xxx-xxx-xxx.ec2.internal_kube-system_134d55c1b1c3bf3583911989a14353da_16
F0110 20:59:35.581865       1 storage_decorator.go:57] Unable to create storage backend: config (&{etcd3 /registry [http://127.0.0.1:4001]    true false 1000 0xc42013c480 <nil> 5m0s 1m0s}), err (dial tcp 127.0.0.1:4001: connect: connection refused)
-
> sudo docker version
Client:
 Version:      17.03.2-ce
 API version:  1.27
 Go version:   go1.7.5
 Git commit:   f5ec1e2
 Built:        Tue Jun 27 02:31:19 2017
 OS/Arch:      linux/amd64
Server:
 Version:      17.03.2-ce
 API version:  1.27 (minimum version 1.12)
 Go version:   go1.7.5
 Git commit:   f5ec1e2
 Built:        Tue Jun 27 02:31:19 2017
 OS/Arch:      linux/amd64
 Experimental: false
-
> kubectl version
Client Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.9", GitCommit:"16236ce91790d4c75b79f6ce96841db1c843e7d2", GitTreeState:"clean", BuildDate:"2019-03-25T06:40:24Z", GoVersion:"go1.10.8", Compiler:"gc", Platform:"linux/amd64"}
The connection to the server 127.0.0.1 was refused - did you specify the right host or port?
-
> sudo docker images
REPOSITORY                           TAG                 IMAGE ID            CREATED             SIZE
protokube                            1.15.0              6b00e7216827        7 weeks ago         288 MB
k8s.gcr.io/kube-proxy                v1.11.9             e18fcce798b8        9 months ago        98.1 MB
k8s.gcr.io/kube-controller-manager   v1.11.9             634ccbd18a0f        9 months ago        155 MB
k8s.gcr.io/kube-apiserver            v1.11.9             ef9a84756d40        9 months ago        187 MB
k8s.gcr.io/kube-scheduler            v1.11.9             e00d30bd3a71        9 months ago        56.9 MB
k8s.gcr.io/pause-amd64               3.0                 99e59f495ffa        3 years ago         747 kB
kopeio/etcd-manager                  3.0.20190930        7937b67f722f        50 years ago        656 MB
-
> sudo docker ps
CONTAINER ID        IMAGE                                                                                                        COMMAND                  CREATED             STATUS              PORTS               NAMES
b4eb0ec9e6a2        k8s.gcr.io/kube-scheduler@sha256:372ab1014701f60b67a65d94f94d30d19335294d98746edcdfcb8808ed5aee3c            "/bin/sh -c 'mkfif..."   15 hours ago        Up 15 hours                             k8s_kube-scheduler_kube-scheduler-ip-xxx-xxx-xxx-xxx.ec2.internal_kube-system_105cd5bac4edf48f265f31eb756b971a_0
8f827dc0eade        kopeio/etcd-manager@sha256:cb0ed7c56dadbc0f4cd515906d72b30094229d6e0a9fcb7aa44e23680bf9a3a8                  "/bin/sh -c 'mkfif..."   15 hours ago        Up 15 hours                             k8s_etcd-manager_etcd-manager-main-ip-xxx-xxx-xxx-xxx.ec2.internal_kube-system_a6a467f6b78a7c7bc15ec1f64799516d_0
5bebb169b8b3        k8s.gcr.io/kube-controller-manager@sha256:aa9b9dac085a65c47746fa8739cf70e9d7e9a356a836ad2ef073da0d7b136db2   "/bin/sh -c 'mkfif..."   15 hours ago        Up 15 hours                             k8s_kube-controller-manager_kube-controller-manager-ip-xxx-xxx-xxx-xxx.ec2.internal_kube-system_564bccf38cd14aa0f647593e69b159ab_0
4467d550824e        k8s.gcr.io/kube-proxy@sha256:a63c81fe4d3e9575cc0a29c4866a2975b01a07c0f473ab2cf1e88ebf78739f80                "/bin/sh -c 'mkfif..."   15 hours ago        Up 15 hours                             k8s_kube-proxy_kube-proxy-ip-xxx-xxx-xxx-xxx.ec2.internal_kube-system_22cd6fe287e6f4bae556504b3245f385_0
0a5c23006e18        kopeio/etcd-manager@sha256:cb0ed7c56dadbc0f4cd515906d72b30094229d6e0a9fcb7aa44e23680bf9a3a8                  "/bin/sh -c 'mkfif..."   15 hours ago        Up 15 hours                             k8s_etcd-manager_etcd-manager-events-ip-xxx-xxx-xxx-xxx.ec2.internal_kube-system_9f2a8de168741a0263161532f42e97b4_0
3efa9ae55618        k8s.gcr.io/pause-amd64:3.0                                                                                   "/pause"                 15 hours ago        Up 15 hours                             k8s_POD_kube-proxy-ip-xxx-xxx-xxx-xxx.ec2.internal_kube-system_22cd6fe287e6f4bae556504b3245f385_0
4e451bc007ac        k8s.gcr.io/pause-amd64:3.0                                                                                   "/pause"                 15 hours ago        Up 15 hours                             k8s_POD_kube-scheduler-ip-xxx-xxx-xxx-xxx.ec2.internal_kube-system_105cd5bac4edf48f265f31eb756b971a_0
7c5c301e034a        k8s.gcr.io/pause-amd64:3.0                                                                                   "/pause"                 15 hours ago        Up 15 hours                             k8s_POD_kube-apiserver-ip-xxx-xxx-xxx-xxx.ec2.internal_kube-system_134d55c1b1c3bf3583911989a14353da_0
d88f075fa61f        k8s.gcr.io/pause-amd64:3.0                                                                                   "/pause"                 15 hours ago        Up 15 hours                             k8s_POD_etcd-manager-main-ip-xxx-xxx-xxx-xxx.ec2.internal_kube-system_a6a467f6b78a7c7bc15ec1f64799516d_0
69e8844e9c14        k8s.gcr.io/pause-amd64:3.0                                                                                   "/pause"                 15 hours ago        Up 15 hours                             k8s_POD_kube-controller-manager-ip-xxx-xxx-xxx-xxx.ec2.internal_kube-system_564bccf38cd14aa0f647593e69b159ab_0
05e67c2e8f98        k8s.gcr.io/pause-amd64:3.0                                                                                   "/pause"                 15 hours ago        Up 15 hours                             k8s_POD_etcd-manager-events-ip-xxx-xxx-xxx-xxx.ec2.internal_kube-system_9f2a8de168741a0263161532f42e97b4_0
eee0a4d563c0        protokube:1.15.0                                                                                             "/usr/bin/protokub..."   15 hours ago        Up 15 hours                             hungry_shirley
The Kubelet is trying to register the master node us-east-1a with an API Server endpoint https://127.0.0.1:443. I believe this should be API server endpoint of any of the other two masters. Kubelet uses kubelet.conf file to talk to the API Server to register node.Change the server in kubelet.conf file located at /etc/kubernetes to point to one of the below:
After changing kubelet.conf restart kubelet.
Edit: Since you are using etcd manager can you try the Kubernetes service unavailable / flannel issues troubleshooting step described here
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With