After Kubernetes upgrade from 1.18.13 to 1.19.5 I get error bellow for some pods randomly. After some time pod fails to start(it's a simple pod, doesn't belong to deployment)
Warning FailedMount 99s kubelet Unable to attach or mount volumes: unmounted volumes=[red-tmp data logs docker src red-conf], unattached volumes=[red-tmp data logs docker src red-conf]: timed out waiting for the condition
We have pretty standard setup without any special customizations:
Example of pod definition:
apiVersion: v1
kind: Pod
metadata:
labels:
app: provision
ver: latest
name: provision
namespace: red
spec:
containers:
- args:
- wait
command:
- provision.sh
image: app-tests
imagePullPolicy: IfNotPresent
name: provision
volumeMounts:
- mountPath: /opt/app/be
name: src
- mountPath: /opt/app/be/conf
name: red-conf
- mountPath: /opt/app/be/tmp
name: red-tmp
- mountPath: /var/lib/app
name: data
- mountPath: /var/log/app
name: logs
- mountPath: /var/run/docker.sock
name: docker
dnsConfig:
options:
- name: ndots
value: "2"
dnsPolicy: ClusterFirst
enableServiceLinks: false
restartPolicy: Never
volumes:
- hostPath:
path: /opt/agent/projects/app-backend
type: Directory
name: src
- name: red-conf
persistentVolumeClaim:
claimName: conf
- name: red-tmp
persistentVolumeClaim:
claimName: tmp
- name: data
persistentVolumeClaim:
claimName: data
- name: logs
persistentVolumeClaim:
claimName: logs
- hostPath:
path: /var/run/docker.sock
type: Socket
name: docker
PV
apiVersion: v1
kind: PersistentVolume
metadata:
name: red-conf
labels:
namespace: red
spec:
accessModes:
- ReadWriteMany
capacity:
storage: 2Gi
hostPath:
path: /var/lib/docker/k8s/red-conf
persistentVolumeReclaimPolicy: Retain
storageClassName: red-conf
volumeMode: Filesystem
PVC
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: conf
namespace: red
spec:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 2Gi
storageClassName: red-conf
volumeMode: Filesystem
volumeName: red-conf
tmp data logs pv have the same setup as conf beside path. They have separate folders:
/var/lib/docker/k8s/red-tmp
/var/lib/docker/k8s/red-data
/var/lib/docker/k8s/red-logs
Currently I don't have any clues how to diagnose the issue :(
Would be glad to get advice. Thanks in advance.
I recommend you to start troubleshooting by reviewing the VolumeAttachment events against what node has tied the PV, perhaps your volume is still linked to a node that was in evicted condition and was replaced by a new one.
You can use this command to check your PV name and status:
kubectl get pv
And then, to review what node has the correct volumeattachment, you can use the following command:
kubectl get volumeattachment
Once you get the name of your PV and at what node it is attached, then you will be able to see if the PV is tied to the correct node or if maybe it is tied to a previous node that is not working or was removed. The node gets evicted and scheduled into a new available node from the pool; to know what nodes are ready and running, you can use this command:
kubectl get nodes
If you detect that your PV is tied to the node that no longer exists, you will need to delete the VolumeAttachment with the following command:
kubectl delete volumeattachment [csi-volumeattachment_name]
If you need to review a detailed guide for this troubleshooting, you can follow this link.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With