Cluster Maintenance
Upgrading Kubernetes: rolling upgrades, version skew policy
- pod eviction time : is the time a node waits to acknowledge the master this node is dead by default it is 5 minutes until this node is acknowledged dead
to upgrade node :
- drain the node from all of the workload using
kubectl drain node-01
- This will move the pods to another node and evict the node from any resources
- Mark this node as un-schedulable this way it is safer
- do the upgrade with the node and when it is done do
kubectl uncordon node-01
this will mark the node as schedulable there is also kubectl cordon node-01
to mark the node and un-schedulable
Backup and restore: etcd backups, cluster snapshots
when backing up custer state we need to backup 3 things
Resources configs
- you can restore from the source code repository declarative way
- you can get all resources to apply later
kubectl get all --all-namespaces -o yaml > all-resources.yaml
or you can use thirdparties like Velero
ETCD
- usually etcd is reachable at port
2379
take snapshot of the ETCD by
- export snapshot using
etcdctl snapshot save snapshot.db
- later when backup this do the following
- stop kube-apiserver using
service kube-apiserver stop
- run
etcdctl snapshot restore snapshot.db --data-dir /var/lib/etcd-from-backup
these commands create extra etcd to avoid duplicate resource config
- then configure the etcd config file
/etc/kubernetes/manifests/etcd.yaml
on etcd service to use the new volume edit the —data-dir
and edit volume to use the new dir also
- reload etcd
systemctl daemon-reload
and restart etcd service etcd restart
- start kube-apiserver using
service kube-apiserver start
Troubleshooting: kubectl logs, describe, exec, events, etc.