Cluster maintenance | Notion

Cluster Maintenance

Upgrading Kubernetes: rolling upgrades, version skew policy

pod eviction time : is the time a node waits to acknowledge the master this node is dead by default it is 5 minutes until this node is acknowledged dead

to upgrade node :

drain the node from all of the workload using kubectl drain node-01
1. This will move the pods to another node and evict the node from any resources
2. Mark this node as un-schedulable this way it is safer
do the upgrade with the node and when it is done do kubectl uncordon node-01 this will mark the node as schedulable there is also kubectl cordon node-01 to mark the node and un-schedulable

Backup and restore: etcd backups, cluster snapshots

when backing up custer state we need to backup 3 things

Resources configs

you can restore from the source code repository declarative way
you can get all resources to apply later kubectl get all --all-namespaces -o yaml > all-resources.yaml or you can use thirdparties like Velero

ETCD

usually etcd is reachable at port 2379 take snapshot of the ETCD by
1. export snapshot using etcdctl snapshot save snapshot.db
2. later when backup this do the following
  1. stop kube-apiserver using service kube-apiserver stop
  2. run etcdctl snapshot restore snapshot.db --data-dir /var/lib/etcd-from-backup these commands create extra etcd to avoid duplicate resource config
  3. then configure the etcd config file /etc/kubernetes/manifests/etcd.yaml on etcd service to use the new volume edit the —data-dir and edit volume to use the new dir also
  4. reload etcd systemctl daemon-reload and restart etcd service etcd restart
  5. start kube-apiserver using service kube-apiserver start

Troubleshooting: kubectl logs, describe, exec, events, etc.