Maintenance considerations
This section provides prerequisites when you have to drain Kubernetes workers dedicated to ThingPark Enterprise.
- Thingpark Enterprise 7.1.3 is required to manage correctly disruptions
- It's advised to backup data before any maintenance operation
Generic requirements
To maintain ThingPark Enterprise compute capacities, It's encouraged to maintain Kubernetes compute resources in each Availability Zone as specified in Sizing hardware
In all cases, kubernetes available resources must allow to schedule workloads on two Availability Zones.
Azure Kubernetes Service examples
Kubernetes cluster version upgrade
This is an example of Kubernetes cluster version upgrade where only ThingPark Enterprise is installed.
Assumptions:
- The next procedure exposes the way to upgrade both Kubernetes control plane and worker from 1.21 to 1.24 version using the az CLI.
- ThingPark Enterprise is deployed on the default node pool
- The deployment is a L segment sizing
-
Start by upgrade the control plan to the last 1.22 patch version. The control plane 1.22 is compatible with worker 1.20 to 1.22.
az aks upgrade --resource-group <resourceGroupName> \
--name <aksClusterName> \
--kubernetes-version 1.22 \
--control-plane-only -
Create the spare worker group following the same sizing of the main one
az aks nodepool add --cluster-name <aksClusterName> \
--name spare \
--resource-group <resourceGroupName> \
--kubernetes-version 1.22 \
--node-count 3 \
--zones 1 2 3 \
--node-vm-size Standard_D4s_v4 \
--node-osdisk-type Managed \
--node-osdisk-size 128 \
--max-pods 50 -
Start to move workloads by draining nodes of the default node pool. Each node should be drained one by one:
-
Drain the node and remove label
kubectl drain <nodeID> --delete-emptydir-data --ignore-daemonsets
-
Monitor both Deployments and StatefulSets are all backing fully healthy before draining the next node
kubectl get sts
kubectl get deploy
-
-
At this point the default node pool must be free of all ThingPark Enterprise workloads. It can be upgraded:
# Speed up upgrade by allowing to upgrade all nodes at same time
az aks nodepool update --cluster-name <aksClusterName> \
--name default \
--resource-group <resourceGroupName> \
--max-surge 100%
# Upgrade to the latest 1.22 patch
az aks nodepool upgrade --cluster-name <aksClusterName> \
--kubernetes-version 1.22 \
--name default \
--resource-group <resourceGroupName> -
Go on with upgrading both control plane and workers of the default node pool to the 1.23 and next to the 1.24 release
# 1.23 upgrade
az aks upgrade --resource-group <resourceGroupName> \
--name <aksClusterName> \
--kubernetes-version 1.23 \
--control-plane-only
az aks nodepool upgrade --cluster-name <aksClusterName> \
--kubernetes-version 1.23 \
--name default \
--resource-group <resourceGroupName>
# 1.24 upgrade
az aks upgrade --resource-group <resourceGroupName> \
--name <aksClusterName> \
--kubernetes-version 1.24 \
--control-plane-only
az aks nodepool upgrade --cluster-name <aksClusterName> \
--kubernetes-version 1.24 \
--name default \
--resource-group <resourceGroupName> -
Drain spare nodes. For each node:
-
Drain the node and remove label
kubectl drain <nodeID> --delete-emptydir-data --ignore-daemonsets
-
Monitor both Deployments and StatefulSets are all backing fully healthy before draining the next node
kubectl get sts
kubectl get deploy
-
-
Delete the spare node pool
az aks nodepool delete --cluster-name <aksClusterName> \
--name spare \
--resource-group <resourceGroupName> -
Check the cluster provisioning state
$ az aks show --resource-group <resourceGroupName> \
--name <aksClusterName> \
--output table
Name Location ResourceGroup KubernetesVersion CurrentKubernetesVersion ProvisioningState Fqdn
--------------- ---------- --------------- ------------------- -------------------------- ------------------- -------------------------------------------------
<aksClusterName> westeurope <resourceGroupName> 1.24.0 1.24.0 Succeeded <aksClusterName>-d1661175.hcp.westeurope.azmk8s.io