06. How to configure HI GIO Kunernetes cluster autoscale

Overview

Step-by-step guide on how to configure HI GIO Kubernetes cluster autoscale

Install tanzu-cli
Create cluster-autoscaler deployment from tanzu package using tanzu-cli
Enable cluster autoscale for your cluster
Test cluster autoscale
Delete cluster-autoscaler deployment and clean up test resource

Procedure

Pre-requisites:

Click here to expand...

Ubuntu bastion can connect to your Kubernetes cluster
Permission for access to your Kubernetes cluster

Procedure:

Click here to expand...

Install tanzu-cli

#Install tanzu-cli to ubuntu
sudo apt update
sudo apt install -y ca-certificates curl gpg
sudo mkdir -p /etc/apt/keyrings
curl -fsSL https://storage.googleapis.com/tanzu-cli-installer-packages/keys/TANZU-PACKAGING-GPG-RSA-KEY.gpg | sudo gpg --dearmor -o /etc/apt/keyrings/tanzu-archive-keyring.gpg
echo "deb [signed-by=/etc/apt/keyrings/tanzu-archive-keyring.gpg] https://storage.googleapis.com/tanzu-cli-installer-packages/apt tanzu-cli-jessie main" | sudo tee /etc/apt/sources.list.d/tanzu.list
sudo apt update
sudo apt install -y tanzu-cli
#Verify tanzu-cli installation
tanzu version

To install tanzu-cli in other environments, please refer to the documentation below:

https://techdocs.broadcom.com/us/en/vmware-tanzu/cli/tanzu-cli/1-5/cli/index.html

(Optional) If you want to configure tanzu completion, please run the command below and follow the instructions output

tanzu completion --help

Create cluster-autoscaler deployment from tanzu package using tanzu-cli

Switched to your Kubernetes context

kubectl config use-context <your context name>

List available cluster-autoscaler in tanzu package and note the version name
```
tanzu package available list cluster-autoscaler.tanzu.vmware.com
```

Create kubeconfig secret name cluster-autoscaler-mgmt-config-secret in cluster kube-system namespace

kubectl create secret generic cluster-autoscaler-mgmt-config-secret \
--from-file=value=<path to your kubeconfig file> \
-n kube-system

Please do not change the secret name (cluster-autoscaler-mgmt-config-secret) and namespace (kube-system)

Create cluster-autoscaler-values.yaml file

arguments:
  ignoreDaemonsetsUtilization: true
  maxNodeProvisionTime: 15m
  maxNodesTotal: 0 #Leave this value as 0. We will define the max and min number of nodes later.
  metricsPort: 8085
  scaleDownDelayAfterAdd: 10m
  scaleDownDelayAfterDelete: 10s
  scaleDownDelayAfterFailure: 3m
  scaleDownUnneededTime: 10m
clusterConfig:
  clusterName: "demo-autoscale-tkg" #adjust here
  clusterNamespace: "demo-autoscale-tkg-ns" #adjust here
paused: false

Required values:

clusterName: your cluster name
clusterNamespace: your cluster namespace

Install cluster-autoscaler

tanzu package install cluster-autoscaler \
--package cluster-autoscaler.tanzu.vmware.com \
--version <version available> \ #adjust the version listed above to match your kubernetes version
--values-file 'cluster-autoscaler-values.yaml' \
--namespace tkg-system #please do not change, this is default namespace for tanzu package

The cluster-autoscaler will deploy into the kube-system namespace.

Run the command below to verify cluster-autoscaler deployment

kubectl get deployments.apps -n kube-system cluster-autoscaler

Configure the minimum and maximum number of nodes in your cluster

Get machinedeployments name and namespace

kubectl get machinedeployments.cluster.x-k8s.io -A

Set cluster-api-autoscaler-node-group-min-size and cluster-api-autoscaler-node-group-max-size

kubectl annotate machinedeployment <machinedeployment name> cluster.x-k8s.io/cluster-api-autoscaler-node-group-min-size=<number min> -n <machinedeployment namespace>
kubectl annotate machinedeployment <machinedeployment name> cluster.x-k8s.io/cluster-api-autoscaler-node-group-max-size=<number max> -n <machinedeployment namespace>

Enable cluster autoscale for your cluster

Because this step requires provider permission to perform, please notify the cloud provider to perform this step.

Test cluster autoscale

Get the current number of nodes
```
kubectl get nodes
```

There is currently only one worker node.

Create test-autoscale.yaml file

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx
  namespace: default
spec:
  replicas: 2
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx
        ports:
        - containerPort: 80
      topologySpreadConstraints: #Spreads pods across different nodes (ensures no node has more pods than others)
      - maxSkew: 1 
        topologyKey: kubernetes.io/hostname
        whenUnsatisfiable: DoNotSchedule
        labelSelector:
          matchLabels:
            app: nginx

Apply test-autoscale.yaml file to deploy 2 replicas of nginx pod in the default namespace (it will trigger to create a new worker node)
```
kubectl apply -f test-autoscale.yaml
```

Get nginx deployment

kubectl get pods

kubectl describe pod nginx-589656b9b5-mcm5j | grep -A 10 Events

You can see there is a new nginx pod with a status of Pending and the events shown FailedScheduling and TriggeredScaleUp:

Warning  FailedScheduling  2m53s  default-scheduler   0/2 nodes are available: 1 node(s) didn't match pod topology spread constraints, 1 node(s) had untolerated taint {node-role.kubernetes.io/control-plane: }. preemption: 0/2 nodes are available: 1 No preemption victims found for incoming pod, 1 Preemption is not helpful for scheduling.
Normal   TriggeredScaleUp  2m43s  cluster-autoscaler  pod triggered scale-up: [{MachineDeployment/demo-autoscale-tkg-ns/demo-autoscale-tkg-worker-node-pool-1 1->2 (max: 5)}]

Waiting for a new node to be provisioned, then you can see a new worker node has been provisioned and new nginx pod status is running
Clean up test resource
```
kubectl delete -f test-autoscale.yaml
```

After deleting the nginx deployment test. The cluster waits a few minutes to delete the unneeded node (please see scaleDownUnneededTime value in cluster-autoscaler-values.yaml file)

Delete cluster-autoscaler deployment (Optional)

In case you don't want your cluster to auto-scale anymore. You can delete cluster-autoscaler deployment using tanzu-cli

tanzu package installed delete cluster-autoscaler -n tkg-system -y

End.