Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

Version 1 Current »

Overview

Step-by-step guide on how to configure HI GIO Kubernetes cluster autoscale

  • Install tanzu-cli

  • Create cluster-autoscaler deployment from tanzu package using tanzu-cli

  • Enable cluster autoscale for your cluster

  • Test cluster autoscale

  • Delete cluster-autoscaler deployment and clean up test resource

Procedure

  • Pre-requisites:
 Click here to expand...
  • Ubuntu bastion can connect to your Kubernetes cluster

  • Permission for access to your Kubernetes cluster

  • Procedure:
 Click here to expand...
  1. Install tanzu-cli

    #Install tanzu-cli to ubuntu
    sudo apt update
    sudo apt install -y ca-certificates curl gpg
    sudo mkdir -p /etc/apt/keyrings
    curl -fsSL https://storage.googleapis.com/tanzu-cli-installer-packages/keys/TANZU-PACKAGING-GPG-RSA-KEY.gpg | sudo gpg --dearmor -o /etc/apt/keyrings/tanzu-archive-keyring.gpg
    echo "deb [signed-by=/etc/apt/keyrings/tanzu-archive-keyring.gpg] https://storage.googleapis.com/tanzu-cli-installer-packages/apt tanzu-cli-jessie main" | sudo tee /etc/apt/sources.list.d/tanzu.list
    sudo apt update
    sudo apt install -y tanzu-cli
    #Verify tanzu-cli installation
    tanzu version
    image-20241223-082306.png

To install tanzu-cli in other environments, please refer to the documentation below:

https://techdocs.broadcom.com/us/en/vmware-tanzu/cli/tanzu-cli/1-5/cli/index.html

(Optional) If you want to configure tanzu completion, please run the command below and follow the instructions output

tanzu completion --help
image-20241223-082817.png

  1. Create cluster-autoscaler deployment from tanzu package using tanzu-cli

  • Switched to your Kubernetes context

    kubectl config use-context <your context name>
    image-20241223-083216.png
  • List available cluster-autoscaler in tanzu package and note the version name

    tanzu package available list cluster-autoscaler.tanzu.vmware.com
    image-20241223-083428.png
  • Create kubeconfig secret name cluster-autoscaler-mgmt-config-secret in cluster kube-system namespace

    kubectl create secret generic cluster-autoscaler-mgmt-config-secret \
    --from-file=value=<path to your kubeconfig file> \
    -n kube-system
    image-20241223-084449.png

Please do not change the secret name (cluster-autoscaler-mgmt-config-secret) and namespace (kube-system)

  • Create cluster-autoscaler-values.yaml file

    arguments:
      ignoreDaemonsetsUtilization: true
      maxNodeProvisionTime: 15m
      maxNodesTotal: 0 #Leave this value as 0. We will define the max and min number of nodes later.
      metricsPort: 8085
      scaleDownDelayAfterAdd: 10m
      scaleDownDelayAfterDelete: 10s
      scaleDownDelayAfterFailure: 3m
      scaleDownUnneededTime: 10m
    clusterConfig:
      clusterName: "demo-autoscale-tkg" #adjust here
      clusterNamespace: "demo-autoscale-tkg-ns" #adjust here
    paused: false

Required values:

  • clusterName: your cluster name

  • clusterNamespace: your cluster namespace

  • Install cluster-autoscaler

    tanzu package install cluster-autoscaler \
    --package cluster-autoscaler.tanzu.vmware.com \
    --version <version available> \ #adjust the version listed above to match your kubernetes version
    --values-file 'cluster-autoscaler-values.yaml' \
    --namespace tkg-system #please do not change, this is default namespace for tanzu package
    image-20241223-085721.png

The cluster-autoscaler will deploy into the kube-system namespace.

Run the command below to verify cluster-autoscaler deployment

kubectl get deployments.apps -n kube-system cluster-autoscaler
image-20241223-090422.png
  • Configure the minimum and maximum number of nodes in your cluster

    • Get machinedeployments name and namespace

      kubectl get machinedeployments.cluster.x-k8s.io -A
      image-20241223-090941.png
    • Set cluster-api-autoscaler-node-group-min-size and cluster-api-autoscaler-node-group-max-size

      kubectl annotate machinedeployment <machinedeployment name> cluster.x-k8s.io/cluster-api-autoscaler-node-group-min-size=<number min> -n <machinedeployment namespace>
      kubectl annotate machinedeployment <machinedeployment name> cluster.x-k8s.io/cluster-api-autoscaler-node-group-max-size=<number max> -n <machinedeployment namespace>
      image-20241223-091454.png

       

  • Enable cluster autoscale for your cluster

Because this step requires provider permission to perform, please notify the cloud provider to perform this step.

  1. Test cluster autoscale

  • Get the current number of nodes

    kubectl get nodes
    image-20241223-092206.png

There is currently only one worker node.

  • Create test-autoscale.yaml file

    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: nginx
      namespace: default
    spec:
      replicas: 2
      selector:
        matchLabels:
          app: nginx
      template:
        metadata:
          labels:
            app: nginx
        spec:
          containers:
          - name: nginx
            image: nginx
            ports:
            - containerPort: 80
          topologySpreadConstraints: #Spreads pods across different nodes (ensures no node has more pods than others)
          - maxSkew: 1 
            topologyKey: kubernetes.io/hostname
            whenUnsatisfiable: DoNotSchedule
            labelSelector:
              matchLabels:
                app: nginx
  • Apply test-autoscale.yaml file to deploy 2 replicas of nginx pod in the default namespace (it will trigger to create a new worker node)

    kubectl apply -f test-autoscale.yaml
    image-20241223-093311.png
  • Get nginx deployment

    kubectl get pods
    image-20241223-093659.png
    kubectl describe pod nginx-589656b9b5-mcm5j | grep -A 10 Events
    image-20241223-093958.png

You can see there is a new nginx pod with a status of Pending and the events shown FailedScheduling and TriggeredScaleUp:

Warning  FailedScheduling  2m53s  default-scheduler   0/2 nodes are available: 1 node(s) didn't match pod topology spread constraints, 1 node(s) had untolerated taint {node-role.kubernetes.io/control-plane: }. preemption: 0/2 nodes are available: 1 No preemption victims found for incoming pod, 1 Preemption is not helpful for scheduling.
Normal   TriggeredScaleUp  2m43s  cluster-autoscaler  pod triggered scale-up: [{MachineDeployment/demo-autoscale-tkg-ns/demo-autoscale-tkg-worker-node-pool-1 1->2 (max: 5)}]
  • Waiting for a new node to be provisioned, then you can see a new worker node has been provisioned and new nginx pod status is running

    image-20241223-094250.png
  • Clean up test resource

    kubectl delete -f test-autoscale.yaml
    image-20241223-095032.png

After deleting the nginx deployment test. The cluster waits a few minutes to delete the unneeded node (please see scaleDownUnneededTime value in cluster-autoscaler-values.yaml file)

  • Delete cluster-autoscaler deployment (Optional)

In case you don't want your cluster to auto-scale anymore. You can delete cluster-autoscaler deployment using tanzu-cli

tanzu package installed delete cluster-autoscaler -n tkg-system -y
image-20241223-100123.png

End.

  • No labels