Task 4 - Auto Scaling Deployment
Objective: Automate Deployment Scaling with HPA
Learn to configure the Horizontal Pod Autoscaler (HPA) for a deployment and simulate traffic to test scaling.
Using kubectl autoscale
When Kubernetes has Resouce Metrics API installed, We can using the kubectl autoscale command kubectl autoscale deployment
to automatically scale a deployment based on CPU utilization (or any other metric) requires that the Kubernetes Metrics Server (or an equivalent metrics API) is installed and operational in your cluster. The Metrics Server collects resource metrics from Kubelets and exposes them in the Kubernetes API server through the Metrics API for use by Horizontal Pod Autoscaler (HPA) and other components.
using the kubectl autoscale command to automatically scale a deployment based on CPU utilization (or any other metric) requires that the Kubernetes Metrics Server (or an equivalent metrics API) is installed and operational in your cluster. The Metrics Server collects resource metrics from Kubelets and exposes them in the Kubernetes API server through the Metrics API for use by Horizontal Pod Autoscaler (HPA) and other components.
Enable resource-API
The Resource Metrics API in Kubernetes is crucial for providing core metrics about Pods and nodes within a cluster, such as CPU and memory usage to enable feature like Horizontal Pod Autoscaler (HPA), Vertical Pod Autoscaler (VPA) and enable efficent resource scheduling.
- copy/paste below command to enable resource-api
curl --insecure --retry 3 --retry-connrefused -fL "https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml" -o components.yaml
sed -i '/- --metric-resolution/a \ \ \ \ \ \ \ \ - --kubelet-insecure-tls' components.yaml
kubectl apply -f components.yaml
kubectl rollout status deployment metrics-server -n kube-system
use kubectl top node
and kubectl top pod
to check the Pod and node resource usage
Create a deployment with resource constrain
In this deployment , we add some resource restriction like memory and cpu for a POD.
when POD reach the CPU or memory limit, if HPA configured, new POD will be created according HPA policy.
- create deployment with CPU and Memory constraints
cat <<EOF | tee nginx-deployment_resource.yaml apiVersion: apps/v1 kind: Deployment metadata: name: nginx-deployment labels: app: nginx spec: replicas: 2 selector: matchLabels: app: nginx template: metadata: labels: app: nginx spec: containers: - name: nginx image: nginx:latest ports: - containerPort: 80 resources: requests: memory: "64Mi" # Minimum memory requested to start the container cpu: "10m" # 100 millicpu (0.1 CPU) requested to start the container limits: memory: "128Mi" # Maximum memory limit for the container cpu: "40m" # 200 millicpu (0.2 CPU) maximum limit for the container EOF kubectl apply -f nginx-deployment_resource.yaml kubectl rollout status deployment nginx-deployment cat << EOF | tee nginx-deployment_clusterIP.yaml apiVersion: v1 kind: Service metadata: labels: app: nginx name: nginx-deployment namespace: default spec: ipFamilies: - IPv4 ipFamilyPolicy: SingleStack ports: - port: 80 protocol: TCP targetPort: 80 selector: app: nginx sessionAffinity: None type: ClusterIP EOF kubectl apply -f nginx-deployment_clusterIP.yaml
check the deployment and service
kubectl get deployment nginx-deployment kubectl get svc nginx-deployment
Use autoscale (HPA) to scale your application
- We can use
kubectl autoscale
command or use create a hpa yaml file then follow akubectl apply -f
to create hpa.
use kubectl command to create hpa
kubectl autoscale deployment nginx-deployment --name=nginx-deployment-hpa --min=2 --max=10 --cpu-percent=50 --save-config
expected Outcome
horizontalpodautoscaler.autoscaling/nginx-deployment-hpa autoscaled
or use yaml file to create hpa
cat << EOF | tee > nginx-deployment-hpa.yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: nginx-deployment-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: nginx-deployment
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 50
EOF
kubectl apply -f nginx-deployment-hpa.yaml
- Target CPU Utilization: This is set to 50%. It means the HPA will aim to adjust the number of Pods so that the average CPU utilization across all Pods is around 50% of the allocated CPU resources for each Pod.
- Scaling Out: If the average CPU utilization across all Pods in the nginx-deployment exceeds 50%, the HPA will increase the number of Pods, making more resources available to handle the workload, until it reaches the maximum limit of 10 Pods.
- Scaling In: If the average CPU utilization drops below 50%, indicating that the resources are underutilized, the HPA will decrease the number of Pods to reduce resource usage, but it won’t go below the minimum of 2 Pod.
- Check Result
use kubectl get hpa nginx-deployment-hpa
to check deployment
kubectl get hpa nginx-deployment-hpa
Expected Outcome
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
nginx-deployment-hpa Deployment/nginx-deployment 0%/50% 2 10 2 23s
use kubectl get deployment nginx-deployment
to check the change of deployment.
use kubectl get hpa
and kubectl describe hpa
to check the size of replicas.
Send http traffic to application
since the nginx-deployment service is cluster-ip type service which can only be accessed from cluster internal, so we need to create a POD which can send http traffic to nginx-deployment service.
- create deployment for generate http traffic, in this deployment, we will use wget to similuate the real traffic towards ngnix-deployment cluster-ip service which has service name
http://nginx-deployment.default.svc.cluster.local
.
cat <<EOF | tee infinite-calls-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: infinite-calls
labels:
app: infinite-calls
spec:
replicas: 2
selector:
matchLabels:
app: infinite-calls
template:
metadata:
name: infinite-calls
labels:
app: infinite-calls
spec:
containers:
- name: infinite-calls
image: busybox
command:
- /bin/sh
- -c
- "while true; do wget -q -O- http://nginx-deployment.default.svc.cluster.local; done"
EOF
kubectl apply -f infinite-calls-deployment.yaml
- check the creation of infinite-calls deployment
kubectl get deployment infinite-calls
- check the log from infinite-calls Pods. {.items[0]} means use first Pod
podName=$(kubectl get pod -l app=infinite-calls -o=jsonpath='{.items[0].metadata.name}')
kubectl logs po/$podName
you will see the response from nginx web server container. use ctr-c to stop.
use kubectl top pod
and kubectl top node
to check the resource usage status
user expected to see the number of Pod increased
You shall see that expected Pod now increased automatically without use attention.
kubectl get pod -l app=nginx
expected outcome
NAME READY STATUS RESTARTS AGE nginx-deployment-55c7f467f8-f2qbp 1/1 Running 0 19m nginx-deployment-55c7f467f8-hxs79 1/1 Running 0 2m2s nginx-deployment-55c7f467f8-jx2k9 1/1 Running 0 19m nginx-deployment-55c7f467f8-r7vdv 1/1 Running 0 3m2s nginx-deployment-55c7f467f8-w6r8l 1/1 Running 0 3m17s
Use
kubectl get hpa
shall tell you that hpa is action which increased the replicas from 2 to other numbers.kubectl get hpa
expected outcome
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE nginx-deployment-hpa Deployment/nginx-deployment 50%/50% 2 10 5 11m
check hpa detail
kubectl describe hpa
expected outcome
Name: nginx-deployment-hpa Namespace: default Labels: <none> Annotations: autoscaling.alpha.kubernetes.io/conditions: [{"type":"AbleToScale","status":"True","lastTransitionTime":"2024-02-22T07:57:16Z","reason":"ReadyForNewScale","message":"recommended size... autoscaling.alpha.kubernetes.io/current-metrics: [{"type":"Resource","resource":{"name":"cpu","currentAverageUtilization":48,"currentAverageValue":"4m"}}] CreationTimestamp: Thu, 22 Feb 2024 07:57:01 +0000 Reference: Deployment/nginx-deployment Target CPU utilization: 50% Current CPU utilization: 48% Min replicas: 2 Max replicas: 10 Deployment pods: 5 current / 5 desired Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal SuccessfulRescale 4m28s horizontal-pod-autoscaler New size: 3; reason: cpu resource utilization (percentage of request) above target Normal SuccessfulRescale 4m13s horizontal-pod-autoscaler New size: 4; reason: cpu resource utilization (percentage of request) above target Normal SuccessfulRescale 3m13s horizontal-pod-autoscaler New size: 5; reason: cpu resource utilization (percentage of request) above target
delete infinite-calls to stop generate the traffic
kubectl delete deployment infinite-calls
after few minutes later, due to no more traffic is hitting the nginx server. hpa will scale in the number of Pod to save resource.
kubectl get hpa
expected output
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE nginx-deployment-hpa Deployment/nginx-deployment 0%/50% 2 10 5 12m
use
kubectl describe hpa
will tell you the reason why hpa scale in the number of Pod.
kubectl describe hpa
expected outcome
Name: nginx-deployment-hpa
Namespace: default
Labels: <none>
Annotations: autoscaling.alpha.kubernetes.io/conditions:
[{"type":"AbleToScale","status":"True","lastTransitionTime":"2024-02-22T07:57:16Z","reason":"ReadyForNewScale","message":"recommended size...
autoscaling.alpha.kubernetes.io/current-metrics:
[{"type":"Resource","resource":{"name":"cpu","currentAverageUtilization":0,"currentAverageValue":"0"}}]
CreationTimestamp: Thu, 22 Feb 2024 07:57:01 +0000
Reference: Deployment/nginx-deployment
Target CPU utilization: 50%
Current CPU utilization: 0%
Min replicas: 2
Max replicas: 10
Deployment pods: 2 current / 2 desired
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SuccessfulRescale 13m horizontal-pod-autoscaler New size: 3; reason: cpu resource utilization (percentage of request) above target
Normal SuccessfulRescale 13m horizontal-pod-autoscaler New size: 4; reason: cpu resource utilization (percentage of request) above target
Normal SuccessfulRescale 12m horizontal-pod-autoscaler New size: 5; reason: cpu resource utilization (percentage of request) above target
Normal SuccessfulRescale 7m55s horizontal-pod-autoscaler New size: 8; reason: cpu resource utilization (percentage of request) above target
Normal SuccessfulRescale 54s horizontal-pod-autoscaler New size: 5; reason: All metrics below target
Normal SuccessfulRescale 39s horizontal-pod-autoscaler New size: 2; reason: All metrics below target
clean up
kubectl delete hpa nginx-deployment-hpa
kubectl delete deployment nginx-deployment
kubectl delete svc nginx-deployment
Summary
Choosing the right scaling method depends on your specific needs, such as whether you need to quickly adjust resources, maintain performance under varying loads, or integrate scaling into a CI/CD pipeline. Manual methods like kubectl scale or editing the deployment are straightforward for immediate needs, while kubectl autoscale and HPA provide more dynamic, automated scaling based on actual usage, making them better suited for production environments with fluctuating workloads.