What is Kubernetes Elasticity?
Horizontal Pod scaling UP/Down as well as Nodes scaling UP/Down when pods are requiring nodes at the time when user suddenly increase/decrease accessing your website. So, nodes of kubernetes dynamically scaling up/down automatically is known as Kubernetes Elasticity.
How It happens?
watches the pods continuously and if it finds that a pod cannot be scheduled,
then based on the
PodCondition, then it chooses to scale up the
node. This is far more effective than looking at the CPU percentage of nodes in
aggregate. Since a node creation can take up to a minute or more depending on
your cloud provider and other factors, it may take some time till the pod can
Also, the nodes can be spread across AZs of cloud like in aws in a region and how you scale might vary based on your topology.
For scaling down, it looks at average utilization on that node, but there are other factors which come into play. For example, if a pod with pod disruption budget is running on a node which cannot be re-scheduled then the node cannot be removed from the cluster. Kubernetes Elasticity provides a way to gracefully terminate nodes and gives up to 10 minutes for pods to relocate.
Let’s understand the difference between scalability and elasticity.Scalability: – Let’s imagine we have a water tub being filled from a water tap. We want to ensure that once the first tub is 80% full, another tub should be placed and the water should be sent to the second tub. This is simple to do with tubs—you introduce a pipe connection between the first and second tub at appropriate mark. Of course, you will need to maintain a stock of tubs as long as we want to scale.
Thanks to the cloud, we don’t have to maintain physical VMs for autoscaling. So in this toy setup, here are the terms:
Tub – the unit of scaling (What to scale?)
80% mark – metric and trigger for scaling (When to scale?)
Pipe – the operation which enables scaling in this case (How to scale?)
Elasticity: – Elasticity adapts to both the “workload increase” as well as “workload decrease” by “provisioning and de-provisioning” resources in an “autonomic” manner.
Let’s take similar example of water tube
As we see in above fig when one of the tube tank not use then it will automatically destroy.
Similarly, Elasticity mythology works like that when workload increase the resource will be added to keep up to date the performance of the application. And when workload decrease resource will be terminating to save your cost on cloud environment.
How Kubernetes Elasticity works?
Kubernetes Elasticity is also known as kuberntes autoscale on cloud.
It does two things
- Horizontal Pod Autoscaler (HPA)
- Nodes/Cluster Autoscaler
Horizontal Pod Autoscaler (HPA)
Horizontal pod autoscaler is a control loop which watches and scales a pod in the deployment. This can be done by creating an HPA object that refers to a deployment/replication controller. You can define the threshold and minimum and maximum scale to which the deployment should scale. The original version of HPA which is GA (autoscaling/v1) only supports CPU as a metric that can be monitored. The current version of HPA which is in beta (autoscaling/v2beta1) supports memory and other custom metrics. Once you create an HPA object and it is able to query the metrics for that pod, you can see it reporting the details:
There are a few tweaks that you can make to the behaviour of horizontal pod autoscaler by adding flags to controller manager:
- Determine how frequently the hPa monitors the given metrics on a pool of pods by using the flag -horizontal-pod-autoscaler-sync-period on controller manager. The default sync period is 30 seconds.
- The delay between two upscale operations defaults to 3 minutes and can be controlled by using the flag -horizontal-pod-autoscaler-upscale-delay
- Similarly, the delay between two downscaling operations is by default 5 minutes and adjustable with flag -horizontal-pod-autoscaler-downscale-delay
Cluster autoscaler is used in Kubernetes to scale cluster i.e. nodes dynamically. It watches the pods continuously and if it finds that a pod cannot be scheduled, then based on the PodCondition, it chooses to scale up. This is far more effective than looking at the CPU percentage of nodes in aggregate. Since a node creation can take up to a minute or more depending on your cloud provider and other factors, it may take some time till the pod can be scheduled. Within a cluster, you might have multiple node pools, for example, a node pool for billing applications and another node pool for machine learning workloads. Also, the nodes can be spread across AZs in a region and how you scale might vary based on your topology.
For scaling down, it looks at average utilization on that node, but there are other factors which come into play. For example, if a pod with pod disruption budget is running on a node which cannot be re-scheduled then the node cannot be removed from the cluster. Cluster autoscaler provides a way to gracefully terminate nodes and gives up to 10 minutes for pods to relocate
Kubenetes Node scale down logs
No Single Point failure
If you have production environment it is good practice to put master kubernetes in a scale group of your cloud infrasture. And have more than one master kuberntes in different availability Zone.
So, that if any of the availability zone was down then other master can take control of it’s nodes.
Looks like below.
Metrics & Cloud Provider
For measuring metrics your server should have Heapster working or API aggregation enabled along with Kubernetes custom metrics enabled. API metrics server is the preferred method in Kubernetes 1.9 onwards. For provisioning nodes, you should have appropriate cloud provider enabled and configured in your cluster.
Pre-request for autoscaling.
Let’s take an example.
If you want any Docker containerize web application run on production, then
- Deploy containerize web application through kubernetes.
- Autoscale the nodes with maximum and minimum number of nodes in different availability zone in cloud infrastructure.
- Put master kubernetes in scaling group. So that if in case master availability zone down then new master will automatically create and take control over existing nodes.
- A Metric Server is installed to feed the Metrics API
- Enable the autoscale on kubernetes deployment
resource with thresh-hold like if CPU utilization increase beyond 75% then
number of pods increase. If pods continuously increase and when it finds that a
pod cannot be scheduled, then based on the
PodCondition, then it chooses to scale up the node
We have been able to autoscale our AWS deployed Kubernetes cluster which is extremely useful. This can be use in production to quickly scale out and down my cluster. But perhaps even more important during idle moments run a minimum size cluster and during workloads it scales back up to full capacity, saving me quite some money.
Author : Rishabh Gupta