Scaling Applications in Kubernetes
Scaling is at the heart of running apps in Kubernetes. The platform gives you flexible tools to tweak your app’s capacity as demand shifts, so you can handle spikes without breaking a sweat.
Understanding ReplicaSets
ReplicaSets are controllers that keep a set number of pod replicas running at all times. They make sure you always have the right number of identical pods up and running.
Want to see your current ReplicaSets?
kubectl get replicasets
You’ll usually interact with ReplicaSets through Deployments, which offer easier updates and more features. If a pod dies or disappears, ReplicaSets bring it right back.
A typical ReplicaSet config includes:
- selector: Picks which pods to manage
- replicas: Sets how many pods you want
- template: Defines what each pod should look like
Autoscaling Pods
The Horizontal Pod Autoscaler (HPA) automatically changes how many pods are running in a deployment or ReplicaSet, based on metrics like CPU usage.
To set up autoscaling, try:
kubectl autoscale deployment my-app --min=2 --max=10 --cpu-percent=80
This creates an HPA that keeps between 2 and 10 replicas, scaling things up if average CPU hits 80%.
Check your HPAs with:
kubectl get hpa
Some best practices for autoscaling:
- Set realistic min and max values
- Pick metrics that matter (CPU, memory, or custom ones)
- Add cooldown periods to stop constant scaling up and down
Don’t forget, HPAs need the metrics server running in your cluster to work right.
Managing Resource Quotas
ResourceQuotas set limits on how much a namespace can use, so no one team or app hogs all the cluster’s resources.
To create a quota, use:
kubectl create quota team-quota --hard=requests.cpu=2,limits.cpu=4,requests.memory=1Gi,limits.memory=2Gi
To see quotas in a namespace:
kubectl get resourcequota
ResourceQuotas can limit:
Resource Type | Examples |
---|---|
Compute | CPU, memory |
Storage | PVC, storage classes |
Object counts | pods, services, configmaps |
Make sure you set resource limits and requests for every container. When a quota’s in place, you have to specify requests and limits, or the API will just reject your pod.
Quota policies can work together with Limit Ranges to fine-tune resource allocation at the namespace or pod level.