Sometimes, running just one instance of an application in Kubernetes isn’t enough. To maintain stable performance during times of increased application load, you may need to deploy multiple instances – and then, when traffic slows down, you may want to scale your deployment back down to fewer instances to avoid wasting resources.

This is why the kubectl scale feature is so powerful. With this command, you can easily add or remove application instances – or replicas, as Kubernetes calls them – in order to maintain an optimal balance between application performance and resource utilization.

Keep reading for a deep dive into how the kubectl scale works, how to use it to scale deployments, and which considerations to keep in mind when adding or removing replicas in Kubernetes.

What is kubectl scale deployment?

Kubectl scale deployment is a Kubernetes command that lets you add or remove instances of a running application.

Diagram of Kubernetes architecture showing deployments managing ReplicaSets, which in turn manage multiple pods.

In this context, an application instance – also known as a replica – is a copy of a running pod. Using the kubectl scale deployment command, you can add replicas, which means you’ll have more copies of the same pod running. This can help meet increases in application traffic or requests, since the more copies of the same application pod you have in operation, the more load your application can handle. In addition, kubectl scale deployment can decrease replicas when you no longer need as many pod copies.

Kubectl scale deployment use cases

As examples of why you might want to use kubectl scale, consider the following common use cases.

1. Handling traffic spikes

The traffic directed at an application can spike during different times of the day, days of the week, or seasons of the year. For example, a pod that hosts a retail application is likely to see more traffic during the holiday shopping season.

To accommodate the increased traffic without degrading performance, you can scale deployments up.

2. Maintaining high availability

Scaling up helps improve workload availability. The more replicas of a pod that you have, the lower the risk of the pod failing.

This doesn’t mean that scaling up is the only factor that matters in maintaining high availability; it’s not, because you should also monitor and troubleshoot application performance issues regardless of how many replicas you have. However, adding replicas can provide some cushion against issues that might cause unavailability.

3. Optimizing resource utilization

Deployment scaling helps to achieve optimal rates of resource utilization by striking a healthy balance between how many pod replicas are running (and, hence, how much memory and CPU they are consuming), and how much traffic the replicas need to support. If you have more replicas than you need, you waste resources – and the money necessary to purchase those resources.

4. Performing load testing

The kubectl scale command can be useful for load testing scenarios where you want to see how well your application performs in response to heavier load. By scaling replicas down, you can effectively increase the load placed on the app, so deployment scaling is a way of testing the load without actually having to direct more requests to the app.

Basic usage of kubectl scale deployment

The basic syntax for the kubectl scale deployment command is as follows:

kubectl scale <desired replica count> <deployment name>

Scaling a deployment: Example

For example, the following command tells Kubernetes to create three replicas of the deployment named my-deployment:

kubectl scale --replicas=3 deployment/my-deployment

In addition to specifying the name of a deployment, you can specify a YAML file, replica set, replication controller, or stateful set. The syntax is the same as above, except you replace the deployment name with the name of the type of object you want to scale.

For instance, imagine you have four replication controllers named rc1, rc2, rc3 and rc4. To scale all of these replication controllers at once, you could use the following command:

kubectl scale --replicas=3 rs/my-rs rc/rc1 rc/rc2 rc/rc3 rc/rc4

There are also various optional arguments for the scaling command. For example, if you want to allow scaling only if there is a current resource version match, use the --resource-version string to define the current resource version match you want to target.

Scaling down deployments

To scale a deployment down, simply specify a replica count that is lower than the existing total. You can check the current number of replicas using kubectl to describe deployments by deployment-name.

If you currently have 5 replicas and you want to scale down to 3, you’d use the following scale operation command:

kubectl scale --replicas=3 deployments/my-deployment

Scaling down to zero

To reduce the replicas of a deployment to zero, specify 0 as the replica count. For example:

kubectl scale --replicas=0 deployment/my-deployment

Scaling to zero will stop all pods associated with the deployment, essentially reducing resource utilization to nothing. However, you can easily scale back up later by increasing the replica count to a number greater than zero – so by scaling to zero, you keep your deployment readily available and can re-enable the application quickly without actually having to redeploy it.

Scaling down all deployments in a namespace

In some cases, you may wish to scale down all deployments in a given namespace. This can be helpful if, for example, you want to free up resources by scaling down all pods hosted in a dev/test namespace.

To do this, specify the namespace name and include the --all flag when running the kubectl scale command. For example:

kubectl scale --replicas=0 deploy -n some-namespace --all

Scaling up deployments

Scaling up deployments is as simple as scaling down: You simply use the scale command and specify a higher number of replicas than the current count.

For example, if you currently have three replicas but want to scale up to five, use:

kubectl scale --replicas=5 deployments/my-deployment

Automating scaling with Horizontal Pod Autoscaler (HPA)

The kubectl scale command lets you scale applications up or down manually. But what if you want them to scale automatically in response to changes in resource utilization, without waiting for an admin to log in and scale them?

In that case, you can use the Horizontal Pod Autoscaler (HPA), a Kubernetes feature that automatically scales pod replicas based on metrics like pod CPU and memory usage.

Diagram explaining how Horizontal Pod Autoscaler (HPA) works, with a control loop querying metrics, calculating replicas, and scaling pods in a Kubernetes cluster every 15 seconds.

To enable HPA, you must first include requests and limits when configuring your deployment by including specs like the following in the deployment definition:

resources:
  requests:
    cpu: "100m"
    memory: "128Mi"
  limits:
    cpu: "200m"
    memory: "256Mi"

Then, create an HPA resource that defines the conditions for autoscaling. For example, the following tells the HPA to maintain a target average CPU utilization rate of 50 percent. It also sets the allowable replica range as 1 to 10 (meaning the HPA can create a minimum of 1 and a maximum of 10 replicas).

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: my-hpa
spec:
  scaleTargetRef:
	apiVersion: apps/v1
	kind: Deployment
	name: my-deployment
  minReplicas: 1
  maxReplicas: 10
  metrics:
  - type: Resource
	resource:
  	   name: cpu
  	   target:
    	      type: Utilization
    	      averageUtilization: 50

With this configuration in place, the HPA will monitor the deployment and add replicas when CPU utilization begins approaching 50 percent (since adding replicas will make more CPU available and, in general, lead to lower overall CPU utilization). It will also remove replicas if utilization starts descending below 50 percent.

Managing replica sets and replication controllers

It’s important not to confuse deployment scaling with replica sets or replication controllers. In Kubernetes, replica sets and replication controllers are a predefined number of replicas that should exist for a given deployment.

Diagram illustrating a ReplicaSet in Kubernetes managing multiple pods, ensuring six replicas are maintained even when a pod fails or is terminated.

You can define replica count when creating a deployment by including a replicas: field in the deployment spec. This creates a replica set. You can also create a standalone replication controller, which is separate from the deployment.

When you create a replica set or replication controller, Kubernetes will maintain the selected number of replicas automatically, unless you tell it not to.

Scaling down a replica set

You can use the kubectl scale to modify the replica account via manual scaling if desired. For example, if you have a replica set named my-rs that originally included 5 replicas, but you want to scale it down to 3, you could do so with:

kubectl scale --replicas=3 rs/my-rs

Considerations for scaling in multi-tenant environments

When scaling in a multi-tenant Kubernetes cluster – meaning one that hosts multiple workloads and is used by multiple users or groups – it’s important to keep in mind that scaling pods up could impact the performance of other workloads. If you scale too aggressively, you might not leave enough memory or CPU available for other users’ applications to function normally.

One way to help mitigate this issue is to enforce namespace quotas, which set limits on how many resources are available within each namespace. If cluster admins give each user or group a dedicated namespace and enforce quotas of them, scaling up in one namespace won’t constrain the resource availability of other users or groups.

Limitations of kubectl scale

While kubectl scale is a valuable command, it has limitations:

  • Manual usage: You must invoke kubectl scale manually, which makes it a poor solution in cases where workload resource utilization fluctuates very quickly. In that case, you’re better off using an automated scaling solution like HPA.
  • Performance impacts: As noted above, scaling one workload up could deprive other workloads of adequate resources, leading to Kubernetes cluster instability.
  • Exacerbation of performance problems: In cases where a workload is underperforming due to issues like memory leaks, scaling up could make the problem worse because it will lead to even higher levels of resource consumption. It may improve workload performance in the short term, but it’s not a substitute for fixing deeper performance flaws in an app.

For these reasons, it’s important to know when alternative solutions are better than kubectl scale. Again, Horizontal Pod Autoscaling may be a better choice for use cases where scaling needs to occur automatically. In addition, monitoring and observability tools can help to reduce the need for workload scaling by allowing teams to get ahead of performance issues before it becomes necessary to scale a deployment. 

Best practices for scaling a deployment

| Practice | Description | |---|---| | Understand workload requirements | Know how many resources your workloads are likely to need so you can select the appropriate number of replicas. | | Scale gradually | Avoid adding or removing too many replicas at once. | | Consider horizontal scaling | Automatically scale based on pod resource utilization metrics. | | Consider vertical scaling | Automatically add more CPU and memory to pods based on resource utilization (not available in all Kubernetes distributions). | | Consider cluster autoscaling | Automatically add nodes to reduce the risk of resource-constrained pods. | | Monitor and troubleshoot | Monitor workloads to identify root-cause performance problems and avoid scaling as a Band-Aid solution. |

If you do choose to use kubectl scale, consider the following best practices to make the process as smooth as possible:

Understand workload requirements

Make sure you know how much CPU, memory and other resources your workloads typically require relative to request rates so that you can determine whether scaling up is actually necessary.

Blindly scaling as a way of addressing performance degradation that stems from problems within an app is a recipe for failure because eventually, your Kubernetes cluster will run out of resources.

Scale gradually

In general, it’s best to perform scale operations gradually by adding just one or two replicas at a time. Scaling more rapidly risks adding too much load to your cluster at once. You may also tie up resources by creating more replicas than you need, leading to higher costs and potentially undercutting the performance of other deployments.

Consider horizontal scaling

As we said, the Horizontal Pod Autoscaler can be superior to kubectl scale for use cases that require rapid, automated scaling.

Consider vertical scaling

Some Kubernetes distributions support vertical scaling, which automatically changes pod memory and CPU allocations without changing the replica count. Vertical scaling can be useful in situations where you want to maintain a consistent number of replicas but still allow your workload to scale.

Consider cluster autoscaling

Some Kubernetes distributions support cluster autoscaling, which adds nodes to the cluster automatically when resource availability becomes constrained.

Cluster autoscaling isn’t a substitute for deployment scaling, but it can help to make deployment scaling unnecessary by making additional nodes available. That way, Kubernetes can reschedule resource-hungry pods to other nodes, where the pods will perform adequately without requiring replicas.

Monitor and troubleshoot

Monitoring your deployments and clusters, and troubleshooting performance issues, are essential for getting to the root cause of performance problems.

Again, deployment scaling can be a short-term solution in cases where a workload is underperforming due to bugs in the application itself. But it’s not a substitute for fixing underlying bugs and optimizing the performance of the workload itself. Scaling a deployment might ease performance issues in the short term, but be prepared to use other tools – like kubectl logs tail and Kubernetes events monitoring – to mitigate root-cause issues in cases where performance degradation is not a simple mismatch between application load and available replicas.

How to scale a deployment with help from groundcover

As a Kubernetes monitoring and observability solution, groundcover won’t scale your deployments for you. But it will clue you into circumstances where you should consider scaling – such as pods that are dropping requests or experiencing high latency rates.

groundcover dashboard showing workload metrics with graphs for pod and node resource usage, including CPU and memory, in a Kubernetes cluster.

In addition, groundcover helps you get to the root cause of complex performance issues in Kubernetes. This means you can make informed decisions about whether your deployment simply needs to be scaled to achieve stable performance, or you have other issues – like buggy code – that you should address to optimize performance and resource utilization.

Kubectl scale: A valuable tool for Kubernetes performance optimization

While kubectl’s scale command is not the right solution for every performance challenge your pods may face, it can help to improve performance under many circumstances – which is why knowing how to scale deployments (as well as replica sets and the various other types of resources that kubectl can scale) is a key pillar of Kubernetes performance management.

Sign up for Updates

Keep up with all things cloud-native observability.

We care about data. Check out our privacy policy.

We care about data. Check out our privacy policy.

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.