Configure Horizontal Pod Autoscaling (HPA) for Milvus
Overview
Horizontal Pod Autoscaling (HPA) is a Kubernetes feature that automatically adjusts the number of Pods in a deployment based on resource utilization, such as CPU or memory. In Milvus, HPA can be applied to stateless components like proxy, queryNode, dataNode, and indexNode to dynamically scale the cluster in response to workload changes.
This guide explains how to configure HPA for Milvus components using the Milvus Operator.
Prerequisites
- A running Milvus cluster deployed with Milvus Operator.
- Access to
kubectlfor managing Kubernetes resources. - Familiarity with Milvus architecture and Kubernetes HPA.
Configure HPA with Milvus Operator
To enable HPA in a Milvus cluster managed by the Milvus Operator, follow these steps:
Set Replicas to -1:
In the Milvus custom resource (CR), set the
replicasfield to-1for the component you want to scale with HPA. This delegates scaling control to HPA instead of the operator. You can edit the CR directly or use the followingkubectl patchcommand to quickly switch to HPA control:kubectl patch milvus <your-release-name> --type='json' -p='[{"op": "replace", "path": "/spec/components/proxy/replicas", "value": -1}]'Replace
<your-release-name>with the name of your Milvus cluster.To verify that the change has been applied, run:
kubectl get milvus <your-release-name> -o jsonpath='{.spec.components.proxy.replicas}'The expected output should be
-1, confirming that theproxycomponent is now under HPA control.Alternatively, you can define it in the CR YAML:
apiVersion: milvus.io/v1beta1 kind: Milvus metadata: name: <your-release-name> spec: mode: cluster components: proxy: replicas: -1Define an HPA Resource:
Create an HPA resource to target the deployment of the desired component. Below is an example for the
proxycomponent:apiVersion: autoscaling/v2 kind: HorizontalPodAutoscaler metadata: name: my-release-milvus-proxy-hpa spec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: my-release-milvus-proxy minReplicas: 2 maxReplicas: 10 metrics: - type: Resource resource: name: cpu target: type: Utilization averageUtilization: 60 - type: Resource resource: name: memory target: type: Utilization averageUtilization: 60 behavior: scaleUp: policies: - type: Pods value: 1 periodSeconds: 30 scaleDown: stabilizationWindowSeconds: 300 policies: - type: Pods value: 1 periodSeconds: 60Replace
my-releaseinmetadata.nameandspec.scaleTargetRef.namewith your actual Milvus cluster name (e.g.,<your-release-name>-milvus-proxy-hpaand<your-release-name>-milvus-proxy).Apply the HPA Configuration:
Deploy the HPA resource using the following command:
kubectl apply -f hpa.yamlTo verify that the HPA has been successfully created, run:
kubectl get hpaYou should see output similar to:
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE my-release-milvus-proxy-hpa Deployment/my-release-milvus-proxy <some>/60% 2 10 2 <time>The
NAMEandREFERENCEfields will reflect your cluster name (e.g.,<your-release-name>-milvus-proxy-hpaandDeployment/<your-release-name>-milvus-proxy).
scaleTargetRef: Specifies the deployment to scale (e.g.,my-release-milvus-proxy).minReplicasandmaxReplicas: Sets the scaling range (2 to 10 Pods in this example).metrics: Configures scaling based on CPU and memory utilization, targeting 60% average usage.
Conclusion
HPA allows Milvus to efficiently adapt to varying workloads. By using the kubectl patch command, you can quickly switch a component to HPA control without manually editing the full CR. For more details, refer to the Kubernetes HPA documentation.