Download the Prometheus tarball for your operating system.
Go to the directory holding the Prometheus file, and ensure that Prometheus is properly installed:
$ ./prometheus --versionYou can add the path to Prometheus to
PATH. This makes it easy to start Prometheus from any shell.
./pushgatewayYou must start Pushgateway before starting the Milvus Server.
Start the Prometheus monitor in server_config.yaml and set the address and port number of Pushgateway:
metric: enable: true # Set the value to true to enable the Prometheus monitor. address: <your_IP_address> # Set the IP address of Pushgateway. port: 9091 # Set the port number of Pushgateway.In the Kubernetes cluster, you need to set the server_config.yaml for each node to monitor.
Go to the Prometheus root directory, and download starter Prometheus configuration file for Milvus:
$ wget https://raw.githubusercontent.com/milvus-io/docs/master/v1.1.0/assets/monitoring/prometheus.yml \ -O prometheus.yml
Download starter alerting rules for Milvus to the Prometheus root directory:
wget -P rules https://raw.githubusercontent.com/milvus-io/docs/master/v1.1.0/assets/monitoring/alert_rules.yml
Edit the Prometheus configuration file according to your needs:
global: Configures parameters such as
global: scrape_interval: 2s # Set the crawl time interval to 2s. evaluation_interval: 2s # Set the evaluation interval to 2s.
alerting: Sets the address and port of Alertmanager.
alerting: alertmanagers: - static_configs: - targets: ['localhost:9093']
rule_files: Specifies the file that defines the alerting rules.
rule_files: - "alert_rules.yml"
targetsfor scraping data.
scrape_configs: - job_name: 'prometheus' static_configs: - targets: ['localhost:9090'] - job_name: 'pushgateway' honor_labels: true static_configs: - targets: ['localhost:9091']See Prometheus Configuration for more information about the configuration file of Prometheus.
After starting up Prometheus, you can display and render on its interface the metrics that Milvus provides. See Milvus Metrics for more information.
Proactively monitoring metrics contributes to identification of emerging issues. Creating alerting rules for events requiring immediate intervention is essential as well.
This section includes the most important events for which you must create alerting rules.
Server is down
- Rule: Send an alert when the Milvus server is down.
- How to detect: If the Milvus server is down, No Data is displayed for various metrics on the monitoring dashboard.
CPU/GPU temperature is too high
- Rule: Send an alert when the CPU/GPU temperature exceeds 80 degrees Celsius.
- How to detect: Check the metrics
GPU Temperatureon the monitoring dashboard.
Download the latest Alertmanager tarball for your operating system.
Ensure that Alertmanager is properly installed:
$ alertmanager --versionYou can add the path to Alertmanager to
PATH. This makes it easy to start Alertmanager from any shell.
Create the Alertmanager configuration file to specify the desired receivers for notifications, and add it to Alertmanager root directory.
Start the Alertmanager server, with the
--config.fileflag pointing to the configuration file:
Use your browser to open http://<hostname of machine running alertmanager>:9093, and use the Alertmanager UI to define rules for muting alerts.