This is because, after restarting, Milvus needs to load data from the disk to the memory for the first vector search. You can set
preload_collection in server_config.yaml and load as many collections as the memory permits. Milvus loads collections to the memory each time it restarts.
Otherwise, you can call
load_collection() to load collections to the memory.
Check if the value of
cache.cache_size in server_config.yaml is greater than the size of the collection.
- Ensure that the value of
cache.cache_sizein server_config.yaml is greater than the size of the collection.
- Ensure that all segments are indexed.
- Check if there are other processes on the server consuming CPU resources.
- Adjust the values of
- If the search performance is unstable, you can add
-e OMP_NUM_THREADS=NUMwhen starting up Milvus, where
NUMis 2/3 of the number of CPU cores.
See Performance tuning for more information.
In general terms, the recommended value of
4 × sqrt(n), where n is the total number of entities in a segment.
nprobe is a trade-off between search performance and accuracy, and based on your dataset and scenario. It is recommended to run several rounds of tests to determine the value of
The following charts are from a test running on the sift50m dataset and IVF_SQ8 index. The test compares search performance and recall rate between different
We only show the results of GPU-enabled Milvus here, because the two distributions of Milvus show similar results.
Key takeaways: This test shows that the recall rate increases with the
Key takeaways: When
nlist is 4096 and
nprobe 128, Milvus shows the best search performance.
If the size of the dataset is smaller than the value of
index_file_size that you set when creating a collection, Milvus does not create an index for this dataset. Therefore, the time to query in a small dataset may be longer. You may as well call
create_index to build the index.
It is very likely that Milvus is using CPU for query. If you want to use GPU for query, you need to set the value of
gpu_search_threshold in server_config.yaml to be less than
nq (number of vectors per query).
You can use
gpu_search_threshold to set the threshold: when
nq is less than this value, Milvus uses CPU for queries; otherwise, Milvus uses GPU instead.
We do not recommend enabling GPU when the query number is small.
This is because the data has not been flushed from memory to disk. To ensure that data can be searched immediately after insertion, you can call
flush. However, calling this method too often creates too many small files and affects search speed.
Milvus processes queries in parallel. An
nq less than 100 and data on a smaller scale do not require high level of parallelism, hence the CPU usage stays low.
You need to set
index_file_size when creating a collection from a client. This parameter specifies the size of each segment, and its default value is
1024 in MB. When the size of newly inserted vectors reaches the specified volume, Milvus packs these vectors into a new segment. In other words, newly inserted vectors do not go into a segment until they grow to the specified volume. When it comes to creating indexes, Milvus creates one index file for each segment. When conducting a vector search, Milvus searches all index files one by one.
As a rule of thumb, we would see a 30% ~ 50% increase in the search performance after changing the value of
index_file_size from 1024 to 2048. Note that an overly large
index_file_size value may cause failure to load a segment into the memory or graphics memory. Suppose the graphics memory is 2 GB and
index_file_size 3 GB, each segment is obviously too large.
In situations where vectors are not frequently inserted, we recommend setting the value of
index_file_size to 1024 MB or 2048 MB. Otherwise, we recommend setting the value to 256 MB or 512 MB to keep unindexed files from getting too large.
See Performance Tuning > Index for more information.
When the client and the server are running on the same physical machine, it takes about 0.8 second to import 100,000 128-dimensional vectors (to an SSD disk). More specifically, the performance depends on the I/O speed of your disk.
- If the newly inserted vectors have not grown to the specified volume to trigger index creation, Milvus needs to load these data directly from disk to memory for a vector search.
- As of v0.9.0, if Milvus has started creating indexes for the newly inserted vectors, an incoming vector search interrupts the index creation process, causing a delay of about one second.
If your batch query is on a small scale (
nq < 64), Milvus combines the query requests, in which case multi-threading helps.
Otherwise, the resources are already exhausted, hence multi-threading does not help much.
Generally speaking, CPU-only query works for situations where
nq (number of vectors per query) is small, whilst GPU-enabled query works best with a large
nq, say 500.
Milvus needs to load data from the memory to the graphics memory for a GPU-enabled query. Only when the load time is negligible compared to the time to query, is GPU-enabled query faster.