🚀 Try Zilliz Cloud, the fully managed Milvus, for free—experience 10x faster performance! Try Now>>

Milvus
Zilliz
  • Home
  • AI Reference
  • What is the impact of using disk-based ANN methods (where part of the index is on SSD/HDD) on query latency compared to fully in-memory indices?

What is the impact of using disk-based ANN methods (where part of the index is on SSD/HDD) on query latency compared to fully in-memory indices?

Using disk-based approximate nearest neighbor (ANN) methods, where part of the index is stored on SSDs or HDDs, typically results in higher query latency compared to fully in-memory indices. This is because accessing data from disk involves slower I/O operations, even when using fast storage like SSDs. In-memory indices avoid this overhead by keeping all data in RAM, which has orders-of-magnitude faster read speeds. For example, a query that requires traversing multiple layers of a hierarchical ANN index might take microseconds in memory but milliseconds on disk due to the need to fetch data blocks from storage.

The performance gap depends on how the disk-based system balances data locality and I/O efficiency. Systems like DiskANN optimize for disk use by structuring data to minimize random access, leveraging sequential reads where possible. However, even with optimizations, SSDs have latency around 100 microseconds for small reads, while RAM access is often under 100 nanoseconds. This difference adds up during queries requiring multiple disk seeks. For instance, a search involving 10 disk accesses could take 1-2 milliseconds on SSD, while an in-memory equivalent might finish in 0.1 milliseconds. HDD-based systems perform worse, with seek times often exceeding 5 milliseconds per access, making them impractical for low-latency applications.

Despite higher latency, disk-based ANN methods enable handling larger datasets at lower cost. A 1TB dataset might require expensive server-grade RAM for in-memory indexing, whereas disk-based systems can use affordable SSDs. Developers often accept slower queries when scaling beyond memory limits—for example, a recommendation system with billions of vectors might use disk-backed indices to avoid prohibitive hardware costs. Hybrid approaches, like caching frequently accessed data in memory while keeping the rest on disk, can mitigate latency for common queries. Ultimately, the choice depends on balancing cost, dataset size, and acceptable response times for the specific use case.

Like the article? Spread the word