AISAQCompatible with Milvus 2.6.4+

AISAQ is a disk-based vector index that extends DISKANN to handle billion-scale datasets with a minimal DRAM footprint.

Unlike DISKANN, which keeps compressed vectors in memory, AISAQ is designed with a “Near-Zero DRAM Architecture” which means holding all data structures on SSD.

AISAQ enables running ultra-high scale databases using standard servers while offering operation modes to balance performance and storage costs.

How AISAQ works

The diagram above compares the storage layouts of DISKANN, AISAQ-Performance, and AISAQ-Scale, showing how data (raw vectors, edge lists, and PQ codes) is distributed between RAM and disk.

Aisaq Vs Diskann Aisaq Vs Diskann

Foundation: DISKANN recap

In DISKANN, the raw vectors and edge lists are stored on disk, while PQ-compressed vectors are kept in memory (DRAM).

When DISKANN traverses to a node (e.g., vector 0):

  • It loads the raw vector (raw_vector_0) and its edge list (edgelist_0) from disk.

  • The edge list indicates which neighbors to visit next (nodes 2, 3, and 5 in this example).

  • The raw vector is used to calculate the exact distance to the query vector for ranking.

  • The PQ data in memory is used for approximate distance filtering to guide the next traversal.

Because the PQ data is already cached in DRAM, each node visit requires only one disk I/O, achieving high query speed with moderate memory usage.

For a detailed explanation of these components and parameters, refer to DISKANN.

AISAQ Operation Modes

AISAQ offers two modes of operation to address two distinct use cases:

Performance mode: optimized for applications that require low latency and high throughput at scale, such as online semantic search.

Scale mode: optimized for applications with more relaxed latency constraints, such as RAG and offline semantic search, while enabling cost-efficient expansion of datasets to ultra-high scale.

AISAQ-performance mode

AISAQ-performance achieves “Near-Zero DRAM footprint” by moving PQ data from memory to disk while maintaining low IOPS through data colocation and redundancy.

  • Each node’s raw vector, edge list, and its neighbors’ PQ data are stored together on disk.

  • This layout ensures that visiting a node (e.g., vector 0) still requires only a single disk I/O.

  • Since PQ data is redundantly stored near multiple nodes, the index file size increases significantly, consuming more disk space.

AISAQ-scale mode

AISAQ-scale focuses on reducing disk space usage while meeting the performance requirements of its target applications.

In this mode:

  • PQ data is stored separately on disk, without redundancy.

  • This design minimizes index size but leads to more I/O operations during graph traversal.

  • To mitigate the IOPS overhead, AISAQ introduces two optimizations:

    • A rearrange algorithm that sorts PQ vectors by priority to improve data locality.

    • A PQ cache in DRAM (pq_read_page_cache_size) that caches frequently accessed PQ data.

Example configuration

# milvus.yaml
knowhere:
  AISAQ:
    build:
      max_degree: 56 # Controls the maximum number of connections (edges) each data point can have in the Vamana graph
      search_list_size: 100 # During index construction, this parameter defines the size of the candidate pool used when searching for the nearest neighbors for each node. For every node being added to the graph, the algorithm maintains a list of the search_list_size best candidates found so far. The search for neighbors stops when this list can no longer be improved. From this final candidate pool, the top max_degree nodes are selected to form the final edges
      inline_pq: -1 # Number of PQ vectors stored inline per Index node (read when node is accessed, to reduce IO)
      rearrange: true # Re-arrange the PQ vectors data structure to improve data locality and reduce disk accesses during search (ignored in performance mode)
      num_entry_points: 100 # Number of candidate entry points to optimize search entry-point selection
      pq_code_budget_gb_ratio: 0.125 # Controls the size of the PQ codes (compressed representations of data points) compared to the size of the uncompressed data
      disk_pq_code_budget_gb_ratio: 0.25 # Controls the size of the PQ codes of the high precision vectors stored in the index (used for re-ranking), compared to the size of the uncompressed data
      pq_cache_size: 0 # PQ vectors cache size in DRAM (bytes). The PQ vectors cache is loaded during Index load and used during search to reduce IOs (ignored in performance mode)
      search_cache_budget_gb_ratio: 0 # Controls the amount of DRAM to be used for caching frequently accessed index nodes. This cache is loaded during index load and used during search to reduce IOs
    search:
      search_list: 16 # During a search operation, this parameter determines the size of the candidate pool that the algorithm maintains as it traverses the graph. A larger value increases the chances of finding the true nearest neighbors (higher recall) but also increases search latency
      beamwidth: 8 # Controls the degree of parallelism during search by determining the maximum number of parallel disk I/O requests to read the index nodes
      vectors_beamwidth: 1 # Controls the degree of parallelism during search by determining the maximum number of parallel disk I/O requests to read groups of neighboring PQ vectors (ignored in performance mode)
      pq_read_page_cache_size: 5242880 (5MiB) # PQ read cache size in DRAM per search thread (bytes). It caches frequently accessed data pages containing PQ vectors (ignored in performance mode and applicable only when rearrange is true). The PQ read cache memory is reused across all AISAQ segments

AISAQ parameters

AISAQ inherits some parameters from DISKANN - max_degree, search_list_size, and pq_code_budget_gb_ratio.

Index-building params

These parameters influence how the AISAQ index is constructed. Adjusting them can affect the index size, build time, and search quality.

Parameter

Description

Value Range

Tuning Suggestion

max_degree

Controls the maximum number of connections (edges) each data point can have in the Vamana graph.

Type: Integer

Range: [1, 512]

Default value: 56

Higher values create denser graphs, potentially increasing recall (finding more relevant results) but also increasing memory usage and build time. In most cases, we recommend you set a value within this range: [10, 100].

search_list_size

During index construction, this parameter defines the size of the candidate pool used when searching for the nearest neighbors for each node. For every node being added to the graph, the algorithm maintains a list of the search_list_size best candidates found so far. The search for neighbors stops when this list can no longer be improved. From this final candidate pool, the top max_degree nodes are selected to form the final edges.

Type: Integer

Range: [1, 512]

Default value: 100

A larger search_list_size increases the likelihood of finding the true nearest neighbors for each node, which can lead to a higher-quality graph and better search performance (recall). However, this comes at the cost of a significantly longer index build time. It should always be set to a value greater than or equal to max_degree.

inline_pq

Number of PQ vectors stored inline per Index node (read when node is accessed, to reduce IO)

Type: Integer

Range: [0, max_degree]

Default value: -1

Higher values of inline_pq improve performance but increase disk space.

Set inline_pq=0 for AISAQ in scale mode.

Set inline_pq=-1 to automatically fill any unused space in the index with PQ vectors for further optimization of AISAQ in scale mode.

Set inline_pq=max_degree for AISAQ in performance mode.

inline_pq settings in between 0 and max_degree enable an adjustable balance between performance and disk-space consumption.

rearrange

Re-arrange the PQ vectors data structure to improve data locality and reduce disk accesses during search (ignored in performance mode).

Type: Boolean

Range: [true, false]

Default value: true

When true, reduces IOs during search with only minor increase in memory and in index build time.

num_entry_points

Number of candidate entry points to optimize search entry-point selection.

Type: Integer

Range: [0, 1000]

Default value: 100

High values may reduce the search time by starting the search from a closer entry point.

Set higher values for large segments (e.g. for 10M vectors and above use value of 1000).

pq_code_budget_gb_ratio

Controls the size of the PQ codes (compressed representations of data points) compared to the size of the uncompressed data.

Type: Float

Range: (0.0, 0.25]

Default value: 0.125

A higher ratio leads to more accurate search results, effectively storing more information about the original vectors but increases computational complexity during search.

In most cases, we recommend you set a value within this range: (0.0417, 0.25].

disk_pq_code_budget_gb_ratio

Controls the size of the PQ codes of the high precision vectors stored in the index (used for re-ranking), compared to the size of the uncompressed data.

Type: Float

Range: [0, 0.25]

Default value: 0.25

With the default value of 0.25, vectors will be quantized to 25% of their original size (4× compression), reducing disk footprint with relatively minimal accuracy impact.

Set value of 0 to store full precision vectors in disk index for re-ranking. A larger value offers a higher recall rate but increases disk usage.

pq_cache_size

PQ vectors cache size in DRAM (bytes). The PQ vectors cache is loaded during Index load and used during search to reduce IOs (ignored in performance mode).

Type: Integer

Range: [0, 1073741824]

Default value: 0

Larger cache improves query performance but increases DRAM usage.

search_cache_budget_gb_ratio

Controls the amount of DRAM to be used for caching frequently accessed index nodes

This cache is loaded during index load and used during search to reduce IOs.

Type: Float

Range: [0.0, 0.3)

Default value: 0

A higher value allocates more memory for caching, reducing disk IOs but consuming more system memory. A lower value uses less memory for caching, potentially increasing the need for disk access.

Index-search params

These parameters influence how AISAQ performs searches. Adjusting them can impact search speed, latency, and resource usage.

Parameter

Description

Value Range

Tuning Suggestion

search_list

During a search operation, this parameter determines the size of the candidate pool that the algorithm maintains as it traverses the graph. A larger value increases the chances of finding the true nearest neighbors (higher recall) but also increases search latency.

Type: Integer

Range: [topk, int32_max]

Default value: 16

For a good balance between performance and accuracy, it is recommended to set this value to be equal to or slightly larger than the number of results you want to retrieve (top_k).

beamwidth

Controls the degree of parallelism during search by determining the maximum number of parallel disk I/O requests to read the index nodes.

Type: Integer

Range: [1, 16]

Default value: 8

Higher values increase parallelism, which can speed up search on systems with powerful CPUs and SSDs. However, setting it too high might lead to excessive resource contention.

In most cases, we recommend you set a value of 2.

vectors_beamwidth

Controls the degree of parallelism during search by determining the maximum number of parallel disk I/O requests to read groups of neighboring PQ vectors (ignored in performance mode).

Type: Integer

Range: [1, 4] must be <= beamwidth

Default value: 1

Higher values increase parallelism, which can speed up search on systems with powerful CPUs and SSDs. However, setting it too high might lead to excessive resource contention, as each neighboring PQ vector group may contain up to max_degree vectors.

In most cases, we recommend you set a value of 1.

pq_read_page_cache_size

PQ read cache size in DRAM per search thread (bytes). It caches frequently accessed data pages containing PQ vectors (ignored in performance mode and applicable only when rearrange is true).

The PQ read cache memory is reused across all AISAQ segments.

Type: Integer

Range: [0, 33554432]

Default value: 5242880 (5MiB)

Larger cache improves query performance but increases DRAM usage.

Recommended values range from 2 MiB for small segments (1 M vectors), 5 MiB for medium segments (50 M vectors) and 10 MiB for large segments (250 M vectors).

Try Managed Milvus for Free

Zilliz Cloud is hassle-free, powered by Milvus and 10x faster.

Get Started
Feedback

Was this page helpful?