Storing raw vectors, compressed representations, or references to vectors involves distinct trade-offs between retrieval speed and storage efficiency. Raw vectors are the original, uncompressed numerical representations (e.g., 512-dimensional float32 arrays). They provide the fastest retrieval because no decoding is required, but they consume significant storage. Compressed representations, such as quantized or encoded versions of vectors, reduce storage costs but require computational effort to decompress during retrieval. References (e.g., database IDs or external keys) minimize local storage by pointing to vectors stored elsewhere, but retrieval speed depends on the external system’s latency. Each approach balances storage savings against retrieval performance differently.
Storage savings vary dramatically. A raw 1024-dimensional float32 vector occupies 4 KB (1024 * 4 bytes). Storing 1 million such vectors requires ~4 GB. Compression methods like product quantization (PQ) can reduce this by grouping subvectors into codebooks. For example, PQ might represent each subvector with an 8-bit index, cutting storage to ~256 MB—a 16x reduction. References, such as 8-byte database IDs, require only ~8 MB for 1 million entries. However, compressed vectors lose precision, impacting retrieval accuracy, while references offload storage to another system (e.g., a database) without reducing the total data volume. The choice depends on whether you prioritize local storage reduction or are willing to manage external dependencies.
Retrieval speed is heavily influenced by data accessibility. Raw vectors allow immediate use in similarity calculations, which is critical for low-latency applications like real-time recommendation systems. Compressed vectors require decoding: for example, PQ-based systems must reconstruct approximate vectors from codebooks, adding overhead. While libraries like FAISS optimize this with precomputed lookup tables, the process still adds latency compared to raw vectors. References introduce the most variability: fetching a vector via a network call to a database could take milliseconds, making them unsuitable for high-throughput scenarios. However, if references point to in-memory caches, latency can be minimized. Developers must weigh whether storage savings justify slower retrieval or increased system complexity (e.g., maintaining a separate vector store).
Zilliz Cloud is a managed vector database built on Milvus perfect for building GenAI applications.
Try FreeLike the article? Spread the word