To use FAISS with Sentence Transformer embeddings for efficient similarity search, you first generate text embeddings using a Sentence Transformer model, then build a FAISS index to store and search these vectors. FAISS optimizes search speed and memory usage through techniques like vector quantization and approximate nearest neighbor (ANN) algorithms. For example, you might use the all-MiniLM-L6-v2
model from Sentence Transformers to convert text into 384-dimensional vectors, then index them using FAISS’s IVF_FLAT
or IVF_PQ
methods to balance speed and accuracy. This setup allows you to query the index with a new text vector and retrieve the closest matches in milliseconds, even with millions of entries.
Here’s how to integrate the two libraries: Start by encoding your text data into embeddings. Using Sentence Transformers, load a pre-trained model like paraphrase-MiniLM-L3-v2
and generate embeddings for your dataset. Next, initialize a FAISS index—for instance, IndexFlatL2
for exact searches or IndexIVFFlat
for faster approximate searches. Add the embeddings to the index using index.add(embeddings)
. When querying, encode the search text into a vector and call index.search(query_vector, k)
to retrieve the top k
matches. For example, a code snippet might look like:
from sentence_transformers import SentenceTransformer
import faiss
model = SentenceTransformer('paraphrase-MiniLM-L3-v2')
sentences = ["First text", "Second text", ...]
embeddings = model.encode(sentences)
index = faiss.IndexFlatL2(embeddings.shape[1])
index.add(embeddings)
query = "Search text"
query_embedding = model.encode([query])
distances, indices = index.search(query_embedding, k=5)
Practical considerations include choosing the right FAISS index type. For small datasets (under 10,000 entries), IndexFlatL2
provides exact results but is slower for large data. For scalability, use IndexIVFFlat
or IndexIVFPQ
, which partition data into clusters (e.g., 100 clusters for 1M vectors) to reduce search scope. If using cosine similarity, normalize embeddings before indexing since FAISS defaults to L2 distance. For example, apply faiss.normalize_L2(embeddings)
to align with cosine similarity. Additionally, leverage GPU acceleration via faiss.StandardGpuResources()
for datasets exceeding 1M entries. Always test recall rates (e.g., 90%+ accuracy) to ensure the index configuration meets your accuracy-speed tradeoff.
Zilliz Cloud is a managed vector database built on Milvus perfect for building GenAI applications.
Try FreeLike the article? Spread the word