🚀 Try Zilliz Cloud, the fully managed Milvus, for free—experience 10x faster performance! Try Now>>

Milvus
Zilliz
  • Home
  • AI Reference
  • What is the method to integrate Sentence Transformer embeddings into an information retrieval system (for example, using them in an Elasticsearch or OpenSearch index)?

What is the method to integrate Sentence Transformer embeddings into an information retrieval system (for example, using them in an Elasticsearch or OpenSearch index)?

To integrate Sentence Transformer embeddings into an information retrieval system like Elasticsearch or OpenSearch, you need to generate dense vector representations of text and configure the search engine to use them for similarity-based queries. The process involves three main steps: embedding generation, index configuration, and query handling. Here’s how to approach it:

1. Generate and Store Embeddings First, use a pre-trained Sentence Transformer model (e.g., all-MiniLM-L6-v2) to convert text into dense vector embeddings. For example, you might process all documents in your dataset offline using a Python script, producing 384-dimensional vectors (the output size of the MiniLM model). These embeddings are stored as fields in your Elasticsearch/OpenSearch documents. When configuring the index, define a dense_vector field type with the correct dimensions. In Elasticsearch, this would look like:

"mappings": {
 "properties": {
 "text_embedding": {
 "type": "dense_vector",
 "dims": 384
 }
 }
}

During data ingestion, populate this field with the precomputed vectors.

2. Configure Similarity Search To perform semantic search, convert the user’s query text into an embedding using the same model, then use a vector similarity metric (e.g., cosine similarity) to find matching documents. In Elasticsearch, this is done via a script query:

{
 "query": {
 "script_score": {
 "query": {"match_all": {}},
 "script": {
 "source": "cosineSimilarity(params.query_vector, 'text_embedding') + 1.0",
 "params": {"query_vector": [0.12, -0.45, ...]} 
 }
 }
 }
}

OpenSearch offers built-in k-NN support, allowing direct configuration of approximate nearest neighbor (ANN) search in the index settings for faster performance on large datasets.

3. Optimize for Performance Precompute embeddings during indexing to avoid runtime overhead. For large datasets, use OpenSearch’s ANN support with HNSW graphs or Elasticsearch’s approximate rank_feature optimizations. Consider hybrid approaches: combine vector search with traditional keyword-based scoring (e.g., BM25) using a weighted sum in the script_score to balance semantic and lexical relevance. Batch embedding generation (e.g., 100 texts at a time) and GPU acceleration (if available) can reduce processing time. Monitor latency and accuracy trade-offs—exact vector searches are slower but more precise, while ANN methods like HNSW sacrifice minimal accuracy for faster results.

Like the article? Spread the word