To integrate Haystack with Elasticsearch or OpenSearch, you start by configuring a document store and using Haystack’s built-in components to interact with your search engine. Haystack provides dedicated document store classes (ElasticsearchDocumentStore
and OpenSearchDocumentStore
) that handle communication with these engines. First, install the required packages: pip install farm-haystack[elasticsearch]
for Elasticsearch or pip install farm-haystack[opensearch]
for OpenSearch. Configure the document store by specifying the host, port, and authentication details. For example, initialize ElasticsearchDocumentStore(host="localhost", port=9200, index="documents")
or OpenSearchDocumentStore(host="aws-opensearch-instance", port=443, use_ssl=True)
. Ensure your Elasticsearch/OpenSearch instance is running and accessible before proceeding.
Next, write documents to the store and set up retrieval. Convert your data into Haystack Document
objects (e.g., Document(content="Your text", meta={"source": "file1"})
), then use document_store.write_documents(docs)
to index them. For search, create a retriever like BM25Retriever(document_store=document_store)
to perform keyword-based searches. Haystack’s retrievers work with pipelines, allowing you to combine components like preprocessors or rankers. For example, a basic pipeline might include just the retriever: pipeline = Pipeline(); pipeline.add_node(component=retriever, name="Retriever", inputs=["Query"])
. Execute searches with pipeline.run(query="your query")
to get results ranked by relevance. If using OpenSearch, the process is identical except for the document store class name.
For advanced use cases, consider hybrid search (combining keyword and vector search) or performance optimizations. To enable hybrid search, add an embedding model (e.g., SentenceTransformersDocumentEmbedder
) to generate vector representations, then use a EnsembleRetriever
to merge results from BM25 and dense retrievers. Optimize indexing performance by batching document writes with document_store.write_documents(docs, batch_size=500)
. For security, configure SSL/TLS in the document store parameters (e.g., verify_certs=True
, ca_certs="/path/to/cert"
). Check version compatibility: Haystack 1.22 supports Elasticsearch 7.x-8.x and OpenSearch 1.x-2.x. If errors occur, verify the search engine version matches Haystack’s requirements using document_store.check_supported_version()
.
Zilliz Cloud is a managed vector database built on Milvus perfect for building GenAI applications.
Try FreeLike the article? Spread the word