🚀 Try Zilliz Cloud, the fully managed Milvus, for free—experience 10x faster performance! Try Now>>

Milvus
Zilliz

What are the core components of Haystack?

Haystack is an open-source framework for building search and question-answering systems. Its core components include Document Stores, Retrievers, Readers, and Pipelines, which work together to process, retrieve, and analyze text data. Each component serves a distinct purpose, enabling developers to build scalable and customizable search pipelines. The framework emphasizes modularity, allowing users to swap tools or models depending on their needs while maintaining a consistent workflow.

Document Stores and Retrievers form the foundation of Haystack’s data handling. Document Stores, such as Elasticsearch, FAISS, or Milvus, store unstructured text (e.g., PDFs, web pages) in a structured format optimized for fast retrieval. Retrievers then query these stores to fetch relevant documents. For example, a sparse retriever like BM25 uses keyword matching, while a dense retriever like DensePassageRetriever uses neural embeddings to find semantically similar texts. Hybrid retrievers combine both approaches for better accuracy. Developers can choose the right retriever based on their use case—keyword-heavy tasks might favor BM25, while semantic search benefits from dense retrievers.

Readers and Pipelines handle downstream processing. Readers extract answers from retrieved documents, often using transformer models like BERT or RoBERTa. For instance, the FARMReader fine-tunes models on custom datasets to improve answer precision. Pipelines tie components together, defining workflows like “retrieve-then-read” for QA systems. A typical ExtractiveQAPipeline connects a retriever and reader, first fetching documents and then scanning them for answers. For generative tasks, pipelines can include a PromptNode to generate responses using models like GPT-3.5. Haystack also offers REST APIs for deployment and tools like the LabelingUI to annotate data. This modular design lets developers adapt the framework to scenarios ranging from document search to chatbots, without reinventing core infrastructure.

Like the article? Spread the word