To implement custom components in a Haystack pipeline, start by creating a class that adheres to Haystack’s component interface. Haystack pipelines are built from reusable components (like retrievers, readers, or custom logic) that process data sequentially. Custom components must implement specific methods, such as run()
or run_batch()
, which define how inputs are transformed. For example, if you’re building a document preprocessor, create a class inheriting from BaseComponent
, define run()
to accept and return Haystack Document
objects, and add your processing logic (e.g., text cleaning). Register the component with Haystack using the @component
decorator to integrate it into the pipeline system.
Here’s a concrete example: suppose you want to filter documents by a keyword. Create a class KeywordFilter
with a run()
method that checks each document’s content. The method should accept a dictionary of inputs (like documents
) and return a dictionary with filtered documents
. Use Haystack’s component
decorator to enable pipeline compatibility. You can also add configuration parameters (e.g., target_keyword
) via the __init__
method. Test the component independently by passing sample documents and verifying the output matches expectations. This modular approach ensures your component works seamlessly with Haystack’s built-in types and error handling.
Finally, add the custom component to your pipeline. Define a Pipeline
object, use add_node()
to include your component, and connect it to other nodes (like a retriever or reader) with connect_nodes()
. For instance, place KeywordFilter
after a document retriever to process results before passing them to a reader. Ensure input/output names align between components (e.g., the retriever emits documents
, which KeywordFilter
expects as input). Use Haystack’s debug mode to trace data flow and validate behavior. By following this structure, you can extend pipelines with domain-specific logic while maintaining compatibility with Haystack’s ecosystem.
Zilliz Cloud is a managed vector database built on Milvus perfect for building GenAI applications.
Try FreeLike the article? Spread the word