To implement advanced filtering in Haystack queries, you can use the Filter
class or dictionaries to define conditions that narrow down search results. Haystack’s filtering system works with document stores like Elasticsearch, OpenSearch, or Weaviate, allowing you to specify logical conditions on document metadata. For example, to filter documents where the category
field equals “news,” you would create a filter like {"field": "category", "operator": "==", "value": "news"}
. These filters can be passed to retrievers or pipelines to constrain search results programmatically.
Advanced filtering becomes powerful when combining multiple conditions using logical operators like AND
, OR
, and NOT
. For instance, to find documents where category
is “tech” and the publish_date
is after 2023, you would construct a nested filter:
{
"operator": "AND",
"conditions": [
{"field": "category", "operator": "==", "value": "tech"},
{"field": "publish_date", "operator": ">", "value": "2023-01-01"}
]
}
You can also use comparison operators like >
, <
, >=
, and <=
for numerical or date fields. For IN
operations (e.g., categories in ["science", “history”]), use {"field": "category", "operator": "in", "value": ["science", "history"]}
. Most document stores support these operations natively, but syntax may vary slightly between backends.
To integrate filtering into a Haystack pipeline, pass the filter to the filters
parameter in a retriever’s retrieve()
method or a pipeline’s run()
method. For example:
from haystack import Pipeline
from haystack.components.retrievers.in_memory import InMemoryBM25Retriever
pipeline = Pipeline()
pipeline.add_component("retriever", InMemoryBM25Retriever(document_store))
result = pipeline.run(
data={"retriever": {"query": "AI trends", "filters": {"category": "tech"}}}
)
If using Elasticsearch, leverage its DSL for geo-queries or regex patterns by crafting filters that align with its syntax. Always test filters with your specific document store, as unsupported operations (e.g., fuzzy matching in FAISS) may cause errors. For complex use cases, combine filters with Haystack’s query builders or custom preprocessing to ensure compatibility.
Zilliz Cloud is a managed vector database built on Milvus perfect for building GenAI applications.
Try FreeLike the article? Spread the word