milvus-logo
LFAI
Home
  • User Guide

Decompounder​

The decompounder filter splits compound words into individual components based on a specified dictionary, making it easier to search for parts of compound terms. This filter is particularly useful for languages that frequently use compound words, such as German.​

Configuration​

The decompounder filter is a custom filter in Milvus. To use it, specify "type": "decompounder" in the filter configuration, along with a word_list parameter that provides the dictionary of word components to recognize.​

analyzer_params = {​
    "tokenizer": "standard",​
    "filter":[{​
        "type": "decompounder", # Specifies the filter type as decompounder​
        "word_list": ["dampf", "schiff", "fahrt", "brot", "backen", "automat"],​
    }],​
}​

The decompounder filter accepts the following configurable parameters.​

Parameter​

Description​

word_list

A list of word components used to split compound terms. This dictionary determines how compound words are decomposed into individual terms.​

The decompounder filter operates on the terms generated by the tokenizer, so it must be used in combination with a tokenizer.

After defining analyzer_params, you can apply them to a VARCHAR field when defining a collection schema. This allows Milvus to process the text in that field using the specified analyzer for efficient tokenization and filtering. For details, refer to Example use.​

Example output​

Here’s an example of how the decompounder filter processes text:​

Original text:​

"dampfschifffahrt brotbackautomat"

Expected output (with word_list: ["dampf", "schiff", "fahrt", "brot", "backen", "automat"]):​

["dampf", "schiff", "fahrt", "brotbackautomat"]​

Try Managed Milvus for Free

Zilliz Cloud is hassle-free, powered by Milvus and 10x faster.

Get Started
Feedback

Was this page helpful?