milvus-logo
LFAI
Home
  • User Guide

Stop​

The stop filter removes specified stop words from tokenized text, helping to eliminate common, less meaningful words. You can configure the list of stop words using the stop_words parameter.​

Configuration​

The length filter is a custom filter in Milvus. To use it, specify "type": "stop" in the filter configuration, along with a stop_words parameter that provides a list of stop words.​

analyzer_params = {​
    "tokenizer": "standard",​
    "filter":[{​
        "type": "stop", # Specifies the filter type as stop​
        "stop_words": ["of", "to", "_english_"], # Defines custom stop words and includes the English stop word list​
    }],​
}​

The stop filter accepts the following configurable parameters.​

Parameter​

Description​

stop_words

A list of words to be removed from tokenization. By default, the predefined _english_ list, containing common English stop words, is used. The details of _english_ can be found here.​

The stop filter operates on the terms generated by the tokenizer, so it must be used in combination with a tokenizer.

After defining analyzer_params, you can apply them to a VARCHAR field when defining a collection schema. This allows Milvus to process the text in that field using the specified analyzer for efficient tokenization and filtering. For details, refer to Example use.​

Example output​

Here’s an example of how the stop filter processes text:​

Original text:​

"The stop filter allows control over common stop words for text processing."

Expected output (with stop_words: ["the", "over", "_english_"]):​

["The", "stop", "filter", "allows", "control", "common", "stop", "words", "text", "processing"]​

Try Managed Milvus for Free

Zilliz Cloud is hassle-free, powered by Milvus and 10x faster.

Get Started
Feedback

Was this page helpful?