milvus-logo
LFAI
Home
  • User Guide

Length​

The length filter removes tokens that do not meet specified length requirements, allowing you to control the length of tokens retained during text processing.​

Configuration

The length filter is a custom filter in Milvus, specified by setting "type": "length" in the filter configuration. You can configure it as a dictionary within the analyzer_params to define length limits.​

analyzer_params = {​
    "tokenizer": "standard",​
    "filter":[{​
        "type": "length", # Specifies the filter type as length​
        "max": 10, # Sets the maximum token length to 10 characters​
    }],​
}​

The length filter accepts the following configurable parameters.​

Parameter​

Description​

max

Sets the maximum token length. Tokens longer than this length are removed.​

The length filter operates on the terms generated by the tokenizer, so it must be used in combination with a tokenizer.

After defining analyzer_params, you can apply them to a VARCHAR field when defining a collection schema. This allows Milvus to process the text in that field using the specified analyzer for efficient tokenization and filtering. For details, refer to Example use.​

Example output

Here’s an example of how the length filter processes text:​

Example text:​

"The length filter allows control over token length requirements for text processing."

Expected output (with max: 10):​

["length", "filter", "allows", "control", "over", "token", "length", "for", "text"]​

Try Managed Milvus for Free

Zilliz Cloud is hassle-free, powered by Milvus and 10x faster.

Get Started
Feedback

Was this page helpful?