milvus-logo
LFAI
Home
  • User Guide

ASCII folding​

The asciifolding** **filter converts characters outside the Basic Latin Unicode block (the first 127 ASCII characters) into their ASCII equivalents. For instance, it transforms characters like í to i, making text processing simpler and more consistent, especially for multilingual content.​

Configuration​

The asciifolding filter is built into Milvus. To use it, simply specify its name in the filter section within analyzer_params.​

analyzer_params = {​
    "tokenizer": "standard",​
    "filter": ["asciifolding"],​
}​

The asciifolding filter operates on the terms generated by the tokenizer, so it must be used in combination with a tokenizer.

After defining analyzer_params, you can apply them to a VARCHAR field when defining a collection schema. This allows Milvus to process the text in that field using the specified analyzer for efficient tokenization and filtering. For details, refer to Example use.​

Example output​

Here’s an example of how the asciifolding filter processes text:​

Original text:​

"Café Möller serves crème brûlée and piñatas."

Expected output:​

["Cafe", "Moller", "serves", "creme", "brulee", "and", "pinatas"]​

Try Managed Milvus for Free

Zilliz Cloud is hassle-free, powered by Milvus and 10x faster.

Get Started
Feedback

Was this page helpful?