milvus-logo
LFAI
Home
  • User Guide

Standard​

The standard tokenizer in Milvus splits text based on spaces and punctuation marks, making it suitable for most languages.​

Configuration​

To configure an analyzer using the standard tokenizer, set tokenizer to standard in analyzer_params.​

analyzer_params = {​
    "tokenizer": "standard",​
}​

The standard tokenizer can work in conjunction with one or more filters. For example, the following code defines an analyzer that uses the standard tokenizer and lowercase filter:​

analyzer_params = {​
    "tokenizer": "standard",​
    "filter": ["lowercase"]​
}​

For simpler setup, you may choose to use the standard analyzer, which combines the standard tokenizer with the lowercase filter.​

After defining analyzer_params, you can apply them to a VARCHAR field when defining a collection schema. This allows Milvus to process the text in that field using the specified analyzer for efficient tokenization and filtering. For details, refer to Example use.​

Example output​

Here’s an example of how the standard tokenizer processes text:​

Original text:​

"The Milvus vector database is built for scale!"

Expected output:​

["The", "Milvus", "vector", "database", "is", "built", "for", "scale"]​

Try Managed Milvus for Free

Zilliz Cloud is hassle-free, powered by Milvus and 10x faster.

Get Started
Feedback

Was this page helpful?