milvus-logo
LFAI
Home
  • User Guide

Whitespace​

The whitespace tokenizer divides text into terms whenever there is a space between words.​

Configuration​

To configure an analyzer using the whitespace tokenizer, set tokenizer to whitespace in analyzer_params.​

analyzer_params = {​
    "tokenizer": "whitespace",​
}​

The whitespace tokenizer can work in conjunction with one or more filters. For example, the following code defines an analyzer that uses the whitespace tokenizer and lowercase filter:​

analyzer_params = {​
    "tokenizer": "whitespace",​
    "filter": ["lowercase"]​
}​

After defining analyzer_params, you can apply them to a VARCHAR field when defining a collection schema. This allows Milvus to process the text in that field using the specified analyzer for efficient tokenization and filtering. For details, refer to Example use.​

Example output​

Here’s an example of how the whitespace tokenizer processes text:​

Original text:​

"The Milvus vector database is built for scale!"

Expected output:​

["The", "Milvus", "vector", "database", "is", "built", "for", "scale!"]​

Try Managed Milvus for Free

Zilliz Cloud is hassle-free, powered by Milvus and 10x faster.

Get Started
Feedback

Was this page helpful?