milvus-logo
LFAI
Home
  • User Guide

English​

The english analyzer in Milvus is designed to process English text, applying language-specific rules for tokenization and filtering.​

Definition​

The english analyzer uses the following components:​

  • Tokenizer: Uses the standard tokenizer to split text into discrete word units.​

  • Filters: Includes multiple filters for comprehensive text processing:​

    • lowercase: Converts all tokens to lowercase, enabling case-insensitive searches.​

    • stemmer: Reduces words to their root form to support broader matching (e.g., “running” becomes “run”).​

    • stop_words: Removes common English stop words to focus on key terms in text.​

The functionality of the english analyzer is equivalent to the following custom analyzer configuration:​

analyzer_params = {​
    "tokenizer": "standard",​
    "filter": [​
        "lowercase",​
        {​
            "type": "stemmer",​
            "language": "english"​
        },{​
            "type": "stop",​
            "stop_words": "_english_",​
        }​
    ]​
}​

Configuration​

To apply the english analyzer to a field, simply set type to english in analyzer_params, and include optional parameters as needed.​

analyzer_params = {​
    "type": "english",​
}​

The english analyzer accepts the following optional parameters: ​

Parameter​

Description​

stop_words

An array containing a list of stop words, which will be removed from tokenization. Defaults to _english_, a built-in set of common English stop words.​

Example configuration with custom stop words:​

analyzer_params = {​
    "type": "english",​
    "stop_words": ["a", "an", "the"]​
}​

After defining analyzer_params, you can apply them to a VARCHAR field when defining a collection schema. This allows Milvus to process the text in that field using the specified analyzer for efficient tokenization and filtering. For details, refer to Example use.​

Example output​

Here’s how the english analyzer processes text.​

Original text:​

"The Milvus vector database is built for scale!"

Expected output:​

["milvus", "vector", "databas", "built", "scale"]​
Table of contents

Try Managed Milvus for Free

Zilliz Cloud is hassle-free, powered by Milvus and 10x faster.

Get Started
Feedback

Was this page helpful?