Python
- About
- DataImport
- EmbeddingModels
  BGEM3EmbeddingFunction
  BM25EmbeddingFunction
  BM25EmbeddingFunction
  encode_documents
  encode_queries
  fit
  load
  save
  CohereEmbeddingFunction
  InstructorEmbeddingFunction
  JinaEmbeddingFunction
  MGTEEmbeddingFunction
  MistralAIEmbeddingFunction
  NomicEmbeddingFunction
  OnnxEmbeddingFunction
  OpenAIEmbeddingFunction
  SentenceTransformerEmbeddingFunction
  SpladeEmbeddingFunction
  VoyageEmbeddingFunction
- MilvusClient
- ORM
- Rerankers

Home
Docs
API Reference
Python
EmbeddingModels
BM25EmbeddingFunction
BM25EmbeddingFunction

BM25EmbeddingFunction

BM25EmbeddingFunction is a class in pymilvus that handles encoding text into embeddings using the BM25 model to support embedding retrieval in Milvus.

pymilvus.model.sparse.bm25.BM25EmbeddingFunction

Constructor

Constructs a BM25EmbeddingFunction for common use cases.

BM25EmbeddingFunction(
    analyzer: Analyzer = None,
    corpus: Optional[List] = None,
    k1: float = 1.5,
    b: float = 0.75,
    epsilon: float = 0.25,
    num_workers: Optional[int] = None,
)

PARAMETERS:

analyzer (object) -

An Analyzer object used to tokenize texts. Defaults to a built-in English language analyzer if not specified. For more information, refer to
corpus (list) -

A list of strings representing the corpus of documents used to fit the model.
k1 (float) -

The BM25 k1 parameter, a float defaulting to 1.5. This controls document term normalization.
b (float) -

The BM25 b parameter, a float defaulting to 0.75. This controls field length normalization.
epsilon (float) -

A float defaulting to 0.25. This is used to smooth idf values.
num_workers (int)

The number of worker processes to use for parallelization. Defaults to the number of CPU cores if not specified.

Examples

from pymilvus.model.sparse.bm25.tokenizers import build_default_analyzer
from pymilvus.model.sparse import BM25EmbeddingFunction

# there are some built-in analyzers for several languages, now we use 'en' for English.
analyzer = build_default_analyzer(language="en")

corpus = [
    "Artificial intelligence was founded as an academic discipline in 1956.",
    "Alan Turing was the first person to conduct substantial research in AI.",
    "Born in Maida Vale, London, Turing was raised in southern England.",
]

bm25_ef = BM25EmbeddingFunction(analyzer)

Try Managed Milvus for Free

Zilliz Cloud is hassle-free, powered by Milvus and 10x faster.

Get Started

Feedback

Was this page helpful?