Function
A Function
instance for generating vector embeddings from user-provided raw data or applying a reranking strategy to the search results in Milvus.
class pymilvus.Function
Constructor
This constructor initializes a new Function
instance designed to transform user’s raw data into vector embeddings or applying a reranking strategy to the search results. This is achieved through an automated process that simplifies similarity search operations.
Function(
name: str,
function_type: FunctionType,
input_field_names: Union[str, List[str]],
output_field_names: Union[str, List[str]],
description: str = "",
)
PARAMETERS:
name
(str) -[REQUIRED]
The name of the function. This identifier is used to reference the function within queries and collections.
function_type
(FunctionType) -[REQUIRED]
The type of embedding function to use. Possible values:
FunctionType.BM25
: Generates sparse vectors based on the BM25 ranking algorithm from aVARCHAR
field.FunctionType.TEXTEMBEDDING
: Generates dense vectors that capture semantic meaning from aVARCHAR
field.FunctionType.RERANK
: Applies reranking strategies to the search results.
input_field_names
(Union[str, List[str]]) -[REQUIRED]
The name of the field containing the raw data that requires conversion to a vector representation. This parameter accepts only one field name.
output_field_names
(Union[str, List[str]]) -The name of the field where the generated embeddings will be stored. This should correspond to a vector field defined in the collection schema. This parameter accepts only one field name.
notes
This applies only when you set
function_type
toFunctionType.BM25
andFunctionType.TEXTEMBEDDING
.params
(dict) -A configuration dictionary for the embedding/ranking function. Supported keys vary by
function_type
:FunctionType.BM25
: No parameters required. Pass an empty dictionary or omit entirely.FunctionType.TEXTEMBEDDING
:provider
(str) -The embedding model provider. Possible values are as follows:
openai
(OpenAI)azure_openai
(Microsoft Azure OpenAI)dashscope
(DashScope)bedrock
(Amazon Bedrock)vertexai
(Google Cloud Vertext AI)voyageai
(Voyage AI)cohere
(Cohere)siliconflow
(SiliconFlow)
model_name
(str) -The name of the embedding model to use. The value varies with the provider. For details, refer to their respective document page.
credential
(str) -The label of a credential defined in the top-level
credential:
section ofmilvus.yaml
.When provided, Milvus retrieves the matching key pair or API token and signs the request on the server side.
When omitted (
None
), Milvus falls back to the credential explicitly configured for the target model provider inmilvus.yaml
.If the label is unknown or the referenced key is missing, the call fails.
dim
(str) -The number of dimensions for the output embeddings. For OpenAI’s third-generation models, you can shorten the full vector to reduce cost and latency without a significant loss of semantic information. For more information, refer to OpenAI announcement blog post.
notes
If you shorten the vector dimension, ensure the
dim
value specified in the schema’sadd_field
method for the vector field matches the final output dimension of your embedding function.
FunctionType.RERANK
: Configureparams
based on reranker type:Weighted Ranker
params = { "reranker": "weighted", # Required "weights": [0.1, 0.9], # List[float], weights per search path ∈ [0,1] "norm_score": True # Optional }
reranker
(str): Specifies the reranking method to use. Must be set toweighted
to use Weighted Ranker.weights
(List[float]): Array of weights corresponding to each search path; values ∈ [0,1]. For details, refer to Mechanism of Weighted Ranker.norm_score
(boolean): Whether to normalize raw scores (using arctan) before weighting. For details, refer to Mechanism of Weighted Ranker.
RRF Ranker
params = { "reranker": "rrf", # Required "k": 100 # Optional (default: 60) }
reranker
(str): Specifies the reranking method to use. Must be set to"rrf"
to use RRF Ranker.k
(int): Smoothing parameter that controls the impact of document ranks; higherk
reduces sensitivity to top ranks. Value range: (0, 16384); default:60
. For details, refer to Mechanism of RRF Ranker.
Decay Ranker
params={ "reranker": "decay", # Specify decay reranker. Must be "decay" "function": "gauss", # Choose decay function type: "gauss", "exp", or "linear" "origin": 1720000000, # Reference point (e.g., Unix timestamp) "scale": 7 * 24 * 60 * 60, # 7 days in seconds "offset": 24 * 60 * 60, # 1 day no-decay zone "decay": 0.5 # Half score at scale distance }
reranker
(str): Specifies the reranking method to use. Must be set to"decay"
to enable decay ranking functionality.function
(str): Specifies which mathematical decay ranker to apply. Possible values:"gauss"
,"expr"
,"linear"
. For details, refer to Choose the right decay ranker.origin
(int): Reference point from which decay score is calculated.scale
(int): Distance or time at which relevance drops to thedecay
value.offset
(int): Creates a “no-decay zone” around theorigin
where items maintain full scores (decay score = 1.0).decay
(float): Score value at thescale
distance, controls curve steepness.
For details on decay ranking, refer to Decay Ranker Overview.
Model Ranker
TEI Provider:
params={ "reranker": "model", # Specify model reranker. Must be "model" "provider": "tei", # Choose provider: "tei" or "vllm" "queries": ["machine learning for time series"], # Query text "endpoint": "http://model-service:8080", # Model service endpoint "maxBatch": 32 # Optional (default: 32) "truncate": True, # Optional: Truncate the inputs that are longer than the maximum supported size "truncation_direction": "Right", # Optional: Direction to truncate the inputs }
vLLM Provider:
params={ "reranker": "model", # Specifies model-based reranking "provider": "vllm", # Specifies vLLM service "queries": ["renewable energy developments"], # Query text "endpoint": "http://localhost:8080", # vLLM service address "maxBatch": 64, # Optional: batch size "truncate_prompt_tokens": 256, # Optional: Use last 256 tokens }
reranker
(str): Must be set to"model"
to enable model reranking.provider
(str): The model service provider to use for reranking. Possible values:"tei"
or"vllm"
. For details, refer to Choose a model provider for your needs.queries
(List[str]): List of query strings used by the reranking model to calculate relevance scores.endpoint
(str): URL of the model service.maxBatch
(int): Maximum number of documents to process in a single batch. Default: 32.truncate
(bool): [TEI only] Whether to truncate inputs that exceed the maximum supported size. For details, refer to TEI Ranker.truncation_direction
(str): [TEI only] Direction for truncation ("Left"
or"Right"
). For details, refer to TEI Ranker.truncate_prompt_tokens
(int): [vLLM only] Number of tokens to keep from the end of the prompt when truncating. For details, refer to vLLM Ranker.
description
(str) -[OPTIONAL]
A brief description of the function’s purpose. This can be useful for documentation or clarity in larger projects and defaults to an empty string.
RETURN TYPE:
Instance of Function
that encapsulates the specific processing behavior for converting raw data to vector embeddings.
RETURNS:
A Function
object that can be registered with a Milvus collection, facilitating automatic embedding generation during data insertion.
EXCEPTIONS:
UnknownFunctionType
This exception will be raised when an unsupported or unrecognized function type is specified.
FunctionIncorrectInputOutputType
This exception will be raised when one or more field names in
input_field_names
oroutput_field_names
are not strings.FunctionDuplicateInputs
This exception will be raised when there are duplicate field names in
input_field_names
.FunctionDuplicateOutputs
This exception will be raised when there are duplicate field names in
output_field_names
.FunctionCommonInputOutput
This exception will be raised when there is an overlap between
input_field_names
andoutput_field_names
, meaning that the same field name is present in both.
Examples
Use
BM25
from pymilvus import Function, FunctionType # use BM25 bm25_function = Function( name="bm25_fn", input_field_names=["document_content"], output_field_names=["sparse_vector"], function_type=FunctionType.BM25, )
Use
TEXTEMBEDDING
from pymilvus import Function, FunctionType # use TEXTEMBEDDING text_embedding_function = Function( name="openai_embedding", # Unique identifier for this embedding function function_type=FunctionType.TEXTEMBEDDING, # Type of embedding function input_field_names=["document"], # Scalar field to embed output_field_names=["dense"], # Vector field to store embeddings params={ # Provider-specific configuration (highest priority) "provider": "openai", # Embedding model provider "model_name": "text-embedding-3-small", # Embedding model # "credential": "apikey1", # Optional: Credential label specified in milvus.yaml # Optional parameters: # "dim": "1536", # Optionally shorten the output vector dimension # "user": "user123" # Optional: identifier for API tracking } )
Use
RERANK
from pymilvus import Function, FunctionType # use RERANK model_ranker = Function( name="semantic_ranker", # Function identifier input_field_names=["document"], # VARCHAR field to use for reranking function_type=FunctionType.RERANK, # Must be set to RERANK params={ "reranker": "model", # Specify model reranker. Must be "model" "provider": "tei", # Choose provider: "tei" or "vllm" "queries": ["machine learning for time series"], # Query text "endpoint": "http://model-service:8080", # Model service endpoint # "maxBatch": 32 # Optional: batch size for processing } )