milvus-logo

Get & Scalar Query

This guide demonstrates how to get entities by ID and conduct scalar filtering. A scalar filtering retrieves entities that match the specified filtering conditions.

Overview

A scalar query filters entities in a collection based on a defined condition using boolean expressions. The query result is a set of entities that match the defined condition. Unlike a vector search, which identifies the closest vector to a given vector in a collection, queries filter entities based on specific criteria.

In Milvus, a filter is always a string compising field names joined by operators. In this guide, you will find various filter examples. To learn more about the operator details, go to the Reference section.

The code snippets on this page use new MilvusClient (Python) to interact with Milvus. New MilvusClient SDKs for other languages will be released in future updates.

Preparations

The following steps repurpose the code to connect to Milvus, quickly set up a collection, and insert over 1,000 randomly generated entities into the collection.

Step 1: Create a collection

from pymilvus import MilvusClient

# 1. Set up a Milvus client
client = MilvusClient(
    uri="http://localhost:19530"
)

# 2. Create a collection
client.create_collection(
    collection_name="quick_setup",
    dimension=5,
)

Step 2: Insert randomly generated entities

# 3. Insert randomly generated vectors 
colors = ["green", "blue", "yellow", "red", "black", "white", "purple", "pink", "orange", "brown", "grey"]
data = [ {
        "id": i, 
        "vector": [ random.uniform(-1, 1) for _ in range(5) ], 
        "color": random.choice(colors), 
        "tag": random.randint(1000, 9999) 
    } for i in range(1000) ]

for i in data:
    i["color_tag"] = "{}_{}".format(i["color"], i["tag"])

print(data[0])

# Output
#
# {
#     "id": 0,
#     "vector": [
#         0.5913205104316952,
#         -0.5474675922381218,
#         0.9433357315736743,
#         0.22479148416151284,
#         0.28294612647978834
#     ],
#     "color": "grey",
#     "tag": 5024,
#     "color_tag": "grey_5024"
# }

# 4. Insert entities to the collection
res = client.insert(
    collection_name="quick_setup",
    data=data
)

print(res)

# Output
#
# {
#     "insert_count": 1000
# }

Step 3: Create partitions and insert more entities

# 5. Create two partitions
client.create_partition(collection_name="quick_setup", partition_name="partitionA")
client.create_partition(collection_name="quick_setup", partition_name="partitionB")

# 6. Insert 500 entities in partition A
data = [ {
        "id": i + 1000, 
        "vector": [ random.uniform(-1, 1) for _ in range(5) ], 
        "color": random.choice(colors), 
        "tag": random.randint(1000, 9999) 
    } for i in range(500) ]

for i in data:
    i["color_tag"] = "{}_{}".format(i["color"], i["tag"])

res = client.insert(
    collection_name="quick_setup",
    data=data,
    partition_name="partitionA"
)

print(res)

# Output
#
# {
#     "insert_count": 500
# }

# 7. Insert 300 entities in partition B
data = [ {
        "id": i + 1500, 
        "vector": [ random.uniform(-1, 1) for _ in range(5) ], 
        "color": random.choice(colors), 
        "tag": random.randint(1000, 9999) 
    } for i in range(300) ]

for i in data:
    i["color_tag"] = "{}_{}".format(i["color"], i["tag"])

res = client.insert(
    collection_name="quick_setup",
    data=data,
    partition_name="partitionB"
)

print(res)

# Output
#
# {
#     "insert_count": 300
# }

Get Entities by ID

If you know the IDs of the entities of your interests, you can use the get() method.

# 4. Get entities by ID
res = client.get(
    collection_name="quick_setup",
    ids=[0, 1, 2]
)

print(res)

# Output
#
# [
#     {
#         "id": 0,
#         "vector": [
#             0.68824464,
#             0.6552274,
#             0.33593303,
#             -0.7099536,
#             -0.07070546
#         ],
#         "color_tag": "green_2006",
#         "color": "green"
#     },
#     {
#         "id": 1,
#         "vector": [
#             -0.98531723,
#             0.33456197,
#             0.2844234,
#             0.42886782,
#             0.32753858
#         ],
#         "color_tag": "white_9298",
#         "color": "white"
#     },
#     {
#         "id": 2,
#         "vector": [
#             -0.9886812,
#             -0.44129863,
#             -0.29859528,
#             0.06059075,
#             -0.43817034
#         ],
#         "color_tag": "grey_5312",
#         "color": "grey"
#     }
# ]

Get entities from partitions

You can also get entities from specific partitions.

# 5. Get entities from partitions
res = client.get(
    collection_name="quick_setup",
    ids=[0, 1, 2],
    partition_names=["_default"]
)

print(res)

# Output
#
# [
#     {
#         "color_tag": "green_2006",
#         "color": "green",
#         "id": 0,
#         "vector": [
#             0.68824464,
#             0.6552274,
#             0.33593303,
#             -0.7099536,
#             -0.07070546
#         ]
#     },
#     {
#         "color_tag": "white_9298",
#         "color": "white",
#         "id": 1,
#         "vector": [
#             -0.98531723,
#             0.33456197,
#             0.2844234,
#             0.42886782,
#             0.32753858
#         ]
#     },
#     {
#         "color_tag": "grey_5312",
#         "color": "grey",
#         "id": 2,
#         "vector": [
#             -0.9886812,
#             -0.44129863,
#             -0.29859528,
#             0.06059075,
#             -0.43817034
#         ]
#     }
# ]

Use Basic Operators

In this section, you will find examples of how to use basic operators in scalar filtering. You can apply these filters to vector searches and data deletions too.

  • Filter entities with their tag values falling between 1,000 to 1,500.

    res = client.query(
        collection_name="quick_setup",
        # highlight-start
        filter="1000 < tag < 1500",
        output_fields=["color_tag"],
        # highlight-end
        limit=3
    )
    
    # Output
    #
    # 
    
  • Filter entities with their color values set to red.

    res = client.query(
        collection_name="quick_setup",
        # highlight-start
        filter='color == "brown"',
        output_fields=["color_tag"],
        # highlight-end
        limit=3
    )
    
    # Output
    #
    # 
    
  • Filter entities with their color values not set to green and purple.

    res = client.query(
        collection_name="quick_setup",
        # highlight-start
        filter='color not in ["green", "purple"]',
        output_fields=["color_tag"],
        # highlight-end
        limit=3
    )
    
    # Output
    #
    # 
    
  • Filter articles whose color tags start with red.

    res = client.query(
        collection_name="quick_setup",
        # highlight-start
        filter='color_tag like "red%"',
        output_fields=["color_tag"],
        # highlight-end
        limit=3
    )
    
    # Output
    #
    # 
    
  • Filter entities with their colors set to red and tag values within the range from 1,000 to 1,500.

    res = client.query(
        collection_name="quick_setup",
        # highlight-start
        filter='(color == "red") and (1000 < tag < 1500)',
        output_fields=["color_tag"],
        # highlight-end
        limit=3
    )
    
    # Output
    #
    # 
    

Use Advanced Operators

In this section, you will find examples of how to use advanced operators in scalar filtering. You can apply these filters to vector searches and data deletions too.

Count entities

  • Counts the total number of entities in a collection.

    res = client.query(
        collection_name="quick_setup",
        # highlight-start
        output_fields=["count(*)"]
        # highlight-end
    )
    
    # Output
    #
    # 
    
  • Counts the total number of entities in specific partitions.

    res = client.query(
        collection_name="quick_setup",
        # highlight-start
        output_fields=["count(*)"],
        partition_name="partitionA"
        # highlight-end
    )
    
    # Output
    #
    # 
    
    res = client.query(
        collection_name="quick_setup",
        # highlight-start
        output_fields=["count(*)"],
        partition_name="partitionB"
        # highlight-end
    )
    
    # Output
    #
    # 
    
  • Counts the number of entities that match a filtering condition

    res = client.query(
        collection_name="quick_setup",
        # highlight-start
        filter='(publication == "Towards Data Science") and ((claps > 1500 and responses > 15) or (10 < reading_time < 15))',
        output_fields=["count(*)"],
        # highlight-end
    )
    
    # Output
    #
    # 
    

Reference on scalar filters

Basic Operators

A boolean expression is always a string comprising field names joined by operators. In this section, you will learn more about basic operators.

Operator Description
add (&&) True if both operands are true
or (||) True if either operand is true
+, -, *, / Addition, subtraction, multiplication, and division
** Exponent
% Modulus
<, > Less than, greater than
==, != Equal to, not equal to
<=, >= Less than or equal to, greater than or equal to
not Reverses the result of a given condition.
like Compares a value to similar values using wildcard operators.
For example, like "prefix%" matches strings that begin with "prefix".
in Tests if an expression matches any value in a list of values.

Advanced operators

  • count(*)

    Counts the exact number of entities in the collection. Use this as an output field to get the exact number of entities in a collection or partition.

    notes

    This applies to loaded collections. You should use it as the only output field.

On this page