milvus-logo
LFAI
Home

Use the MilvusClient

This page goes over how to use the MilvusClient found in Pymilvus. The MilvusClient is a simplified wrapper around Pymilvus that is easier to use and hides away a majority of the complication found in using the original SDK.

Ensure that Milvus is running.

The MilvusClient supports a single unified way of connecting to the service through the use of a URI. A few examples of valid URIs are:

  1. "http://localhost:19530"
  2. "https://user:password@mysite:19530"
  3. "https://username:password@in01-12a.aws-us-west-2.vectordb.zillizcloud.com:19538"

When using an HTTPS connection we expect a username and password.

Now lets go over a quick example using the MilvusClient

Basics

Create the Client

Most of the information needed to use the Client is provided in the construction call. There are two main use cases for the client, creating a new Milvus collection or using a previously made collection.

If creating a new collection, you must specify the vector_field name as this cannot be parsed from the inserted data. If you want to manually handle the primary field keys for this collection then you must also specify pk_field, otherwise an autogenerated int field will be used. If a collection with the same name exists in the Milvus instance then you must set overwrite to True to remove the previous collection.

If you want to connect to a previously created collection then the only values that need to be provided are the uri and the collection_name, the rest of the information will be inferred from the collection itself.

from pymilvus import MilvusClient

client = MilvusClient(
    collection_name="qux",
    uri="http://localhost:19530",
    vector_field="float_vector", 
    # pk_field= "id", # If you wanted to provide your own PK
    overwrite=True,
)

Insert Data

With the MilvusClient created we can begin to insert data. Data is inserted in the form of a list of dictionaries where each dict corresponds to a row in the collection. Each dict must include values for all the columns in the collection, otherwise the insert will throw an exception.

If the client was created on a collection that doesnt exist or overwrite is set to True, the first entry in the list of dicts will be used to construct the schema of the collection. All subsequent inserts will need to contain the same fields as the first dict. If no index parameteres were supplied at construction time, then a default HNSW index will be used to index the data.

data = [
    {
        "float_vector": [1,2,3],
        "id": 1,
        "text": "foo"
    },
    {
        "float_vector": [4,5,6],
        "id": 2,
        "text": "bar"
    },
    {
        "float_vector": [7,8,9],
        "id": 3,
        "text": "baz"
    }
]
client.insert_data(data)

Search the Data

Once the data has been inserted into Milvus we can proceed to search the collection. The search takes in the search vector/s and how many search results we want (top_k). In addition to this, if you want you can also supply search parameters. Ths search parameters should correspond to the index_parameters if you supplied them at construction time. If not supplied, MilvusClient will use default search parameters.

res = client.search_data(
    data = [[1,3,5], [7,8,9]],
    top_k = 2,
)
# [[
#     {'data': {'id': 1, 'internal_pk_4537': 441340318978146436, 'text': 'foo'}, 'score': 5.0},
#     {'data': {'id': 2, 'internal_pk_4537': 441340318978146437, 'text': 'bar'}, 'score': 14.0}
# ],
# [
#     {'data': {'id': 3, 'internal_pk_4537': 441340318978146438, 'text': 'baz'}, 'score': 0.0},
#     {'data': {'id': 2, 'internal_pk_4537': 441340318978146437, 'text': 'bar'}, 'score': 27.0}
# ]]

The search results will come in the form of a list of lists. For each search vector, you will recieve a list of dicts, with each dict containing the distance and the corresponding result data. If not all of the data is needed you can adjust what data is returned using the return_fields argument.

Advanced

Partitions

The MilvusClient supports partitions in its current release. Partitions can be specified both at MilvusClient construction and later on. Here is a quick example on using the partitions functionality.

from pymilvus import MilvusClient


client = MilvusClient(
    collection_name="qux",
    uri="http://localhost:19530",
    vector_field="float_vector",
    partitions = ["zaz"],
    overwrite=True,
)

data = [
    {
        "float_vector": [1,2,3],
        "id": 1,
        "text": "foo"
    },
]
client.insert_data(data, partition="zaz")

client.add_partitions(["zoo"])

data = [
    {
        "float_vector": [4,5,6],
        "id": 2,
        "text": "bar"
    },
]
client.insert_data(data, partition="zoo")

res = client.search_data(
    data = [1,3,5],
    top_k = 2,
)

# [[
#     {'data': {'id': 1, 'internal_pk_3bd4': 441363276234227849, 'text': 'foo'}, 'score': 5.0},
#     {'data': {'id': 2, 'internal_pk_3bd4': 441363276234227866, 'text': 'bar'}, 'score': 14.0}
# ]]


res = client.search_data(
    data = [1,3,5],
    top_k = 2,
    partitions=["zaz"]
)

# [[
#     {'data': {'id': 1, 'internal_pk_3bd4': 441363276234227849, 'text': 'foo'}, 'score': 5.0}
# ]]

res = client.search_data(
    data = [1,3,5],
    top_k = 2,
    partitions=["zoo"]
)

# [[
#     {'data': {'id': 2, 'internal_pk_3bd4': 441363276234227866, 'text': 'bar'}, 'score': 14.0}
# ]]

Filtering

Filtering can be used to narrow down results to match metadata or to query data based on metadata.

from pymilvus import MilvusClient

client = MilvusClient(
    collection_name="qux",
    uri="http://localhost:19530",
    vector_field="float_vector", 
    # pk_field= "id", # If you wanted to provide your own PK
    overwrite=True,
)

data = [
    {
        "float_vector": [1,2,3],
        "id": 1,
        "text": "foo"
    },
    {
        "float_vector": [4,5,6],
        "id": 2,
        "text": "bar"
    },
    {
        "float_vector": [7,8,9],
        "id": 3,
        "text": "baz"
    }
]
client.insert_data(data)

res = client.search_data(
    data = [1,3,5],
    top_k = 2,
    filter_expression = "id > 1"
)

# [[
#     {'score': 14.0, 'data': {'id': 2, 'text': 'bar', 'internal_pk_5465': 441363276234227922}},
#     {'score': 77.0, 'data': {'id': 3, 'text': 'baz', 'internal_pk_5465': 441363276234227923}}
# ]]

res = client.query_data(
    filter_expression = "id == 1"
)

# [
#   {'id': 1, 'text': 'foo', 'internal_pk_5465': 441363276234227921}
# ]

Vector Retrieval and Deletion

As a vector database we have the ability to return the actual vectors and delete their entries. In order to do these two functions we need to first get the pks corresponding to the entry we are trying to act on. Here is an example below.

from pymilvus import MilvusClient

client = MilvusClient(
    collection_name="qux",
    uri="http://localhost:19530",
    vector_field="float_vector", 
    pk_field= "text", 
    overwrite=True,
)

data = [
    {
        "float_vector": [1,2,3],
        "id": 1,
        "text": "foo"
    },
    {
        "float_vector": [4,5,6],
        "id": 2,
        "text": "bar"
    },
    {
        "float_vector": [7,8,9],
        "id": 3,
        "text": "baz"
    }
]

client.insert_data(data)

res = client.query_data(
    filter_expression = "id == 1"
)

# [
#   {'id': 1, 'text': 'foo'}
# ]

res = client.get_vectors_by_pk(pks = res[0]["text"])

# [
#     {'float_vector': [1.0, 2.0, 3.0], 'text': 'foo'}
# ]

client.delete_by_pk(pks = res[0]["text"])

res = client.query_data(
    filter_expression = "id == 1"
)

# []