Use the MilvusClient
This page goes over how to use the MilvusClient found in Pymilvus. The MilvusClient is a simplified wrapper around Pymilvus that is easier to use and hides away a majority of the complication found in using the original SDK.
The MilvusClient supports a single unified way of connecting to the service through the use of a URI. A few examples of valid URIs are:
- “http://localhost:19530”
- “https://user:password@mysite:19530”
- “https://username:password@in01-12a.aws-us-west-2.vectordb.zillizcloud.com:19538”
When using an HTTPS connection we expect a username and password.
Now lets go over a quick example using the MilvusClient
Basics
Create the Client
Most of the information needed to use the Client is provided in the construction call. There are two main use cases for the client, creating a new Milvus collection or using a previously made collection.
If creating a new collection, you must specify the vector_field name as this cannot be parsed from the inserted data. If you want to manually handle the primary field keys for this collection then you must also specify pk_field, otherwise an autogenerated int field will be used. If a collection with the same name exists in the Milvus instance then you must set overwrite to True
to remove the previous collection.
If you want to connect to a previously created collection then the only values that need to be provided are the uri and the collection_name, the rest of the information will be inferred from the collection itself.
from pymilvus import MilvusClient
client = MilvusClient(
collection_name="qux",
uri="http://localhost:19530",
vector_field="float_vector",
# pk_field= "id", # If you wanted to provide your own PK
overwrite=True,
)
Insert Data
With the MilvusClient created we can begin to insert data. Data is inserted in the form of a list of dictionaries where each dict corresponds to a row in the collection. Each dict must include values for all the columns in the collection, otherwise the insert will throw an exception.
If the client was created on a collection that doesnt exist or overwrite is set to True, the first entry in the list of dicts will be used to construct the schema of the collection. All subsequent inserts will need to contain the same fields as the first dict. If no index parameteres were supplied at construction time, then a default HNSW index will be used to index the data.
data = [
{
"float_vector": [1,2,3],
"id": 1,
"text": "foo"
},
{
"float_vector": [4,5,6],
"id": 2,
"text": "bar"
},
{
"float_vector": [7,8,9],
"id": 3,
"text": "baz"
}
]
client.insert_data(data)
Search the Data
Once the data has been inserted into Milvus we can proceed to search the collection. The search takes in the search vector/s and how many search results we want (top_k). In addition to this, if you want you can also supply search parameters. Ths search parameters should correspond to the index_parameters if you supplied them at construction time. If not supplied, MilvusClient will use default search parameters.
res = client.search_data(
data = [[1,3,5], [7,8,9]],
top_k = 2,
)
# [[
# {'data': {'id': 1, 'internal_pk_4537': 441340318978146436, 'text': 'foo'}, 'score': 5.0},
# {'data': {'id': 2, 'internal_pk_4537': 441340318978146437, 'text': 'bar'}, 'score': 14.0}
# ],
# [
# {'data': {'id': 3, 'internal_pk_4537': 441340318978146438, 'text': 'baz'}, 'score': 0.0},
# {'data': {'id': 2, 'internal_pk_4537': 441340318978146437, 'text': 'bar'}, 'score': 27.0}
# ]]
The search results will come in the form of a list of lists. For each search vector, you will recieve a list of dicts, with each dict containing the distance and the corresponding result data. If not all of the data is needed you can adjust what data is returned using the return_fields argument.
Advanced
Partitions
The MilvusClient supports partitions in its current release. Partitions can be specified both at MilvusClient construction and later on. Here is a quick example on using the partitions functionality.
from pymilvus import MilvusClient
client = MilvusClient(
collection_name="qux",
uri="http://localhost:19530",
vector_field="float_vector",
partitions = ["zaz"],
overwrite=True,
)
data = [
{
"float_vector": [1,2,3],
"id": 1,
"text": "foo"
},
]
client.insert_data(data, partition="zaz")
client.add_partitions(["zoo"])
data = [
{
"float_vector": [4,5,6],
"id": 2,
"text": "bar"
},
]
client.insert_data(data, partition="zoo")
res = client.search_data(
data = [1,3,5],
top_k = 2,
)
# [[
# {'data': {'id': 1, 'internal_pk_3bd4': 441363276234227849, 'text': 'foo'}, 'score': 5.0},
# {'data': {'id': 2, 'internal_pk_3bd4': 441363276234227866, 'text': 'bar'}, 'score': 14.0}
# ]]
res = client.search_data(
data = [1,3,5],
top_k = 2,
partitions=["zaz"]
)
# [[
# {'data': {'id': 1, 'internal_pk_3bd4': 441363276234227849, 'text': 'foo'}, 'score': 5.0}
# ]]
res = client.search_data(
data = [1,3,5],
top_k = 2,
partitions=["zoo"]
)
# [[
# {'data': {'id': 2, 'internal_pk_3bd4': 441363276234227866, 'text': 'bar'}, 'score': 14.0}
# ]]
Filtering
Filtering can be used to narrow down results to match metadata or to query data based on metadata.
from pymilvus import MilvusClient
client = MilvusClient(
collection_name="qux",
uri="http://localhost:19530",
vector_field="float_vector",
# pk_field= "id", # If you wanted to provide your own PK
overwrite=True,
)
data = [
{
"float_vector": [1,2,3],
"id": 1,
"text": "foo"
},
{
"float_vector": [4,5,6],
"id": 2,
"text": "bar"
},
{
"float_vector": [7,8,9],
"id": 3,
"text": "baz"
}
]
client.insert_data(data)
res = client.search_data(
data = [1,3,5],
top_k = 2,
filter_expression = "id > 1"
)
# [[
# {'score': 14.0, 'data': {'id': 2, 'text': 'bar', 'internal_pk_5465': 441363276234227922}},
# {'score': 77.0, 'data': {'id': 3, 'text': 'baz', 'internal_pk_5465': 441363276234227923}}
# ]]
res = client.query_data(
filter_expression = "id == 1"
)
# [
# {'id': 1, 'text': 'foo', 'internal_pk_5465': 441363276234227921}
# ]
Vector Retrieval and Deletion
As a vector database we have the ability to return the actual vectors and delete their entries. In order to do these two functions we need to first get the pks corresponding to the entry we are trying to act on. Here is an example below.
from pymilvus import MilvusClient
client = MilvusClient(
collection_name="qux",
uri="http://localhost:19530",
vector_field="float_vector",
pk_field= "text",
overwrite=True,
)
data = [
{
"float_vector": [1,2,3],
"id": 1,
"text": "foo"
},
{
"float_vector": [4,5,6],
"id": 2,
"text": "bar"
},
{
"float_vector": [7,8,9],
"id": 3,
"text": "baz"
}
]
client.insert_data(data)
res = client.query_data(
filter_expression = "id == 1"
)
# [
# {'id': 1, 'text': 'foo'}
# ]
res = client.get_vectors_by_pk(pks = res[0]["text"])
# [
# {'float_vector': [1.0, 2.0, 3.0], 'text': 'foo'}
# ]
client.delete_by_pk(pks = res[0]["text"])
res = client.query_data(
filter_expression = "id == 1"
)
# []