A partition is a group of entities in one collection with the same label. Entities inserted without a label will be tagged a default label by milvus.
Partition is managable, which means managing a group of entities with the same label in one collection.
Constructor |
Description |
---|---|
Milvus partition. |
API |
Description |
---|---|
Return the description text. |
|
Return the partition name. |
|
Return whether the Partition is empty. |
|
Return the number of entities. |
API |
Description |
---|---|
Drop the Partition, as well as its corresponding index files. |
|
Load the Partition from disk to memory. |
|
Release the Partition from memory. |
|
Insert data into partition. |
|
Delete entities with an expression condition. |
|
Vector similarity search with an optional boolean expression as filters. |
|
Query with a set of criteria. |
pymilvus.
Partition
(collection, name, description='', **kwargs)¶description
¶Return the description text.
Partition description text, return when operation is successful
>>> from pymilvus import connections, Collection, Partition, FieldSchema, CollectionSchema, DataType
>>> connections.connect()
>>> schema = CollectionSchema([
... FieldSchema("film_id", DataType.INT64, is_primary=True),
... FieldSchema("films", dtype=DataType.FLOAT_VECTOR, dim=2)
... ])
>>> collection = Collection("test_partition_description", schema)
>>> partition = Partition(collection, "comedy", "comedy films")
>>> partition.description
'comedy films'
name
¶Return the partition name.
Partition name, return when operation is successful
>>> from pymilvus import connections, Collection, Partition, FieldSchema, CollectionSchema, DataType
>>> connections.connect()
>>> schema = CollectionSchema([
... FieldSchema("film_id", DataType.INT64, is_primary=True),
... FieldSchema("films", dtype=DataType.FLOAT_VECTOR, dim=2)
... ])
>>> collection = Collection("test_partition_name", schema)
>>> partition = Partition(collection, "comedy", "comedy films")
>>> partition.name
'comedy'
is_empty
¶Returns whether the partition is empty
Whether the partition is empty
True: The partition is empty.
False: The partition is not empty.
>>> from pymilvus import connections, Collection, Partition, FieldSchema, CollectionSchema, DataType
>>> connections.connect()
>>> schema = CollectionSchema([
... FieldSchema("film_id", DataType.INT64, is_primary=True),
... FieldSchema("films", dtype=DataType.FLOAT_VECTOR, dim=2)
... ])
>>> collection = Collection("test_partition_is_empty", schema)
>>> partition = Partition(collection, "comedy", "comedy films")
>>> partition.is_empty
True
num_entities
¶Return the number of entities.
Number of entities in this partition.
>>> from pymilvus import connections, Collection, Partition, FieldSchema, CollectionSchema, DataType
>>> connections.connect()
>>> schema = CollectionSchema([
... FieldSchema("film_id", DataType.INT64, is_primary=True),
... FieldSchema("films", dtype=DataType.FLOAT_VECTOR, dim=2)
... ])
>>> collection = Collection("test_partition_num_entities", schema)
>>> partition = Partition(collection, "comedy", "comedy films")
>>> data = [
... [i for i in range(10)],
... [[float(i) for i in range(2)] for _ in range(10)],
... ]
>>> partition.insert(data)
>>> partition.num_entities
10
drop
(timeout=None, **kwargs)¶Drop the partition, as well as its corresponding index files.
timeout (float) -- An optional duration of time in seconds to allow for the RPC. When timeout is set to None, client waits until server response or error occur
PartitionNotExistException -- When partitoin does not exist
>>> from pymilvus import connections, Collection, Partition, FieldSchema, CollectionSchema, DataType
>>> connections.connect()
>>> schema = CollectionSchema([
... FieldSchema("film_id", DataType.INT64, is_primary=True),
... FieldSchema("films", dtype=DataType.FLOAT_VECTOR, dim=2)
... ])
>>> collection = Collection("test_partition_drop", schema)
>>> partition = Partition(collection, "comedy", "comedy films")
>>> partition.drop()
load
(timeout=None, **kwargs)¶Load the partition from disk to memory.
timeout (float) -- An optional duration of time in seconds to allow for the RPC. When timeout is set to None, client waits until server response or error occur
InvalidArgumentException -- If argument is not valid
>>> from pymilvus import connections, Collection, Partition, FieldSchema, CollectionSchema, DataType
>>> connections.connect()
>>> schema = CollectionSchema([
... FieldSchema("film_id", DataType.INT64, is_primary=True),
... FieldSchema("films", dtype=DataType.FLOAT_VECTOR, dim=2)
... ])
>>> collection = Collection("test_partition_load", schema)
>>> partition = Partition(collection, "comedy", "comedy films")
>>> partition.load()
release
(timeout=None, **kwargs)¶Release the partition from memory.
timeout (float) -- An optional duration of time in seconds to allow for the RPC. When timeout is set to None, client waits until server response or error occur
PartitionNotExistException -- When partitoin does not exist
>>> from pymilvus import connections, Collection, Partition, FieldSchema, CollectionSchema, DataType
>>> connections.connect()
>>> schema = CollectionSchema([
... FieldSchema("film_id", DataType.INT64, is_primary=True),
... FieldSchema("films", dtype=DataType.FLOAT_VECTOR, dim=2)
... ])
>>> collection = Collection("test_partition_release", schema)
>>> partition = Partition(collection, "comedy", "comedy films")
>>> partition.load()
>>> partition.release()
insert
(data, timeout=None, **kwargs)¶Insert data into partition.
data (list-like(list, tuple) object or pandas.DataFrame) -- The specified data to insert, the dimension of data needs to align with column number
timeout (float) -- An optional duration of time in seconds to allow for the RPC. When timeout is set to None, client waits until server response or error occur
kwargs --
timeout (float
) --
An optional duration of time in seconds to allow for the RPC. When timeout
is set to None, client waits until server response or error occur.
A MutationResult object contains a property named insert_count represents how many
entities have been inserted into milvus and a property named primary_keys is a list of primary keys of the inserted entities. :rtype: MutationResult
PartitionNotExistException -- When partitoin does not exist
>>> from pymilvus import connections, Collection, Partition, FieldSchema, CollectionSchema, DataType
>>> connections.connect()
>>> schema = CollectionSchema([
... FieldSchema("film_id", DataType.INT64, is_primary=True),
... FieldSchema("films", dtype=DataType.FLOAT_VECTOR, dim=2)
... ])
>>> collection = Collection("test_partition_insert", schema)
>>> partition = Partition(collection, "comedy", "comedy films")
>>> data = [
... [i for i in range(10)],
... [[float(i) for i in range(2)] for _ in range(10)],
... ]
>>> partition.insert(data)
>>> partition.num_entities
10
delete
(expr, timeout=None, **kwargs)¶Delete entities with an expression condition.
expr (str) -- The expression to specify entities to be deleted
timeout (float) -- An optional duration of time in seconds to allow for the RPC. When timeout is set to None, client waits until server response or error occur
A MutationResult object contains a property named delete_count represents how many entities will be deleted.
MutationResult
RpcError -- If gRPC encounter an error
ParamError -- If parameters are invalid
BaseException -- If the return result from server is not ok
>>> from pymilvus import connections, Collection, Partition, FieldSchema, CollectionSchema, DataType
>>> connections.connect()
>>> schema = CollectionSchema([
... FieldSchema("film_id", DataType.INT64, is_primary=True),
... FieldSchema("films", DataType.FLOAT_VECTOR, dim=2)
... ])
>>> test_collection = Collection("test_partition_delete", schema)
>>> test_partition = test_collection.create_partition("comedy", "comedy films")
>>> data = [
... [i for i in range(10)],
... [[float(i) for i in range(2)] for _ in range(10)],
... ]
>>> test_partition.insert(data)
(insert count: 10, delete count: 0, upsert count: 0, timestamp: 431044482906718212)
>>> test_partition.num_entities
10
>>> test_partition.delete("film_id in [0, 1]")
(insert count: 0, delete count: 2, upsert count: 0, timestamp: 431044582560759811)
search
(data, anns_field, param, limit, expr=None, output_fields=None, timeout=None, round_decimal=-1, **kwargs)¶Vector similarity search with an optional boolean expression as filters.
data (list[list[float]]) -- The vectors of search data, the length of data is number of query (nq), the dim of every vector in data must be equal to vector field's of collection.
anns_field (str) -- The vector field used to search of collection.
param (dict) -- The parameters of search, such as nprobe, etc.
limit -- The max number of returned record, we also called this parameter as topk.
round_decimal (int) -- The specified number of decimal places of returned distance
expr (str) -- The boolean expression used to filter attribute.
output_fields (list[str]) -- The fields to return in the search result, not supported now.
timeout (float) -- An optional duration of time in seconds to allow for the RPC. When timeout is set to None, client waits until server response or error occur.
kwargs --
_async (bool
) --
Indicate if invoke asynchronously. When value is true, method returns a
SearchFuture object; otherwise, method returns results from server directly.
_callback (function
) --
The callback function which is invoked after server response successfully. It only
takes effect when _async is set to True.
consistency_level (str/int
) --
Which consistency level to use when searching in the partition. For details, see
https://github.com/milvus-io/milvus/blob/master/docs/developer_guides/how-guarantee-ts-works.md.
Note: this parameter will overwrite the same parameter specified when user created the collection,
if no consistency level was specified, search will use the consistency level when you create the collection.
guarantee_timestamp (int
) --
This function instructs Milvus to see all operations performed before a provided timestamp. If no
such timestamp is provided, then Milvus will search all operations performed to date.
Note: only used in Customized consistency level.
graceful_time (int
) --
Only used in bounded consistency level. If graceful_time is set, PyMilvus will use current timestamp minus
the graceful_time as the guarantee_timestamp. This option is 5s by default if not set.
travel_timestamp (int
) --
Users can specify a timestamp in a search to get results based on a data view
at a specified point in time.
SearchResult: SearchResult is iterable and is a 2d-array-like class, the first dimension is the number of vectors to query (nq), the second dimension is the number of limit(topk).
RpcError -- If gRPC encounter an error.
ParamError -- If parameters are invalid.
BaseException -- If the return result from server is not ok.
>>> from pymilvus import connections, Collection, FieldSchema, CollectionSchema, DataType
>>> import random
>>> connections.connect()
>>> schema = CollectionSchema([
... FieldSchema("film_id", DataType.INT64, is_primary=True),
... FieldSchema("films", dtype=DataType.FLOAT_VECTOR, dim=2)
... ])
>>> collection = Collection("test_collection_search", schema)
>>> partition = Partition(collection, "comedy", "comedy films")
>>> # insert
>>> data = [
... [i for i in range(10)],
... [[random.random() for _ in range(2)] for _ in range(10)],
... ]
>>> partition.insert(data)
>>> partition.num_entities
10
>>> partition.load()
>>> # search
>>> search_param = {
... "data": [[1.0, 1.0]],
... "anns_field": "films",
... "param": {"metric_type": "L2"},
... "limit": 2,
... "expr": "film_id > 0",
... }
>>> res = partition.search(**search_param)
>>> assert len(res) == 1
>>> hits = res[0]
>>> assert len(hits) == 2
>>> print(f"- Total hits: {len(hits)}, hits ids: {hits.ids} ")
- Total hits: 2, hits ids: [8, 5]
>>> print(f"- Top1 hit id: {hits[0].id}, distance: {hits[0].distance}, score: {hits[0].score} ")
- Top1 hit id: 8, distance: 0.10143111646175385, score: 0.10143111646175385
query
(expr, output_fields=None, timeout=None, **kwargs)¶Query with a set of criteria, and results in a list of records that match the query exactly.
expr (str) -- The query expression
output_fields (list[str]) -- A list of fields to return
timeout (float) -- An optional duration of time in seconds to allow for the RPC. When timeout is set to None, client waits until server response or error occur
kwargs --
consistency_level (str/int
) --
Which consistency level to use during a query on the collection. For details, see
https://github.com/milvus-io/milvus/blob/master/docs/developer_guides/how-guarantee-ts-works.md.
Note: this parameter will overwrite the same parameter specified when user created the collection,
if no consistency level was specified, query will use the consistency level when you create the collection.
guarantee_timestamp (int
) --
This function instructs Milvus to see all operations performed before a provided timestamp. If no
such timestamp is specified, Milvus will query all operations performed to date.
Note: only used in Customized consistency level.
graceful_time (int
) --
Only used in bounded consistency level. If graceful_time is set, PyMilvus will use current timestamp minus
the graceful_time as the guarantee_timestamp. This option is 5s by default if not set.
travel_timestamp (int
) --
Users can specify a timestamp in a search to get results based on a data view
at a specified point in time.
A list that contains all results
list
RpcError -- If gRPC encounter an error
ParamError -- If parameters are invalid
BaseException -- If the return result from server is not ok
>>> from pymilvus import connections, Collection, FieldSchema, CollectionSchema, DataType
>>> import random
>>> connections.connect()
>>> schema = CollectionSchema([
... FieldSchema("film_id", DataType.INT64, is_primary=True),
... FieldSchema("film_date", DataType.INT64),
... FieldSchema("films", dtype=DataType.FLOAT_VECTOR, dim=2)
... ])
>>> collection = Collection("test_collection_query", schema)
>>> partition = Partition(collection, "comedy", "comedy films")
>>> # insert
>>> data = [
... [i for i in range(10)],
... [i + 2000 for i in range(10)],
... [[random.random() for _ in range(2)] for _ in range(10)],
... ]
>>> partition.insert(data)
>>> partition.num_entities
10
>>> partition.load()
>>> # query
>>> expr = "film_id in [ 0, 1 ]"
>>> res = partition.query(expr, output_fields=["film_date"])
>>> assert len(res) == 2
>>> print(f"- Query results: {res}")
- Query results: [{'film_id': 0, 'film_date': 2000}, {'film_id': 1, 'film_date': 2001}]