Partition¶
A partition is a group of entities in one collection with the same label. Entities inserted without a label will be tagged a default label by milvus.
Partition is managable, which means managing a group of entities with the same label in one collection.
Constructor¶
Constructor |
Description |
---|---|
Milvus partition. |
Attributes¶
API |
Description |
---|---|
Return the description text. |
|
Return the partition name. |
|
Return whether the Partition is empty. |
|
Return the number of entities. |
Methods¶
API |
Description |
---|---|
Drop the Partition, as well as its corresponding index files. |
|
Load the Partition from disk to memory. |
|
Release the Partition from memory. |
|
Insert data into partition. |
|
Delete entities with an expression condition. |
|
Vector similarity search with an optional boolean expression as filters. |
|
Query with a set of criteria. |
API Refereences¶
-
class
pymilvus.
Partition
(collection, name, description='', **kwargs)¶ -
property
description
¶ Return the description text.
- Return str
Partition description text, return when operation is successful
- Example
>>> from pymilvus import connections, Collection, Partition, FieldSchema, CollectionSchema, DataType >>> connections.connect() >>> schema = CollectionSchema([ ... FieldSchema("film_id", DataType.INT64, is_primary=True), ... FieldSchema("films", dtype=DataType.FLOAT_VECTOR, dim=2) ... ]) >>> collection = Collection("test_partition_description", schema) >>> partition = Partition(collection, "comedy", "comedy films") >>> partition.description 'comedy films'
-
property
name
¶ Return the partition name.
- Return str
Partition name, return when operation is successful
- Example
>>> from pymilvus import connections, Collection, Partition, FieldSchema, CollectionSchema, DataType >>> connections.connect() >>> schema = CollectionSchema([ ... FieldSchema("film_id", DataType.INT64, is_primary=True), ... FieldSchema("films", dtype=DataType.FLOAT_VECTOR, dim=2) ... ]) >>> collection = Collection("test_partition_name", schema) >>> partition = Partition(collection, "comedy", "comedy films") >>> partition.name 'comedy'
-
property
is_empty
¶ Returns whether the partition is empty
- Return bool
Whether the partition is empty
True: The partition is empty.
False: The partition is not empty.
- Example
>>> from pymilvus import connections, Collection, Partition, FieldSchema, CollectionSchema, DataType >>> connections.connect() >>> schema = CollectionSchema([ ... FieldSchema("film_id", DataType.INT64, is_primary=True), ... FieldSchema("films", dtype=DataType.FLOAT_VECTOR, dim=2) ... ]) >>> collection = Collection("test_partition_is_empty", schema) >>> partition = Partition(collection, "comedy", "comedy films") >>> partition.is_empty True
-
property
num_entities
¶ Return the number of entities.
- Return int
Number of entities in this partition.
- Example
>>> from pymilvus import connections, Collection, Partition, FieldSchema, CollectionSchema, DataType >>> connections.connect() >>> schema = CollectionSchema([ ... FieldSchema("film_id", DataType.INT64, is_primary=True), ... FieldSchema("films", dtype=DataType.FLOAT_VECTOR, dim=2) ... ]) >>> collection = Collection("test_partition_num_entities", schema) >>> partition = Partition(collection, "comedy", "comedy films") >>> data = [ ... [i for i in range(10)], ... [[float(i) for i in range(2)] for _ in range(10)], ... ] >>> partition.insert(data) >>> partition.num_entities 10
-
drop
(timeout=None, **kwargs)¶ Drop the partition, as well as its corresponding index files.
- Parameters
timeout (float) -- An optional duration of time in seconds to allow for the RPC. When timeout is set to None, client waits until server response or error occur
- Raises
PartitionNotExistException -- When partitoin does not exist
- Example
>>> from pymilvus import connections, Collection, Partition, FieldSchema, CollectionSchema, DataType >>> connections.connect() >>> schema = CollectionSchema([ ... FieldSchema("film_id", DataType.INT64, is_primary=True), ... FieldSchema("films", dtype=DataType.FLOAT_VECTOR, dim=2) ... ]) >>> collection = Collection("test_partition_drop", schema) >>> partition = Partition(collection, "comedy", "comedy films") >>> partition.drop()
-
load
(timeout=None, **kwargs)¶ Load the partition from disk to memory.
- Parameters
timeout (float) -- An optional duration of time in seconds to allow for the RPC. When timeout is set to None, client waits until server response or error occur
- Raises
InvalidArgumentException -- If argument is not valid
- Example
>>> from pymilvus import connections, Collection, Partition, FieldSchema, CollectionSchema, DataType >>> connections.connect() >>> schema = CollectionSchema([ ... FieldSchema("film_id", DataType.INT64, is_primary=True), ... FieldSchema("films", dtype=DataType.FLOAT_VECTOR, dim=2) ... ]) >>> collection = Collection("test_partition_load", schema) >>> partition = Partition(collection, "comedy", "comedy films") >>> partition.load()
-
release
(timeout=None, **kwargs)¶ Release the partition from memory.
- Parameters
timeout (float) -- An optional duration of time in seconds to allow for the RPC. When timeout is set to None, client waits until server response or error occur
- Raises
PartitionNotExistException -- When partitoin does not exist
- Example
>>> from pymilvus import connections, Collection, Partition, FieldSchema, CollectionSchema, DataType >>> connections.connect() >>> schema = CollectionSchema([ ... FieldSchema("film_id", DataType.INT64, is_primary=True), ... FieldSchema("films", dtype=DataType.FLOAT_VECTOR, dim=2) ... ]) >>> collection = Collection("test_partition_release", schema) >>> partition = Partition(collection, "comedy", "comedy films") >>> partition.load() >>> partition.release()
-
insert
(data, timeout=None, **kwargs)¶ Insert data into partition.
- Parameters
data (list-like(list, tuple) object or pandas.DataFrame) -- The specified data to insert, the dimension of data needs to align with column number
timeout (float) -- An optional duration of time in seconds to allow for the RPC. When timeout is set to None, client waits until server response or error occur
kwargs --
timeout (
float
) -- An optional duration of time in seconds to allow for the RPC. When timeout is set to None, client waits until server response or error occur.
- Returns
A MutationResult object contains a property named insert_count represents how many
entities have been inserted into milvus and a property named primary_keys is a list of primary keys of the inserted entities. :rtype: MutationResult
- Raises
PartitionNotExistException -- When partitoin does not exist
- Example
>>> from pymilvus import connections, Collection, Partition, FieldSchema, CollectionSchema, DataType >>> connections.connect() >>> schema = CollectionSchema([ ... FieldSchema("film_id", DataType.INT64, is_primary=True), ... FieldSchema("films", dtype=DataType.FLOAT_VECTOR, dim=2) ... ]) >>> collection = Collection("test_partition_insert", schema) >>> partition = Partition(collection, "comedy", "comedy films") >>> data = [ ... [i for i in range(10)], ... [[float(i) for i in range(2)] for _ in range(10)], ... ] >>> partition.insert(data) >>> partition.num_entities 10
-
delete
(expr, timeout=None, **kwargs)¶ Delete entities with an expression condition.
- Parameters
expr (str) -- The expression to specify entities to be deleted
timeout (float) -- An optional duration of time in seconds to allow for the RPC. When timeout is set to None, client waits until server response or error occur
- Returns
A MutationResult object contains a property named delete_count represents how many entities will be deleted.
- Return type
MutationResult
- Raises
RpcError -- If gRPC encounter an error
ParamError -- If parameters are invalid
BaseException -- If the return result from server is not ok
- Example
>>> from pymilvus import connections, Collection, Partition, FieldSchema, CollectionSchema, DataType >>> connections.connect() >>> schema = CollectionSchema([ ... FieldSchema("film_id", DataType.INT64, is_primary=True), ... FieldSchema("films", DataType.FLOAT_VECTOR, dim=2) ... ]) >>> test_collection = Collection("test_partition_delete", schema) >>> test_partition = test_collection.create_partition("comedy", "comedy films") >>> data = [ ... [i for i in range(10)], ... [[float(i) for i in range(2)] for _ in range(10)], ... ] >>> test_partition.insert(data) (insert count: 10, delete count: 0, upsert count: 0, timestamp: 431044482906718212) >>> test_partition.num_entities 10 >>> test_partition.delete("film_id in [0, 1]") (insert count: 0, delete count: 2, upsert count: 0, timestamp: 431044582560759811)
-
search
(data, anns_field, param, limit, expr=None, output_fields=None, timeout=None, round_decimal=-1, **kwargs)¶ Vector similarity search with an optional boolean expression as filters.
- Parameters
data (list[list[float]]) -- The vectors of search data, the length of data is number of query (nq), the dim of every vector in data must be equal to vector field's of collection.
anns_field (str) -- The vector field used to search of collection.
param (dict) -- The parameters of search, such as nprobe, etc.
limit -- The max number of returned record, we also called this parameter as topk.
round_decimal (int) -- The specified number of decimal places of returned distance
expr (str) -- The boolean expression used to filter attribute.
output_fields (list[str]) -- The fields to return in the search result, not supported now.
timeout (float) -- An optional duration of time in seconds to allow for the RPC. When timeout is set to None, client waits until server response or error occur.
kwargs --
_async (
bool
) -- Indicate if invoke asynchronously. When value is true, method returns a SearchFuture object; otherwise, method returns results from server directly._callback (
function
) -- The callback function which is invoked after server response successfully. It only takes effect when _async is set to True.consistency_level (
str/int
) -- Which consistency level to use when searching in the partition. For details, see https://github.com/milvus-io/milvus/blob/master/docs/developer_guides/how-guarantee-ts-works.md. Note: this parameter will overwrite the same parameter specified when user created the collection, if no consistency level was specified, search will use the consistency level when you create the collection.guarantee_timestamp (
int
) -- This function instructs Milvus to see all operations performed before a provided timestamp. If no such timestamp is provided, then Milvus will search all operations performed to date. Note: only used in Customized consistency level.graceful_time (
int
) -- Only used in bounded consistency level. If graceful_time is set, PyMilvus will use current timestamp minus the graceful_time as the guarantee_timestamp. This option is 5s by default if not set.travel_timestamp (
int
) -- Users can specify a timestamp in a search to get results based on a data viewat a specified point in time.
- Returns
SearchResult: SearchResult is iterable and is a 2d-array-like class, the first dimension is the number of vectors to query (nq), the second dimension is the number of limit(topk).
- Return type
- Raises
RpcError -- If gRPC encounter an error.
ParamError -- If parameters are invalid.
BaseException -- If the return result from server is not ok.
- Example
>>> from pymilvus import connections, Collection, FieldSchema, CollectionSchema, DataType >>> import random >>> connections.connect()
>>> schema = CollectionSchema([ ... FieldSchema("film_id", DataType.INT64, is_primary=True), ... FieldSchema("films", dtype=DataType.FLOAT_VECTOR, dim=2) ... ]) >>> collection = Collection("test_collection_search", schema) >>> partition = Partition(collection, "comedy", "comedy films") >>> # insert >>> data = [ ... [i for i in range(10)], ... [[random.random() for _ in range(2)] for _ in range(10)], ... ] >>> partition.insert(data) >>> partition.num_entities 10 >>> partition.load() >>> # search >>> search_param = { ... "data": [[1.0, 1.0]], ... "anns_field": "films", ... "param": {"metric_type": "L2"}, ... "limit": 2, ... "expr": "film_id > 0", ... } >>> res = partition.search(**search_param) >>> assert len(res) == 1 >>> hits = res[0] >>> assert len(hits) == 2 >>> print(f"- Total hits: {len(hits)}, hits ids: {hits.ids} ") - Total hits: 2, hits ids: [8, 5] >>> print(f"- Top1 hit id: {hits[0].id}, distance: {hits[0].distance}, score: {hits[0].score} ") - Top1 hit id: 8, distance: 0.10143111646175385, score: 0.10143111646175385
-
query
(expr, output_fields=None, timeout=None, **kwargs)¶ Query with a set of criteria, and results in a list of records that match the query exactly.
- Parameters
expr (str) -- The query expression
output_fields (list[str]) -- A list of fields to return
timeout (float) -- An optional duration of time in seconds to allow for the RPC. When timeout is set to None, client waits until server response or error occur
kwargs --
consistency_level (
str/int
) -- Which consistency level to use during a query on the collection. For details, see https://github.com/milvus-io/milvus/blob/master/docs/developer_guides/how-guarantee-ts-works.md. Note: this parameter will overwrite the same parameter specified when user created the collection, if no consistency level was specified, query will use the consistency level when you create the collection.guarantee_timestamp (
int
) -- This function instructs Milvus to see all operations performed before a provided timestamp. If no such timestamp is specified, Milvus will query all operations performed to date. Note: only used in Customized consistency level.graceful_time (
int
) -- Only used in bounded consistency level. If graceful_time is set, PyMilvus will use current timestamp minus the graceful_time as the guarantee_timestamp. This option is 5s by default if not set.travel_timestamp (
int
) -- Users can specify a timestamp in a search to get results based on a data viewat a specified point in time.
- Returns
A list that contains all results
- Return type
list
- Raises
RpcError -- If gRPC encounter an error
ParamError -- If parameters are invalid
BaseException -- If the return result from server is not ok
- Example
>>> from pymilvus import connections, Collection, FieldSchema, CollectionSchema, DataType >>> import random >>> connections.connect()
>>> schema = CollectionSchema([ ... FieldSchema("film_id", DataType.INT64, is_primary=True), ... FieldSchema("film_date", DataType.INT64), ... FieldSchema("films", dtype=DataType.FLOAT_VECTOR, dim=2) ... ]) >>> collection = Collection("test_collection_query", schema) >>> partition = Partition(collection, "comedy", "comedy films") >>> # insert >>> data = [ ... [i for i in range(10)], ... [i + 2000 for i in range(10)], ... [[random.random() for _ in range(2)] for _ in range(10)], ... ] >>> partition.insert(data) >>> partition.num_entities 10 >>> partition.load() >>> # query >>> expr = "film_id in [ 0, 1 ]" >>> res = partition.query(expr, output_fields=["film_date"]) >>> assert len(res) == 2 >>> print(f"- Query results: {res}") - Query results: [{'film_id': 0, 'film_date': 2000}, {'film_id': 1, 'film_date': 2001}]
-
property