milvus-logo

Partition

A partition is a group of entities in one collection with the same label. Entities inserted without a label will be tagged a default label by milvus.

Partition is managable, which means managing a group of entities with the same label in one collection.

Constructor

Constructor

Description

Partition()

Milvus partition.

Attributes

API

Description

description

Return the description text.

name

Return the partition name.

is_empty

Return whether the Partition is empty.

num_entities

Return the number of entities.

Methods

API

Description

drop()

Drop the Partition, as well as its corresponding index files.

load()

Load the Partition from disk to memory.

release()

Release the Partition from memory.

insert()

Insert data into partition.

delete()

Delete entities with an expression condition.

search()

Vector similarity search with an optional boolean expression as filters.

query()

Query with a set of criteria.

API Refereences

class pymilvus.Partition(collection, name, description='', **kwargs)
property description

Return the description text.

Return str

Partition description text, return when operation is successful

Example
>>> from pymilvus import connections, Collection, Partition, FieldSchema, CollectionSchema, DataType
>>> connections.connect()
>>> schema = CollectionSchema([
...     FieldSchema("film_id", DataType.INT64, is_primary=True),
...     FieldSchema("films", dtype=DataType.FLOAT_VECTOR, dim=2)
... ])
>>> collection = Collection("test_partition_description", schema)
>>> partition = Partition(collection, "comedy", "comedy films")
>>> partition.description
'comedy films'
property name

Return the partition name.

Return str

Partition name, return when operation is successful

Example
>>> from pymilvus import connections, Collection, Partition, FieldSchema, CollectionSchema, DataType
>>> connections.connect()
>>> schema = CollectionSchema([
...     FieldSchema("film_id", DataType.INT64, is_primary=True),
...     FieldSchema("films", dtype=DataType.FLOAT_VECTOR, dim=2)
... ])
>>> collection = Collection("test_partition_name", schema)
>>> partition = Partition(collection, "comedy", "comedy films")
>>> partition.name
'comedy'
property is_empty

Returns whether the partition is empty

Return bool

Whether the partition is empty

  • True: The partition is empty.

  • False: The partition is not empty.

Example
>>> from pymilvus import connections, Collection, Partition, FieldSchema, CollectionSchema, DataType
>>> connections.connect()
>>> schema = CollectionSchema([
...     FieldSchema("film_id", DataType.INT64, is_primary=True),
...     FieldSchema("films", dtype=DataType.FLOAT_VECTOR, dim=2)
... ])
>>> collection = Collection("test_partition_is_empty", schema)
>>> partition = Partition(collection, "comedy", "comedy films")
>>> partition.is_empty
True
property num_entities

Return the number of entities.

Return int

Number of entities in this partition.

Example
>>> from pymilvus import connections, Collection, Partition, FieldSchema, CollectionSchema, DataType
>>> connections.connect()
>>> schema = CollectionSchema([
...     FieldSchema("film_id", DataType.INT64, is_primary=True),
...     FieldSchema("films", dtype=DataType.FLOAT_VECTOR, dim=2)
... ])
>>> collection = Collection("test_partition_num_entities", schema)
>>> partition = Partition(collection, "comedy", "comedy films")
>>> data = [
...     [i for i in range(10)],
...     [[float(i) for i in range(2)] for _ in range(10)],
... ]
>>> partition.insert(data)
>>> partition.num_entities
10
drop(timeout=None, **kwargs)

Drop the partition, as well as its corresponding index files.

Parameters

timeout (float) -- An optional duration of time in seconds to allow for the RPC. When timeout is set to None, client waits until server response or error occur

Raises

PartitionNotExistException -- When partitoin does not exist

Example
>>> from pymilvus import connections, Collection, Partition, FieldSchema, CollectionSchema, DataType
>>> connections.connect()
>>> schema = CollectionSchema([
...     FieldSchema("film_id", DataType.INT64, is_primary=True),
...     FieldSchema("films", dtype=DataType.FLOAT_VECTOR, dim=2)
... ])
>>> collection = Collection("test_partition_drop", schema)
>>> partition = Partition(collection, "comedy", "comedy films")
>>> partition.drop()
load(timeout=None, **kwargs)

Load the partition from disk to memory.

Parameters

timeout (float) -- An optional duration of time in seconds to allow for the RPC. When timeout is set to None, client waits until server response or error occur

Raises

InvalidArgumentException -- If argument is not valid

Example
>>> from pymilvus import connections, Collection, Partition, FieldSchema, CollectionSchema, DataType
>>> connections.connect()
>>> schema = CollectionSchema([
...     FieldSchema("film_id", DataType.INT64, is_primary=True),
...     FieldSchema("films", dtype=DataType.FLOAT_VECTOR, dim=2)
... ])
>>> collection = Collection("test_partition_load", schema)
>>> partition = Partition(collection, "comedy", "comedy films")
>>> partition.load()
release(timeout=None, **kwargs)

Release the partition from memory.

Parameters

timeout (float) -- An optional duration of time in seconds to allow for the RPC. When timeout is set to None, client waits until server response or error occur

Raises

PartitionNotExistException -- When partitoin does not exist

Example
>>> from pymilvus import connections, Collection, Partition, FieldSchema, CollectionSchema, DataType
>>> connections.connect()
>>> schema = CollectionSchema([
...     FieldSchema("film_id", DataType.INT64, is_primary=True),
...     FieldSchema("films", dtype=DataType.FLOAT_VECTOR, dim=2)
... ])
>>> collection = Collection("test_partition_release", schema)
>>> partition = Partition(collection, "comedy", "comedy films")
>>> partition.load()
>>> partition.release()
insert(data, timeout=None, **kwargs)

Insert data into partition.

Parameters
  • data (list-like(list, tuple) object or pandas.DataFrame) -- The specified data to insert, the dimension of data needs to align with column number

  • timeout (float) -- An optional duration of time in seconds to allow for the RPC. When timeout is set to None, client waits until server response or error occur

  • kwargs --

    • timeout (float) -- An optional duration of time in seconds to allow for the RPC. When timeout is set to None, client waits until server response or error occur.

Returns

A MutationResult object contains a property named insert_count represents how many

entities have been inserted into milvus and a property named primary_keys is a list of primary keys of the inserted entities. :rtype: MutationResult

Raises

PartitionNotExistException -- When partitoin does not exist

Example
>>> from pymilvus import connections, Collection, Partition, FieldSchema, CollectionSchema, DataType
>>> connections.connect()
>>> schema = CollectionSchema([
...     FieldSchema("film_id", DataType.INT64, is_primary=True),
...     FieldSchema("films", dtype=DataType.FLOAT_VECTOR, dim=2)
... ])
>>> collection = Collection("test_partition_insert", schema)
>>> partition = Partition(collection, "comedy", "comedy films")
>>> data = [
...     [i for i in range(10)],
...     [[float(i) for i in range(2)] for _ in range(10)],
... ]
>>> partition.insert(data)
>>> partition.num_entities
10
delete(expr, timeout=None, **kwargs)

Delete entities with an expression condition.

Parameters
  • expr (str) -- The expression to specify entities to be deleted

  • timeout (float) -- An optional duration of time in seconds to allow for the RPC. When timeout is set to None, client waits until server response or error occur

Returns

A MutationResult object contains a property named delete_count represents how many entities will be deleted.

Return type

MutationResult

Raises
  • RpcError -- If gRPC encounter an error

  • ParamError -- If parameters are invalid

  • BaseException -- If the return result from server is not ok

Example
>>> from pymilvus import connections, Collection, Partition, FieldSchema, CollectionSchema, DataType
>>> connections.connect()
>>> schema = CollectionSchema([
...     FieldSchema("film_id", DataType.INT64, is_primary=True),
...     FieldSchema("films", DataType.FLOAT_VECTOR, dim=2)
... ])
>>> test_collection = Collection("test_partition_delete", schema)
>>> test_partition = test_collection.create_partition("comedy", "comedy films")
>>> data = [
...     [i for i in range(10)],
...     [[float(i) for i in range(2)] for _ in range(10)],
... ]
>>> test_partition.insert(data)
(insert count: 10, delete count: 0, upsert count: 0, timestamp: 431044482906718212)
>>> test_partition.num_entities
10
>>> test_partition.delete("film_id in [0, 1]")
(insert count: 0, delete count: 2, upsert count: 0, timestamp: 431044582560759811)
search(data, anns_field, param, limit, expr=None, output_fields=None, timeout=None, round_decimal=-1, **kwargs)

Vector similarity search with an optional boolean expression as filters.

Parameters
  • data (list[list[float]]) -- The vectors of search data, the length of data is number of query (nq), the dim of every vector in data must be equal to vector field's of collection.

  • anns_field (str) -- The vector field used to search of collection.

  • param (dict) -- The parameters of search, such as nprobe, etc.

  • limit -- The max number of returned record, we also called this parameter as topk.

  • round_decimal (int) -- The specified number of decimal places of returned distance

  • expr (str) -- The boolean expression used to filter attribute.

  • output_fields (list[str]) -- The fields to return in the search result, not supported now.

  • timeout (float) -- An optional duration of time in seconds to allow for the RPC. When timeout is set to None, client waits until server response or error occur.

  • kwargs --

    • _async (bool) -- Indicate if invoke asynchronously. When value is true, method returns a SearchFuture object; otherwise, method returns results from server directly.

    • _callback (function) -- The callback function which is invoked after server response successfully. It only takes effect when _async is set to True.

    • consistency_level (str/int) -- Which consistency level to use when searching in the partition. For details, see https://github.com/milvus-io/milvus/blob/master/docs/developer_guides/how-guarantee-ts-works.md. Note: this parameter will overwrite the same parameter specified when user created the collection, if no consistency level was specified, search will use the consistency level when you create the collection.

    • guarantee_timestamp (int) -- This function instructs Milvus to see all operations performed before a provided timestamp. If no such timestamp is provided, then Milvus will search all operations performed to date. Note: only used in Customized consistency level.

    • graceful_time (int) -- Only used in bounded consistency level. If graceful_time is set, PyMilvus will use current timestamp minus the graceful_time as the guarantee_timestamp. This option is 5s by default if not set.

    • travel_timestamp (int) -- Users can specify a timestamp in a search to get results based on a data view

      at a specified point in time.

Returns

SearchResult: SearchResult is iterable and is a 2d-array-like class, the first dimension is the number of vectors to query (nq), the second dimension is the number of limit(topk).

Return type

SearchResult

Raises
  • RpcError -- If gRPC encounter an error.

  • ParamError -- If parameters are invalid.

  • BaseException -- If the return result from server is not ok.

Example
>>> from pymilvus import connections, Collection, FieldSchema, CollectionSchema, DataType
>>> import random
>>> connections.connect()

>>> schema = CollectionSchema([
...     FieldSchema("film_id", DataType.INT64, is_primary=True),
...     FieldSchema("films", dtype=DataType.FLOAT_VECTOR, dim=2)
... ])
>>> collection = Collection("test_collection_search", schema)
>>> partition = Partition(collection, "comedy", "comedy films")
>>> # insert
>>> data = [
...     [i for i in range(10)],
...     [[random.random() for _ in range(2)] for _ in range(10)],
... ]
>>> partition.insert(data)
>>> partition.num_entities
10
>>> partition.load()
>>> # search
>>> search_param = {
...     "data": [[1.0, 1.0]],
...     "anns_field": "films",
...     "param": {"metric_type": "L2"},
...     "limit": 2,
...     "expr": "film_id > 0",
... }
>>> res = partition.search(**search_param)
>>> assert len(res) == 1
>>> hits = res[0]
>>> assert len(hits) == 2
>>> print(f"- Total hits: {len(hits)}, hits ids: {hits.ids} ")
- Total hits: 2, hits ids: [8, 5]
>>> print(f"- Top1 hit id: {hits[0].id}, distance: {hits[0].distance}, score: {hits[0].score} ")
- Top1 hit id: 8, distance: 0.10143111646175385, score: 0.10143111646175385
query(expr, output_fields=None, timeout=None, **kwargs)

Query with a set of criteria, and results in a list of records that match the query exactly.

Parameters
  • expr (str) -- The query expression

  • output_fields (list[str]) -- A list of fields to return

  • timeout (float) -- An optional duration of time in seconds to allow for the RPC. When timeout is set to None, client waits until server response or error occur

  • kwargs --

    • consistency_level (str/int) -- Which consistency level to use during a query on the collection. For details, see https://github.com/milvus-io/milvus/blob/master/docs/developer_guides/how-guarantee-ts-works.md. Note: this parameter will overwrite the same parameter specified when user created the collection, if no consistency level was specified, query will use the consistency level when you create the collection.

    • guarantee_timestamp (int) -- This function instructs Milvus to see all operations performed before a provided timestamp. If no such timestamp is specified, Milvus will query all operations performed to date. Note: only used in Customized consistency level.

    • graceful_time (int) -- Only used in bounded consistency level. If graceful_time is set, PyMilvus will use current timestamp minus the graceful_time as the guarantee_timestamp. This option is 5s by default if not set.

    • travel_timestamp (int) -- Users can specify a timestamp in a search to get results based on a data view

      at a specified point in time.

Returns

A list that contains all results

Return type

list

Raises
  • RpcError -- If gRPC encounter an error

  • ParamError -- If parameters are invalid

  • BaseException -- If the return result from server is not ok

Example
>>> from pymilvus import connections, Collection, FieldSchema, CollectionSchema, DataType
>>> import random
>>> connections.connect()

>>> schema = CollectionSchema([
...     FieldSchema("film_id", DataType.INT64, is_primary=True),
...     FieldSchema("film_date", DataType.INT64),
...     FieldSchema("films", dtype=DataType.FLOAT_VECTOR, dim=2)
... ])
>>> collection = Collection("test_collection_query", schema)
>>> partition = Partition(collection, "comedy", "comedy films")
>>> # insert
>>> data = [
...     [i for i in range(10)],
...     [i + 2000 for i in range(10)],
...     [[random.random() for _ in range(2)] for _ in range(10)],
... ]
>>> partition.insert(data)
>>> partition.num_entities
10
>>> partition.load()
>>> # query
>>> expr = "film_id in [ 0, 1 ]"
>>> res = partition.query(expr, output_fields=["film_date"])
>>> assert len(res) == 2
>>> print(f"- Query results: {res}")
- Query results: [{'film_id': 0, 'film_date': 2000}, {'film_id': 1, 'film_date': 2001}]