Collection

The scheme of a collection is fixed when collection created. Collection scheme consists of many fields, and must contain a vector field. A field to collection is like a column to RDBMS table. Data type are the same in one field.

A collection is a set of entities, which are also called rows. An entity contains data of all fields. Each entity can be labeled, a group of entities with the same label is called a partition. Entity without a label will be tagged a default label by Milvus.

Constructor

Constructor

Description

Collection()

Milvus client

Attributes

API

Description

schema

Return the schema of collection.

description

Return the description text about the collection.

name

Return the collection name.

is_empty

Return whether the collection is empty.

num_entities

Return the number of entities.

primary_field

Return the primary field of collection.

partitions

Return all partitions of the collection.

indexes

Return all indexes of the collection.

APIs References

class pymilvus_orm.Collection(name, data=None, schema=None, **kwargs)

This is a class corresponding to collection in milvus.

property schema

Return the schema of collection.

Return schema.CollectionSchema

Schema of collection.

property description

Return the description text about the collection.

Return str

Collection description text, return when operation is successful.

Example

>>> from pymilvus_orm.collection import Collection
>>> from pymilvus_orm.schema import FieldSchema, CollectionSchema
>>> from pymilvus_orm.types import DataType
>>> from pymilvus_orm import connections
>>> connections.create_connection(alias="default")
<milvus.client.stub.Milvus object at 0x7f9a190ca898>
>>> field = FieldSchema("int64", DataType.INT64, descrition="int64", is_primary=False)
>>> schema = CollectionSchema(fields=[field], description="test get description")
>>> collection = Collection(name="test_collection", schema=schema, _using="default")
>>> collection.description
'test get description'
property name

Return the collection name.

Return str

Collection name, return when operation is successful.

Example

>>> from pymilvus_orm.collection import Collection
>>> from pymilvus_orm.schema import FieldSchema, CollectionSchema
>>> from pymilvus_orm.types import DataType
>>> from pymilvus_orm import connections
>>> connections.create_connection(alias="default")
<milvus.client.stub.Milvus object at 0x7f9a190ca898>
>>> field = FieldSchema("int64", DataType.INT64, descrition="int64", is_primary=False)
>>> schema = CollectionSchema(fields=[field], description="test get collection name")
>>> collection = Collection(name="test_collection", schema=schema, _using="default")
>>> collection.name
'test_collection'
property is_empty

Return whether the collection is empty. This method need to call num_entities.

Return bool

Whether the collection is empty.

Example

>>> from pymilvus_orm.collection import Collection
>>> from pymilvus_orm.schema import FieldSchema, CollectionSchema
>>> from pymilvus_orm.types import DataType
>>> from pymilvus_orm import connections
>>> connections.create_connection(alias="default")
<milvus.client.stub.Milvus object at 0x7f9a190ca898>
>>> field = FieldSchema("int64", DataType.INT64, descrition="int64", is_primary=False)
>>> schema = CollectionSchema(fields=[field], description="test collection is empty")
>>> collection = Collection(name="test_collection", schema=schema)
>>> collection.is_empty
True
>>> data = [[1,2,3,4]]
>>> collection.insert(data)
[424769928069057860, 424769928069057861, 424769928069057862, 424769928069057863]
>>> collection.is_empty
False
property num_entities

Return the number of entities.

Return int

Number of entities in this collection.

Raises

CollectionNotExistException -- If collection doesn't exist.

Example

>>> from pymilvus_orm.collection import Collection
>>> from pymilvus_orm.schema import FieldSchema, CollectionSchema
>>> from pymilvus_orm.types import DataType
>>> from pymilvus_orm import connections
>>> connections.create_connection(alias="default")
<milvus.client.stub.Milvus object at 0x7f9a190ca898>
>>> field = FieldSchema("int64", DataType.INT64, descrition="int64", is_primary=False)
>>> schema = CollectionSchema(fields=[field], description="get collection entities num")
>>> collection = Collection(name="test_collection", schema=schema)
>>> collection.num_entities
0
>>> data = [[1,2,3,4]]
>>> collection.insert(data)
[424769928069057860, 424769928069057861, 424769928069057862, 424769928069057863]
>>> collection.num_entities
4
property primary_field

Return the primary field of collection.

Return schema.FieldSchema

The primary field of collection.

Example

>>> from pymilvus_orm.collection import Collection
>>> from pymilvus_orm.schema import FieldSchema, CollectionSchema
>>> from pymilvus_orm.types import DataType
>>> from pymilvus_orm import connections
>>> connections.create_connection(alias="default")
<milvus.client.stub.Milvus object at 0x7f9a190ca898>
>>> field = FieldSchema("int64", DataType.INT64, descrition="int64", is_primary=True)
>>> schema = CollectionSchema(fields=[field], description="get collection entities num")
>>> collection = Collection(name="test_collection", schema=schema)
>>> collection.primary_field
<pymilvus_orm.schema.FieldSchema object at 0x7f64f6a3cc40>
>>> collection.primary_field.name
'int64'
drop(**kwargs)

Drop the collection, as well as its corresponding index files.

Parameters

kwargs --

  • timeout (float) -- An optional duration of time in seconds to allow for the RPC. When timeout is set to None, client waits until server response or error occur.

Raises

CollectionNotExistException -- If collection doesn't exist.

Example

>>> from pymilvus_orm.collection import Collection
>>> from pymilvus_orm.schema import FieldSchema, CollectionSchema
>>> from pymilvus_orm.types import DataType
>>> from pymilvus_orm import connections
>>> connections.create_connection(alias="default")
<milvus.client.stub.Milvus object at 0x7f9a190ca898>
>>> field = FieldSchema("int64", DataType.INT64, descrition="int64", is_primary=False)
>>> schema = CollectionSchema(fields=[field], description="drop collection")
>>> collection = Collection(name="test_collection", schema=schema)
>>> import pandas as pd
>>> int64_series = pd.Series(data=list(range(10, 20)), index=list(range(10)))
>>> data = pd.DataFrame(data={"int64" : int64_series})
>>> collection.insert(data=data)
>>> collection.num_entities
>>> collection.drop()
>>> from pymilvus_orm import utility
>>> utility.has_collection("test_collection")
False
load(field_names=None, index_names=None, partition_names=None, **kwargs)

Load the collection from disk to memory.

Parameters
  • field_names (list[str]) -- The specified fields to load.

  • index_names (list[str]) -- The specified indexes to load.

  • partition_names (list[str]) -- The specified partitions to load.

  • kwargs --

    • timeout (float) -- An optional duration of time in seconds to allow for the RPC. When timeout is set to None, client waits until server response or error occur.

Raises
  • CollectionNotExistException -- If collection doesn't exist.

  • ParamError -- If parameters are invalid.

  • BaseException -- If fields, index or partition doesn't exist.

Example

>>> from pymilvus_orm.collection import Collection
>>> from pymilvus_orm.schema import FieldSchema, CollectionSchema
>>> from pymilvus_orm import connections
>>> from pymilvus_orm.types import DataType
>>> field = FieldSchema("int64", DataType.INT64, is_primary=False, description="int64")
>>> schema = CollectionSchema([field], description="collection schema has a int64 field")
>>> connections.create_connection()
<milvus.client.stub.Milvus object at 0x7f8579002dc0>
>>> collection = Collection(name="test_collection", schema=schema)
>>> import pandas as pd
>>> int64_series = pd.Series(data=list(range(10, 20)), index=list(range(10)))
>>> data = pd.DataFrame(data={"int64" : int64_series})
>>> collection.insert(data)
>>> collection.load() # load collection to memory
>>> assert not collection.is_empty
>>> assert collection.num_entities == 10
release(**kwargs)

Release the collection from memory.

Parameters

kwargs --

  • timeout (float) -- An optional duration of time in seconds to allow for the RPC. When timeout is set to None, client waits until server response or error occur.

Raises
  • CollectionNotExistException -- If collection doesn't exist.

  • BaseException -- If collection hasn't been loaded.

Example

>>> from pymilvus_orm.collection import Collection
>>> from pymilvus_orm.schema import FieldSchema, CollectionSchema
>>> from pymilvus_orm import connections
>>> from pymilvus_orm.types import DataType
>>> field = FieldSchema("int64", DataType.INT64, is_primary=False, description="int64")
>>> schema = CollectionSchema([field], description="collection schema has a int64 field")
>>> connections.create_connection()
<milvus.client.stub.Milvus object at 0x7f8579002dc0>
>>> collection = Collection(name="test_collection", schema=schema)
>>> import pandas as pd
>>> int64_series = pd.Series(data=list(range(10, 20)), index=list(range(10)))
>>> data = pd.DataFrame(data={"int64" : int64_series})
>>> collection.insert(data)
>>> collection.load()   # load collection to memory
>>> assert not collection.is_empty
>>> assert collection.num_entities == 10
>>> collection.release()    # release the collection from memory
insert(data, partition_name=None, **kwargs)

Insert data into collection.

Parameters
  • data (list-like(list, tuple) object or pandas.DataFrame) -- The specified data to insert, the dimension of data needs to align with column number

  • partition_name (str) -- The partition name which the data will be inserted to, if partition name is not passed, then the data will be inserted to "_default" partition

  • kwargs --

    • timeout (float) -- An optional duration of time in seconds to allow for the RPC. When timeout is set to None, client waits until server response or error occur.

Raises
  • CollectionNotExistException -- If collection doesn't exist.

  • ParamError -- If parameters are invalid.

  • BaseException -- If partition doesn't exist.

Example

>>> from pymilvus_orm.collection import Collection
>>> from pymilvus_orm.schema import FieldSchema, CollectionSchema
>>> from pymilvus_orm import connections
>>> from pymilvus_orm.types import DataType
>>> connections.create_connection()
<milvus.client.stub.Milvus object at 0x7f8579002dc0>
>>> field = FieldSchema("int64", DataType.INT64, is_primary=False, description="int64")
>>> schema = CollectionSchema([field], description="collection schema has a int64 field")
>>> collection = Collection(name="test_collection", schema=schema)
>>> import random
>>> data = [[random.randint(1, 100) for _ in range(10)]]
>>> collection.insert(data)
>>> collection.load()
>>> assert not collection.is_empty
>>> assert collection.num_entities == 10
search(data, anns_field, param, limit, expression, partition_names=None, output_fields=None, timeout=None, **kwargs)

Vector similarity search with an optional boolean expression as filters.

Parameters
  • data (list[list[float]]) -- The vectors of search data, the length of data is number of query (nq), the dim of every vector in data must be equal to vector field's of collection.

  • anns_field (str) -- The vector field used to search of collection.

  • param (dict) -- The parameters of search, such as nprobe, etc.

  • limit (int) -- The max number of returned record, we also called this parameter as topk.

  • expression (str) -- The boolean expression used to filter attribute.

  • partition_names (list[str]) -- The names of partitions to search.

  • output_fields (list[str]) -- The fields to return in the search result, not supported now.

  • timeout (float) -- An optional duration of time in seconds to allow for the RPC. When timeout is set to None, client waits until server response or error occur.

  • kwargs --

    • _async (bool) -- Indicate if invoke asynchronously. When value is true, method returns a SearchResultFuture object; otherwise, method returns results from server directly.

    • _callback (function) -- The callback function which is invoked after server response successfully. It only takes effect when _async is set to True.

Returns

SearchResult: SearchResult is iterable and is a 2d-array-like class, the first dimension is the number of vectors to query (nq), the second dimension is the number of limit(topk).

Return type

SearchResult

Raises
  • RpcError -- If gRPC encounter an error.

  • ParamError -- If parameters are invalid.

  • BaseException -- If the return result from server is not ok.

Example

>>> from pymilvus_orm.collection import Collection
>>> from pymilvus_orm.schema import FieldSchema, CollectionSchema
>>> from pymilvus_orm import connections
>>> from pymilvus_orm.types import DataType
>>> connections.create_connection()
<milvus.client.stub.Milvus object at 0x7f8579002dc0>
>>> dim = 128
>>> year_field = FieldSchema("year", DataType.INT64, is_primary=False, description="year")
>>> embedding_field = FieldSchema(name="embedding", dtype=DataType.FLOAT_VECTOR, dim=dim)
>>> schema = CollectionSchema(fields=[year_field, embedding_field])
>>> collection = Collection(name="test_collection", schema=schema)
>>> import random
>>> nb = 3000
>>> nq = 10
>>> limit = 10
>>> years = [i for i in range(nb)]
>>> embeddings = [[random.random() for _ in range(dim)] for _ in range(nb)]
>>> collection.insert([years, embeddings])
>>> collection.load()
>>> search_params = {"metric_type": "L2", "params": {"nprobe": 10}}
>>> res = collection.search(embeddings[:10], "embedding", search_params, limit, "year > 20")
>>> assert len(res) == nq
>>> for hits in res:
>>>     assert len(hits) == limit
>>> hits = res[0]
>>> assert len(hits.ids) == limit
>>> top1 = hits[0]
>>> print(top1.id)
>>> print(top1.distance)
>>> print(top1.score)
property partitions

Return all partitions of the collection.

Return list[Partition]

List of Partition object, return when operation is successful.

Raises

CollectionNotExistException -- If collection doesn't exist.

Example

>>> from pymilvus_orm.collection import Collection
>>> from pymilvus_orm.schema import FieldSchema, CollectionSchema
>>> from pymilvus_orm.types import DataType
>>> from pymilvus_orm import connections
>>> connections.create_connection(alias="default")
<milvus.client.stub.Milvus object at 0x7f9a190ca898>
>>> field = FieldSchema("int64", DataType.INT64, descrition="int64", is_primary=False)
>>> schema = CollectionSchema(fields=[field], description="collection description")
>>> collection = Collection(name="test_collection", schema=schema, alias="default")
>>> collection.partitions
[{"name": "_default", "description": "", "num_entities": 0}]
partition(partition_name) → pymilvus_orm.partition.Partition

Return the partition corresponding to name. Return None if not existed.

Parameters

partition_name (str) -- The name of the partition to get.

Return Partition

Partition object corresponding to partition_name.

Raises
  • CollectionNotExistException -- If collection doesn't exist.

  • BaseException -- If partition doesn't exist.

Example

>>> from pymilvus_orm.collection import Collection
>>> from pymilvus_orm.schema import FieldSchema, CollectionSchema
>>> from pymilvus_orm.types import DataType
>>> from pymilvus_orm import connections
>>> connections.create_connection(alias="default")
<milvus.client.stub.Milvus object at 0x7f9a190ca898>
>>> field = FieldSchema("int64", DataType.INT64, descrition="int64", is_primary=False)
>>> schema = CollectionSchema(fields=[field], description="collection description")
>>> collection = Collection(name="test_collection", schema=schema, alias="default")
>>> collection.partition("partition")
>>> collection.partition("_default")
{"name": "_default", "description": "", "num_entities": 0}
has_partition(partition_name) → bool

Checks if a specified partition exists.

Parameters

partition_name (str) -- The name of the partition to check

Return bool

Whether a specified partition exists.

Raises

CollectionNotExistException -- If collection doesn't exist.

Example

>>> from pymilvus_orm.collection import Collection
>>> from pymilvus_orm.schema import FieldSchema, CollectionSchema
>>> from pymilvus_orm.types import DataType
>>> from pymilvus_orm import connections
>>> connections.create_connection(alias="default")
<milvus.client.stub.Milvus object at 0x7f9a190ca898>
>>> field = FieldSchema("int64", DataType.INT64, descrition="int64", is_primary=False)
>>> schema = CollectionSchema(fields=[field], description="collection description")
>>> collection = Collection(name="test_collection", schema=schema, alias="default")
>>> collection.create_partition(partition_name="partition", description="test partition")
{"name": "partition", "description": "", "num_entities": 0}
>>> collection.partition("partition")
{"name": "partition", "description": "", "num_entities": 0}
>>> collection.has_partition("partition")
True
>>> collection.has_partition("partition2")
False
drop_partition(partition_name, **kwargs)

Drop the partition and its corresponding index files.

Parameters
  • partition_name (str) -- The name of the partition to drop.

  • kwargs --

    • timeout (float) -- An optional duration of time in seconds to allow for the RPC. When timeout is set to None, client waits until server response or error occur.

Raises
  • CollectionNotExistException -- If collection doesn't exist.

  • BaseException -- If partition doesn't exist.

Example

>>> from pymilvus_orm.collection import Collection
>>> from pymilvus_orm.schema import FieldSchema, CollectionSchema
>>> from pymilvus_orm.types import DataType
>>> from pymilvus_orm import connections
>>> connections.create_connection(alias="default")
<milvus.client.stub.Milvus object at 0x7f9a190ca898>
>>> field = FieldSchema("int64", DataType.INT64, descrition="int64", is_primary=False)
>>> schema = CollectionSchema(fields=[field], description="collection description")
>>> collection = Collection(name="test_collection", schema=schema, alias="default")
>>> collection.create_partition(partition_name="partition", description="test partition")
{"name": "partition", "description": "", "num_entities": 0}
>>> collection.partition("partition")
{"name": "partition", "description": "", "num_entities": 0}
>>> collection.has_partition("partition")
True
>>> collection.drop_partition("partition")
>>> collection.has_partition("partition")
False
property indexes

Return all indexes of the collection.

Return list[Index]

List of Index object, return when operation is successful.

Raises

CollectionNotExistException -- If collection doesn't exist.

Example

>>> from pymilvus_orm.collection import Collection
>>> from pymilvus_orm.schema import FieldSchema, CollectionSchema
>>> from pymilvus_orm.types import DataType
>>> from pymilvus_orm import connections
>>> connections.create_connection(alias="default")
<milvus.client.stub.Milvus object at 0x7f9a190ca898>
>>> field = FieldSchema("int64", DataType.INT64, descrition="int64", is_primary=False)
>>> schema = CollectionSchema(fields=[field], description="collection description")
>>> collection = Collection(name="test_collection", schema=schema, alias="default")
>>> collection.indexes
[]
index(index_name='') → pymilvus_orm.index.Index

Return the index corresponding to name.

Parameters

index_name (str) -- The name of the index to create.

Return Index

Index object corresponding to index_name.

Raises
  • CollectionNotExistException -- If collection doesn't exist.

  • BaseException -- If index doesn't exist.

Example

>>> from pymilvus_orm.collection import Collection
>>> from pymilvus_orm.schema import FieldSchema, CollectionSchema
>>> from pymilvus_orm.types import DataType
>>> from pymilvus_orm import connections
>>> connections.create_connection(alias="default")
<milvus.client.stub.Milvus object at 0x7f9a190ca898>
>>> year_field = FieldSchema("year", DataType.INT64, is_primary=False, description="year")
>>> embedding_field = FieldSchema(name="embedding", dtype=DataType.FLOAT_VECTOR, dim=128)
>>> schema = CollectionSchema(fields=[year_field, embedding_field])
>>> collection = Collection(name="test_collection", schema=schema)
>>> index = {"index_type": "IVF_FLAT", "params": {"nlist": 128}, "metric_type": "L2"}
>>> collection.create_index("embedding", index)
Status(code=0, message='')
>>> collection.indexes
[<pymilvus_orm.index.Index object at 0x7f4435587e20>]
>>> collection.index()
<pymilvus_orm.index.Index object at 0x7f44355a1460>
create_index(field_name, index_params, index_name='', **kwargs) → pymilvus_orm.index.Index

Create index on a specified column according to the index parameters. Return Index Object.

Parameters
  • field_name (str) -- The name of the field to create an index for.

  • index_params (dict) -- Indexing parameters.

  • index_name (str) -- The name of the index to create.

Raises
  • CollectionNotExistException -- If collection doesn't exist.

  • ParamError -- If index parameters are invalid.

  • BaseException -- If field doesn't exist.

  • BaseException -- If index has been created.

Example

>>> from pymilvus_orm.collection import Collection
>>> from pymilvus_orm.schema import FieldSchema, CollectionSchema
>>> from pymilvus_orm.types import DataType
>>> from pymilvus_orm import connections
>>> connections.create_connection(alias="default")
<milvus.client.stub.Milvus object at 0x7f9a190ca898>
>>> year_field = FieldSchema("year", DataType.INT64, is_primary=False, description="year")
>>> embedding_field = FieldSchema(name="embedding", dtype=DataType.FLOAT_VECTOR, dim=128)
>>> schema = CollectionSchema(fields=[year_field, embedding_field])
>>> collection = Collection(name="test_collection", schema=schema)
>>> index = {"index_type": "IVF_FLAT", "params": {"nlist": 128}, "metric_type": "L2"}
>>> collection.create_index("embedding", index)
Status(code=0, message='')
>>> collection.indexes
[<pymilvus_orm.index.Index object at 0x7f4435587e20>]
>>> collection.index()
<pymilvus_orm.index.Index object at 0x7f44355a1460>
has_index(index_name='') → bool

Checks whether a specified index exists.

Parameters

index_name (str) -- The name of the index to check.

Return bool

If specified index exists.

Raises

CollectionNotExistException -- If collection doesn't exist.

Example

>>> from pymilvus_orm.collection import Collection
>>> from pymilvus_orm.schema import FieldSchema, CollectionSchema
>>> from pymilvus_orm.types import DataType
>>> from pymilvus_orm import connections
>>> connections.create_connection(alias="default")
<milvus.client.stub.Milvus object at 0x7f9a190ca898>
>>> year_field = FieldSchema("year", DataType.INT64, is_primary=False, description="year")
>>> embedding_field = FieldSchema(name="embedding", dtype=DataType.FLOAT_VECTOR, dim=128)
>>> schema = CollectionSchema(fields=[year_field, embedding_field])
>>> collection = Collection(name="test_collection", schema=schema)
>>> index = {"index_type": "IVF_FLAT", "params": {"nlist": 128}, "metric_type": "L2"}
>>> collection.create_index("embedding", index)
Status(code=0, message='')
>>> collection.indexes
[<pymilvus_orm.index.Index object at 0x7f4435587e20>]
>>> collection.index()
<pymilvus_orm.index.Index object at 0x7f44355a1460>
>>> collection.has_index()
True
drop_index(index_name='', **kwargs)

Drop index and its corresponding index files.

Parameters
  • index_name (str) -- The name of the partition to drop.

  • kwargs --

    • timeout (float) -- An optional duration of time in seconds to allow for the RPC. When timeout is set to None, client waits until server response or error occur.

Raises
  • CollectionNotExistException -- If collection doesn't exist.

  • BaseException -- If index has been created.

Example

>>> from pymilvus_orm.collection import Collection
>>> from pymilvus_orm.schema import FieldSchema, CollectionSchema
>>> from pymilvus_orm.types import DataType
>>> from pymilvus_orm import connections
>>> connections.create_connection(alias="default")
<milvus.client.stub.Milvus object at 0x7feaddc9cb80>
>>> year_field = FieldSchema("year", DataType.INT64, is_primary=False, description="year")
>>> embedding_field = FieldSchema(name="embedding", dtype=DataType.FLOAT_VECTOR, dim=128)
>>> schema = CollectionSchema(fields=[year_field, embedding_field])
>>> collection = Collection(name="test_collection", schema=schema)
>>> index = {"index_type": "IVF_FLAT", "params": {"nlist": 128}, "metric_type": "L2"}
>>> collection.create_index("embedding", index)
Status(code=0, message='')
>>> collection.has_index()
True
>>> collection.drop_index()
>>> collection.has_index()
False

Methods

API

Description

drop()

Drop the collection, as well as its corresponding index files.

load()

Load the collection from disk to memory.

release()

Release the collection from memory.

insert()

Insert data into collection.

search()

Return the number of entities.

partition()

Return the partition corresponding to name.

has_partition()

Checks if a specified partition exists.

drop_partition()

Drop the partition and its corresponding index files.

index()

Return the index corresponding to name.

create_index()

Create index on a specified column according to the index parameters.

has_index()

Checks whether a specified index exists.

drop_index()

Drop index and its corresponding index files.

APIs References

class pymilvus_orm.Collection(name, data=None, schema=None, **kwargs)

This is a class corresponding to collection in milvus.

drop(**kwargs)

Drop the collection, as well as its corresponding index files.

Parameters

kwargs --

  • timeout (float) -- An optional duration of time in seconds to allow for the RPC. When timeout is set to None, client waits until server response or error occur.

Raises

CollectionNotExistException -- If collection doesn't exist.

Example

>>> from pymilvus_orm.collection import Collection
>>> from pymilvus_orm.schema import FieldSchema, CollectionSchema
>>> from pymilvus_orm.types import DataType
>>> from pymilvus_orm import connections
>>> connections.create_connection(alias="default")
<milvus.client.stub.Milvus object at 0x7f9a190ca898>
>>> field = FieldSchema("int64", DataType.INT64, descrition="int64", is_primary=False)
>>> schema = CollectionSchema(fields=[field], description="drop collection")
>>> collection = Collection(name="test_collection", schema=schema)
>>> import pandas as pd
>>> int64_series = pd.Series(data=list(range(10, 20)), index=list(range(10)))
>>> data = pd.DataFrame(data={"int64" : int64_series})
>>> collection.insert(data=data)
>>> collection.num_entities
>>> collection.drop()
>>> from pymilvus_orm import utility
>>> utility.has_collection("test_collection")
False
load(field_names=None, index_names=None, partition_names=None, **kwargs)

Load the collection from disk to memory.

Parameters
  • field_names (list[str]) -- The specified fields to load.

  • index_names (list[str]) -- The specified indexes to load.

  • partition_names (list[str]) -- The specified partitions to load.

  • kwargs --

    • timeout (float) -- An optional duration of time in seconds to allow for the RPC. When timeout is set to None, client waits until server response or error occur.

Raises
  • CollectionNotExistException -- If collection doesn't exist.

  • ParamError -- If parameters are invalid.

  • BaseException -- If fields, index or partition doesn't exist.

Example

>>> from pymilvus_orm.collection import Collection
>>> from pymilvus_orm.schema import FieldSchema, CollectionSchema
>>> from pymilvus_orm import connections
>>> from pymilvus_orm.types import DataType
>>> field = FieldSchema("int64", DataType.INT64, is_primary=False, description="int64")
>>> schema = CollectionSchema([field], description="collection schema has a int64 field")
>>> connections.create_connection()
<milvus.client.stub.Milvus object at 0x7f8579002dc0>
>>> collection = Collection(name="test_collection", schema=schema)
>>> import pandas as pd
>>> int64_series = pd.Series(data=list(range(10, 20)), index=list(range(10)))
>>> data = pd.DataFrame(data={"int64" : int64_series})
>>> collection.insert(data)
>>> collection.load() # load collection to memory
>>> assert not collection.is_empty
>>> assert collection.num_entities == 10
release(**kwargs)

Release the collection from memory.

Parameters

kwargs --

  • timeout (float) -- An optional duration of time in seconds to allow for the RPC. When timeout is set to None, client waits until server response or error occur.

Raises
  • CollectionNotExistException -- If collection doesn't exist.

  • BaseException -- If collection hasn't been loaded.

Example

>>> from pymilvus_orm.collection import Collection
>>> from pymilvus_orm.schema import FieldSchema, CollectionSchema
>>> from pymilvus_orm import connections
>>> from pymilvus_orm.types import DataType
>>> field = FieldSchema("int64", DataType.INT64, is_primary=False, description="int64")
>>> schema = CollectionSchema([field], description="collection schema has a int64 field")
>>> connections.create_connection()
<milvus.client.stub.Milvus object at 0x7f8579002dc0>
>>> collection = Collection(name="test_collection", schema=schema)
>>> import pandas as pd
>>> int64_series = pd.Series(data=list(range(10, 20)), index=list(range(10)))
>>> data = pd.DataFrame(data={"int64" : int64_series})
>>> collection.insert(data)
>>> collection.load()   # load collection to memory
>>> assert not collection.is_empty
>>> assert collection.num_entities == 10
>>> collection.release()    # release the collection from memory
insert(data, partition_name=None, **kwargs)

Insert data into collection.

Parameters
  • data (list-like(list, tuple) object or pandas.DataFrame) -- The specified data to insert, the dimension of data needs to align with column number

  • partition_name (str) -- The partition name which the data will be inserted to, if partition name is not passed, then the data will be inserted to "_default" partition

  • kwargs --

    • timeout (float) -- An optional duration of time in seconds to allow for the RPC. When timeout is set to None, client waits until server response or error occur.

Raises
  • CollectionNotExistException -- If collection doesn't exist.

  • ParamError -- If parameters are invalid.

  • BaseException -- If partition doesn't exist.

Example

>>> from pymilvus_orm.collection import Collection
>>> from pymilvus_orm.schema import FieldSchema, CollectionSchema
>>> from pymilvus_orm import connections
>>> from pymilvus_orm.types import DataType
>>> connections.create_connection()
<milvus.client.stub.Milvus object at 0x7f8579002dc0>
>>> field = FieldSchema("int64", DataType.INT64, is_primary=False, description="int64")
>>> schema = CollectionSchema([field], description="collection schema has a int64 field")
>>> collection = Collection(name="test_collection", schema=schema)
>>> import random
>>> data = [[random.randint(1, 100) for _ in range(10)]]
>>> collection.insert(data)
>>> collection.load()
>>> assert not collection.is_empty
>>> assert collection.num_entities == 10
search(data, anns_field, param, limit, expression, partition_names=None, output_fields=None, timeout=None, **kwargs)

Vector similarity search with an optional boolean expression as filters.

Parameters
  • data (list[list[float]]) -- The vectors of search data, the length of data is number of query (nq), the dim of every vector in data must be equal to vector field's of collection.

  • anns_field (str) -- The vector field used to search of collection.

  • param (dict) -- The parameters of search, such as nprobe, etc.

  • limit (int) -- The max number of returned record, we also called this parameter as topk.

  • expression (str) -- The boolean expression used to filter attribute.

  • partition_names (list[str]) -- The names of partitions to search.

  • output_fields (list[str]) -- The fields to return in the search result, not supported now.

  • timeout (float) -- An optional duration of time in seconds to allow for the RPC. When timeout is set to None, client waits until server response or error occur.

  • kwargs --

    • _async (bool) -- Indicate if invoke asynchronously. When value is true, method returns a SearchResultFuture object; otherwise, method returns results from server directly.

    • _callback (function) -- The callback function which is invoked after server response successfully. It only takes effect when _async is set to True.

Returns

SearchResult: SearchResult is iterable and is a 2d-array-like class, the first dimension is the number of vectors to query (nq), the second dimension is the number of limit(topk).

Return type

SearchResult

Raises
  • RpcError -- If gRPC encounter an error.

  • ParamError -- If parameters are invalid.

  • BaseException -- If the return result from server is not ok.

Example

>>> from pymilvus_orm.collection import Collection
>>> from pymilvus_orm.schema import FieldSchema, CollectionSchema
>>> from pymilvus_orm import connections
>>> from pymilvus_orm.types import DataType
>>> connections.create_connection()
<milvus.client.stub.Milvus object at 0x7f8579002dc0>
>>> dim = 128
>>> year_field = FieldSchema("year", DataType.INT64, is_primary=False, description="year")
>>> embedding_field = FieldSchema(name="embedding", dtype=DataType.FLOAT_VECTOR, dim=dim)
>>> schema = CollectionSchema(fields=[year_field, embedding_field])
>>> collection = Collection(name="test_collection", schema=schema)
>>> import random
>>> nb = 3000
>>> nq = 10
>>> limit = 10
>>> years = [i for i in range(nb)]
>>> embeddings = [[random.random() for _ in range(dim)] for _ in range(nb)]
>>> collection.insert([years, embeddings])
>>> collection.load()
>>> search_params = {"metric_type": "L2", "params": {"nprobe": 10}}
>>> res = collection.search(embeddings[:10], "embedding", search_params, limit, "year > 20")
>>> assert len(res) == nq
>>> for hits in res:
>>>     assert len(hits) == limit
>>> hits = res[0]
>>> assert len(hits.ids) == limit
>>> top1 = hits[0]
>>> print(top1.id)
>>> print(top1.distance)
>>> print(top1.score)
partition(partition_name) → pymilvus_orm.partition.Partition

Return the partition corresponding to name. Return None if not existed.

Parameters

partition_name (str) -- The name of the partition to get.

Return Partition

Partition object corresponding to partition_name.

Raises
  • CollectionNotExistException -- If collection doesn't exist.

  • BaseException -- If partition doesn't exist.

Example

>>> from pymilvus_orm.collection import Collection
>>> from pymilvus_orm.schema import FieldSchema, CollectionSchema
>>> from pymilvus_orm.types import DataType
>>> from pymilvus_orm import connections
>>> connections.create_connection(alias="default")
<milvus.client.stub.Milvus object at 0x7f9a190ca898>
>>> field = FieldSchema("int64", DataType.INT64, descrition="int64", is_primary=False)
>>> schema = CollectionSchema(fields=[field], description="collection description")
>>> collection = Collection(name="test_collection", schema=schema, alias="default")
>>> collection.partition("partition")
>>> collection.partition("_default")
{"name": "_default", "description": "", "num_entities": 0}
has_partition(partition_name) → bool

Checks if a specified partition exists.

Parameters

partition_name (str) -- The name of the partition to check

Return bool

Whether a specified partition exists.

Raises

CollectionNotExistException -- If collection doesn't exist.

Example

>>> from pymilvus_orm.collection import Collection
>>> from pymilvus_orm.schema import FieldSchema, CollectionSchema
>>> from pymilvus_orm.types import DataType
>>> from pymilvus_orm import connections
>>> connections.create_connection(alias="default")
<milvus.client.stub.Milvus object at 0x7f9a190ca898>
>>> field = FieldSchema("int64", DataType.INT64, descrition="int64", is_primary=False)
>>> schema = CollectionSchema(fields=[field], description="collection description")
>>> collection = Collection(name="test_collection", schema=schema, alias="default")
>>> collection.create_partition(partition_name="partition", description="test partition")
{"name": "partition", "description": "", "num_entities": 0}
>>> collection.partition("partition")
{"name": "partition", "description": "", "num_entities": 0}
>>> collection.has_partition("partition")
True
>>> collection.has_partition("partition2")
False
drop_partition(partition_name, **kwargs)

Drop the partition and its corresponding index files.

Parameters
  • partition_name (str) -- The name of the partition to drop.

  • kwargs --

    • timeout (float) -- An optional duration of time in seconds to allow for the RPC. When timeout is set to None, client waits until server response or error occur.

Raises
  • CollectionNotExistException -- If collection doesn't exist.

  • BaseException -- If partition doesn't exist.

Example

>>> from pymilvus_orm.collection import Collection
>>> from pymilvus_orm.schema import FieldSchema, CollectionSchema
>>> from pymilvus_orm.types import DataType
>>> from pymilvus_orm import connections
>>> connections.create_connection(alias="default")
<milvus.client.stub.Milvus object at 0x7f9a190ca898>
>>> field = FieldSchema("int64", DataType.INT64, descrition="int64", is_primary=False)
>>> schema = CollectionSchema(fields=[field], description="collection description")
>>> collection = Collection(name="test_collection", schema=schema, alias="default")
>>> collection.create_partition(partition_name="partition", description="test partition")
{"name": "partition", "description": "", "num_entities": 0}
>>> collection.partition("partition")
{"name": "partition", "description": "", "num_entities": 0}
>>> collection.has_partition("partition")
True
>>> collection.drop_partition("partition")
>>> collection.has_partition("partition")
False
index(index_name='') → pymilvus_orm.index.Index

Return the index corresponding to name.

Parameters

index_name (str) -- The name of the index to create.

Return Index

Index object corresponding to index_name.

Raises
  • CollectionNotExistException -- If collection doesn't exist.

  • BaseException -- If index doesn't exist.

Example

>>> from pymilvus_orm.collection import Collection
>>> from pymilvus_orm.schema import FieldSchema, CollectionSchema
>>> from pymilvus_orm.types import DataType
>>> from pymilvus_orm import connections
>>> connections.create_connection(alias="default")
<milvus.client.stub.Milvus object at 0x7f9a190ca898>
>>> year_field = FieldSchema("year", DataType.INT64, is_primary=False, description="year")
>>> embedding_field = FieldSchema(name="embedding", dtype=DataType.FLOAT_VECTOR, dim=128)
>>> schema = CollectionSchema(fields=[year_field, embedding_field])
>>> collection = Collection(name="test_collection", schema=schema)
>>> index = {"index_type": "IVF_FLAT", "params": {"nlist": 128}, "metric_type": "L2"}
>>> collection.create_index("embedding", index)
Status(code=0, message='')
>>> collection.indexes
[<pymilvus_orm.index.Index object at 0x7f4435587e20>]
>>> collection.index()
<pymilvus_orm.index.Index object at 0x7f44355a1460>
create_index(field_name, index_params, index_name='', **kwargs) → pymilvus_orm.index.Index

Create index on a specified column according to the index parameters. Return Index Object.

Parameters
  • field_name (str) -- The name of the field to create an index for.

  • index_params (dict) -- Indexing parameters.

  • index_name (str) -- The name of the index to create.

Raises
  • CollectionNotExistException -- If collection doesn't exist.

  • ParamError -- If index parameters are invalid.

  • BaseException -- If field doesn't exist.

  • BaseException -- If index has been created.

Example

>>> from pymilvus_orm.collection import Collection
>>> from pymilvus_orm.schema import FieldSchema, CollectionSchema
>>> from pymilvus_orm.types import DataType
>>> from pymilvus_orm import connections
>>> connections.create_connection(alias="default")
<milvus.client.stub.Milvus object at 0x7f9a190ca898>
>>> year_field = FieldSchema("year", DataType.INT64, is_primary=False, description="year")
>>> embedding_field = FieldSchema(name="embedding", dtype=DataType.FLOAT_VECTOR, dim=128)
>>> schema = CollectionSchema(fields=[year_field, embedding_field])
>>> collection = Collection(name="test_collection", schema=schema)
>>> index = {"index_type": "IVF_FLAT", "params": {"nlist": 128}, "metric_type": "L2"}
>>> collection.create_index("embedding", index)
Status(code=0, message='')
>>> collection.indexes
[<pymilvus_orm.index.Index object at 0x7f4435587e20>]
>>> collection.index()
<pymilvus_orm.index.Index object at 0x7f44355a1460>
has_index(index_name='') → bool

Checks whether a specified index exists.

Parameters

index_name (str) -- The name of the index to check.

Return bool

If specified index exists.

Raises

CollectionNotExistException -- If collection doesn't exist.

Example

>>> from pymilvus_orm.collection import Collection
>>> from pymilvus_orm.schema import FieldSchema, CollectionSchema
>>> from pymilvus_orm.types import DataType
>>> from pymilvus_orm import connections
>>> connections.create_connection(alias="default")
<milvus.client.stub.Milvus object at 0x7f9a190ca898>
>>> year_field = FieldSchema("year", DataType.INT64, is_primary=False, description="year")
>>> embedding_field = FieldSchema(name="embedding", dtype=DataType.FLOAT_VECTOR, dim=128)
>>> schema = CollectionSchema(fields=[year_field, embedding_field])
>>> collection = Collection(name="test_collection", schema=schema)
>>> index = {"index_type": "IVF_FLAT", "params": {"nlist": 128}, "metric_type": "L2"}
>>> collection.create_index("embedding", index)
Status(code=0, message='')
>>> collection.indexes
[<pymilvus_orm.index.Index object at 0x7f4435587e20>]
>>> collection.index()
<pymilvus_orm.index.Index object at 0x7f44355a1460>
>>> collection.has_index()
True
drop_index(index_name='', **kwargs)

Drop index and its corresponding index files.

Parameters
  • index_name (str) -- The name of the partition to drop.

  • kwargs --

    • timeout (float) -- An optional duration of time in seconds to allow for the RPC. When timeout is set to None, client waits until server response or error occur.

Raises
  • CollectionNotExistException -- If collection doesn't exist.

  • BaseException -- If index has been created.

Example

>>> from pymilvus_orm.collection import Collection
>>> from pymilvus_orm.schema import FieldSchema, CollectionSchema
>>> from pymilvus_orm.types import DataType
>>> from pymilvus_orm import connections
>>> connections.create_connection(alias="default")
<milvus.client.stub.Milvus object at 0x7feaddc9cb80>
>>> year_field = FieldSchema("year", DataType.INT64, is_primary=False, description="year")
>>> embedding_field = FieldSchema(name="embedding", dtype=DataType.FLOAT_VECTOR, dim=128)
>>> schema = CollectionSchema(fields=[year_field, embedding_field])
>>> collection = Collection(name="test_collection", schema=schema)
>>> index = {"index_type": "IVF_FLAT", "params": {"nlist": 128}, "metric_type": "L2"}
>>> collection.create_index("embedding", index)
Status(code=0, message='')
>>> collection.has_index()
True
>>> collection.drop_index()
>>> collection.has_index()
False