milvus-logo
Star
0
Forks
0
快速开始

Utility

Methods

API

Description

loading_progress(collection_name, [partition_names,using])

Query the progress of loading.

wait_for_loading_complete(collection_name, [partition_names, timeout, using])

Wait until loading is complete.

index_building_progress(collection_name, [using])

Query the progress of index building.

wait_for_index_building_complete(collection_name, [timeout, using])

Wait util index building is complete.

has_collection(collection_name, [using])

Check if a specified collection exists.

has_partition(collection_name, partition_name, [using])

Check if a specified partition exists.

list_collections([timeout, using])

List all collections.

drop_collections(collection_name, [timeout, using])

Drop a collection by name.

calc_distance(vectors_left, vectors_right, params, [timeout, using])

Calculate distance between two vector arrays.

get_query_segment_info([timeout, using])

Get segments information from query nodes.

mkts_from_hybridts(ts, [milliseconds, delta])

Generate hybrid timestamp with a known one.

mkts_from_unixtime(timestamp, [milliseconds, delta])

Generate hybrid timestamp with Unix time.

mkts_from_datetime(d_time, [milliseconds, delta])

Generate hybrid timestamp with datatime.

hybridts_to_unixtime(hybridts)

Convert hybrid timestamp to UNIX Epoch time.

hybridts_to_datetime(hybridts, [tz])

Convert hybrid timestamp to datetime.

create_alias(collection_name, alias, [timeout, using])

Specify alias for a collection.

alter_alias(collection_name, alias, [timeout, using])

Change the alias of a collection to another collection.

drop_alias(alias, [timeout, using])

Delete the alias.

APIs References

pymilvus.utility.loading_progress(collection_name, partition_names=None, using='default')

Show loading progress of sealed segments in percentage.

Parameters
  • collection_name (str) -- The name of collection is loading

  • partition_names (str list) -- The names of partitions is loading

Return dict

{'load_progress': 100%}

Raises

PartitionNotExistException -- If partition doesn't exist.

Example
>>> from pymilvus import Collection, FieldSchema, CollectionSchema, DataType, connections, utility
>>> import pandas as pd
>>> import random
>>> connections.connect()
>>> fields = [
...     FieldSchema("film_id", DataType.INT64, is_primary=True),
...     FieldSchema("films", DataType.FLOAT_VECTOR, dim=8),
... ]
>>> schema = CollectionSchema(fields)
>>> collection = Collection("test_loading_progress", schema)
>>> data = pd.DataFrame({
...     "film_id" : pd.Series(data=list(range(10, 20)), index=list(range(10))),
...     "films": [[random.random() for _ in range(8)] for _ in range (10)],
... })
>>> collection.insert(data)
>>> collection.create_index("films", {"index_type": "IVF_FLAT", "params": {"nlist": 8}, "metric_type": "L2"})
>>> collection.load(_async=True)
>>> utility.loading_progress("test_loading_progress")
{'loading_progress': '100%'}
pymilvus.utility.wait_for_loading_complete(collection_name, partition_names=None, timeout=None, using='default')

Block until loading is done or Raise Exception after timeout.

Parameters
  • collection_name (str) -- The name of collection to wait for loading complete

  • partition_names (str list) -- The names of partitions to wait for loading complete

  • timeout (int) -- The timeout for this method, unit: second

Raises
  • CollectionNotExistException -- If collection doesn't exist.

  • PartitionNotExistException -- If partition doesn't exist.

Example
>>> from pymilvus import Collection, FieldSchema, CollectionSchema, DataType, connections, utility
>>> connections.connect(alias="default")
>>> _DIM = 128
>>> field_int64 = FieldSchema("int64", DataType.INT64, description="int64", is_primary=True)
>>> field_float_vector = FieldSchema("float_vector", DataType.FLOAT_VECTOR, description="float_vector", is_primary=False, dim=_DIM)
>>> schema = CollectionSchema(fields=[field_int64, field_float_vector], description="get collection entities num")
>>> collection = Collection(name="test_collection", schema=schema)
>>> import pandas as pd
>>> int64_series = pd.Series(data=list(range(10, 20)), index=list(range(10)))i
>>> float_vector_series = [[random.random() for _ in range _DIM] for _ in range (10)]
>>> data = pd.DataFrame({"int64" : int64_series, "float_vector": float_vector_series})
>>> collection.insert(data)
>>> collection.load() # load collection to memory
>>> utility.wait_for_loading_complete("test_collection")
pymilvus.utility.index_building_progress(collection_name, index_name='', using='default')

Show # indexed entities vs. # total entities.

Parameters
  • collection_name (str) -- The name of collection is building index

  • index_name (str) -- The name of index is building. Default index_name is to be used if index_name is not specific.

Return dict

Index building progress is a dict contains num of indexed entities and num of total entities. {'total_rows':total_rows,'indexed_rows':indexed_rows}

Raises
  • CollectionNotExistException -- If collection doesn't exist.

  • IndexNotExistException -- If index doesn't exist.

Example
>>> from pymilvus import Collection, FieldSchema, CollectionSchema, DataType, connections, utility
>>> connections.connect(alias="default")
>>> _DIM = 128
>>> field_int64 = FieldSchema("int64", DataType.INT64, description="int64", is_primary=True)
>>> field_float_vector = FieldSchema("float_vector", DataType.FLOAT_VECTOR, description="float_vector", is_primary=False, dim=_DIM)
>>> schema = CollectionSchema(fields=[field_int64, field_float_vector], description="test")
>>> collection = Collection(name="test_collection", schema=schema)
>>> import random
>>> import numpy as np
>>> import pandas as pd
>>> vectors = [[random.random() for _ in range(_DIM)] for _ in range(5000)]
>>> int64_series = pd.Series(data=list(range(5000, 10000)), index=list(range(5000)))
>>> vectors = [[random.random() for _ in range(_DIM)] for _ in range (5000)]
>>> data = pd.DataFrame({"int64" : int64_series, "float_vector": vectors})
>>> collection.insert(data)
>>> collection.load() # load collection to memory
>>> index_param = {
>>>    "metric_type": "L2",
>>>    "index_type": "IVF_FLAT",
>>>    "params": {"nlist": 1024}
>>> }
>>> collection.create_index("float_vector", index_param)
>>> utility.index_building_progress("test_collection", "")
>>> utility.loading_progress("test_collection")
pymilvus.utility.wait_for_index_building_complete(collection_name, index_name='', timeout=None, using='default')

Block until building is done or Raise Exception after timeout.

Parameters
  • collection_name (str) -- The name of collection to wait

  • index_name (str) -- The name of index to wait

  • timeout (int) -- The timeout for this method, unit: second

Raises
  • CollectionNotExistException -- If collection doesn't exist.

  • IndexNotExistException -- If index doesn't exist.

Example
>>> from pymilvus import Collection, FieldSchema, CollectionSchema, DataType, connections, utility
>>> connections.connect(alias="default")
>>> _DIM = 128
>>> field_int64 = FieldSchema("int64", DataType.INT64, description="int64", is_primary=True)
>>> field_float_vector = FieldSchema("float_vector", DataType.FLOAT_VECTOR, description="float_vector", is_primary=False, dim=_DIM)
>>> schema = CollectionSchema(fields=[field_int64, field_float_vector], description="test")
>>> collection = Collection(name="test_collection", schema=schema)
>>> import random
>>> import numpy as np
>>> import pandas as pd
>>> vectors = [[random.random() for _ in range(_DIM)] for _ in range(5000)]
>>> int64_series = pd.Series(data=list(range(5000, 10000)), index=list(range(5000)))
>>> vectors = [[random.random() for _ in range(_DIM)] for _ in range (5000)]
>>> data = pd.DataFrame({"int64" : int64_series, "float_vector": vectors})
>>> collection.insert(data)
>>> collection.load() # load collection to memory
>>> index_param = {
>>>    "metric_type": "L2",
>>>    "index_type": "IVF_FLAT",
>>>    "params": {"nlist": 1024}
>>> }
>>> collection.create_index("float_vector", index_param)
>>> utility.index_building_progress("test_collection", "")
>>> utility.loading_progress("test_collection")
pymilvus.utility.has_collection(collection_name, using='default')

Checks whether a specified collection exists.

Parameters

collection_name (str) -- The name of collection to check.

Return bool

Whether the collection exists.

Example
>>> from pymilvus import Collection, FieldSchema, CollectionSchema, DataType, connections, utility
>>> connections.connect(alias="default")
>>> _DIM = 128
>>> field_int64 = FieldSchema("int64", DataType.INT64, description="int64", is_primary=True)
>>> field_float_vector = FieldSchema("float_vector", DataType.FLOAT_VECTOR, description="float_vector", is_primary=False, dim=_DIM)
>>> schema = CollectionSchema(fields=[field_int64, field_float_vector], description="test")
>>> collection = Collection(name="test_collection", schema=schema)
>>> utility.has_collection("test_collection")
pymilvus.utility.has_partition(collection_name, partition_name, using='default')

Checks if a specified partition exists in a collection.

Parameters
  • collection_name (str) -- The collection name of partition to check

  • partition_name (str) -- The name of partition to check.

Return bool

Whether the partition exist.

Example
>>> from pymilvus import Collection, FieldSchema, CollectionSchema, DataType, connections, utility
>>> connections.connect(alias="default")
>>> _DIM = 128
>>> field_int64 = FieldSchema("int64", DataType.INT64, description="int64", is_primary=True)
>>> field_float_vector = FieldSchema("float_vector", DataType.FLOAT_VECTOR, description="float_vector", is_primary=False, dim=_DIM)
>>> schema = CollectionSchema(fields=[field_int64, field_float_vector], description="test")
>>> collection = Collection(name="test_collection", schema=schema)
>>> utility.has_partition("_default")
pymilvus.utility.list_collections(timeout=None, using='default') → list

Returns a list of all collection names.

Parameters

timeout (float) -- An optional duration of time in seconds to allow for the RPC. When timeout is set to None, client waits until server response or error occur.

Return list[str]

List of collection names, return when operation is successful

Example
>>> from pymilvus import Collection, FieldSchema, CollectionSchema, DataType, connections, utility
>>> connections.connect(alias="default")
>>> _DIM = 128
>>> field_int64 = FieldSchema("int64", DataType.INT64, description="int64", is_primary=True)
>>> field_float_vector = FieldSchema("float_vector", DataType.FLOAT_VECTOR, description="float_vector", is_primary=False, dim=_DIM)
>>> schema = CollectionSchema(fields=[field_int64, field_float_vector], description="test")
>>> collection = Collection(name="test_collection", schema=schema)
>>> utility.list_collections()
pymilvus.utility.drop_collection(collection_name, timeout=None, using='default')

Drop a collection by name

Parameters
  • collection_name (str) -- A string representing the collection to be deleted

  • timeout (float) -- An optional duration of time in seconds to allow for the RPC. When timeout is set to None, client waits until server response or error occur.

Example
>>> from pymilvus import Collection, FieldSchema, CollectionSchema, DataType, connections, utility
>>> connections.connect(alias="default")
>>> schema = CollectionSchema(fields=[
...     FieldSchema("int64", DataType.INT64, description="int64", is_primary=True),
...     FieldSchema("float_vector", DataType.FLOAT_VECTOR, is_primary=False, dim=128),
... ])
>>> collection = Collection(name="drop_collection_test", schema=schema)
>>> utility.has_collection("drop_collection_test")
>>> True
>>> utility.drop_collection("drop_collection_test")
>>> utility.has_collection("drop_collection_test")
>>> False
pymilvus.utility.calc_distance(vectors_left, vectors_right, params=None, timeout=None, using='default')

Calculate distance between two vector arrays.

Parameters

vectors_left (dict) -- The vectors on the left of operator.

{"ids": [1, 2, 3, .... n], "collection": "c_1", "partition": "p_1", "field": "v_1"} or {"float_vectors": [[1.0, 2.0], [3.0, 4.0], ... [9.0, 10.0]]} or {"bin_vectors": [b'”', b'N', ... b'Ê']}

Parameters

vectors_right (dict) -- The vectors on the right of operator.

{"ids": [1, 2, 3, .... n], "collection": "col_1", "partition": "p_1", "field": "v_1"} or {"float_vectors": [[1.0, 2.0], [3.0, 4.0], ... [9.0, 10.0]]} or {"bin_vectors": [b'”', b'N', ... b'Ê']}

Parameters

params --

key-value pair parameters

Key: "metric_type"/"metric" Value: "L2"/"IP"/"HAMMING"/"TANIMOTO", default is "L2", Key: "sqrt" Value: true or false, default is false Only for "L2" distance Key: "dim" Value: set this value if dimension is not a multiple of 8,

otherwise the dimension will be calculted by list length, only for "HAMMING" and "TANIMOTO"

type params

dict Examples of supported metric_type:

{"metric_type": "L2", "sqrt": true} {"metric_type": "IP"} {"metric_type": "HAMMING", "dim": 17} {"metric_type": "TANIMOTO"}

Note: metric type are case insensitive

Returns

2-d array distances

Return type

list[list[int]] for "HAMMING" or list[list[float]] for others Assume the vectors_left: L_1, L_2, L_3 Assume the vectors_right: R_a, R_b Distance between L_n and R_m we called "D_n_m" The returned distances are arranged like this:

[[D_1_a, D_1_b],

[D_2_a, D_2_b], [D_3_a, D_3_b]]

Note: if some vectors doesn't exist in collection, the returned distance is "-1.0"

Example
>>> vectors_l = [[random.random() for _ in range(64)] for _ in range(5)]
>>> vectors_r = [[random.random() for _ in range(64)] for _ in range(10)]
>>> op_l = {"float_vectors": vectors_l}
>>> op_r = {"float_vectors": vectors_r}
>>> params = {"metric": "L2", "sqrt": True}
>>> results = utility.calc_distance(vectors_left=op_l, vectors_right=op_r, params=params)
pymilvus.utility.get_query_segment_info(collection_name, timeout=None, using='default')

Notifies Proxy to return segments information from query nodes.

Parameters
  • collection_name -- A string representing the collection to get segments info.

  • timeout (float) -- An optional duration of time in seconds to allow for the RPC. When timeout is set to None, client waits until server response or error occur.

Returns

QuerySegmentInfo: QuerySegmentInfo is the growing segments's information in query cluster.

Return type

QuerySegmentInfo

Example
>>> from pymilvus import Collection, FieldSchema, CollectionSchema, DataType, connections, utility
>>> connections.connect(alias="default")
>>> _DIM = 128
>>> field_int64 = FieldSchema("int64", DataType.INT64, description="int64", is_primary=True)
>>> field_float_vector = FieldSchema("float_vector", DataType.FLOAT_VECTOR, description="float_vector", is_primary=False, dim=_DIM)
>>> schema = CollectionSchema(fields=[field_int64, field_float_vector], description="get collection entities num")
>>> collection = Collection(name="test_get_segment_info", schema=schema)
>>> import pandas as pd
>>> int64_series = pd.Series(data=list(range(10, 20)), index=list(range(10)))i
>>> float_vector_series = [[random.random() for _ in range _DIM] for _ in range (10)]
>>> data = pd.DataFrame({"int64" : int64_series, "float_vector": float_vector_series})
>>> collection.insert(data)
>>> collection.load() # load collection to memory
>>> res = utility.get_query_segment_info("test_get_segment_info")
pymilvus.utility.mkts_from_hybridts(hybridts, milliseconds=0.0, delta=None)

Generate a hybrid timestamp based on an existing hybrid timestamp, timedelta and incremental time internval.

Parameters
  • hybridts (int) -- The original hybrid timestamp used to generate a new hybrid timestamp. Non-negative interger range from 0 to 18446744073709551615.

  • milliseconds (float) -- Incremental time interval. The unit of time is milliseconds.

  • delta (datetime.timedelta) -- A duration expressing the difference between two date, time, or datetime instances to microsecond resolution.

Return int

Hybrid timetamp is a non-negative interger range from 0 to 18446744073709551615.

Example
>>> from pymilvus import Collection, FieldSchema, CollectionSchema, DataType, connections, utility
>>> connections.connect(alias="default")
>>> _DIM = 128
>>> field_int64 = FieldSchema("int64", DataType.INT64, description="int64", is_primary=True)
>>> field_float_vector = FieldSchema("float_vector", DataType.FLOAT_VECTOR, description="float_vector", is_primary=False, dim=_DIM)
>>> schema = CollectionSchema(fields=[field_int64, field_vector], description="get collection entities num")
>>> collection = Collection(name="test_collection", schema=schema)
>>> import pandas as pd
>>> int64_series = pd.Series(data=list(range(10, 20)), index=list(range(10)))i
>>> float_vector_series = [[random.random() for _ in range _DIM] for _ in range (10)]
>>> data = pd.DataFrame({"int64" : int64_series, "float_vector": float_vector_series})
>>> m = collection.insert(data)
>>> ts_new = utility.mkts_from_hybridts(m.timestamp, milliseconds=1000.0)
pymilvus.utility.mkts_from_unixtime(epoch, milliseconds=0.0, delta=None)

Generate a hybrid timestamp based on Unix Epoch time, timedelta and incremental time internval.

Parameters
  • epoch (float) -- The known Unix Epoch time used to generate a hybrid timestamp. The Unix Epoch time is the number of seconds that have elapsed since January 1, 1970 (midnight UTC/GMT).

  • milliseconds (float) -- Incremental time interval. The unit of time is milliseconds.

  • delta (datetime.timedelta) -- A duration expressing the difference between two date, time, or datetime instances to microsecond resolution.

Return int

Hybrid timetamp is a non-negative interger range from 0 to 18446744073709551615.

Example
>>> import time
>>> from pymilvus import utility
>>> epoch_t = time.time()
>>> ts = utility.mkts_from_unixtime(epoch_t, milliseconds=1000.0)
pymilvus.utility.mkts_from_datetime(d_time, milliseconds=0.0, delta=None)

Generate a hybrid timestamp based on datetime, timedelta and incremental time internval.

Parameters
  • d_time (datetime.datetime.) -- The known datetime used to generate a hybrid timestamp.

  • milliseconds (float) -- Incremental time interval. The unit of time is milliseconds.

  • delta (datetime.timedelta) -- A duration expressing the difference between two date, time, or datetime instances to microsecond resolution.

Return int

Hybrid timetamp is a non-negative interger range from 0 to 18446744073709551615.

Example
>>> import datetime
>>> from pymilvus import utility
>>> d = datetime.datetime.now()
>>> ts = utility.mkts_from_datetime(d, milliseconds=1000.0)
pymilvus.utility.hybridts_to_unixtime(hybridts)

Convert a hybrid timestamp to UNIX Epoch time ignoring the logic part.

Parameters

hybridts (int) -- The known hybrid timestamp to convert to UNIX Epoch time. Non-negative interger range from 0 to 18446744073709551615.

Return float

The Unix Epoch time is the number of seconds that have elapsed since January 1, 1970 (midnight UTC/GMT).

Example
>>> import time
>>> from pymilvus import utility
>>> epoch1 = time.time()
>>> ts = utility.mkts_from_unixtime(epoch1)
>>> epoch2 = utility.hybridts_to_unixtime(ts)
>>> assert epoch1 == epoch2
pymilvus.utility.hybridts_to_datetime(hybridts, tz=None)

Convert a hybrid timestamp to the datetime according to timezone.

Parameters
  • hybridts (int) -- The known hybrid timestamp to convert to datetime. Non-negative interger range from 0 to 18446744073709551615.

  • tz (datetime.timezone) -- Timezone defined by a fixed offset from UTC. If argument tz is None or not specified, the hybridts is converted to the platform’s local date and time.

Return datetime

The datetime object.

Raises

Exception -- If parameter tz is not of type datetime.timezone.

Example
>>> import time
>>> from pymilvus import utility
>>> epoch_t = time.time()
>>> ts = utility.mkts_from_unixtime(epoch_t)
>>> d = utility.hybridts_to_datetime(ts)
pymilvus.utility.create_alias(collection_name: str, alias: str, timeout=None, using='default')

Specify alias for a collection. Alias cannot be duplicated, you can't assign the same alias to different collections. But you can specify multiple aliases for a collection, for example:

before create_alias("collection_1", "bob"):

aliases of collection_1 are ["tom"]

after create_alias("collection_1", "bob"):

aliases of collection_1 are ["tom", "bob"]

Parameters
  • alias (str.) -- The alias of the collection.

  • timeout (float) -- An optional duration of time in seconds to allow for the RPC. When timeout is set to None, client waits until server response or error occur

Raises
  • CollectionNotExistException -- If the collection does not exist.

  • BaseException -- If the alias failed to create.

Example
>>> from pymilvus import connections, Collection, FieldSchema, CollectionSchema, DataType, utility
>>> connections.connect()
>>> schema = CollectionSchema([
...     FieldSchema("film_id", DataType.INT64, is_primary=True),
...     FieldSchema("films", dtype=DataType.FLOAT_VECTOR, dim=2)
... ])
>>> collection = Collection("test_collection_create_alias", schema)
>>> utility.create_alias(collection.name, "alias")
Status(code=0, message='')
pymilvus.utility.alter_alias(collection_name: str, alias: str, timeout=None, using='default')

Change the alias of a collection to another collection. Raise error if the alias doesn't exist. Alias cannot be duplicated, you can't assign same alias to different collections. This api can change alias owner collection, for example:

before alter_alias("collection_2", "bob"):

collection_1's aliases = ["bob"] collection_2's aliases = []

after alter_alias("collection_2", "bob"):

collection_1's aliases = [] collection_2's aliases = ["bob"]

Parameters
  • collection_name (str.) -- The collection name to witch this alias is goting to alter.

  • alias (str) -- The alias of the collection.

  • timeout (float) -- An optional duration of time in seconds to allow for the RPC. When timeout is set to None, client waits until server response or error occur

Raises
  • CollectionNotExistException -- If the collection does not exist.

  • BaseException -- If the alias failed to alter.

Example
>>> from pymilvus import connections, Collection, FieldSchema, CollectionSchema, DataType, utility
>>> connections.connect()
>>> schema = CollectionSchema([
...     FieldSchema("film_id", DataType.INT64, is_primary=True),
...     FieldSchema("films", dtype=DataType.FLOAT_VECTOR, dim=2)
... ])
>>> collection = Collection("test_collection_alter_alias", schema)
>>> utility.alter_alias(collection.name, "alias")
if the alias exists, return Status(code=0, message='')
otherwise return Status(code=1, message='alias does not exist')
pymilvus.utility.drop_alias(alias: str, timeout=None, using='default')

Delete the alias. No need to provide collection name because an alias can only be assigned to one collection and the server knows which collection it belongs. For example:

before drop_alias("bob"):

aliases of collection_1 are ["tom", "bob"]

after drop_alias("bob"):

aliases of collection_1 are = ["tom"]

Parameters
  • alias (str) -- The alias to drop.

  • timeout (float) -- An optional duration of time in seconds to allow for the RPC. When timeout is set to None, client waits until server response or error occur

Raises
  • CollectionNotExistException -- If the collection does not exist.

  • BaseException -- If the alias doesn't exist.

Example
>>> from pymilvus import connections, Collection, FieldSchema, CollectionSchema, DataType, utility
>>> connections.connect()
>>> schema = CollectionSchema([
...     FieldSchema("film_id", DataType.INT64, is_primary=True),
...     FieldSchema("films", dtype=DataType.FLOAT_VECTOR, dim=2)
... ])
>>> collection = Collection("test_collection_drop_alias", schema)
>>> utility.create_alias(collection.name, "alias")
>>> utility.drop_alias("alias")
Status(code=0, message='')