Utility¶
Methods¶
API |
Description |
---|---|
Query the progress of loading. |
|
wait_for_loading_complete(collection_name, [partition_names, timeout, using]) |
Wait until loading is complete. |
Query the progress of index building. |
|
wait_for_index_building_complete(collection_name, [timeout, using]) |
Wait util index building is complete. |
Check if a specified collection exists. |
|
Check if a specified partition exists. |
|
List all collections. |
|
Drop a collection by name. |
|
calc_distance(vectors_left, vectors_right, params, [timeout, using]) |
Calculate distance between two vector arrays. |
Get segments information from query nodes. |
|
Generate hybrid timestamp with a known one. |
|
Generate hybrid timestamp with Unix time. |
|
Generate hybrid timestamp with datatime. |
|
Convert hybrid timestamp to UNIX Epoch time. |
|
Convert hybrid timestamp to datetime. |
|
Specify alias for a collection. |
|
Change the alias of a collection to another collection. |
|
Delete the alias. |
APIs References¶
-
pymilvus.utility.
loading_progress
(collection_name, partition_names=None, using='default')¶ Show loading progress of sealed segments in percentage.
- Parameters
collection_name (str) -- The name of collection is loading
partition_names (str list) -- The names of partitions is loading
- Return dict
{'load_progress': 100%}
- Raises
PartitionNotExistException -- If partition doesn't exist.
- Example
>>> from pymilvus import Collection, FieldSchema, CollectionSchema, DataType, connections, utility >>> import pandas as pd >>> import random >>> connections.connect() >>> fields = [ ... FieldSchema("film_id", DataType.INT64, is_primary=True), ... FieldSchema("films", DataType.FLOAT_VECTOR, dim=8), ... ] >>> schema = CollectionSchema(fields) >>> collection = Collection("test_loading_progress", schema) >>> data = pd.DataFrame({ ... "film_id" : pd.Series(data=list(range(10, 20)), index=list(range(10))), ... "films": [[random.random() for _ in range(8)] for _ in range (10)], ... }) >>> collection.insert(data) >>> collection.create_index("films", {"index_type": "IVF_FLAT", "params": {"nlist": 8}, "metric_type": "L2"}) >>> collection.load(_async=True) >>> utility.loading_progress("test_loading_progress") {'loading_progress': '100%'}
-
pymilvus.utility.
wait_for_loading_complete
(collection_name, partition_names=None, timeout=None, using='default')¶ Block until loading is done or Raise Exception after timeout.
- Parameters
collection_name (str) -- The name of collection to wait for loading complete
partition_names (str list) -- The names of partitions to wait for loading complete
timeout (int) -- The timeout for this method, unit: second
- Raises
CollectionNotExistException -- If collection doesn't exist.
PartitionNotExistException -- If partition doesn't exist.
- Example
>>> from pymilvus import Collection, FieldSchema, CollectionSchema, DataType, connections, utility >>> connections.connect(alias="default") >>> _DIM = 128 >>> field_int64 = FieldSchema("int64", DataType.INT64, description="int64", is_primary=True) >>> field_float_vector = FieldSchema("float_vector", DataType.FLOAT_VECTOR, description="float_vector", is_primary=False, dim=_DIM) >>> schema = CollectionSchema(fields=[field_int64, field_float_vector], description="get collection entities num") >>> collection = Collection(name="test_collection", schema=schema) >>> import pandas as pd >>> int64_series = pd.Series(data=list(range(10, 20)), index=list(range(10)))i >>> float_vector_series = [[random.random() for _ in range _DIM] for _ in range (10)] >>> data = pd.DataFrame({"int64" : int64_series, "float_vector": float_vector_series}) >>> collection.insert(data) >>> collection.load() # load collection to memory >>> utility.wait_for_loading_complete("test_collection")
-
pymilvus.utility.
index_building_progress
(collection_name, index_name='', using='default')¶ Show # indexed entities vs. # total entities.
- Parameters
collection_name (str) -- The name of collection is building index
index_name (str) -- The name of index is building. Default index_name is to be used if index_name is not specific.
- Return dict
Index building progress is a dict contains num of indexed entities and num of total entities. {'total_rows':total_rows,'indexed_rows':indexed_rows}
- Raises
CollectionNotExistException -- If collection doesn't exist.
IndexNotExistException -- If index doesn't exist.
- Example
>>> from pymilvus import Collection, FieldSchema, CollectionSchema, DataType, connections, utility >>> connections.connect(alias="default") >>> _DIM = 128 >>> field_int64 = FieldSchema("int64", DataType.INT64, description="int64", is_primary=True) >>> field_float_vector = FieldSchema("float_vector", DataType.FLOAT_VECTOR, description="float_vector", is_primary=False, dim=_DIM) >>> schema = CollectionSchema(fields=[field_int64, field_float_vector], description="test") >>> collection = Collection(name="test_collection", schema=schema) >>> import random >>> import numpy as np >>> import pandas as pd >>> vectors = [[random.random() for _ in range(_DIM)] for _ in range(5000)] >>> int64_series = pd.Series(data=list(range(5000, 10000)), index=list(range(5000))) >>> vectors = [[random.random() for _ in range(_DIM)] for _ in range (5000)] >>> data = pd.DataFrame({"int64" : int64_series, "float_vector": vectors}) >>> collection.insert(data) >>> collection.load() # load collection to memory >>> index_param = { >>> "metric_type": "L2", >>> "index_type": "IVF_FLAT", >>> "params": {"nlist": 1024} >>> } >>> collection.create_index("float_vector", index_param) >>> utility.index_building_progress("test_collection", "") >>> utility.loading_progress("test_collection")
-
pymilvus.utility.
wait_for_index_building_complete
(collection_name, index_name='', timeout=None, using='default')¶ Block until building is done or Raise Exception after timeout.
- Parameters
collection_name (str) -- The name of collection to wait
index_name (str) -- The name of index to wait
timeout (int) -- The timeout for this method, unit: second
- Raises
CollectionNotExistException -- If collection doesn't exist.
IndexNotExistException -- If index doesn't exist.
- Example
>>> from pymilvus import Collection, FieldSchema, CollectionSchema, DataType, connections, utility >>> connections.connect(alias="default") >>> _DIM = 128 >>> field_int64 = FieldSchema("int64", DataType.INT64, description="int64", is_primary=True) >>> field_float_vector = FieldSchema("float_vector", DataType.FLOAT_VECTOR, description="float_vector", is_primary=False, dim=_DIM) >>> schema = CollectionSchema(fields=[field_int64, field_float_vector], description="test") >>> collection = Collection(name="test_collection", schema=schema) >>> import random >>> import numpy as np >>> import pandas as pd >>> vectors = [[random.random() for _ in range(_DIM)] for _ in range(5000)] >>> int64_series = pd.Series(data=list(range(5000, 10000)), index=list(range(5000))) >>> vectors = [[random.random() for _ in range(_DIM)] for _ in range (5000)] >>> data = pd.DataFrame({"int64" : int64_series, "float_vector": vectors}) >>> collection.insert(data) >>> collection.load() # load collection to memory >>> index_param = { >>> "metric_type": "L2", >>> "index_type": "IVF_FLAT", >>> "params": {"nlist": 1024} >>> } >>> collection.create_index("float_vector", index_param) >>> utility.index_building_progress("test_collection", "") >>> utility.loading_progress("test_collection")
-
pymilvus.utility.
has_collection
(collection_name, using='default')¶ Checks whether a specified collection exists.
- Parameters
collection_name (str) -- The name of collection to check.
- Return bool
Whether the collection exists.
- Example
>>> from pymilvus import Collection, FieldSchema, CollectionSchema, DataType, connections, utility >>> connections.connect(alias="default") >>> _DIM = 128 >>> field_int64 = FieldSchema("int64", DataType.INT64, description="int64", is_primary=True) >>> field_float_vector = FieldSchema("float_vector", DataType.FLOAT_VECTOR, description="float_vector", is_primary=False, dim=_DIM) >>> schema = CollectionSchema(fields=[field_int64, field_float_vector], description="test") >>> collection = Collection(name="test_collection", schema=schema) >>> utility.has_collection("test_collection")
-
pymilvus.utility.
has_partition
(collection_name, partition_name, using='default')¶ Checks if a specified partition exists in a collection.
- Parameters
collection_name (str) -- The collection name of partition to check
partition_name (str) -- The name of partition to check.
- Return bool
Whether the partition exist.
- Example
>>> from pymilvus import Collection, FieldSchema, CollectionSchema, DataType, connections, utility >>> connections.connect(alias="default") >>> _DIM = 128 >>> field_int64 = FieldSchema("int64", DataType.INT64, description="int64", is_primary=True) >>> field_float_vector = FieldSchema("float_vector", DataType.FLOAT_VECTOR, description="float_vector", is_primary=False, dim=_DIM) >>> schema = CollectionSchema(fields=[field_int64, field_float_vector], description="test") >>> collection = Collection(name="test_collection", schema=schema) >>> utility.has_partition("_default")
-
pymilvus.utility.
list_collections
(timeout=None, using='default') → list¶ Returns a list of all collection names.
- Parameters
timeout (float) -- An optional duration of time in seconds to allow for the RPC. When timeout is set to None, client waits until server response or error occur.
- Return list[str]
List of collection names, return when operation is successful
- Example
>>> from pymilvus import Collection, FieldSchema, CollectionSchema, DataType, connections, utility >>> connections.connect(alias="default") >>> _DIM = 128 >>> field_int64 = FieldSchema("int64", DataType.INT64, description="int64", is_primary=True) >>> field_float_vector = FieldSchema("float_vector", DataType.FLOAT_VECTOR, description="float_vector", is_primary=False, dim=_DIM) >>> schema = CollectionSchema(fields=[field_int64, field_float_vector], description="test") >>> collection = Collection(name="test_collection", schema=schema) >>> utility.list_collections()
-
pymilvus.utility.
drop_collection
(collection_name, timeout=None, using='default')¶ Drop a collection by name
- Parameters
collection_name (str) -- A string representing the collection to be deleted
timeout (float) -- An optional duration of time in seconds to allow for the RPC. When timeout is set to None, client waits until server response or error occur.
- Example
>>> from pymilvus import Collection, FieldSchema, CollectionSchema, DataType, connections, utility >>> connections.connect(alias="default") >>> schema = CollectionSchema(fields=[ ... FieldSchema("int64", DataType.INT64, description="int64", is_primary=True), ... FieldSchema("float_vector", DataType.FLOAT_VECTOR, is_primary=False, dim=128), ... ]) >>> collection = Collection(name="drop_collection_test", schema=schema) >>> utility.has_collection("drop_collection_test") >>> True >>> utility.drop_collection("drop_collection_test") >>> utility.has_collection("drop_collection_test") >>> False
-
pymilvus.utility.
calc_distance
(vectors_left, vectors_right, params=None, timeout=None, using='default')¶ Calculate distance between two vector arrays.
- Parameters
vectors_left (dict) -- The vectors on the left of operator.
{"ids": [1, 2, 3, .... n], "collection": "c_1", "partition": "p_1", "field": "v_1"} or {"float_vectors": [[1.0, 2.0], [3.0, 4.0], ... [9.0, 10.0]]} or {"bin_vectors": [b'', b'N', ... b'Ê']}
- Parameters
vectors_right (dict) -- The vectors on the right of operator.
{"ids": [1, 2, 3, .... n], "collection": "col_1", "partition": "p_1", "field": "v_1"} or {"float_vectors": [[1.0, 2.0], [3.0, 4.0], ... [9.0, 10.0]]} or {"bin_vectors": [b'', b'N', ... b'Ê']}
- Parameters
params --
- key-value pair parameters
Key: "metric_type"/"metric" Value: "L2"/"IP"/"HAMMING"/"TANIMOTO", default is "L2", Key: "sqrt" Value: true or false, default is false Only for "L2" distance Key: "dim" Value: set this value if dimension is not a multiple of 8,
otherwise the dimension will be calculted by list length, only for "HAMMING" and "TANIMOTO"
- type params
dict Examples of supported metric_type:
{"metric_type": "L2", "sqrt": true} {"metric_type": "IP"} {"metric_type": "HAMMING", "dim": 17} {"metric_type": "TANIMOTO"}
Note: metric type are case insensitive
- Returns
2-d array distances
- Return type
list[list[int]] for "HAMMING" or list[list[float]] for others Assume the vectors_left: L_1, L_2, L_3 Assume the vectors_right: R_a, R_b Distance between L_n and R_m we called "D_n_m" The returned distances are arranged like this:
- [[D_1_a, D_1_b],
[D_2_a, D_2_b], [D_3_a, D_3_b]]
Note: if some vectors doesn't exist in collection, the returned distance is "-1.0"
- Example
>>> vectors_l = [[random.random() for _ in range(64)] for _ in range(5)] >>> vectors_r = [[random.random() for _ in range(64)] for _ in range(10)] >>> op_l = {"float_vectors": vectors_l} >>> op_r = {"float_vectors": vectors_r} >>> params = {"metric": "L2", "sqrt": True} >>> results = utility.calc_distance(vectors_left=op_l, vectors_right=op_r, params=params)
-
pymilvus.utility.
get_query_segment_info
(collection_name, timeout=None, using='default')¶ Notifies Proxy to return segments information from query nodes.
- Parameters
collection_name -- A string representing the collection to get segments info.
timeout (float) -- An optional duration of time in seconds to allow for the RPC. When timeout is set to None, client waits until server response or error occur.
- Returns
QuerySegmentInfo: QuerySegmentInfo is the growing segments's information in query cluster.
- Return type
QuerySegmentInfo
- Example
>>> from pymilvus import Collection, FieldSchema, CollectionSchema, DataType, connections, utility >>> connections.connect(alias="default") >>> _DIM = 128 >>> field_int64 = FieldSchema("int64", DataType.INT64, description="int64", is_primary=True) >>> field_float_vector = FieldSchema("float_vector", DataType.FLOAT_VECTOR, description="float_vector", is_primary=False, dim=_DIM) >>> schema = CollectionSchema(fields=[field_int64, field_float_vector], description="get collection entities num") >>> collection = Collection(name="test_get_segment_info", schema=schema) >>> import pandas as pd >>> int64_series = pd.Series(data=list(range(10, 20)), index=list(range(10)))i >>> float_vector_series = [[random.random() for _ in range _DIM] for _ in range (10)] >>> data = pd.DataFrame({"int64" : int64_series, "float_vector": float_vector_series}) >>> collection.insert(data) >>> collection.load() # load collection to memory >>> res = utility.get_query_segment_info("test_get_segment_info")
-
pymilvus.utility.
mkts_from_hybridts
(hybridts, milliseconds=0.0, delta=None)¶ Generate a hybrid timestamp based on an existing hybrid timestamp, timedelta and incremental time internval.
- Parameters
hybridts (int) -- The original hybrid timestamp used to generate a new hybrid timestamp. Non-negative interger range from 0 to 18446744073709551615.
milliseconds (float) -- Incremental time interval. The unit of time is milliseconds.
delta (datetime.timedelta) -- A duration expressing the difference between two date, time, or datetime instances to microsecond resolution.
- Return int
Hybrid timetamp is a non-negative interger range from 0 to 18446744073709551615.
- Example
>>> from pymilvus import Collection, FieldSchema, CollectionSchema, DataType, connections, utility >>> connections.connect(alias="default") >>> _DIM = 128 >>> field_int64 = FieldSchema("int64", DataType.INT64, description="int64", is_primary=True) >>> field_float_vector = FieldSchema("float_vector", DataType.FLOAT_VECTOR, description="float_vector", is_primary=False, dim=_DIM) >>> schema = CollectionSchema(fields=[field_int64, field_vector], description="get collection entities num") >>> collection = Collection(name="test_collection", schema=schema) >>> import pandas as pd >>> int64_series = pd.Series(data=list(range(10, 20)), index=list(range(10)))i >>> float_vector_series = [[random.random() for _ in range _DIM] for _ in range (10)] >>> data = pd.DataFrame({"int64" : int64_series, "float_vector": float_vector_series}) >>> m = collection.insert(data) >>> ts_new = utility.mkts_from_hybridts(m.timestamp, milliseconds=1000.0)
-
pymilvus.utility.
mkts_from_unixtime
(epoch, milliseconds=0.0, delta=None)¶ Generate a hybrid timestamp based on Unix Epoch time, timedelta and incremental time internval.
- Parameters
epoch (float) -- The known Unix Epoch time used to generate a hybrid timestamp. The Unix Epoch time is the number of seconds that have elapsed since January 1, 1970 (midnight UTC/GMT).
milliseconds (float) -- Incremental time interval. The unit of time is milliseconds.
delta (datetime.timedelta) -- A duration expressing the difference between two date, time, or datetime instances to microsecond resolution.
- Return int
Hybrid timetamp is a non-negative interger range from 0 to 18446744073709551615.
- Example
>>> import time >>> from pymilvus import utility >>> epoch_t = time.time() >>> ts = utility.mkts_from_unixtime(epoch_t, milliseconds=1000.0)
-
pymilvus.utility.
mkts_from_datetime
(d_time, milliseconds=0.0, delta=None)¶ Generate a hybrid timestamp based on datetime, timedelta and incremental time internval.
- Parameters
d_time (datetime.datetime.) -- The known datetime used to generate a hybrid timestamp.
milliseconds (float) -- Incremental time interval. The unit of time is milliseconds.
delta (datetime.timedelta) -- A duration expressing the difference between two date, time, or datetime instances to microsecond resolution.
- Return int
Hybrid timetamp is a non-negative interger range from 0 to 18446744073709551615.
- Example
>>> import datetime >>> from pymilvus import utility >>> d = datetime.datetime.now() >>> ts = utility.mkts_from_datetime(d, milliseconds=1000.0)
-
pymilvus.utility.
hybridts_to_unixtime
(hybridts)¶ Convert a hybrid timestamp to UNIX Epoch time ignoring the logic part.
- Parameters
hybridts (int) -- The known hybrid timestamp to convert to UNIX Epoch time. Non-negative interger range from 0 to 18446744073709551615.
- Return float
The Unix Epoch time is the number of seconds that have elapsed since January 1, 1970 (midnight UTC/GMT).
- Example
>>> import time >>> from pymilvus import utility >>> epoch1 = time.time() >>> ts = utility.mkts_from_unixtime(epoch1) >>> epoch2 = utility.hybridts_to_unixtime(ts) >>> assert epoch1 == epoch2
-
pymilvus.utility.
hybridts_to_datetime
(hybridts, tz=None)¶ Convert a hybrid timestamp to the datetime according to timezone.
- Parameters
hybridts (int) -- The known hybrid timestamp to convert to datetime. Non-negative interger range from 0 to 18446744073709551615.
tz (datetime.timezone) -- Timezone defined by a fixed offset from UTC. If argument tz is None or not specified, the hybridts is converted to the platform’s local date and time.
- Return datetime
The datetime object.
- Raises
Exception -- If parameter tz is not of type datetime.timezone.
- Example
>>> import time >>> from pymilvus import utility >>> epoch_t = time.time() >>> ts = utility.mkts_from_unixtime(epoch_t) >>> d = utility.hybridts_to_datetime(ts)
-
pymilvus.utility.
create_alias
(collection_name: str, alias: str, timeout=None, using='default')¶ Specify alias for a collection. Alias cannot be duplicated, you can't assign the same alias to different collections. But you can specify multiple aliases for a collection, for example:
- before create_alias("collection_1", "bob"):
aliases of collection_1 are ["tom"]
- after create_alias("collection_1", "bob"):
aliases of collection_1 are ["tom", "bob"]
- Parameters
alias (str.) -- The alias of the collection.
timeout (float) -- An optional duration of time in seconds to allow for the RPC. When timeout is set to None, client waits until server response or error occur
- Raises
CollectionNotExistException -- If the collection does not exist.
BaseException -- If the alias failed to create.
- Example
>>> from pymilvus import connections, Collection, FieldSchema, CollectionSchema, DataType, utility >>> connections.connect() >>> schema = CollectionSchema([ ... FieldSchema("film_id", DataType.INT64, is_primary=True), ... FieldSchema("films", dtype=DataType.FLOAT_VECTOR, dim=2) ... ]) >>> collection = Collection("test_collection_create_alias", schema) >>> utility.create_alias(collection.name, "alias") Status(code=0, message='')
-
pymilvus.utility.
alter_alias
(collection_name: str, alias: str, timeout=None, using='default')¶ Change the alias of a collection to another collection. Raise error if the alias doesn't exist. Alias cannot be duplicated, you can't assign same alias to different collections. This api can change alias owner collection, for example:
- before alter_alias("collection_2", "bob"):
collection_1's aliases = ["bob"] collection_2's aliases = []
- after alter_alias("collection_2", "bob"):
collection_1's aliases = [] collection_2's aliases = ["bob"]
- Parameters
collection_name (str.) -- The collection name to witch this alias is goting to alter.
alias (str) -- The alias of the collection.
timeout (float) -- An optional duration of time in seconds to allow for the RPC. When timeout is set to None, client waits until server response or error occur
- Raises
CollectionNotExistException -- If the collection does not exist.
BaseException -- If the alias failed to alter.
- Example
>>> from pymilvus import connections, Collection, FieldSchema, CollectionSchema, DataType, utility >>> connections.connect() >>> schema = CollectionSchema([ ... FieldSchema("film_id", DataType.INT64, is_primary=True), ... FieldSchema("films", dtype=DataType.FLOAT_VECTOR, dim=2) ... ]) >>> collection = Collection("test_collection_alter_alias", schema) >>> utility.alter_alias(collection.name, "alias") if the alias exists, return Status(code=0, message='') otherwise return Status(code=1, message='alias does not exist')
-
pymilvus.utility.
drop_alias
(alias: str, timeout=None, using='default')¶ Delete the alias. No need to provide collection name because an alias can only be assigned to one collection and the server knows which collection it belongs. For example:
- before drop_alias("bob"):
aliases of collection_1 are ["tom", "bob"]
- after drop_alias("bob"):
aliases of collection_1 are = ["tom"]
- Parameters
alias (str) -- The alias to drop.
timeout (float) -- An optional duration of time in seconds to allow for the RPC. When timeout is set to None, client waits until server response or error occur
- Raises
CollectionNotExistException -- If the collection does not exist.
BaseException -- If the alias doesn't exist.
- Example
>>> from pymilvus import connections, Collection, FieldSchema, CollectionSchema, DataType, utility >>> connections.connect() >>> schema = CollectionSchema([ ... FieldSchema("film_id", DataType.INT64, is_primary=True), ... FieldSchema("films", dtype=DataType.FLOAT_VECTOR, dim=2) ... ]) >>> collection = Collection("test_collection_drop_alias", schema) >>> utility.create_alias(collection.name, "alias") >>> utility.drop_alias("alias") Status(code=0, message='')