milvus-logo
LFAI
< Docs

do_bulk_insert()

This operation bulk-inserts data from specified files.

Request Syntax

do_bulk_insert(
    collection_name: str,
    files: list,
    partition_name: str | None,
    timeout: float | None,
    using: str = "default",
    **kwargs,
)

PARAMETERS:

  • collection_name (str) -

    [REQUIRED]

    The name of the target collection of this operation.

  • files (list[str]) -

    [REQUIRED]

    A list of paths to the files that contain the source data.

    how can i prepare the source data files?

    • You can include a JSON file (.json) or a set of NumPy files (.npy) as the source data files.

    • A valid JSON file has a root key named rows, which is a list of dictionaries with each representing an entity that matches the schema of the target collection.

    If the target collection allows dynamic fields, include the dynamic fields and their values in each entity dictionary.

    • A valid set of NumPy files should be named after the fields in the schema of the target collection, and the data in them should match the corresponding field definitions.

    If the target collection allows dynamic fields, create an extra file named $meta.npy to include the dynamic fields and their values.

    For details on preparing the source data files, refer to Insert Entities from Files.

    • You have to upload the source data files to the bucket defined by minio.bucketname in your Milvus configuration before running this operation.

    Let's take a Milvus instance set up using Docker Compose as an example, and the bucket name is a-bucket.

    • If you upload the source data files to this bucket, you should include only the file names with extensions in the files list. For example, files=["id.npy", "vector.npy"] or files=["data.json"].

    • If you upload the source data files to a sub-directory in this bucket, you should include the file paths relative to the bucket. For example, if the sub-directory is data, the parameter settings should be files=["data/id.npy", "data/vector.py"] or files=["data.json"].

    • To find the name of the MinIO bucket your Milvus instance uses, simply log into the MinIO server and find out.

  • partition_name (str) -

    The name of a partition in the specified collection.

    Setting this makes Milvus bulk-insert the data into the specified partition.

    Setting this to the name of a partition that does not exist results in a MilvusException.

  • using (str) -

    The alias of the employed connection.

    The default value is default, indicating that this operation employs the default connection.

  • timeout (float | None)

    The timeout duration for this operation. Setting this to None indicates that this operation timeouts when any response arrives or any error occurs.

RETURN TYPE:

int

RETURNS: A bulk-insert task ID.

EXCEPTIONS:

  • MilvusException

    This exception will be raised when any error occurs during this operation.

Examples

from pymilvus import connections, utility

# Connect to localhost:19530
connections.connect()

# Bulk-insert data from a set of NumPy files already uploaded to the MioIO server
utility.do_bulk_insert(
    collection_name="test_collection",
    files=["data/id.npy", "data/vector.npy"],
)

# 446781855410073001

# Bulk-insert data from a JSON file already uploaded to the MioIO server
utility.do_bulk_insert(
    collection_name="test_collection",
    files=["data/data.json"],
) 

# 446781855410077319

Feedback

Was this page helpful?