milvus-logo

Insert Entities from Files

Milvus 2.2 now supports inserting a batch of entities from a file. Compared to the insert() method, this feature reduces network transmission across the Milvus client, proxy, Pulsar, and data nodes. You can now import a batch of entities in a file into a collection with just one line of code.

This topic describes how to insert multiple entities in a batch from a JSON file.

Prepare a JSON file

Organize the data you want to insert in a row-based JSON file. You can name the file as you wish, but the root key in the file must be "rows".

In the file, each entity corresponds to a dictionary. The key of the dictionary is the primary field, and the value of the dictionary contains the rest fields. The entities in the file must match the collection schema.

For binary vectors, use uint8 array. Each uint8 value represents 8 dimensions, and the value must be between [0, 255]. For example, [1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1] is a 16-dimensional binary vector and should be written as [128, 7] in a JSON file.

The file size should be no greater than 1 GB.

The following is an example of a row-based JSON file.

{
  "rows":[
    {"book_id": 101, "word_count": 13, "book_intro": [1.1, 1.2]},
    {"book_id": 102, "word_count": 25, "book_intro": [2.1, 2.2]},
    {"book_id": 103, "word_count": 7, "book_intro": [3.1, 3.2]},
    {"book_id": 104, "word_count": 12, "book_intro": [4.1, 4.2]},
    {"book_id": 105, "word_count": 34, "book_intro": [5.1, 5.2]}
  ]
}

Insert entities from files

1. Upload data files

You can use either MinIO or local hard disk for storage in Milvus.

Using local hard disk for storage is only available in Milvus Standalone.
  • To use MinIO for storage, upload data files to the bucket defined by minio.bucketName in the milvus.yml configuration file .

  • For local storage, copy the data files into a directory of local disk.

2. Insert entities

from pymilvus import utility
task_id = utility.do_bulk_insert(
    collection_name="book",
    partition_name="2022",
    files=["test.json"]
)
Parameter Description
collection_name The name of the collection to insert entities into.
files The file to insert. The value should be a file path relative to storage root path or a MinIO bucket. Currently, only one JSON file path is allowed.
partition_name (optional) The name of the partition to insert entities into.

List tasks

Check task state

Since the utility.do_bulk_insert() method is asynchronous, you need to check if a file import task is completed.

task = utility.get_bulk_insert_state(task_id=task_id)
print("Task state:", task.state_name()) 
print("Imported files:", task.files) 
print("Collection name:", task.collection_name) 
print("Partition name:", task.partition_name)
print("Start time:", task.create_time_str)
print("Imported row count:", task.row_count)
print("Entities ID array generated by this task:", task.ids())

if task.state == BulkInsertState.ImportFailed:
    print("Failed reason:", task.failed_reason)

The following table lists the state of a file import task.

State Code Description
Pending 0 The task is pending.
Failed 1 The task fails. Use task.failed_reason to understand why the task fails.
Started 2 The task is assigned to a data node and will be executed soon.
Persisted 5 New data segments are generated and persisted.
Completed 6 New data segments are indexes if a collection index is specified. Otherwise, the task state changes from Persisted to Completed directly.
Failed and cleaned 7 The task fails and all temporary data generated by this task are cleaned up.

List all tasks

tasks = utility.list_bulk_insert_tasks(collection_name="book", limit=10)
for task in tasks:
    print(task)
Parameter Description
collection_name (optional) Specify the target collection name to list all tasks on this collection. Leave the value empty if you want to list all tasks recorded by Milvus root coords.
limit (optional) Specify this parameter to limit the number of returned tasks.

See System Configurations for more information about import task configurations.

Limits

Feature Maximum limit
Max size of task pending list 32
Max size of a data file 1GB

What's next

Learn more basic operations of Milvus:

Is this page helpful?
On this page