Python
- About
- DataImport
  - BulkFileType
  - BulkImport
  - LocalBulkWriter
  - RemoteBulkWriter
  - VolumeBulkWriter
    VolumeBulkWriter
    append_row
    commit
- EmbeddingModels
- MilvusClient
- ORM
- Rerankers
- Volume

Home
Docs
API Reference
Python
DataImport
VolumeBulkWriter
VolumeBulkWriter

VolumeBulkWriter

A VolumeBulkWriter instance rewrites your raw data locally in a format that Milvus understands, and then uploads the resulting files to a remote volume in Zilliz Cloud.

class pymilvus.bulk_writer.VolumeBulkWriter(LocalBulkWriter)

Constructor

VolumeBulkWriter(
    schema: CollectionSchema,
    remote_path: str,
    cloud_endpoint: str,
    api_key: str,
    volume_name: str,
    chunk_size: int = 1024 * MB,
    file_type: BulkFileType = BulkFileType.PARQUET,
    config: Optional[dict] = None,
    **kwargs,
)

PARAMETERS:

schema (CollectionSchema) -

[REQUIRED]

The schema of a target collection to which the rewritten data is to be imported.
remote_path (str) -

[REQUIRED]

The path to the directory in the remote volume that is to hold the rewritten data.
cloud_endpoint (str) -

[REQUIRED]

The endpoint URL of the Zilliz Cloud instance.
api_key (str) -

[REQUIRED]

The API key used to authenticate with the Zilliz Cloud instance.
volume_name (str) -

[REQUIRED]

The name of the remote volume in Zilliz Cloud to which the files are uploaded.
chunk_size (int) -

The maximum size of a file segment.

While rewriting your raw data, Milvus segments the data into batches and stores each batch in a separate file.

The value defaults to 1,073,741,824 in bytes, which is 1 GB.
file_type (BulkFileType) -

The file type of the output files.

The value defaults to BulkFileType.PARQUET.
config (dict) -

Optional configuration parameters for the bulk writer.

Notes

A VolumeBulkWriter is a context manager and can be used in a with statement. When the context exits, the local working directory is cleaned up.

Properties

The following are the properties of the VolumeBulkWriter class.

data_path (str)

Returns the remote path where the uploaded files are stored.
batch_files (List[List[str]])

Returns the list of uploaded file batches. Each inner list contains the remote paths of files uploaded in a single commit.

Examples

from pymilvus.bulk_writer.volume_bulk_writer import VolumeBulkWriter
from pymilvus import CollectionSchema, FieldSchema, DataType

# Define collection schema
fields = [
    FieldSchema(name="id", dtype=DataType.INT64, is_primary=True, auto_id=False),
    FieldSchema(name="vector", dtype=DataType.FLOAT_VECTOR, dim=128),
]
schema = CollectionSchema(fields, "example_collection")

# Create VolumeBulkWriter
with VolumeBulkWriter(
    schema=schema,
    remote_path="/data/bulk_import",
    cloud_endpoint="https://your-cloud-endpoint.zillizcloud.com",
    api_key="your-api-key",
    volume_name="my-volume",
    chunk_size=1024 * 1024 * 1024,
    file_type=BulkFileType.PARQUET,
) as writer:
    # Append rows
    for i in range(1000):
        writer.append_row({
            "id": i,
            "vector": [0.1] * 128,
        })

    # Commit and upload
    writer.commit()

    print(writer.data_path)
    print(writer.batch_files)

Try Managed Milvus for Free

Zilliz Cloud is hassle-free, powered by Milvus and 10x faster.

Get Started

Feedback

Was this page helpful?