milvus-logo
LFAI
< Docs
  • Python

RemoteBulkWriter

A RemoteBulkWriter instance writes your raw data in a format that Milvus understands into an AWS-S3-compatible bucket.

class pymilvus.RemoteBulkWriter

Constructor

Constructs a RemoteBulkWriter object with a set of parameters, such as schema, remote_path, connect_param etc.

notes

A RemoteBulkWriter object intends to rewrite your raw data in a format that Milvus understands into an AWS-S3-compatible bucket.

from pymilvus import CollectionSchema
from pymilvus.bulk_writer import RemoteBulkWriter, BulkFileType

writer = RemoteBulkWriter(
    schema=CollectionSchema(),
    remote_path="string",
    connect_param=RemoteBulkWriter.ConnectParam()
    chunk_size=512*1024*1024,
    file_type=BulkFileType.PARQUET
)

PARAMETERS:

  • schema (CollectionSchema) -

    [REQUIRED]

    The schema of a target collection to which the rewritten data is to be imported.

  • remote_path (str) -

    [REQUIRED]

    The path to the directory that is to hold the rewritten data.

  • connect_param (ConnectParam) -

    The parameters used to connect to a remote bucket.

  • chunk_size (int) -

    The maximum size of a file segment.

    While rewriting your raw data, Milvus splits your raw data into segments.

    The value defaults to 536,870,912 in bytes, which is 512 MB.

    how does bulkwriter segment my data?

    The way BulkWriter segments your data varies with the target file type.

    If the generated file exceeds the specified segment size, BulkWriter creates multiple files and names them in sequence numbers, each no larger than the segment size.

  • file_type (BulkFileType) -

    The type of the output file.

    The value defaults to BulkFileType.PARQUET.

    Possible options are BulkFileType.JSON, BulkFileType.PARQUET, BulkFileType.CSV.

  • config (dict)

    A dictionary specifying optional configurations for processing CSV files. This parameter is available only when file_type is set to BulkFileType.CSV. Example configuration:

    config={
        "sep": "\t",
        "nullkey": "NULL"
    }
    
    • sep (string)

      The delimiter of CSV file. The value must be a string of length 1, which defaults to ",". The following strings are not allowed: "\0", "\n", "\r", """.

    • nullkey (string)

      Special string representing null value. The value defaults to empty string: "".

RETURN TYPE:

RemoteBulkWriter

RETURNS:

A RemoteBulkWriter object.

EXCEPTIONS:

  • SchemaNotReadyException

    This exception will be raised when the provided schema is invalid.

Properties

  • data_path (pathlib.PosixPath) -

    The path to the output directory.

  • batch_files (str) -

    A list of the generated file names.

Classes

The following are the classes of the RemoteBulkWriter class:

  • ConnectParam

Methods

The following are the methods of the RemoteBulkWriter class:

Try Managed Milvus for Free

Zilliz Cloud is hassle-free, powered by Milvus and 10x faster.

Get Started
Feedback

Was this page helpful?