milvus-logo
LFAI
< Docs
  • Java
    • v1
      • BulkWriter

LocalBulkWriter

A LocalBulkWriter instance rewrites your raw data locally in a format that Milvus understands.

LocalBulkWriter(LocalBulkWriterParam bulkWriterParam)

Methods of LocalBulkWriter:

Method

Description

Parameters

appendRow(JsonObject rowData)

Append a row into buffer. Once the buffer size exceeds a threshold, the writer will persist the buffer to data file.

rowData: A gson.JsonObject to store the data of a row.
For each field:
- If dataType is Bool/Int8/Int16/Int32/Int64/Float/Double/Varchar, use JsonObject.addProperty(key, value) to input;
- If dataType is FloatVector, use JsonObject.add(key, gson.toJsonTree(List[Float]) to input;
- If dataType is BinaryVector/Float16Vector/BFloat16Vector, use JsonObject.add(key, gson.toJsonTree(byte[])) to input;
- If dataType is SparseFloatVector, use JsonObject.add(key, gson.toJsonTree(SortedMap[Long, Float])) to input;
- If dataType is Array, use JsonObject.add(key, gson.toJsonTree(List of Boolean/Integer/Short/Long/Float/Double/String)) to input;
- If dataType is JSON, use JsonObject.add(key, JsonElement) to input;

commit(boolean async)

Force persist data files and complete the writer.

async: Set to true to wait until all data files are persisted.

getBatchFiles()

Returns a List<List<String>gt; of the persisted data files. Each List<String> is a batch files that can be input as a job for the bulkinsert interface.

N/A

LocalBulkWriterParam

Use the LocalBulkWriterParam.Builder to construct a LocalBulkWriterParam object.

import io.milvus.bulkwriter.LocalBulkWriterParam;
LocalBulkWriterParam.Builder builder = LocalBulkWriterParam.newBuilder();

Methods of LocalBulkWriterParam.Builder:

Method

Description

Parameters

withCollectionSchema(CollectionSchemaParam collectionSchema)

Sets the collection schema. See the CollectionSchemaParam description in the Collection.createCollection() section.

collectionSchema: collection schema

withLocalPath(tring localPath)

Sets the local path to output the data files.

localPath: A local path.

withChunkSize(int chunkSize)

Sets the maximum size of a data chunk.
While rewriting your raw data, This tool splits your raw data into chunks.
The value defaults to 128 MB.

chunkSize: the maximum size of a data chunk.

withFileType(BulkFileType fileType)

The type of the output file. Currently, only PARQUET is available.

fileType: The output file type.

build()

Constructs a LocalBulkWriterParam object

N/A

Example

import io.milvus.bulkwriter.*;
import io.milvus.bulkwriter.common.clientenum.BulkFileType;
import io.milvus.param.collection.CollectionSchemaParam;

CollectionSchemaParam collectionSchema = CollectionSchemaParam.newBuilder()
        .addFieldType(FieldType.newBuilder()
                .withName("id")
                .withDataType(DataType.Int64)
                .withPrimaryKey(true)
                .withAutoID(false)
                .build())
        .addFieldType(FieldType.newBuilder()
                .withName("vector")
                .withDataType(DataType.FloatVector)
                .withDimension(DIM)
                .build())
        .build();
        
LocalBulkWriterParam bulkWriterParam = LocalBulkWriterParam.newBuilder()
        .withCollectionSchema(collectionSchema)
        .withLocalPath("/tmp/bulk_writer")
        .withFileType(fileType)
        .withChunkSize(512 * 1024 * 1024)
        .build();

try (LocalBulkWriter localBulkWriter = new LocalBulkWriter(bulkWriterParam)) {
    Gson gson = new Gson();
    for (int i = 0; i < 100000; i++) {
        JsonObject row = new JsonObject();
        row.addProperty("id", i);
        row.add("vector", gson.toJsonTree(GeneratorUtils.genFloatVector(DIM)));

        localBulkWriter.appendRow(row);
    }

    localBulkWriter.commit(false);
    List<List<String>> batchFiles = localBulkWriter.getBatchFiles();
    System.out.printf("Local writer done! output local files: %s%n", batchFiles);
} catch (Exception e) {
    System.out.println("Local writer catch exception: " + e);
    throw e;
}

Try Managed Milvus for Free

Zilliz Cloud is hassle-free, powered by Milvus and 10x faster.

Get Started
Feedback

Was this page helpful?