From Elasticsearch
This guide provides a comprehensive, step-by-step process for migrating data from Elasticsearch to Milvus 2.x. By following this guide, you will be able to efficiently transfer your data, leveraging Milvus 2.x advanced features and improved performance.
Prerequisites
- Software versions:
- Source Elasticsearch: 7.x or 8.x
- Target Milvus: 2.x
- For installation details, refer to Installing Elasticsearch and Install Milvus.
- Required tools:
- Milvus-migration tool. For installation details, refer to Install Migration Tool.
- Supported data types for migration: The fields to migrate from the source Elasticsearch index are of the following types - dense_vector, keyword, text, long, integer, double, float, boolean, object. Data types not listed here are currently not supported for migration. Refer to Field mapping reference for detailed information on data mappings between Milvus collections and Elasticsearch indexes.
- Elasticsearch index requirements:
- The source Elasticsearch index must contain a vector field of the
dense_vectortype. Migration cannot start without a vector field.
- The source Elasticsearch index must contain a vector field of the
Configure the migration file
Save the example migration config file as migration.yaml and modify the configs based on your actual conditions. You are free to put the config file in any local directory.
dumper: # configs for the migration job.
worker:
workMode: "elasticsearch" # operational mode of the migration job.
reader:
bufferSize: 2500 # buffer size to read from Elasticsearch in each batch. A value ranging from 2000 to 4000 is recommended.
meta: # meta configs for the source Elasticsearch index and target Milvus 2.x collection.
mode: "config" # specifies the source for meta configs. currently, onlly `config` is supported.
version: "8.9.1"
index: "qatest_index" # identifies the Elasticsearch index to migrate data from.
fields: # fields within the Elasticsearch index to be migrated.
- name: "my_vector" # name of the Elasticsearch field.
type: "dense_vector" # data type of the Elasticsearch field.
dims: 128 # dimension of the vector field. required only when `type` is `dense_vector`.
- name: "id"
pk: true # specifies if the field serves as a primary key.
type: "long"
- name: "num"
type: "integer"
- name: "double1"
type: "double"
- name: "text1"
maxLen: 1000 # max. length of data fields. required only for `keyword` and `text` data types.
type: "text"
- name: "bl1"
type: "boolean"
- name: "float1"
type: "float"
milvus: # configs specific to creating the collection in Milvus 2.x
collection: "Collection_01" # name of the Milvus collection. defaults to the Elasticsearch index name if not specified.
closeDynamicField: false # specifies whether to disable the dynamic field in the collection. defaults to `false`.
shardNum: 2 # number of shards to be created in the collection.
consistencyLevel: Strong # consistency level for Milvus collection.
source: # connection configs for the source Elasticsearch server
es:
urls:
- "http://10.15.1.***:9200" # address of the source Elasticsearch server.
username: "" # username for the Elasticsearch server.
password: "" # password for the Elasticsearch server.
target:
mode: "remote" # storage location for dumped files. valid values: `remote` and `local`.
remote: # configs for remote storage
outputDir: "migration/milvus/test" # output directory path in the cloud storage bucket.
cloud: "aws" # cloud storage service provider. Examples: `aws`, `gcp`, `azure`, etc.
region: "us-west-2" # region of the cloud storage; can be any value if using local Minio.
bucket: "zilliz-aws-us-****-*-********" # bucket name for storing data; must align with configs in milvus.yaml for Milvus 2.x.
useIAM: true # whether to use an IAM Role for connection.
checkBucket: false # checks if the specified bucket exists in the storage.
milvus2x: # connection configs for the target Milvus 2.x server
endpoint: "http://10.102.*.**:19530" # address of the target Milvus server.
username: "****" # username for the Milvus 2.x server.
password: "******" # password for the Milvus 2.x server.
The following table describes the parameters in the example config file. For a full list of configs, refer to Milvus Migration: Elasticsearch to Milvus 2.x.
dumperParameter Description dumper.worker.workModeThe operational mode of the migration job. Set to elasticsearchwhen migrating from Elasticsearch indexes.dumper.worker.reader.bufferSizeBuffer size to read from Elasticsearch in each batch. Unit: KB. metaParameter Description meta.modeSpecifies the source for meta configs. Currently, only configis supported.meta.indexIdentifies the Elasticsearch index to migrate data from. meta.fieldsFields within the Elasticsearch index to be migrated. meta.fields.nameName of the Elasticsearch field. meta.fields.maxLenMaximum length of the field. This parameter is required only when meta.fields.typeiskeywordortext.meta.fields.pkSpecifies if the field serves as the primary key. meta.fields.typeData type of the Elasticsearch field. Currently, the following data types in Elasticsearch are supported: dense_vector, keyword, text, long, integer, double, float, boolean, object. meta.fields.dimsDimension of the vector field. This parameter is required only when meta.fields.typeisdense_vector.meta.milvusConfigs specific to creating the collection in Milvus 2.x. meta.milvus.collectionName of the Milvus collection. Defaults to the Elasticsearch index name if not specified. meta.milvus.closeDynamicFieldSpecifies whether to disable the dynamic field in the collection. Defaults to false. For more information on dynamic fields, refer to Enable Dynamic Field.meta.milvus.shardNumNumber of shards to be created in the collection. For more information on shards, refer to Terminology. meta.milvus.consistencyLevelConsistency level for the collection in Milvus. For more information, refer to Consistency. sourceParameter Description source.esConnection configs for the source Elasticsearch server. source.es.urlsAddress of the source Elasticsearch server. source.es.usernameUsername for the Elasticsearch server. source.es.passwordPassword for the Elasticsearch server. targetParameter Description target.modeStorage location for dumped files. Valid values:
-local: Store dumped files on local disks.
-remote: Store dumped files on object storage.target.remote.outputDirOutput directory path in the cloud storage bucket. target.remote.cloudCloud storage service provider. Example values: aws,gcp,azure.target.remote.regionCloud storage region. It can be any value if you use local MinIO. target.remote.bucketBucket name for storing data. The value must be the same as the config in Milvus 2.x. For more information, refer to System Configurations. target.remote.useIAMWhether to use an IAM Role for connection. target.remote.checkBucketWhether to check if the specified bucket exists in object storage. target.milvus2xConnection configs for the target Milvus 2.x server. target.milvus2x.endpointAddress of the target Milvus server. target.milvus2x.usernameUsername for the Milvus 2.x server. This parameter is required if user authentication is enabled for your Milvus server. For more information, refer to Enable Authentication. target.milvus2x.passwordPassword for the Milvus 2.x server. This parameter is required if user authentication is enabled for your Milvus server. For more information, refer to Enable Authentication.
Start the migration task
Start the migration task with the following command. Replace {YourConfigFilePath} with the local directory where the config file migration.yaml resides.
./milvus-migration start --config=/{YourConfigFilePath}/migration.yaml
The following is an example of a successful migration log output:
[task/load_base_task.go:94] ["[LoadTasker] Dec Task Processing-------------->"] [Count=0] [fileName=testfiles/output/zwh/migration/test_mul_field4/data_1_1.json] [taskId=442665677354739304]
[task/load_base_task.go:76] ["[LoadTasker] Progress Task --------------->"] [fileName=testfiles/output/zwh/migration/test_mul_field4/data_1_1.json] [taskId=442665677354739304]
[dbclient/cus_field_milvus2x.go:86] ["[Milvus2x] begin to ShowCollectionRows"]
[loader/cus_milvus2x_loader.go:66] ["[Loader] Static: "] [collection=test_mul_field4_rename1] [beforeCount=50000] [afterCount=100000] [increase=50000]
[loader/cus_milvus2x_loader.go:66] ["[Loader] Static Total"] ["Total Collections"=1] [beforeTotalCount=50000] [afterTotalCount=100000] [totalIncrease=50000]
[migration/es_starter.go:25] ["[Starter] migration ES to Milvus finish!!!"] [Cost=80.009174459]
[starter/starter.go:106] ["[Starter] Migration Success!"] [Cost=80.00928425]
[cleaner/remote_cleaner.go:27] ["[Remote Cleaner] Begin to clean files"] [bucket=a-bucket] [rootPath=testfiles/output/zwh/migration]
[cmd/start.go:32] ["[Cleaner] clean file success!"]
Verify the result
Once the migration task is executed, you can make API calls or use Attu to view the number of entities migrated. For more information, refer to Attu and get_collection_stats().
Field mapping reference
Review the table below to understand how field types in Elasticsearch indexes are mapped to field types in Milvus collections.
For more information on supported data types in Milvus, refer to Supported data types.
| Elasticsearch Field Type | Milvus Field Type | Description |
|---|---|---|
| dense_vector | FloatVector | Vector dimensions remain unchanged during migration. |
| keyword | VarChar | Set Max Length (1 to 65,535). Strings exceeding the limit can trigger migration errors. |
| text | VarChar | Set Max Length (1 to 65,535). Strings exceeding the limit can trigger migration errors. |
| long | Int64 | - |
| integer | Int32 | - |
| double | Double | - |
| float | Float | - |
| boolean | Bool | - |
| object | JSON | - |