关于 Milvus
开始
概念
用户指南
模型
Milvus 迁移
管理指南
工具
集成
教程
常见问题
API Reference

从 Elasticsearch

本指南提供了将数据从 Elasticsearch 迁移到 Milvus 2.x 的全面、循序渐进的过程。按照本指南，你将能够有效地传输数据，利用 Milvus 2.x 的高级功能和改进的性能。

前提条件

软件版本：
- 源 Elasticsearch：7.x 或 8.x
- 目标 Milvus：2.x
- 有关安装详情，请参阅安装 Elasticsearch和安装 Milvus。
所需工具
- Milvus-migration工具。有关安装详情，请参阅安装迁移工具。
支持迁移的数据类型：要从源 Elasticsearch 索引中迁移的字段属于以下类型 -dense_vector、keyword、text、long、integer、double、float、boolean、object。此处未列出的数据类型目前不支持迁移。有关 Milvus 集合与 Elasticsearch索引之间数据映射的详细信息，请参阅字段映射参考。
Elasticsearch 索引要求：
- 源 Elasticsearch 索引必须包含dense_vector 类型的向量字段。如果没有向量字段，迁移将无法启动。

配置迁移文件

将示例迁移配置文件保存为migration.yaml ，然后根据实际情况修改配置。您可以将配置文件放在任何本地目录下。

dumper: # configs for the migration job.
  worker:
    workMode: "elasticsearch" # operational mode of the migration job.
    reader:
      bufferSize: 2500 # buffer size to read from Elasticsearch in each batch. A value ranging from 2000 to 4000 is recommended.
meta: # meta configs for the source Elasticsearch index and target Milvus 2.x collection.
  mode: "config" # specifies the source for meta configs. currently, onlly `config` is supported.
  version: "8.9.1"
  index: "qatest_index" # identifies the Elasticsearch index to migrate data from.
  fields: # fields within the Elasticsearch index to be migrated.
  - name: "my_vector" # name of the Elasticsearch field.
    type: "dense_vector" # data type of the Elasticsearch field.
    dims: 128 # dimension of the vector field. required only when `type` is `dense_vector`.
  - name: "id"
    pk: true # specifies if the field serves as a primary key.
    type: "long"
  - name: "num"
    type: "integer"
  - name: "double1"
    type: "double"
  - name: "text1"
    maxLen: 1000 # max. length of data fields. required only for `keyword` and `text` data types.
    type: "text"
  - name: "bl1"
    type: "boolean"
  - name: "float1"
    type: "float"
  milvus: # configs specific to creating the collection in Milvus 2.x
    collection: "Collection_01" # name of the Milvus collection. defaults to the Elasticsearch index name if not specified.
    closeDynamicField: false # specifies whether to disable the dynamic field in the collection. defaults to `false`.
    shardNum: 2 # number of shards to be created in the collection.
    consistencyLevel: Strong # consistency level for Milvus collection.
source: # connection configs for the source Elasticsearch server
  es:
    urls:
    - "http://10.15.1.***:9200" # address of the source Elasticsearch server.
    username: "" # username for the Elasticsearch server.
    password: "" # password for the Elasticsearch server.
target:
  mode: "remote" # storage location for dumped files. valid values: `remote` and `local`.
  remote: # configs for remote storage
    outputDir: "migration/milvus/test" # output directory path in the cloud storage bucket.
    cloud: "aws" # cloud storage service provider. Examples: `aws`, `gcp`, `azure`, etc.
    region: "us-west-2" # region of the cloud storage; can be any value if using local Minio.
    bucket: "zilliz-aws-us-****-*-********" # bucket name for storing data; must align with configs in milvus.yaml for Milvus 2.x.
    useIAM: true # whether to use an IAM Role for connection.
    checkBucket: false # checks if the specified bucket exists in the storage.
  milvus2x: # connection configs for the target Milvus 2.x server
    endpoint: "http://10.102.*.**:19530" # address of the target Milvus server.
    username: "****" # username for the Milvus 2.x server.
    password: "******" # password for the Milvus 2.x server.

下表描述了示例配置文件中的参数。有关配置的完整列表，请参阅《Milvus 迁移》：Elasticsearch 到 Milvus 2.x。

dumper

参数	参数
`dumper.worker.workMode`	迁移任务的运行模式。从 Elasticsearch 索引迁移时设置为`elasticsearch` 。
`dumper.worker.reader.bufferSize`	每批从 Elasticsearch 中读取的缓冲区大小。单位：KB：KB。

meta

参数	说明
`meta.mode`	指定元配置的来源。目前仅支持`config` 。
`meta.index`	确定要迁移数据的 Elasticsearch 索引。
`meta.fields`	要迁移的 Elasticsearch 索引中的字段。
`meta.fields.name`	Elasticsearch 字段的名称。
`meta.fields.maxLen`	字段的最大长度。只有当`meta.fields.type` 为`keyword` 或`text` 时才需要此参数。
`meta.fields.pk`	指定字段是否作为主键。
`meta.fields.type`	Elasticsearch 字段的数据类型。目前，Elasticsearch 支持以下数据类型：dense_vector、keyword、text、long、integer、double、float、boolean、object。
`meta.fields.dims`	向量字段的尺寸。只有在`meta.fields.type` 是`dense_vector` 时才需要此参数。
`meta.milvus`	在 Milvus 2.x 中创建集合的特定配置。
`meta.milvus.collection`	Milvus 集合的名称。如果未指定，默认为 Elasticsearch 索引名称。
`meta.milvus.closeDynamicField`	指定是否禁用集合中的Dynamic Field。默认为`false` 。有关Dynamic Field的更多信息，请参阅启用Dynamic Field。
`meta.milvus.shardNum`	要在集合中创建的分片数量。有关分片的更多信息，请参阅术语。
`meta.milvus.consistencyLevel`	集合在 Milvus 中的一致性级别。有关更多信息，请参阅一致性。

source

参数	描述
`source.es`	源 Elasticsearch 服务器的连接配置。
`source.es.urls`	源 Elasticsearch 服务器的地址。
`source.es.username`	Elasticsearch 服务器的用户名。
`source.es.password`	Elasticsearch 服务器的密码。

target

参数	说明
`target.mode`	转储文件的存储位置。有效值： -`local`: 将转储文件存储在本地磁盘上。 -`remote`: 将转储文件存储在对象存储上。
`target.remote.outputDir`	云存储桶中的输出目录路径。
`target.remote.cloud`	云存储服务提供商。示例值：`aws`,`gcp`,`azure` 。
`target.remote.region`	云存储区域。如果使用本地 MinIO，可以是任何值。
`target.remote.bucket`	用于存储数据的存储桶名称。该值必须与 Milvus 2.x 中的配置相同。更多信息，请参阅系统配置。
`target.remote.useIAM`	是否使用 IAM 角色进行连接。
`target.remote.checkBucket`	是否检查对象存储中是否存在指定的存储桶。
`target.milvus2x`	目标 Milvus 2.x 服务器的连接配置。
`target.milvus2x.endpoint`	目标 Milvus 服务器地址。
`target.milvus2x.username`	Milvus 2.x 服务器的用户名。如果 Milvus 服务器启用了用户身份验证，则需要使用此参数。有关详细信息，请参阅启用身份验证。
`target.milvus2x.password`	Milvus 2.x 服务器的密码。如果 Milvus 服务器启用了用户身份验证，则需要使用此参数。有关详细信息，请参阅启用身份验证。

启动迁移任务

使用以下命令启动迁移任务。将{YourConfigFilePath} 替换为配置文件migration.yaml 所在的本地目录。

./milvus-migration start --config=/{YourConfigFilePath}/migration.yaml

以下是迁移日志输出成功的示例：

[task/load_base_task.go:94] ["[LoadTasker] Dec Task Processing-------------->"] [Count=0] [fileName=testfiles/output/zwh/migration/test_mul_field4/data_1_1.json] [taskId=442665677354739304]
[task/load_base_task.go:76] ["[LoadTasker] Progress Task --------------->"] [fileName=testfiles/output/zwh/migration/test_mul_field4/data_1_1.json] [taskId=442665677354739304]
[dbclient/cus_field_milvus2x.go:86] ["[Milvus2x] begin to ShowCollectionRows"]
[loader/cus_milvus2x_loader.go:66] ["[Loader] Static: "] [collection=test_mul_field4_rename1] [beforeCount=50000] [afterCount=100000] [increase=50000]
[loader/cus_milvus2x_loader.go:66] ["[Loader] Static Total"] ["Total Collections"=1] [beforeTotalCount=50000] [afterTotalCount=100000] [totalIncrease=50000]
[migration/es_starter.go:25] ["[Starter] migration ES to Milvus finish!!!"] [Cost=80.009174459]
[starter/starter.go:106] ["[Starter] Migration Success!"] [Cost=80.00928425]
[cleaner/remote_cleaner.go:27] ["[Remote Cleaner] Begin to clean files"] [bucket=a-bucket] [rootPath=testfiles/output/zwh/migration]
[cmd/start.go:32] ["[Cleaner] clean file success!"]

验证结果

执行迁移任务后，可以调用 API 或使用 Attu 查看已迁移实体的数量。有关详细信息，请参阅Attu和get_collection_stats()。

字段映射参考

查看下表，了解 Elasticsearch 索引中的字段类型如何映射到 Milvus 集合中的字段类型。

有关 Milvus 支持的数据类型的更多信息，请参阅支持的数据类型。

Elasticsearch 字段类型	Milvus 字段类型	描述
密集向量	浮点向量	向量尺寸在迁移过程中保持不变。
关键字	变量	设置最大长度（1 至 65,535）。超过限制的字符串会引发迁移错误。
文本	字符串	设置最大长度（1 至 65,535）。超过限制的字符串会触发迁移错误。
长	Int64	-
整数	Int32	-
双	双	-
浮点数	浮点数	-
布尔	布尔	-
对象	JSON	-

翻译自 DeepLogo

反馈

此页对您是否有帮助?