Milvus
Zilliz
  • Home
  • Blog
  • MinIO 停止接受社区修改:评估 RustFS 作为 Milvus S3 兼容对象存储后端的可行性

MinIO 停止接受社区修改:评估 RustFS 作为 Milvus S3 兼容对象存储后端的可行性

  • Tutorials
January 14, 2026
Min Yin

本文作者尹敏是 Milvus 最活跃的社区贡献者之一,经授权在此发布。

MinIO是一个开源、高性能、兼容 S3 的对象存储系统,广泛应用于 AI/ML、分析和其他数据密集型工作负载。对于许多Milvus部署来说,它也一直是对象存储的默认选择。不过最近,MinIO 团队更新了GitHub README,表示该项目不再接受新的变更

实际上,在过去几年中,MinIO 已逐渐将注意力转向商业产品,收紧了许可和分发模型,并缩减了社区存储库中的活跃开发。将开源项目转入维护模式是这一广泛转型的自然结果。

对于默认依赖MinIO的Milvus用户来说,这一变化是难以忽视的。对象存储是 Milvus 持久层的核心,随着时间的推移,它的可靠性不仅取决于当前的工作情况,还取决于系统是否能与它所支持的工作负载一起继续发展。

在此背景下,本文探讨了作为潜在替代方案的RustFS。RustFS 是一个基于 Rust、兼容 S3 的对象存储系统,强调内存安全和现代系统设计。该系统目前仍处于实验阶段,本文的讨论并非生产推荐。

Milvus 架构和对象存储组件的位置

Milvus 2.6 采用完全解耦的存储-计算架构。在这个模型中,存储层由三个独立的组件组成,每个组件都扮演着不同的角色。

Etcd 用于存储元数据,Pulsar 或 Kafka 用于处理流日志,而对象存储(通常是 MinIO 或 S3 兼容服务)则为向量数据和索引文件提供持久性。由于存储和计算是分开的,因此 Milvus 可以独立扩展计算节点,同时依靠共享、可靠的存储后端。

对象存储在 Milvus 中的作用

对象存储是 Milvus 的持久存储层。原始向量数据以 binlog 的形式持久化,HNSW 和 IVF_FLAT 等索引结构也存储在这里。

这种设计使计算节点成为无状态节点。查询节点不在本地存储数据,而是按需从对象存储中加载数据段和索引。因此,节点可以自由增减,从故障中快速恢复,并支持整个集群的动态负载平衡,而无需在存储层重新平衡数据。

my-milvus-bucket/
├── files/                          # rootPath (default)
│   ├── insert_log/                 # insert binlogs
│   │   └── {Collection_ID}/
│   │       └── {Partition_ID}/
│   │           └── {Segment_ID}/
│   │               └── {Field_ID}/
│   │                   └── {Log_ID}     # Per-field binlog files
│   │
│   ├── delta_log/                  # Delete binlogs
│   │   └── {Collection_ID}/
│   │       └── {Partition_ID}/
│   │           └── {Segment_ID}/
│   │               └── {Log_ID}        
│   │
│   ├── stats_log/                  # Statistical data (e.g., Bloom filters)
│   │   └── {Collection_ID}/
│   │       └── {Partition_ID}/
│   │           └── {Segment_ID}/
│   │               └── {Field_ID}/
│   │                   └── {Log_ID}
│   │
│   └── index_files/                # Index files
│       └── {Build_ID}_{Index_Version}_{Segment_ID}_{Field_ID}/
│           ├── index_file_0
│           ├── index_file_1
│           └── ...

Milvus 为何使用 S3 API

Milvus 没有定义自定义存储协议,而是使用 S3 API 作为其对象存储接口。S3 已成为对象存储的事实标准:AWS S3、阿里巴巴云 OSS 和腾讯云 COS 等主要云提供商原生支持 S3,而 MinIO、Ceph RGW、SeaweedFS 和 RustFS 等开源系统也完全兼容 S3。

通过对 S3 API 进行标准化,Milvus 可以依赖成熟的、经过良好测试的 Go SDK,而不是为每个存储后端维护单独的集成。这种抽象还为用户提供了部署灵活性:MinIO 用于本地开发,云对象存储用于生产,Ceph 和 RustFS 用于私有环境。只要有与 S3 兼容的端点,Milvus 就不需要知道或关心下面使用的是哪种存储系统。

# Milvus configuration file milvus.yaml
minio:
 address: localhost
 port: 9000
 accessKeyID: minioadmin
 secretAccessKey: minioadmin
 useSSL: false
 bucketName: milvus-bucket

评估 RustFS 作为 Milvus 的 S3 兼容对象存储后端

项目概述

RustFS 是用 Rust 编写的分布式对象存储系统。它目前处于 alpha 阶段(版本 1.0.0-alpha.68),旨在将 MinIO 的操作简易性与 Rust 在内存安全和性能方面的优势结合起来。更多详情请访问GitHub

RustFS 仍在积极开发中,其分布式模式尚未正式发布。因此,现阶段不建议将 RustFS 用于生产或关键任务工作负载。

架构设计

RustFS 采用的设计在概念上与 MinIO 相似。HTTP 服务器提供与 S3 兼容的 API,对象管理器负责处理对象元数据,存储引擎负责数据块管理。在存储层,RustFS 依赖 XFS 或 ext4 等标准文件系统。

对于计划中的分布式模式,RustFS 打算使用 etcd 进行元数据协调,多个 RustFS 节点组成一个集群。这种设计与常见的对象存储架构非常吻合,让拥有 MinIO 经验的用户对 RustFS 不陌生。

与 Milvus 的兼容性

在考虑将 RustFS 作为替代对象存储后端之前,有必要评估它是否符合 Milvus 的核心存储要求。

Milvus 要求S3 应用程序接口RustFS 支持
向量数据持久化PutObject,GetObject✅ 完全支持
索引文件管理ListObjects,DeleteObject完全支持
分段合并操作多部分上传✅ 完全支持
一致性保证强读写后强一致性(单节点)

根据上述评估,RustFS 当前的 S3 API 实现满足 Milvus 的基线功能要求。这使其适合在非生产环境中进行实际测试。

实际操作:在 Milvus 中用 RustFS 替换 MinIO

本练习的目标是替换默认的 MinIO 对象存储服务,并使用 RustFS 作为对象存储后端部署 Milvus 2.6.7。

前提条件

  1. 已安装 Docker 和 Docker Compose(版本≥ 20.10),系统可正常拉取镜像并运行容器。

  2. 本地目录可用于对象数据存储,如/volume/data/ (或自定义路径)。

  3. 主机端口 9000 对外开放,或相应配置其他端口。

  4. RustFS 容器以非根用户身份运行 (rustfs)。确保主机数据目录为 UID 10001 所有。

第 1 步:创建数据目录并设置权限

# Create the project directory
mkdir -p milvus-rustfs && cd milvus-rustfs
# Create the data directory
mkdir -p volumes/{rustfs, etcd, milvus}
# Update permissions for the RustFS directory
sudo chown -R 10001:10001 volumes/rustfs

下载官方 Docker Compose 文件

wget https://github.com/milvus-io/milvus/releases/download/v2.6.7/milvus-standalone-docker-compose.yml -O docker-compose.yml

第 2 步:修改对象存储服务

定义 RustFS 服务

注意:确保访问密钥和秘钥与 Milvus 服务中配置的凭证相匹配。

rustfs:
 container_name: milvus-rustfs
 image: registry.cn-hangzhou.aliyuncs.com/rustfs/rustfs: latest
 environment:
 RUSTFS_ACCESS_KEY: minioadmin
 RUSTFS_SECRET_KEY: minioadmin
 RUSTFS_CONSOLE_ENABLE: “true”
 RUSTFS_REGION: us-east-1
 # RUSTFS_SERVER_DOMAINS: localhost# Optional; not required for local deployments
 ports:
 - “9001:9001”
 - “9000:9000”
 volumes:
 - ${DOCKER_VOLUME_DIRECTORY:-.}/volumes/rustfs:/data
 command: >
 --address :9000
 --console-enable
 /data
 healthcheck:
 test: [“CMD”, “curl”, “-f”, “http://localhost:9000/health"]
 interval: 30s
 timeout: 20s
 retries: 3

完成配置

注意:Milvus 的存储配置目前假设 MinIO 风格的默认值,尚不允许自定义访问密钥或秘钥值。使用 RustFS 作为替代时,必须使用 Milvus 期望的相同默认凭据。

version: ‘3.5’
services:
 etcd:
 container_name: milvus-etcd
 image: registry.cn-hangzhou.aliyuncs.com/etcd/etcd: v3.5.25
 environment:
 - ETCD_AUTO_COMPACTION_MODE=revision
 - ETCD_AUTO_COMPACTION_RETENTION=1000
 - ETCD_QUOTA_BACKEND_BYTES=4294967296
 - ETCD_SNAPSHOT_COUNT=50000
 volumes:
 - ${DOCKER_VOLUME_DIRECTORY:-.}/volumes/etcd:/etcd
 command: etcd -advertise-client-urls=http://etcd:2379 -listen-client-urls http://0.0.0.0:2379 --data-dir /etcd
 healthcheck:
 test: [“CMD”, “etcdctl”, “endpoint”, “health”]
 interval: 30s
 timeout: 20s
 retries: 3
 rustfs:
 container_name: milvus-rustfs
 image: registry.cn-hangzhou.aliyuncs.com/rustfs/rustfs: latest
 environment:
 RUSTFS_ACCESS_KEY: minioadmin
 RUSTFS_SECRET_KEY: minioadmin
 RUSTFS_CONSOLE_ENABLE: “true”
 RUSTFS_REGION: us-east-1
 # RUSTFS_SERVER_DOMAINS: localhost# Optional; not required for local deployments
 ports:
 - “9001:9001”
 - “9000:9000”
 volumes:
 - ${DOCKER_VOLUME_DIRECTORY:-.}/volumes/rustfs:/data
 command: >
 --address :9000
 --console-enable
 /data
 healthcheck:
 test: [“CMD”, “curl”, “-f”, “http://localhost:9000/health"]
 interval: 30s
 timeout: 20s
 retries: 3
 standalone:
 container_name: milvus-standalone
 image: registry.cn-hangzhou.aliyuncs.com/milvus/milvus: v2.6.7
 command: [”milvus“, ”run“, ”standalone“]
 security_opt:
 - seccomp: unconfined
 environment:
 MINIO_REGION: us-east-1
 ETCD_ENDPOINTS: etcd:2379
 MINIO_ADDRESS: rustfs:9000
 MINIO_ACCESS_KEY: minioadmin
 MINIO_SECRET_KEY: minioadmin
 MINIO_USE_SSL: ”false“
 MQ_TYPE: rocksmq
 volumes:
 - ${DOCKER_VOLUME_DIRECTORY:-.}/volumes/milvus:/var/lib/milvus
 healthcheck:
 test: [”CMD“, ”curl“, ”-f“, ”http://localhost:9091/healthz"]
 interval: 30s
 start_period: 90s
 timeout: 20s
 retries: 3
 ports:
 - “19530:19530”
 - “9091:9091”
 depends_on:
 - “etcd”
 - “rustfs”
networks:
 default:
 name: milvus

启动服务

docker-compose -f docker-compose.yaml up -d

检查服务状态

docker-compose ps -a

访问 RustFS Web UI

在浏览器中打开 RustFS 网页界面:http://localhost:9001

使用默认凭据登录:用户名和密码都是 minioadmin。

测试 Milvus 服务

from pymilvus import connections, Collection, FieldSchema, CollectionSchema, DataType
# connect to Milvus
connections.connect(
 alias="default",
 host='localhost',
 port='19530'
)
print("✓ Successfully connected to Milvus!")
# create test collection
fields = [
 FieldSchema(name="id", dtype=DataType.INT64, is_primary=True, auto_id=True),
 FieldSchema(name="embedding", dtype=DataType.FLOAT_VECTOR, dim=128)
]
schema = CollectionSchema(fields=fields, description="test collection")
collection = Collection(name="test_collection", schema=schema)
print("✓ Test collection created!")
print("✓ RustFS verified as the S3 storage backend!")

### Step 3: Storage Performance Testing (Experimental)

Test Design

Two Milvus deployments were set up on identical hardware (16 cores / 32 GB memory / NVMe SSD), using RustFS and MinIO respectively as the object storage backend. The test dataset consisted of 1,000,000 vectors with 768 dimensions, using an HNSW index with parameters M = 16 and efConstruction = 200. Data was inserted in batches of 5,000.

The following metrics were evaluated: Insert throughput, Index build time, Cold and warm load time, Search latency, Storage footprint.

Test Code

Note: Only the core parts of the test code are shown below.

def milvus_load_bench(dim=768, rows=1_000_000, batch=5000): collection = Collection(...) # Insert test t0 = time.perf_counter() for i in range(0, rows, batch): collection.insert([rng.random((batch, dim), dtype=np.float32).tolist()]) insert_time = time.perf_counter() - t0 # Index build collection.flush() collection.create_index(field_name="embededding", index_params={"index_type": "HNSW", ...}) # 加载测试(冷启动 + 两次热启动) Collections.release() load_times = [] for i in range(3): if i > 0: collection.release(); time.sleep(2) collection.load() load_times.append(...) # 查询测试 search_times = [] for _ in range(3): collection.search(queries, limit=10, ...)


**Test Command**

python bench.py milvus-load-bench --dim 768 --rows 1000000 --batch 5000
-索引类型 HNSW --repeat-load 3 --release-before-load --do-search --drop-after


**Performance Results**
  • RustFS

Faster writes (+57%), lower storage usage (57%), and faster warm loads (+67%), making it suitable for write-heavy, cost-sensitive workloads.

Much slower queries (7.96 ms vs. 1.85 ms, ~+330% latency) with noticeable variance (up to 17.14 ms), and 43% slower index builds. Not suitable for query-intensive applications.

  • MinIO

Excellent query performance (1.85 ms average latency), mature small-file and random I/O optimizations, and a well-established ecosystem.

MetricRustFSMinIODifference
Insert Throughput4,472 rows/s2,845 rows/s0.57
Index Build Time803 s562 s-43%
Load (Cold Start)22.7 s18.3 s-24%
Load (Warm Start)0.009 s0.027 s0.67
Search Latency7.96 ms1.85 ms-330%
Storage Usage7.8 GB18.0 GB0.57

RustFS significantly outperforms MinIO in write performance and storage efficiency, with both showing roughly 57% improvement. This demonstrates the system-level advantages of the Rust ecosystem. However, the 330% gap in query latency limits RustFS’s suitability for query-intensive workloads.

For production environments, cloud-managed object storage services like AWS S3 are recommended first, as they are mature, stable, and require no operational effort. Self-hosted solutions are better suited to specific scenarios: RustFS for cost-sensitive or write-intensive workloads, MinIO for query-intensive use cases, and Ceph for data sovereignty. With further optimization in random read performance, RustFS has the potential to become a strong self-hosted option.

Conclusion

MinIO’s decision to stop accepting new community contributions is disappointing for many developers, but it won’t disrupt Milvus users. Milvus depends on the S3 API—not any specific vendor implementation—so swapping storage backends is straightforward. This S3-compatibility layer is intentional: it ensures Milvus stays flexible, portable, and decoupled from vendor lock-in.

For production deployments, cloud-managed services such as AWS S3 and Alibaba Cloud OSS remain the most reliable options. They’re mature, highly available, and drastically reduce the operational load compared to running your own object storage. Self-hosted systems like MinIO or Ceph still make sense in cost-sensitive environments or where data sovereignty is non-negotiable, but they require significantly more engineering overhead to operate safely at scale.

RustFS is interesting—Apache 2.0-licensed, Rust-based, and community-driven—but it’s still early. The performance gap is noticeable, and the distributed mode hasn’t shipped yet. It’s not production-ready today, but it’s a project worth watching as it matures.

If you’re comparing object storage options for Milvus, evaluating MinIO replacements, or weighing the operational trade-offs of different backends, we’d love to hear from you.

Join our[ Discord channel](https://discord.com/invite/8uyFbECzPX) and share your thoughts. You can also book a 20-minute one-on-one session to get insights, guidance, and answers to your questions through[ Milvus Office Hours](https://milvus.io/blog/join-milvus-office-hours-to-get-support-from-vectordb-experts.md).

Try Managed Milvus for Free

Zilliz Cloud is hassle-free, powered by Milvus and 10x faster.

Get Started

Like the article? Spread the word

扩展阅读