• About Milvus
  • Get Started
  • Concepts
  • User Guide
  • Data Import
  • AI Tools
  • Administration Guide
  • Tools
  • Integrations
  • Tutorials
  • FAQs
  • API Reference

Snapshot Use CasesCompatible with Milvus 3.0.x

In this guide, you will find common use cases for snapshots.

Data backup and restoration

Snapshots are quick, point-in-time images of data, suitable for fast rollbacks or testing (days to weeks). At the same time, backups are independent, complete copies stored separately for long-term disaster recovery (weeks to years) and for better protection against total storage failure.

The following table compares snapshots and backups.

Backup

Snapshot

Backup creation

Copies all data files (time-consuming)

Creates metadata only (in milliseconds)

Restoration

Imports data and rebuilds indexes

Copies existing data and index files only

Performance

Slow and resource-intensive

Fast and lightweight (in seconds to minutes)

System impact

High I/O and CPU usage

Minimal impact

Creating a snapshot usually takes milliseconds, and restoring it takes seconds to minutes, depending on the data volume.

For more details on snapshot limits, restrictions, and their system impacts, refer to Snapshots.

Create snapshots

Before creating a snapshot, you are advised to stop writing data to the target collection and call flush() to avoid possible data loss.

When naming a snapshot, use clear, descriptive names, such as "daily_backup_20240101" or "v2.1_production_release" and avoid generic terms, such as "backup1" and "test". Use snapshot names wisely to distinguish snapshots across versions, environments, and stages.

The code examples below assume that you already have a collection named my_collection.

from pymilvus import MilvusClient

client = MilvusClient(
    uri="http://localhost:19530",
    token="root:Milvus"
)

# Recommended: Flush data before creating snapshot to ensure all data is included
client.flush(collection_name="my_collection")

# Create snapshot for entire collection
client.create_snapshot(
    collection_name="my_collection",
    snapshot_name="backup_20240101",
    description="Daily backup for January 1st, 2024"
)
// java
import (
    "context"
    "github.com/milvus-io/milvus/client/v2/milvusclient"
)

client, err := milvusclient.New(context.Background(), &milvusclient.ClientConfig{
    Address: "localhost:19530",
    Token: "root:Milvus",
})

// Recommended: Flush data before creating snapshot to ensure all data is included
err = client.Flush(context.Background(), milvusclient.NewFlushOption("my_collection"))
if err != nil {
    log.Fatal(err)
}

// Create snapshot
createOpt := milvusclient.NewCreateSnapshotOption("backup_20240101", "my_collection").
    WithDescription("Daily backup for January 1st, 2024")

err = client.CreateSnapshot(context.Background(), createOpt)
// node.js
# restful

Restore snapshots

You can restore a snapshot to a new collection. This operation is asynchronous and returns a job ID for tracking the restoration progress.

The restoration uses a copy-segment mechanism instead of data import, which is more efficient because it

  • directly copies segment files (binlogs, deltalogs, index files) from snapshot storage

  • preserves field IDs and index IDs to ensure compatibility with existing data files

  • avoids data rewriting and index rebuilding, resulting in significantly faster restore times, and

  • ensures a 10- to 100-fold performance increase compared with traditional backup and restore methods

To restore a snapshot, do as follows:

# Restore snapshot to new collection
job_id = client.restore_snapshot(
    snapshot_name="backup_20240101",
    collection_name="restored_collection",
)
// java
restoreOpt := milvusclient.NewRestoreSnapshotOption(
    "backup_20240101",
    "restored_collection"
)

jobID, err := client.RestoreSnapshot(context.Background(), restoreOpt)
if err != nil {
    log.Fatal(err)
}
// node.js
# restful

Drop snapshots

You can drop a snapshot if it is no longer needed. You are advised to remove old snapshots regularly to save storage.

client.drop_snapshot(
    snapshot_name="backup_20240101"
)
// java
dropOpt := milvusclient.NewDropSnapshotOption("backup_20240101")
err := client.DropSnapshot(context.Background(), dropOpt)
// node.js
# restful

Data processing with Spark

Snapshots enable efficient offline data processing by providing stable, consistent data sources for analytical workloads. You can directly access snapshot data stored in object storage with Spark or other big data processing frameworks without impacting the live Milvus cluster.

The following code assumes you have created a snapshot named "analytics_snapshot_20260321", stored it in an object storage bucket, and obtained the object storage access credentials.

Step 1: Get snapshot metadata

Before using Spark to access snapshot data, get snapshot metadata to locate the data files in object storage.

# Get snapshot metadata
snapshot_info = client.describe_snapshot(
    snapshot_name=s"analytics_snapshot_20260321",
    include_collection_info=True
)

# Locate data files in S3
s3_path = f"s3a://{snapshot_info.s3_location}/binlogs/"

Step2: Initiate a Spark session

With the data files in object storage, initiate a Spark session and read the data into a dataframe.

spark = SparkSession.builder \
    .appName("VectorAnalytics") \
    .config("spark.hadoop.fs.s3a.access.key", "YOUR_ACCESS_KEY") \
    .config("spark.hadoop.fs.s3a.secret.key", "YOUR_SECRET_KEY") \
    .getOrCreate()