🚀 Try Zilliz Cloud, the fully managed Milvus, for free—experience 10x faster performance! Try Now>>

Milvus
Zilliz

What datasets are commonly used for image search?

Datasets for image search typically focus on large-scale labeled or annotated images to train and evaluate systems that match queries to relevant images. Three categories dominate: general-purpose datasets, domain-specific collections, and benchmark datasets designed for testing retrieval accuracy. Here are the most widely used options and their applications.

General-purpose datasets like MS COCO and ImageNet are foundational. MS COCO contains over 330,000 images with detailed object annotations, segmentation masks, and captions, making it useful for training models to recognize objects and their context—critical for semantic image search. ImageNet, with 14 million images labeled across 20,000 categories, is often used for pre-training feature extractors (e.g., ResNet) that power embedding-based search systems. Flickr-based datasets like Flickr30k or Flickr8k provide text captions paired with images, enabling text-to-image retrieval tasks. These datasets emphasize diversity in content and are commonly used to train multimodal systems where search queries can be textual or visual.

For specialized use cases, domain-specific datasets are preferred. Stanford Online Products (1.2 million product images) is designed for metric learning in e-commerce search, where fine-grained similarity matters. GLAMI-1M, focused on fashion, includes clothing items with attributes like color and style for training attribute-aware search models. Landmark retrieval often uses ROxford and RParis, which contain photos of famous landmarks under varying conditions (lighting, angles) to test robustness. These datasets address challenges like distinguishing subtle visual differences or handling noisy real-world queries.

Benchmark datasets like Revisited Oxford/Paris (updated versions of ROxford/RParis) and Google Landmarks provide standardized evaluation protocols. They include hard negative examples and query galleries with occlusion or viewpoint changes, helping developers test retrieval accuracy in challenging scenarios. Many research papers also use DeepFashion (clothing) or Food-101 (food images) to validate domain-specific search techniques. When choosing a dataset, prioritize ones aligned with your application’s requirements—whether general object retrieval, attribute-based search, or handling complex visual variations.

Like the article? Spread the word