Milvus
Zilliz

Are there different versions of Google embedding 2?

Google has introduced various embedding models, and “Gemini Embedding 2” represents a significant advancement. While not a direct “version 2” of a single prior model in a simple numerical sequence, it is positioned as the successor to previous text-only embedding models and offers substantial enhancements, particularly in its multimodal capabilities.

The key distinction of Gemini Embedding 2, also referred to as gemini-embedding-2-preview during its public preview, is its native multimodal nature. Unlike earlier Google embedding models which were primarily text-only, Gemini Embedding 2 can process and embed text, images, video, audio, and documents into a unified embedding space. This allows for cross-modal search, classification, and clustering, where, for example, an image can be searched using a text description. Previously, multimodal approaches often involved separate encoders for different data types, requiring manual stitching to make data searchable together, potentially missing subtle interconnections. Gemini Embedding 2 is built on the Gemini foundation model, inheriting its multimodal understanding from the ground up, projecting all modalities into a single joint embedding space through a shared transformer architecture.

Furthermore, Gemini Embedding 2 offers flexibility in output dimensionality, though it defaults to a 3072-dimensional vector. Developers can specify a smaller output_dimensionality (such as 768 or 1536) to reduce storage and computational costs while retaining much of the embedding quality. This model also supports custom task instructions, enabling optimization of embeddings for specific use cases like semantic similarity, classification, or document search, which helps maximize accuracy and efficiency. For text inputs, it supports a context window of up to 8192 tokens, a significant increase over prior limits. These embeddings can be stored and indexed efficiently in vector databases like Milvus to power various AI applications requiring semantic search and retrieval.

In summary, “Google embedding 2” refers to gemini-embedding-2-preview, Google’s latest offering in embedding technology. It is a powerful, natively multimodal model that goes beyond its text-only predecessors (such as gemini-embedding-001, text-embedding-005, and text-multilingual-embedding-002), by integrating diverse data types into a single, semantically rich vector space. This advancement simplifies complex AI pipelines, improves accuracy, and provides developers with a versatile tool for building more sophisticated retrieval and analytics systems across over 100 languages.

Like the article? Spread the word