Milvus
Zilliz

What algorithm powers Google embedding 2?

Google Embedding 2, officially known as Gemini Embedding 2, is powered by the Gemini architecture and incorporates a technique called Matryoshka Representation Learning (MRL). Gemini Embedding 2 is Google’s first natively multimodal embedding model, meaning it can generate embeddings for various data types, including text, images, video, audio, and documents, within a single shared embedding space. This capability allows for unified retrieval and classification across different media types, simplifying complex data pipelines. For instance, you could search for an image using a text description or compare video clips to audio segments, all within the same embedding framework. The model expands on Google’s earlier text-only embeddings, offering state-of-the-art performance in multimodal depth and capturing semantic intent across over 100 languages.

A core algorithmic aspect of Gemini Embedding 2 is its use of Matryoshka Representation Learning (MRL). This technique enables flexible output dimensions, allowing developers to scale down the default 3,072-dimensional embeddings to smaller sizes like 1,536 or 768 dimensions. This adaptability is crucial for balancing performance, storage costs, and computational efficiency, as lower-dimensional embeddings require less storage and can speed up similarity searches in vector databases. Developers can specify custom task instructions to optimize the embeddings for their specific goals, such as code retrieval or search result optimization. The model also handles interleaved inputs, meaning it can process combinations of different modalities, like an image and text, in a single request to produce a unified embedding that captures relationships across these media types.

The resulting high-dimensional vectors generated by Gemini Embedding 2 are designed to capture the semantic meaning and context of the input data. These embeddings are instrumental for a wide array of AI applications, including semantic search, clustering, classification, and Retrieval-Augmented Generation (RAG) workflows. When building such applications, these numerical representations can be stored and indexed efficiently in a vector database such as Milvus. This allows for fast and accurate similarity searches, where the geometric distance between vectors in the embedding space reflects the semantic similarity between the original data points. This approach significantly enhances the effectiveness of applications that rely on understanding the relationships between different pieces of information, regardless of their original modality.

Like the article? Spread the word