🚀 Try Zilliz Cloud, the fully managed Milvus, for free—experience 10x faster performance! Try Now>>

Milvus
Zilliz
  • Home
  • AI Reference
  • How does the distance metric used (cosine vs L2) interplay with the embedding model choice, and could a mismatch lead to suboptimal retrieval results?

How does the distance metric used (cosine vs L2) interplay with the embedding model choice, and could a mismatch lead to suboptimal retrieval results?

The choice between cosine similarity and L2 (Euclidean) distance as a metric for vector comparison depends heavily on how the embedding model was trained. Embedding models are optimized to structure their output vectors in ways that align with specific distance metrics. For example, models trained using objectives like cosine similarity (e.g., contrastive loss in Siamese networks) learn to emphasize the angular relationships between vectors, making their embeddings more directionally meaningful. In contrast, models trained with L2-based losses (e.g., triplet loss with Euclidean margins) focus on minimizing absolute distances between similar items. If the retrieval system uses a metric that doesn’t match the model’s training objective, the geometric relationships between embeddings may not reflect their semantic similarity, leading to poor retrieval accuracy.

For instance, consider a model trained to maximize cosine similarity between related text snippets. Such embeddings will lie in a hypersphere where direction (angle) matters more than magnitude. Using L2 distance here could be problematic because vectors with small angles but large magnitudes (e.g., due to rare terms in TF-IDF-like embeddings) might appear farther apart than they should. Conversely, a model trained with L2 objectives might produce embeddings where both direction and magnitude are meaningful (e.g., in image embeddings where pixel intensity matters). Using cosine similarity here might ignore magnitude-based patterns the model learned. A classic example of mismatch is using OpenAI’s text-embedding-ada-002 (often paired with cosine) with L2: since the model normalizes embeddings to unit length, L2 becomes equivalent to cosine, but this isn’t universally true for all models.

To avoid suboptimal results, developers should first check the model’s documentation or training setup. For example, Sentence-BERT models are often fine-tuned with cosine similarity, making them perform best with that metric. If the model’s training objective is unclear, experimenting with both metrics on a validation set can help identify the better choice. Additionally, normalization (scaling vectors to unit length) can sometimes reconcile the two metrics, as L2 distance between normalized vectors reduces to a function of cosine similarity. However, normalization isn’t always appropriate—for models where magnitude carries meaning (e.g., embeddings representing confidence scores), forcing unit length could discard useful information. In summary, alignment between the model’s design and the metric is critical, and mismatches can lead to significant performance degradation in retrieval tasks.

Like the article? Spread the word