🚀 Try Zilliz Cloud, the fully managed Milvus, for free—experience 10x faster performance! Try Now>>

Milvus
Zilliz
  • Home
  • AI Reference
  • Why might one choose dot product as a similarity metric for certain applications (such as embeddings that are not normalized), and how does it relate to cosine similarity mathematically?

Why might one choose dot product as a similarity metric for certain applications (such as embeddings that are not normalized), and how does it relate to cosine similarity mathematically?

The dot product is chosen as a similarity metric in applications where the magnitude of embeddings carries meaningful information. Unlike cosine similarity, which normalizes vectors to focus solely on direction, the dot product retains both the direction and magnitude of the vectors. For example, in recommendation systems, embeddings might represent user preferences or item features. If a user’s embedding has a larger magnitude (e.g., due to higher activity or stronger preferences), the dot product naturally weights their similarity higher. This can be useful when the “intensity” of features matters—such as prioritizing active users’ preferences over casual ones. Similarly, in retrieval systems, document embeddings with larger magnitudes might indicate higher relevance or confidence, making the dot product a better fit for ranking.

Mathematically, the dot product of two vectors a and b is a · b = ||a|| ||b|| cosθ, where θ is the angle between them, and ||a||, ||b|| are their magnitudes. Cosine similarity is defined as (a · b) / (||a|| ||b||), which simplifies to cosθ. This shows cosine similarity is just the dot product scaled by the product of the vectors’ magnitudes. When embeddings are normalized (unit length), the dot product and cosine similarity are equivalent. However, when magnitudes vary, the dot product incorporates this variation. For instance, if two embeddings point in the same direction but one has a larger magnitude, their dot product will be higher than their cosine similarity, reflecting the combined effect of alignment and intensity.

The choice between dot product and cosine similarity depends on whether magnitude is relevant. Use the dot product when magnitudes signal importance—like in a model where embedding length correlates with confidence (e.g., a search engine ranking documents by both relevance and quality). Use cosine similarity when only directional alignment matters, such as comparing text embeddings where document length shouldn’t influence similarity. Computationally, the dot product is cheaper if embeddings aren’t pre-normalized, as it avoids the division step. For example, in real-time systems processing millions of embeddings, skipping normalization reduces latency. However, if magnitudes are noisy or irrelevant, cosine similarity’s normalization ensures fairer comparisons by isolating directional agreement.

Like the article? Spread the word