Cosine similarity is a mathematical measure used to determine how similar two vectors are, based on the cosine of the angle between them. It ranges from -1 to 1, where 1 means the vectors are identical in direction, 0 means they are orthogonal (no similarity), and -1 means they are diametrically opposite. In natural language processing (NLP), cosine similarity is often applied to normalized vectors (vectors scaled to unit length), which simplifies the calculation to the dot product of the vectors. This metric is preferred over Euclidean distance for text similarity tasks because it focuses on the direction of the vectors rather than their magnitude, making it robust to differences in sentence length or word frequency. For example, the sentences “I love programming” and “Coding is enjoyable” might have embeddings pointing in similar directions, resulting in a high cosine similarity score.
Sentence Transformers are neural models specifically designed to generate dense vector representations (embeddings) of sentences that capture semantic meaning. Unlike traditional models like BERT, which produce token-level embeddings, Sentence Transformers are fine-tuned using techniques like contrastive learning or triplet loss to ensure that semantically similar sentences are closer in the embedding space. For instance, the model all-MiniLM-L6-v2
maps sentences to 384-dimensional vectors, where similar sentences (e.g., “The cat sits on the mat” and “A kitten is lying on the rug”) have embeddings with minimal angular distance. These embeddings are optimized for tasks like semantic search, clustering, and similarity comparison, as they preserve the semantic relationships between sentences in a compact numerical form.
When using Sentence Transformers to measure sentence similarity, cosine similarity is applied to the embeddings of two sentences. First, the sentences are converted into embeddings using the model. For example, in Python, model.encode(sentences)
generates embeddings. Next, the cosine similarity between these embeddings is computed. If the embeddings are normalized (common in practice), this reduces to a simple dot product. A score close to 1 (e.g., 0.85) indicates high similarity, while a score near 0 suggests dissimilarity. Developers often use this approach in applications like recommendation systems (matching user queries to products) or chatbots (identifying paraphrased user inputs). For example, comparing “How do I reset my password?” and “What steps are needed to change my login credentials?” would yield a high cosine similarity, enabling the system to recognize the semantic equivalence. This combination of Sentence Transformers and cosine similarity provides an efficient and effective way to quantify semantic relationships between texts.
Zilliz Cloud is a managed vector database built on Milvus perfect for building GenAI applications.
Try FreeLike the article? Spread the word