Training objectives like contrastive learning and triplet loss guide Sentence Transformers to produce embeddings where semantically similar sentences are closer in vector space while dissimilar ones are farther apart. These methods structure the learning process by comparing examples to refine the model’s understanding of similarity. For instance, contrastive learning uses pairs of sentences (positive or negative) to adjust embeddings, while triplet loss operates on triplets of examples (anchor, positive, negative) to enforce a margin-based separation. Both approaches rely on distance metrics like cosine similarity to quantify relationships between sentences.
In practice, contrastive learning trains the model by minimizing the distance between positive pairs and maximizing it for negative pairs. For example, given a pair of paraphrases (positive) like “The quick brown fox” and “A fast dusky fox,” the model reduces their embedding distance. Conversely, for a negative pair like “The quick brown fox” and “A slow gray turtle,” it increases the distance. Triplet loss extends this by using three inputs: an anchor (“A happy person”), a positive (“Someone feeling joyful”), and a negative (“A sad individual”). The loss function ensures the anchor is closer to the positive than the negative by a predefined margin. Mathematically, triplet loss is defined as max(d(anchor, positive) - d(anchor, negative) + margin, 0)
, where d
measures distance. This forces the model to create a clear hierarchy of similarity.
Developers implementing these objectives in Sentence Transformers typically use libraries like sentence-transformers
, which abstract away much of the complexity. For contrastive learning, a Siamese network structure processes pairs through the same encoder, computing similarity scores. For triplet loss, triplets are encoded, and the loss is applied to their distances. Key considerations include selecting an appropriate margin (e.g., 0.5), mining hard negatives (challenging examples close to the anchor), and structuring batches to include diverse examples. For instance, training on Natural Language Inference (NLI) data might use entailment pairs as positives and contradiction pairs as negatives. By aligning these techniques with their data and use case, developers can fine-tune models for tasks like semantic search or clustering, where precise similarity judgments are critical.
Zilliz Cloud is a managed vector database built on Milvus perfect for building GenAI applications.
Try FreeLike the article? Spread the word