🚀 Try Zilliz Cloud, the fully managed Milvus, for free—experience 10x faster performance! Try Now>>

Milvus
Zilliz
  • Home
  • AI Reference
  • Is it possible to use Sentence Transformer models without the Sentence Transformers library (for example, using the Hugging Face Transformers API directly)?

Is it possible to use Sentence Transformer models without the Sentence Transformers library (for example, using the Hugging Face Transformers API directly)?

Yes, it is possible to use Sentence Transformer models without the Sentence Transformers library by leveraging the Hugging Face Transformers API directly. Sentence Transformers are built on top of the Transformers library and typically combine pre-trained models like BERT with pooling layers to generate sentence embeddings. While the Sentence Transformers library simplifies this process with abstractions, you can replicate the same functionality using lower-level Transformers components. This involves manually handling tokenization, model inference, and pooling operations to produce embeddings from raw text inputs.

To achieve this, you would first load a pre-trained model (e.g., bert-base-uncased) using Hugging Face’s AutoModel and AutoTokenizer classes. Tokenization converts text into input IDs and attention masks, which are passed to the model to generate token-level embeddings. The critical step is applying a pooling strategy—such as mean pooling—to aggregate token embeddings into a fixed-length sentence vector. For example, after obtaining the model’s output, you could average the embeddings across the token sequence (excluding padding tokens using the attention mask). This replicates the default behavior of many Sentence Transformer models, which use mean pooling under the hood. Additionally, some models require normalization of the output vectors, which can be done manually using libraries like NumPy or PyTorch.

However, there are trade-offs. The Sentence Transformers library abstracts away boilerplate code, handles model-specific nuances (e.g., max sequence lengths), and ensures compatibility with pre-trained checkpoint architectures. When using Transformers directly, you must replicate these details. For instance, if a Sentence Transformer model was trained with a specific pooling method like CLS token pooling or max pooling, you need to implement that logic explicitly. Code maintenance also becomes more complex, as updates to the Transformers API might require adjustments. Despite this, the approach works for developers who prefer minimal dependencies or need fine-grained control. For example, you could load the all-MiniLM-L6-v2 model checkpoint via AutoModel, apply mean pooling, and normalize the output to match embeddings generated by the Sentence Transformers library.

Like the article? Spread the word