🚀 Try Zilliz Cloud, the fully managed Milvus, for free—experience 10x faster performance! Try Now>>

Milvus
Zilliz
  • Home
  • AI Reference
  • How do you save a fine-tuned Sentence Transformer model and later load it for inference or deployment?

How do you save a fine-tuned Sentence Transformer model and later load it for inference or deployment?

To save a fine-tuned Sentence Transformer model, use the save method provided by the library. After training your model, call model.save("output/path"), which saves the model weights, configuration, and tokenizer to the specified directory. This method ensures all components (e.g., the transformer architecture, pooling layer, and tokenizer) are stored in a structured format, including files like config.json, pytorch_model.bin, and tokenizer-specific files. This approach guarantees reproducibility and compatibility when reloading the model later.

To load the saved model for inference, initialize a SentenceTransformer instance with the directory path. For example, model = SentenceTransformer("output/path") reconstructs the model using the saved configuration and weights. This works seamlessly across environments, provided the same dependencies (e.g., sentence-transformers and transformers library versions) are used. If you’ve modified the model architecture (e.g., added custom layers), ensure those changes are reflected in the code before loading to avoid errors. The loaded model can then generate embeddings via model.encode(text) as usual.

For deployment, consider packaging the model into a service. For instance, wrap the loaded model in a REST API using Flask or FastAPI, where incoming text requests are processed via model.encode(). Alternatively, serialize the model to ONNX format for faster inference in production systems. When deploying to cloud platforms like AWS SageMaker, package the model directory into a Docker container with the necessary dependencies. Always validate the loaded model’s performance with test inputs to ensure consistency between training and inference environments.

Like the article? Spread the word