To save a fine-tuned Sentence Transformer model, use the save
method provided by the library. After training your model, call model.save("output/path")
, which saves the model weights, configuration, and tokenizer to the specified directory. This method ensures all components (e.g., the transformer architecture, pooling layer, and tokenizer) are stored in a structured format, including files like config.json
, pytorch_model.bin
, and tokenizer-specific files. This approach guarantees reproducibility and compatibility when reloading the model later.
To load the saved model for inference, initialize a SentenceTransformer
instance with the directory path. For example, model = SentenceTransformer("output/path")
reconstructs the model using the saved configuration and weights. This works seamlessly across environments, provided the same dependencies (e.g., sentence-transformers
and transformers
library versions) are used. If you’ve modified the model architecture (e.g., added custom layers), ensure those changes are reflected in the code before loading to avoid errors. The loaded model can then generate embeddings via model.encode(text)
as usual.
For deployment, consider packaging the model into a service. For instance, wrap the loaded model in a REST API using Flask or FastAPI, where incoming text requests are processed via model.encode()
. Alternatively, serialize the model to ONNX format for faster inference in production systems. When deploying to cloud platforms like AWS SageMaker, package the model directory into a Docker container with the necessary dependencies. Always validate the loaded model’s performance with test inputs to ensure consistency between training and inference environments.
Zilliz Cloud is a managed vector database built on Milvus perfect for building GenAI applications.
Try FreeLike the article? Spread the word