句子轉換器
Milvus 透過SentenceTransformerEmbeddingFunction類與Sentence Transformer預先訓練的模型整合。這個類別提供了使用預先訓練的 Sentence Transformer 模型來編碼文件和查詢的方法,並將嵌入回傳成與 Milvus 索引相容的密集向量。
要使用此功能,請安裝必要的相依性:
pip install --upgrade pymilvus
pip install "pymilvus[model]"
然後,實體化SentenceTransformerEmbeddingFunction:
from pymilvus import model
sentence_transformer_ef = model.dense.SentenceTransformerEmbeddingFunction(
model_name='all-MiniLM-L6-v2', # Specify the model name
device='cpu' # Specify the device to use, e.g., 'cpu' or 'cuda:0'
)
參數:
model_name(string)
用於編碼的 Sentence Transformer 模型名稱。預設值為all-MiniLM-L6-v2。您可以使用任何 Sentence Transformers 預先訓練的模型。如需可用模型的清單,請參閱預先訓練的模型。
裝置(字串)
要使用的裝置,cpu代表 CPU,cuda:n代表第 n 個 GPU 裝置。
要為文件建立嵌入式資料,請使用encode_documents()方法:
docs = [
"Artificial intelligence was founded as an academic discipline in 1956.",
"Alan Turing was the first person to conduct substantial research in AI.",
"Born in Maida Vale, London, Turing was raised in southern England.",
]
docs_embeddings = sentence_transformer_ef.encode_documents(docs)
# Print embeddings
print("Embeddings:", docs_embeddings)
# Print dimension and shape of embeddings
print("Dim:", sentence_transformer_ef.dim, docs_embeddings[0].shape)
預期的輸出與下圖相似:
Embeddings: [array([-3.09392996e-02, -1.80662833e-02, 1.34775648e-02, 2.77156215e-02,
-4.86349640e-03, -3.12581174e-02, -3.55921760e-02, 5.76934684e-03,
2.80773244e-03, 1.35783911e-01, 3.59678417e-02, 6.17732145e-02,
...
-4.61330153e-02, -4.85207550e-02, 3.13997865e-02, 7.82178566e-02,
-4.75336798e-02, 5.21207601e-02, 9.04406682e-02, -5.36676683e-02],
dtype=float32)]
Dim: 384 (384,)
要為查詢建立嵌入式資料,請使用encode_queries()方法:
queries = ["When was artificial intelligence founded",
"Where was Alan Turing born?"]
query_embeddings = sentence_transformer_ef.encode_queries(queries)
# Print embeddings
print("Embeddings:", query_embeddings)
# Print dimension and shape of embeddings
print("Dim:", sentence_transformer_ef.dim, query_embeddings[0].shape)
預期輸出與下列內容相似:
Embeddings: [array([-2.52114702e-02, -5.29330298e-02, 1.14570223e-02, 1.95571519e-02,
-2.46500354e-02, -2.66519729e-02, -8.48201662e-03, 2.82961670e-02,
-3.65092754e-02, 7.50745758e-02, 4.28900979e-02, 7.18822703e-02,
...
-6.76431581e-02, -6.45996556e-02, -4.67132553e-02, 4.78532910e-02,
-2.31596199e-03, 4.13446948e-02, 1.06935494e-01, -1.08258888e-01],
dtype=float32)]
Dim: 384 (384,)