教師
Instructor是一個由指令調整的文字內嵌模型,只要提供任務指令,就可以產生適合任何任務(例如分類、檢索、聚類、文字評估等)和領域(例如科學、金融等)的文字內嵌,而不需要任何微調。
Milvus 透過 InstructorEmbeddingFunction 類別與 Instructor 的嵌入模型整合。該類提供使用 Instructor 嵌入模型編碼文件和查詢的方法,並將嵌入返回為與 Milvus 索引相容的密集向量。
要使用此功能,請安裝必要的相依性:
pip install --upgrade pymilvus
pip install "pymilvus[model]"
然後,實體化 InstructorEmbeddingFunction:
from pymilvus.model.dense import InstructorEmbeddingFunction
ef = InstructorEmbeddingFunction(
model_name="hkunlp/instructor-xl", # Defaults to `hkunlp/instructor-xl`
query_instruction="Represent the question for retrieval:",
doc_instruction="Represent the document for retrieval:"
)
參數:
model_name
(字串)要用做編碼的 Mistral AI 嵌入模型名稱。預設值為
hkunlp/instructor-xl
。如需詳細資訊,請參閱模型清單。query_instruction
(字串)指導模型如何為查詢或問題產生嵌入的特定任務指示。
doc_instruction
(字串)指導模型為文件產生內嵌的特定任務指示。
若要為文件建立內嵌,請使用encode_documents()
方法:
docs = [
"Artificial intelligence was founded as an academic discipline in 1956.",
"Alan Turing was the first person to conduct substantial research in AI.",
"Born in Maida Vale, London, Turing was raised in southern England.",
]
docs_embeddings = ef.encode_documents(docs)
# Print embeddings
print("Embeddings:", docs_embeddings)
# Print dimension and shape of embeddings
print("Dim:", ef.dim, docs_embeddings[0].shape)
預期的輸出與下列內容相似:
Embeddings: [array([ 1.08575663e-02, 3.87877878e-03, 3.18090729e-02, -8.12458917e-02,
-4.68971021e-02, -5.85585833e-02, -5.95418774e-02, -8.55880603e-03,
-5.54775111e-02, -6.08020350e-02, 1.76202394e-02, 1.06648318e-02,
-5.89960292e-02, -7.46861771e-02, 6.60329172e-03, -4.25189249e-02,
...
-1.26921125e-02, 3.01475357e-02, 8.25323071e-03, -1.88470203e-02,
6.04814291e-03, -2.81618331e-02, 5.91602828e-03, 7.13866428e-02],
dtype=float32)]
Dim: 768 (768,)
要為查詢建立內嵌,請使用encode_queries()
方法:
queries = ["When was artificial intelligence founded",
"Where was Alan Turing born?"]
query_embeddings = ef.encode_queries(queries)
print("Embeddings:", query_embeddings)
print("Dim", ef.dim, query_embeddings[0].shape)
預期的輸出與下列內容相似:
Embeddings: [array([ 1.21721877e-02, 1.88485277e-03, 3.01732980e-02, -8.10302645e-02,
-6.13401756e-02, -3.98149453e-02, -5.18723316e-02, -6.76784338e-03,
-6.59285188e-02, -5.38365729e-02, -5.13435388e-03, -2.49210224e-02,
-5.74403182e-02, -7.03031123e-02, 6.63730130e-03, -3.42259370e-02,
...
7.36595877e-03, 2.85532661e-02, -1.55952033e-02, 2.13342719e-02,
1.51187545e-02, -2.82798670e-02, 2.69396193e-02, 6.16136603e-02],
dtype=float32)]
Dim 768 (768,)