OpenAI
Milvus 透過OpenAIEmbeddingFunction類與 OpenAI 的模型整合。這個類別提供了使用預先訓練的 OpenAI 模型來編碼文件和查詢的方法,並將嵌入返回為與 Milvus 索引相容的密集向量。若要使用此功能,請在OpenAI平台上建立帳號,從OpenAI取得 API 金鑰。
要使用此功能,請安裝必要的相依性:
pip install --upgrade pymilvus
pip install "pymilvus[model]"
然後,實體化OpenAIEmbeddingFunction:
from pymilvus import model
openai_ef = model.dense.OpenAIEmbeddingFunction(
model_name='text-embedding-3-large', # Specify the model name
api_key='YOUR_API_KEY', # Provide your OpenAI API key
dimensions=512 # Set the embedding dimensionality
)
參數:
model_name(string)
用於編碼的 OpenAI 模型名稱。有效選項為text-embedding-3-small、text- embedding- 3-large 及text-embedding-ada-002(預設)。
api_key(字串)
存取 OpenAI API 的 API 金鑰。
base_url(字串)
存取 OpenAI API 的基本 URL。該值預設為https://api.openai.com/v1。不過,如果您要存取不同模型提供者的相容 API 端點或本機 vLLM 範例,例如http://localhost:8080/v1,您可以在此指定 URL。
dimensions(int)
結果輸出嵌入應該有的尺寸數。僅在text-embedding-3及更新版本的模型中支援。
若要為文件建立嵌入式資料,請使用encode_documents()方法:
docs = [
"Artificial intelligence was founded as an academic discipline in 1956.",
"Alan Turing was the first person to conduct substantial research in AI.",
"Born in Maida Vale, London, Turing was raised in southern England.",
]
docs_embeddings = openai_ef.encode_documents(docs)
# Print embeddings
print("Embeddings:", docs_embeddings)
# Print dimension and shape of embeddings
print("Dim:", openai_ef.dim, docs_embeddings[0].shape)
預期的輸出與下面相似:
Embeddings: [array([ 1.76741909e-02, -2.04964578e-02, -1.09788161e-02, -5.27223349e-02,
4.23139781e-02, -6.64533582e-03, 4.21088142e-03, 1.04644023e-01,
5.10009527e-02, 5.32827862e-02, -3.26061808e-02, -3.66494283e-02,
...
-8.93232748e-02, 6.68255147e-03, 3.55093405e-02, -5.09071983e-02,
3.74144339e-03, 4.72541340e-02, 2.11916920e-02, 1.00753829e-02,
-5.76633997e-02, 9.68257990e-03, 4.62721288e-02, -4.33261096e-02])]
Dim: 512 (512,)
要為查詢建立內嵌,請使用encode_queries()方法:
queries = ["When was artificial intelligence founded",
"Where was Alan Turing born?"]
query_embeddings = openai_ef.encode_queries(queries)
# Print embeddings
print("Embeddings:", query_embeddings)
# Print dimension and shape of embeddings
print("Dim", openai_ef.dim, query_embeddings[0].shape)
預期的輸出與下列內容相似:
Embeddings: [array([ 0.00530251, -0.01907905, -0.01672608, -0.05030033, 0.01635982,
-0.03169853, -0.0033602 , 0.09047844, 0.00030747, 0.11853652,
-0.02870182, -0.01526102, 0.05505067, 0.00993909, -0.07165466,
...
-9.78106782e-02, -2.22669560e-02, 1.21873049e-02, -4.83198799e-02,
5.32377362e-02, -1.90469325e-02, 5.62430918e-02, 1.02650477e-02,
-6.21757433e-02, 7.88027793e-02, 4.91846527e-04, -1.51633881e-02])]
Dim 512 (512,)