SPLADE
SPLADEembedding 是一種為文件和查詢提供高度稀疏表示的模型,它繼承了字袋 (BOW) 模型的理想特性,例如精確的詞彙匹配和效率。
Milvus 透過SpladeEmbeddingFunction類別與 SPLADE 模型整合。這個類別提供了對文件和查詢進行編碼的方法,並將嵌入返回為與 Milvus 索引相容的稀疏向量。
要使用此功能,請安裝必要的相依性:
pip install --upgrade pymilvus
pip install "pymilvus[model]"
要實體化SpladeEmbeddingFunction,請使用指令:
from pymilvus import model
splade_ef = model.sparse.SpladeEmbeddingFunction(
model_name="naver/splade-cocondenser-selfdistil",
device="cpu"
)
參數:
model_name(string)
要用做編碼的 SPLADE 模型名稱。有效的選項為naver/splade-cocondenser-ensembledistil(預設)、naver/splade_v2_max、naver/splade_v2_distil 及naver/splade-cocondenser-selfdistil。如需詳細資訊,請參閱Play with models。
裝置(字串)
要使用的裝置,cpu代表 CPU,cuda:n代表第 n 個 GPU 裝置。
要為文件建立嵌入式資料,請使用encode_documents()方法:
docs = [
"Artificial intelligence was founded as an academic discipline in 1956.",
"Alan Turing was the first person to conduct substantial research in AI.",
"Born in Maida Vale, London, Turing was raised in southern England.",
]
docs_embeddings = splade_ef.encode_documents(docs)
# Print embeddings
print("Embeddings:", docs_embeddings)
# since the output embeddings are in a 2D csr_array format, we convert them to a list for easier manipulation.
print("Sparse dim:", splade_ef.dim, list(docs_embeddings)[0].shape)
預期的輸出與下圖相似:
Embeddings: (0, 2001) 0.6392706036567688
(0, 2034) 0.024093208834528923
(0, 2082) 0.3230178654193878
...
(2, 23602) 0.5671860575675964
(2, 26757) 0.5770265460014343
(2, 28639) 3.1990697383880615
Sparse dim: 30522 (1, 30522)
要為查詢建立嵌入式資料,請使用encode_queries()方法:
queries = ["When was artificial intelligence founded",
"Where was Alan Turing born?"]
query_embeddings = splade_ef.encode_queries(queries)
# Print embeddings
print("Embeddings:", query_embeddings)
# since the output embeddings are in a 2D csr_array format, we convert them to a list for easier manipulation.
print("Sparse dim:", splade_ef.dim, list(query_embeddings)[0].shape)
預期輸出與下列內容相似:
Embeddings: (0, 2001) 0.6353746056556702
(0, 2194) 0.015553371049463749
(0, 2301) 0.2756537199020386
...
(1, 18522) 0.1282549500465393
(1, 23602) 0.13133203983306885
(1, 28639) 2.8150033950805664
Sparse dim: 30522 (1, 30522)