milvus-logo
LFAI
首页
  • 模型

吉纳人工智能

Jina AI 的嵌入模型是高性能的文本嵌入模型,可以将文本输入转化为数字表示,捕捉文本的语义。这些模型在密集检索、语义文本相似性和多语言理解等应用中表现出色。

Milvus 通过JinaEmbeddingFunction 类与 Jina AI 的嵌入模型集成。该类提供了使用 Jina AI 嵌入模型对文档和查询进行编码的方法,并将嵌入作为与 Milvus 索引兼容的密集向量返回。要使用此功能,请从Jina AI 获取 API 密钥。

要使用此功能,请安装必要的依赖项:

pip install --upgrade pymilvus
pip install "pymilvus[model]"

然后,实例化JinaEmbeddingFunction

from pymilvus.model.dense import JinaEmbeddingFunction

jina_ef = JinaEmbeddingFunction(
    model_name="jina-embeddings-v2-base-en", # Defaults to `jina-embeddings-v2-base-en`
    api_key=JINAAI_API_KEY # Provide your Jina AI API key
)

参数

  • model_name (字符串)

    用于编码的 Jina AI 嵌入模型名称。您可以指定任何可用的 Jina AI 嵌入模型名称,例如jina-embeddings-v2-base-en,jina-embeddings-v2-small-en 等。如果不指定此参数,则将使用jina-embeddings-v2-base-en 。有关可用模型的列表,请参阅Jina Embeddings

  • api_key (字符串)

    访问 Jina AI API 的 API 密钥。

要为文档创建嵌入,请使用encode_documents() 方法:

docs = [
    "Artificial intelligence was founded as an academic discipline in 1956.",
    "Alan Turing was the first person to conduct substantial research in AI.",
    "Born in Maida Vale, London, Turing was raised in southern England.",
]

docs_embeddings = jina_ef.encode_documents(docs)

# Print embeddings
print("Embeddings:", docs_embeddings)
# Print dimension and shape of embeddings
print("Dim:", jina_ef.dim, docs_embeddings[0].shape)

预期输出类似于下图:

Embeddings: [array([-4.88487840e-01, -4.28095880e-01,  4.90086500e-01, -1.63274320e-01,
        3.43437800e-01,  3.21476880e-01,  2.83173790e-02, -3.10403670e-01,
        4.76985040e-01, -1.77410420e-01, -3.84803180e-01, -2.19224200e-01,
       -2.52898000e-01,  6.62411900e-02, -8.58173100e-01,  1.05221800e+00,
...
       -2.04462400e-01,  7.14229800e-01, -1.66823000e-01,  8.72551440e-01,
        5.53560140e-01,  8.92506300e-01, -2.39408610e-01, -4.22413560e-01,
       -3.19551350e-01,  5.59153850e-01,  2.44338100e-01, -8.60452100e-01])]
Dim: 768 (768,)

要为查询创建嵌入式信息,请使用encode_queries() 方法:

queries = ["When was artificial intelligence founded", 
           "Where was Alan Turing born?"]

query_embeddings = jina_ef.encode_queries(queries)

print("Embeddings:", query_embeddings)
print("Dim", jina_ef.dim, query_embeddings[0].shape)

预期输出类似于下面的内容:

Embeddings: [array([-5.99164660e-01, -3.49827350e-01,  8.22405160e-01, -1.18632730e-01,
        5.78107540e-01,  1.09789170e-01,  2.91604200e-01, -3.29306450e-01,
        2.93779640e-01, -2.17880800e-01, -6.84535440e-01, -3.79752000e-01,
       -3.47541800e-01,  9.20846100e-02, -6.13804400e-01,  6.31312800e-01,
...
       -1.84993740e-02,  9.38629150e-01,  2.74858470e-02,  1.09396360e+00,
        3.96270750e-01,  7.44445800e-01, -1.95404050e-01, -6.08383200e-01,
       -3.75076300e-01,  3.87512200e-01,  8.11889650e-01, -3.76407620e-01])]
Dim 768 (768,)

翻译自DeepLogo

目录
反馈

此页对您是否有帮助?