To use LangChain with different types of embeddings, you first need to understand how LangChain abstracts the process of integrating embeddings. LangChain provides a standardized interface for embedding models, allowing you to swap providers (e.g., OpenAI, Hugging Face, or custom models) without rewriting your entire pipeline. Start by installing LangChain and any required dependencies for your chosen embedding model. For example, using OpenAI’s embeddings requires the openai
package, while Hugging Face might need sentence-transformers
. Initialize the embedding class (e.g., OpenAIEmbeddings
or HuggingFaceEmbeddings
) with parameters like API keys or model names, then use methods like embed_documents()
or embed_query()
to generate vectors. This abstraction lets you focus on higher-level tasks like retrieval or similarity search while keeping embedding logic consistent.
For example, using OpenAI embeddings involves initializing OpenAIEmbeddings()
with your API key and calling embed_query("example text")
to generate a vector. If you prefer open-source models, Hugging Face’s InstructEmbeddings
can be initialized with a model name like "hkunlp/instructor-large"
and used similarly. LangChain also supports local models, such as SentenceTransformerEmbeddings
from the sentence-transformers
library. Each embedding type has unique configuration requirements: OpenAI relies on API calls, while Hugging Face models may require downloading weights or adjusting device settings (e.g., model_kwargs={"device": "cuda"}
). The key is that LangChain’s unified interface ensures these differences are handled behind the scenes, so your application code remains consistent regardless of the embedding provider.
To integrate custom embeddings or adapt advanced workflows, subclass LangChain’s Embeddings
base class and implement embed_documents()
and embed_query()
. For instance, if you have a proprietary model, wrap its inference logic in these methods. When combining embeddings with vector stores like Chroma or FAISS, ensure the store’s dimensions match your embeddings’ output (e.g., OpenAI’s 1536-dimensional vectors). If switching embedding models, re-embed your documents to maintain compatibility. LangChain’s VectorStore
integrations simplify this by providing methods like add_texts()
that internally use your chosen embeddings. This flexibility allows developers to experiment with different models while maintaining a clean, scalable architecture.
Zilliz Cloud is a managed vector database built on Milvus perfect for building GenAI applications.
Try FreeLike the article? Spread the word