Milvus
Zilliz

How should I embed content for Nemotron 3 Super RAG systems?

When building RAG systems with Nemotron 3 Super, use embeddings from NVIDIA NeMo Retriever or Llama Nemotron Embed models that align with the same training philosophy as Nemotron 3 Super.

For text content, Llama Nemotron Embed generates high-quality embeddings optimized for retrieval. For multimodal content mixing text and images, Llama Nemotron Embed VL produces unified embeddings across modalities. Both are designed to pair well with Nemotron 3 Super’s retrieval expectations.

Store these embeddings in Milvus using dense vector indexes for similarity search. For best results, chunk your source documents to match the context window you’ll pass to Nemotron 3 Super during generation—if you pass chunks to the model directly, embed individual chunks; if you plan to combine multiple chunks, embed at that granularity. Choosing Embedding Models for RAG in 2026 discusses embedding selection strategies in detail, helping you optimize for your specific domain and Nemotron 3 Super’s capabilities.

Like the article? Spread the word