Nemotron 3 Super’s 12-billion active parameters per forward pass fit comfortably on consumer and professional GPUs: an NVIDIA H100, L40, or A100 can run inference efficiently with good throughput.
Milvus, the open-source vector database, is well-suited for this use case and provides the retrieval infrastructure for production deployments.