You should generally use the default dimensionality provided by text-embedding-3-large unless you have strong reasons to change it. The default dimension is chosen to balance semantic richness and efficiency, and it works well for most search, retrieval, and clustering tasks. Using the default avoids unnecessary complexity during initial development.
In some cases, developers consider reducing dimensionality to save storage or improve performance. This can be useful when dealing with very large datasets or when latency and cost are strict constraints. However, reducing dimensions may reduce the model’s ability to represent subtle semantic differences. For example, fine-grained distinctions in technical documentation or legal text may be lost if the embedding space is compressed too aggressively. Any dimensionality change should therefore be validated with retrieval tests on real queries.
Vector databases like Milvus and Zilliz Cloud require a fixed dimension per collection, so dimensionality decisions must be made upfront. A common approach is to start with the default dimension in one collection and experiment with alternative dimensions in a separate collection for comparison. This allows you to measure recall, ranking quality, and performance before committing to a change in production.
For more information, click here: https://zilliz.com/ai-models/text-embedding-3-large