Normalized Discounted Cumulative Gain (nDCG) is a metric used to evaluate the quality of ranked lists, such as search results or recommendations, by accounting for both the relevance of items and their positions. It compares the effectiveness of a ranking to an ideal ranking, producing a score between 0 and 1, where 1 represents a perfect ordering. nDCG is calculated by dividing the Discounted Cumulative Gain (DCG) of the ranked list by the DCG of an ideally ordered list (IDCG). Let’s break this down step by step.
First, DCG measures the cumulative gain of a ranked list by summing the relevance of each item, adjusted by a discount factor that reduces the weight of items appearing later in the list. The formula for DCG is typically: DCG = Σ (relevance_i / log₂(position_i + 1)) For example, consider a search query returning documents with relevance scores [3, 2, 3, 0, 1] at positions 1 to 5. The DCG is calculated as:
Next, the Ideal DCG (IDCG) is computed by sorting the relevance scores in descending order and recalculating DCG. Using the same example, the ideal order is [3, 3, 2, 1, 0]. The IDCG becomes:
Finally, nDCG is the ratio of DCG to IDCG: nDCG = DCG / IDCG In the example, nDCG ≈ 6.15 / 6.32 ≈ 0.97. This indicates the ranking is close to ideal. A key advantage of nDCG is its normalization, which allows comparison across queries with varying relevance scales. However, edge cases like all-zero relevance scores require special handling (e.g., setting nDCG to 0). Developers should also note variations in DCG formulas, such as using (2^relevance_i - 1) instead of raw relevance scores, which amplifies differences between higher and lower relevance grades.
In practice, nDCG is widely used in search engines, recommendation systems, and other ranking tasks where position and relevance matter. By focusing on both factors, it provides a balanced measure of ranking quality. When implementing nDCG, ensure consistency in the choice of logarithm base (typically base 2) and handle ties or missing values according to the problem’s requirements. Libraries like TensorFlow or scikit-learn often include built-in functions for nDCG, but understanding the underlying calculation helps debug and optimize custom implementations.
Zilliz Cloud is a managed vector database built on Milvus perfect for building GenAI applications.
Try FreeLike the article? Spread the word