Coder Perfect

ROW NUMBER vs. SQL RANK() ()

Problem

The distinctions between these perplex me. I get two idential result sets when I run the following SQL. Is it possible for someone to explain the differences?

SELECT ID, [Description], RANK()       OVER(PARTITION BY StyleID ORDER BY ID) as 'Rank'      FROM SubStyle
SELECT ID, [Description], ROW_NUMBER() OVER(PARTITION BY StyleID ORDER BY ID) as 'RowNumber' FROM SubStyle

Asked by dotNET Hobbiest

Solution #1

You will only see the difference if you have ties within a partition for a particular ordering value.

In this situation, RANK and DENSE RANK are deterministic; all rows with the same value for both the ordering and partitioning columns will have the same result, but ROW NUMBER would assign an incrementing result to the tied rows arbitrarily (non-deterministically).

Consider the following scenario: (All rows have the same StyleID so are in the same partition and within that partition the first 3 rows are tied when ordered by ID)

WITH T(StyleID, ID)
     AS (SELECT 1,1 UNION ALL
         SELECT 1,1 UNION ALL
         SELECT 1,1 UNION ALL
         SELECT 1,2)
SELECT *,
       RANK() OVER(PARTITION BY StyleID ORDER BY ID)       AS 'RANK',
       ROW_NUMBER() OVER(PARTITION BY StyleID ORDER BY ID) AS 'ROW_NUMBER',
       DENSE_RANK() OVER(PARTITION BY StyleID ORDER BY ID) AS 'DENSE_RANK'
FROM   T  

Returns

StyleID     ID       RANK      ROW_NUMBER      DENSE_RANK
----------- -------- --------- --------------- ----------
1           1        1         1               1
1           1        1         2               1
1           1        1         3               1
1           2        4         4               2

The ROW NUMBER increments for the three identical rows, the RANK value stays the same, and then it jumps to 4. The same rank is allocated to all three rows by DENSE RANK, yet the next distinct value is given a value of 2.

Answered by Martin Smith

Solution #2

ROW NUMBER: Returns a unique number starting with 1 for each row. Numbers are arbitrarily assigned to rows with duplicate values.

Rank: Assigns a unique number to each row beginning with 1, except for rows having duplicate values, in which case the same ranking is assigned to each duplicate ranking and a gap emerges in the sequence.

Answered by Ritesh Mengji

Solution #3

The relationship between ROW NUMBER() and DENSE RANK() is explored in this article (the RANK() function is not discussed). When using ROW NUMBER() to generate a ROW NUMBER() on a SELECT DISTINCT query, the ROW NUMBER() will produce distinct values before the DISTINCT keyword removes them. Take, for example, this query.

SELECT DISTINCT
  v, 
  ROW_NUMBER() OVER (ORDER BY v) row_number
FROM t
ORDER BY v, row_number

… could lead to the following outcome (DISTINCT has no effect):

+---+------------+
| V | ROW_NUMBER |
+---+------------+
| a |          1 |
| a |          2 |
| a |          3 |
| b |          4 |
| c |          5 |
| c |          6 |
| d |          7 |
| e |          8 |
+---+------------+

Whereas this query:

SELECT DISTINCT
  v, 
  DENSE_RANK() OVER (ORDER BY v) row_number
FROM t
ORDER BY v, row_number

… delivers the result you’re looking for in this case:

+---+------------+
| V | ROW_NUMBER |
+---+------------+
| a |          1 |
| b |          2 |
| c |          3 |
| d |          4 |
| e |          5 |
+---+------------+

Note that the ORDER BY clause of the DENSE_RANK() function will need all other columns from the SELECT DISTINCT clause to work properly.

The reason for this is that window functions must be calculated before DISTINCT can be used.

Using typical PostgreSQL/Sybase/SQL syntax (WINDOW clause):

SELECT
  v,
  ROW_NUMBER() OVER (window) row_number,
  RANK()       OVER (window) rank,
  DENSE_RANK() OVER (window) dense_rank
FROM t
WINDOW window AS (ORDER BY v)
ORDER BY v

… you’ll get:

+---+------------+------+------------+
| V | ROW_NUMBER | RANK | DENSE_RANK |
+---+------------+------+------------+
| a |          1 |    1 |          1 |
| a |          2 |    1 |          1 |
| a |          3 |    1 |          1 |
| b |          4 |    4 |          2 |
| c |          5 |    5 |          3 |
| c |          6 |    5 |          3 |
| d |          7 |    7 |          4 |
| e |          8 |    8 |          5 |
+---+------------+------+------------+

Answered by Lukas Eder

Solution #4

A simple query without a partition clause is as follows:

select 
    sal, 
    RANK() over(order by sal desc) as Rank,
    DENSE_RANK() over(order by sal desc) as DenseRank,
    ROW_NUMBER() over(order by sal desc) as RowNumber
from employee 

Output:

    --------|-------|-----------|----------
    sal     |Rank   |DenseRank  |RowNumber
    --------|-------|-----------|----------
    5000    |1      |1          |1
    3000    |2      |2          |2
    3000    |2      |2          |3
    2975    |4      |3          |4
    2850    |5      |4          |5
    --------|-------|-----------|----------

Answered by DSR

Solution #5

Quite a bit:

A row’s rank is one plus the number of ranks preceding the row in question.

The separate rank of rows, with no gaps in the ranking, is row number.

http://www.bidn.com/blogs/marcoadf/bidn-blog/379/ranking-functions-row_number-vs-rank-vs-dense_rank-vs-ntile

Answered by NotMe

Post is based on https://stackoverflow.com/questions/7747327/sql-rank-versus-row-number