Coder Perfect

In a group by clause, does the order of the columns matter?

Problem

Does it matter which order I group by if I have two columns, one with very high cardinality and the other with very low cardinality (unique # of values)?

Here’s an example:

select 
     dimensionName, 
     dimensionCategory, 
     sum(someFact)
from SomeFact f
join SomeDim d on f.dimensionKey = d.dimensionKey
group by 
    d.dimensionName,  -- large number of unique values
    d.dimensionCategory -- small number of unique values

Is there ever a time when it matters?

Asked by Jeff Meatball Yang

Solution #1

The GROUP BY clause does not care about the order.

Only MySQL and SQLite (non-standard, non-portable) databases I’m aware of enable you to choose columns that aren’t included in the group by, but the order doesn’t matter.

Answered by OMG Ponies

Solution #2

SQL is declarative.

You tell the optimiser how you want the data categorized in this example, and it figures out how to do it.

It will not assess each line individually (procedural) and will instead focus on one column at a time.

Indexes are the most important area where column order counts. col1, col2 and col2, col1 are not the same thing. Not at all.

Answered by gbn

Solution #3

Microsoft SQL Server has a legacy, non-standard capability called ROLLUP. ROLLUP is a GROUP BY extension that determines which columns should be grouped in the result based on the order of the GROUP BY columns. However, ROLLUP is no longer recommended. Grouping sets, which are supported by SQL Server 2008 and later versions, are the standard SQL alternative.

Answered by nvogel

Solution #4

Because it hasn’t been mentioned yet. The answers above are valid, indicating that the order of the columns after the “group by” clause has no bearing on the query’s correctness (i.e. the sum amount).

The order in which the rows are obtained, however, is determined by the order of the columns supplied following the “group by” phrase. Consider Table A, which has the following rows:

Col1 Col2 Col3
1   xyz 100
2   abc 200
3   xyz 300
3   xyz 400

SELECT *, SUM(Col3) FROM A GROUP BY Col2, Col1 will return entries in ascending order as ordered by Col2.

Col1 Col2 Col3 sum(Col3)
2   abc 200 200
1   xyz 100 100
3   xyz 300 700

Change the column ordering in group by to Col1, Col2. Col1 is used to sort the retrieved rows in ascending order.

i.e. from A group, pick *, sum(Col3) by Col1, Col2.

Col1 Col2 Col3 sum(Col3)
1   xyz 100 100
2   abc 200 200
3   xyz 300 700

Note: The the summation amount (i.e. the correctness of the query) remains exactly the same.

Answered by AaCodes

Solution #5

If I have two columns, one with very high cardinality and one with very low cardinality (unique # of values), does it matter in which order I group by?

Query-1

SELECT spec_id, catid, spec_display_value, COUNT(*) AS cnt  FROM tbl_product_spec 
GROUP BY spec_id, catid, spec_display_value ;

Query-2

SELECT spec_id, catid, spec_display_value, COUNT(*) AS cnt  FROM tbl_product_spec FORCE INDEX(idx_comp_spec_cnt)
GROUP BY catid, spec_id,spec_display_value;

Both are equal; but, in a group by clause, order does not function.

Answered by Gauravk

Post is based on https://stackoverflow.com/questions/3064677/does-the-order-of-columns-matter-in-a-group-by-clause