Coder Perfect

In MySQL, which is faster: SELECT DISTINCT or GROUP BY?

Problem

If I have a table, I’ll use it.

CREATE TABLE users (
  id int(10) unsigned NOT NULL auto_increment,
  name varchar(255) NOT NULL,
  profession varchar(255) NOT NULL,
  employer varchar(255) NOT NULL,
  PRIMARY KEY  (id)
)

and I’m looking for the quickest (or most suggested) way to collect all of the unique values in my profession’s field:

SELECT DISTINCT u.profession FROM users u

or

SELECT u.profession FROM users u GROUP BY u.profession

?

Asked by vava

Solution #1

They’re nearly identical in terms of functionality (in fact this is how some databases implement DISTINCT under the hood).

It will be DISTINCTIVE if one of them is faster. This is because, despite the fact that the two are identical, a query optimizer would have to notice that your GROUP BY is only using the keys of the group members. Because DISTINCT makes this apparent, you can get away with using a little less intelligent optimizer.

When in doubt, put it to the test!

Answered by SquareCog

Solution #2

These two terms are synonyms in a professions index.

If you don’t, then use DISTINCT.

In MySQL, the GROUP BY function sorts the results. You might even:

SELECT u.profession FROM users u GROUP BY u.profession DESC

Now arrange your occupations in DESC order

DISTINCT establishes a temporary table in which duplicates are stored. GROUP BY does the same thing, but then sorts the different results.

So

SELECT DISTINCT u.profession FROM users u

is faster, if you don’t have an index on profession.

Answered by Quassnoi

Solution #3

For the scenario of DISTINCT on a single column vs GROUP BY on a single column, all of the responses are accurate. Every database engine has its own implementation and optimizations, and if you’re concerned about the minor differences (in most situations), you’ll need to test against a specific server AND version! Because implementations are subject to change…

However, if more than one column is selected in the query, the DISTINCT is essentially different! Because it will compare ALL columns of ALL rows in this situation, rather than just one.

So, if you have something along the lines of:

// This will NOT return unique by [id], but unique by (id,name)
SELECT DISTINCT id, name FROM some_query_with_joins

// This will select unique by [id].
SELECT id, name FROM some_query_with_joins GROUP BY id

It’s a common misconception that the DISTINCT keyword separates rows based on the first column you specify, however in this case, DISTINCT is a universal term.

So, guys, be careful not to assume that the solutions above are right in all circumstances… You can become perplexed and obtain incorrect results when all you wished to do was optimize!

Answered by daniel.gindi

Solution #4

If you can, go for the simplest and shortest option — DISTINCT appears to be more what you’re after because it will give you EXACTLY the answer you need and nothing else!

Answered by Tim

Solution #5

In rare cases, well distinct can be slower than group by in Postgres (dont know about other dbs).

tested example:

postgres=# select count(*) from (select distinct i from g) a;

count 

10001
(1 row)

Time: 1563,109 ms

postgres=# select count(*) from (select i from g group by i) a;

count
10001
(1 row)

Time: 594,481 ms

http://www.pgsql.cz/index.php/PostgreSQL_SQL_Tricks_I

so be cautious…:)

Answered by OptilabWorker

Post is based on https://stackoverflow.com/questions/581521/whats-faster-select-distinct-or-group-by-in-mysql