How do you get the rows for each grouped set that contain the maximum value?
I’ve seen a few overly complex variations on this issue, but none of them have a good response. I’ve attempted to make the exam as simple as possible.
Given a table like that below, with person, group, and age columns, how would you get the oldest person in each group? (A tie within a group should give the first alphabetical result)
Person | Group | Age --- Bob | 1 | 32 Jill | 1 | 34 Shawn| 1 | 42 Jake | 2 | 29 Paul | 2 | 36 Laura| 2 | 39
Desired result set:
Shawn | 1 | 42 Laura | 2 | 39
Asked by Yarin
The right answer is:
SELECT o.* FROM `Persons` o # 'o' from 'oldest person in group' LEFT JOIN `Persons` b # 'b' from 'bigger age' ON o.Group = b.Group AND o.Age < b.Age WHERE b.Age is NULL # bigger age not found
Age in mn One or more rows from b will match any row from o that does not have the maximum value of its group in column Age.
The LEFT JOIN joins the oldest individual in the group (including those who are the only ones in their group) to a row of NULLs from b (‘no oldest age in the group’). When you use INNER JOIN, these rows are ignored because they don’t match.
Only the rows with NULLs in the fields derived from b are kept in the WHERE clause. They are the oldest individuals in each group.
This solution and many others are explained in the book SQL Antipatterns: Avoiding the Pitfalls of Database Programming
Answered by axiac
In mysql, there’s a super-easy way to do this:
select * from (select * from mytable order by `Group`, age desc, Person) x group by `Group`
This works because you can’t aggregate non-group-by fields in mysql; otherwise, mysql will just return the first row. The approach is to first sort the data so that the row you want is first in each group, then group by the columns for which you want the value.
You avoid the problems of returning numerous rows when there are many rows with the same maximum value, as well as complicated subqueries that try to find the max() and so on (as the other answers would do)
Please keep in mind that this is a mysql-only solution. All other databases I’m familiar with will throw a SQL syntax error stating something along the lines of “non aggregated columns are not listed in the group by clause” or something similar. Because this solution relies on undocumented behavior, the more cautious may want to include a test to ensure that it continues to work if MySQL changes its behavior in the future.
The sql-mode setting contains ONLY FULL GROUP BY by default since version 5.7, thus you must disable this option to make this work (edit the option file for the server to remove this setting).
Answered by Bohemian
You can join against a subquery that pulls the MAX(Group) and Age from the database. This method works with almost any RDBMS.
SELECT t1.* FROM yourTable t1 INNER JOIN ( SELECT `Group`, MAX(Age) AS max_age FROM yourTable GROUP BY `Group` ) t2 ON t1.`Group` = t2.`Group` AND t1.Age = t2.max_age;
Answered by Michael Berkowski
My simple SQLite (and presumably MySQL) solution is as follows:
SELECT *, MAX(age) FROM mytable GROUP BY `Group`;
It does not, however, operate with PostgreSQL and maybe other systems.
You can use the DISTINCT ON clause in PostgreSQL:
SELECT DISTINCT ON ("group") * FROM "mytable" ORDER BY "group", "age" DESC;
Answered by Igor Kulagin
I’m not sure if MySQL has a function called row number. If that’s the case, you can use it to achieve your goal. You can do something similar on SQL Server:
CREATE TABLE p ( person NVARCHAR(10), gp INT, age INT ); GO INSERT INTO p VALUES ('Bob', 1, 32); INSERT INTO p VALUES ('Jill', 1, 34); INSERT INTO p VALUES ('Shawn', 1, 42); INSERT INTO p VALUES ('Jake', 2, 29); INSERT INTO p VALUES ('Paul', 2, 36); INSERT INTO p VALUES ('Laura', 2, 39); GO SELECT t.person, t.gp, t.age FROM ( SELECT *, ROW_NUMBER() OVER (PARTITION BY gp ORDER BY age DESC) row FROM p ) t WHERE t.row = 1;
Answered by user130268
Post is based on https://stackoverflow.com/questions/12102200/get-records-with-max-value-for-each-group-of-grouped-sql-results