Problem
What is the easiest (and hopefully not too sluggish) method of calculating the median in MySQL? I’ve found the mean using AVG(x), but I’m having trouble finding a straightforward technique to calculate the median. For the time being, I’m returning all of the rows to PHP, sorting them, and then selecting the center row, but surely there must be a simpler approach.
Example data:
id | val
--------
1 4
2 7
3 2
4 2
5 9
6 8
7 3
Sorting by val yields 2 2 3 4 7 8 9, indicating that the median should be 4, as opposed to SELECT AVG(val), which yields 5.
Asked by davr
Solution #1
In MariaDB / MySQL, you can do the following:
SELECT AVG(dd.val) as median_val
FROM (
SELECT d.val, @rownum:=@rownum+1 as `row_number`, @total_rows:=@rownum
FROM data d, (SELECT @rownum:=0) r
WHERE d.val is NOT NULL
-- put some where clause here
ORDER BY d.val
) as dd
WHERE dd.row_number IN ( FLOOR((@total_rows+1)/2), FLOOR((@total_rows+2)/2) );
After the first run, @rownum will contain the whole number of rows, according to Steve Cohen. This can be used to find the median without the requirement for a second pass or join.
When there are an even number of records, AVG(dd.val) and dd.row number IN(…) are also utilized to accurately produce a median. Reasoning:
SELECT FLOOR((3+1)/2),FLOOR((3+2)/2); -- when total_rows is 3, avg rows 2 and 2
SELECT FLOOR((4+1)/2),FLOOR((4+2)/2); -- when total_rows is 4, avg rows 2 and 3
Finally, MariaDB 10.3.3 and later includes the MEDIAN function.
Answered by velcrow
Solution #2
I recently found another answer in the comments section of the internet:
Make sure your columns are well-indexed, and that the index is used to filter and sort the data. Verify the plans with the explanations.
select count(*) from table --find the number of rows
Figure out what the “median” row number is. Use median row = floor(count / 2) instead.
Then choose one from the list:
select val from table order by val asc limit median_row,1
This should yield a single row with only the value you’re looking for.
Jacob
Answered by TheJacobTaylor
Solution #3
The acceptable solution didn’t work on my MySQL installation, returning an empty set, but this query did in every circumstance I tried it in:
SELECT x.val from data x, data y
GROUP BY x.val
HAVING SUM(SIGN(1-SIGN(y.val-x.val)))/COUNT(*) > .5
LIMIT 1
Answered by zookatron
Solution #4
Unfortunately, neither TheJacobTaylor’s nor velcrow’s responses are correct for current MySQL versions.
Although Velcro’s answer is near, it fails to calculate correctly for result sets with an even number of rows. On odd-numbered sets, the median is either the middle number or the average of the two middle numbers, while on even-numbered sets, the median is the average of the two middle numbers.
So, here’s velcro’s approach, tweaked to work with both odd and even numbers:
SELECT AVG(middle_values) AS 'median' FROM (
SELECT t1.median_column AS 'middle_values' FROM
(
SELECT @row:=@row+1 as `row`, x.median_column
FROM median_table AS x, (SELECT @row:=0) AS r
WHERE 1
-- put some where clause here
ORDER BY x.median_column
) AS t1,
(
SELECT COUNT(*) as 'count'
FROM median_table x
WHERE 1
-- put same where clause here
) AS t2
-- the following condition will return 1 record for odd number sets, or 2 records for even number sets.
WHERE t1.row >= t2.count/2 and t1.row <= ((t2.count/2) +1)) AS t3;
Follow these three simple steps to put it to use:
Answered by bob
Solution #5
I propose a more expedient method.
Calculate the number of rows:
FROM data, SELECT CEIL(COUNT(*)/2)
Then, in a sorted subquery, take the middle value:
SELECT max(val) FROM (SELECT val FROM data ORDER BY val limit @middlevalue) x;
I tested this with a 5x10e6 random number dataset, and it found the median in under 10 seconds.
Answered by Reggie Edwards
Post is based on https://stackoverflow.com/questions/1291152/simple-way-to-calculate-median-with-mysql