Coder Perfect

MySQL has a simple way to calculate median.


What is the easiest (and hopefully not too sluggish) method of calculating the median in MySQL? I’ve found the mean using AVG(x), but I’m having trouble finding a straightforward technique to calculate the median. For the time being, I’m returning all of the rows to PHP, sorting them, and then selecting the center row, but surely there must be a simpler approach.

Example data:

id | val
 1    4
 2    7
 3    2
 4    2
 5    9
 6    8
 7    3

Sorting by val yields 2 2 3 4 7 8 9, indicating that the median should be 4, as opposed to SELECT AVG(val), which yields 5.

Asked by davr

Solution #1

In MariaDB / MySQL, you can do the following:

SELECT AVG(dd.val) as median_val
SELECT d.val, @rownum:=@rownum+1 as `row_number`, @total_rows:=@rownum
  FROM data d, (SELECT @rownum:=0) r
  WHERE d.val is NOT NULL
  -- put some where clause here
  ORDER BY d.val
) as dd
WHERE dd.row_number IN ( FLOOR((@total_rows+1)/2), FLOOR((@total_rows+2)/2) );

After the first run, @rownum will contain the whole number of rows, according to Steve Cohen. This can be used to find the median without the requirement for a second pass or join.

When there are an even number of records, AVG(dd.val) and dd.row number IN(…) are also utilized to accurately produce a median. Reasoning:

SELECT FLOOR((3+1)/2),FLOOR((3+2)/2); -- when total_rows is 3, avg rows 2 and 2
SELECT FLOOR((4+1)/2),FLOOR((4+2)/2); -- when total_rows is 4, avg rows 2 and 3

Finally, MariaDB 10.3.3 and later includes the MEDIAN function.

Answered by velcrow

Solution #2

I recently found another answer in the comments section of the internet:

Make sure your columns are well-indexed, and that the index is used to filter and sort the data. Verify the plans with the explanations.

select count(*) from table --find the number of rows

Figure out what the “median” row number is. Use median row = floor(count / 2) instead.

Then choose one from the list:

select val from table order by val asc limit median_row,1

This should yield a single row with only the value you’re looking for.


Answered by TheJacobTaylor

Solution #3

The acceptable solution didn’t work on my MySQL installation, returning an empty set, but this query did in every circumstance I tried it in:

SELECT x.val from data x, data y
GROUP BY x.val
HAVING SUM(SIGN(1-SIGN(y.val-x.val)))/COUNT(*) > .5

Answered by zookatron

Solution #4

Unfortunately, neither TheJacobTaylor’s nor velcrow’s responses are correct for current MySQL versions.

Although Velcro’s answer is near, it fails to calculate correctly for result sets with an even number of rows. On odd-numbered sets, the median is either the middle number or the average of the two middle numbers, while on even-numbered sets, the median is the average of the two middle numbers.

So, here’s velcro’s approach, tweaked to work with both odd and even numbers:

SELECT AVG(middle_values) AS 'median' FROM (
  SELECT t1.median_column AS 'middle_values' FROM
      SELECT @row:=@row+1 as `row`, x.median_column
      FROM median_table AS x, (SELECT @row:=0) AS r
      WHERE 1
      -- put some where clause here
      ORDER BY x.median_column
    ) AS t1,
      SELECT COUNT(*) as 'count'
      FROM median_table x
      WHERE 1
      -- put same where clause here
    ) AS t2
    -- the following condition will return 1 record for odd number sets, or 2 records for even number sets.
    WHERE t1.row >= t2.count/2 and t1.row <= ((t2.count/2) +1)) AS t3;

Follow these three simple steps to put it to use:

Answered by bob

Solution #5

I propose a more expedient method.

Calculate the number of rows:


Then, in a sorted subquery, take the middle value:

SELECT max(val) FROM (SELECT val FROM data ORDER BY val limit @middlevalue) x;

I tested this with a 5x10e6 random number dataset, and it found the median in under 10 seconds.

Answered by Reggie Edwards

Post is based on