Tuesday, April 12, 2016

Calculating the Median in SQL Using Window Functions

Numerous approaches to calculating the median in SQL have been presented in books and online. Here's another one that is short and simple, and makes use of window functions. This example is specific to PostgreSQL, but could be adapted to any other DBMS that supports window functions. To illustrate this method, I'll use a set of test data in a table named median_test:

create table median_test (id text, value double precision);

insert into median_test 
    (id, value)
        generate_series(1,100) as num, 
        chr(cast(random()*10 as integer)+65) as id, 
        random() as value
    ) as dd;

To support the median calculation, two columns are added using window functions, one of which is the row number of the ordered values (but doubled), and the other of which is the total number of rows. The window functions allow these to be calculated for each data frame, which is determined here by values of the id column. In this example the additional columns are added in a temporary view, but this could instead be a subquery, a common table expression, or even an actual table.

create or replace temporary view d as
    row_number() over(partition by id order by value) * 2 as rownum2,
    count(*) over (partition by id) as rows

The median calculation is then carried out by averaging either the two central values when there is an even number of values, or the one central value when there is an odd number of values.

    avg(value) as median
    rownum2 in (rows, rows+1, rows+2)
group by

When there is an odd number of rows, the single median value will have a value of rownum2 equal to rows+1. When there is an even number of rows, the two central rows will have values of rownum2 equal to either rows or rows+2.

No comments:

Post a Comment