create table median_test (id text, value double precision); insert into median_test (id, value) select id, value from (select generate_series(1,100) as num, chr(cast(random()*10 as integer)+65) as id, random() as value ) as dd;
To support the median calculation, two columns are added using window functions, one of which is the row number of the ordered values (but doubled), and the other of which is the total number of rows. The window functions allow these to be calculated for each data frame, which is determined here by values of the id column. In this example the additional columns are added in a temporary view, but this could instead be a subquery, a common table expression, or even an actual table.
create or replace temporary view d as select id, value, row_number() over(partition by id order by value) * 2 as rownum2, count(*) over (partition by id) as rows from median_test;
The median calculation is then carried out by averaging either the two central values when there is an even number of values, or the one central value when there is an odd number of values.
select id, avg(value) as median from d where rownum2 in (rows, rows+1, rows+2) group by id;
When there is an odd number of rows, the single median value will have a value of rownum2 equal to rows+1. When there is an even number of rows, the two central rows will have values of rownum2 equal to either rows or rows+2.
No comments:
Post a Comment