Group by a range of values with PostgreSQL
Introduction
Here is a simple solution on how to group data within a range. As a basic example, you may have a list of team members with their birth year and you may want to know the average salary grouped by decade:
r_from | r_to | team_members | salary_avg | salary_min | salary_max
--------+------+--------------+------------+------------+------------
1960 | 1969 | 2 | 48150.00 | 39400 | 56900
1970 | 1979 | 4 | 36575.00 | 26800 | 44000
1980 | 1989 | 4 | 43125.00 | 23000 | 87000
1990 | 1999 | 3 | 45533.33 | 40000 | 55000
Grouped veggies
Try it yourself
If you'd like to try the above example for yourself, just create this table:
CREATE TEMP TABLE team (
id serial,
name text,
birth_year integer,
salary integer
);
INSERT INTO team (name, birth_year, salary)
VALUES ('Gabriel', 1970, 44000),
('Tom', 1972, 36000),
('Bill', 1978, 39500),
('Bob', 1980, 29000),
('Roger', 1976, 26800),
('Lucas', 1965, 56900),
('Jerome', 1984, 33500),
('Andrew', 1992, 41600),
('John', 1991, 40000),
('Paul', 1964, 39400),
('Richard', 1986, 23000),
('Joseph', 1988, 87000),
('Jason', 1990, 55000);
Create a range with generate_series
A simple solution would be to create a range with generate series:
gab@test> WITH series AS (
SELECT generate_series(1950, 2010, 10) AS r_from -- 1950 = min, 2010 = max, 10 = 10 year interval
), range AS (
SELECT r_from, (r_from + 9) AS r_to FROM series -- 9 = interval (10 years) minus 1
)
SELECT r_from, r_to
FROM range;
r_from | r_to
--------+------
1950 | 1959
1960 | 1969
1970 | 1979
1980 | 1989
1990 | 1999
2000 | 2009
2010 | 2019
(7 rows)
Group data by a range of values
You can use the example above to group data within a range:
gab@test> WITH series AS (
SELECT generate_series(1950, 2010, 10) AS r_from -- 1950 = min, 2010 = max, 10 = 10 year interval
), range AS (
SELECT r_from, (r_from + 9) AS r_to FROM series -- 9 = interval (10 years) minus 1
)
SELECT r_from, r_to,
(SELECT count(*) FROM team WHERE birth_year BETWEEN r_from AND r_to) as team_members,
round((SELECT AVG(salary) FROM team WHERE birth_year BETWEEN r_from AND r_to), 2) as salary_avg,
(SELECT MIN(salary) FROM team WHERE birth_year BETWEEN r_from AND r_to) as salary_min,
(SELECT MAX(salary) FROM team WHERE birth_year BETWEEN r_from AND r_to) as salary_max
FROM range;
r_from | r_to | team_members | salary_avg | salary_min | salary_max
--------+------+--------------+------------+------------+------------
1950 | 1959 | 0 | {NULL} | {NULL} | {NULL}
1960 | 1969 | 2 | 48150.00 | 39400 | 56900
1970 | 1979 | 4 | 36575.00 | 26800 | 44000
1980 | 1989 | 4 | 43125.00 | 23000 | 87000
1990 | 1999 | 3 | 45533.33 | 40000 | 55000
2000 | 2009 | 0 | {NULL} | {NULL} | {NULL}
2010 | 2019 | 0 | {NULL} | {NULL} | {NULL}
(7 rows)