PostgreSQL foreign data wrapper: Aggregate Pushdown
Introduction
PostgreSQL 10 implemented push aggregate functions to the remote server to improve foreign data wrapper performances (see release notes)
Without aggregate pushdown, when an aggregation function was used (COUNT, SUM, AVG...), all rows were forwarded to the foreign data wrapper which would aggregate them accordingly remotely.
PostgreSQL 10 added features to forward data already aggregated to the foreign data wrapper which dramatically improves performances.
Example
If you need more information on how to setup a foreign data wrapper, please read my dedicated article.
Sample table
We will create then access the following table with 2 foreign data wrappers, one with PostgreSQL 9.6 and the second with PostgreSQL 10:
-- Create table
CREATE TABLE remote_table (
id serial primary key,
num int,
date timestamp with time zone
);
-- Add indexes
CREATE INDEX ON remote_table (num);
CREATE INDEX ON remote_table (date);
-- Populate with some data
INSERT INTO remote_table (num, date)
SELECT floor(random() * 10 + 1)::int, generate_series(NOW() - INTERVAL '1000 days', NOW(), '1 day'::interval);
-- Grant SELECT privileges to "remote_user"
GRANT SELECT ON ALL TABLES IN SCHEMA public TO remote_user;
PostgreSQL 9.6: no aggregate pushdown
postgres=# SELECT num, count(*)
FROM fdw.remote_table
GROUP BY num
ORDER BY count(*) DESC
LIMIT 5;
num | count
-----+-------
5 | 117
1 | 113
7 | 111
4 | 109
10 | 101
(5 rows)
Time: 14.683 ms
Query explained:
postgres=# explain (verbose, costs off) SELECT num, count(*) FROM fdw.remote_table GROUP BY num ORDER BY count(*) DESC LIMIT 5;
QUERY PLAN
---------------------------------------------------------------------
Limit
Output: num, (count(*))
-> Sort
Output: num, (count(*))
Sort Key: (count(*)) DESC
-> HashAggregate
Output: num, count(*)
Group Key: remote_table.num
-> Foreign Scan on fdw.remote_table
Output: id, num, date
Remote SQL: SELECT num FROM public.remote_table
(11 rows)
Time: 1.507 ms
PostgreSQL 10: with aggregate pushdown
postgres=# SELECT num, count(*)
FROM fdw.remote_table
GROUP BY num
ORDER BY count(*) DESC
LIMIT 5;
num | count
-----+-------
1 | 107
7 | 105
10 | 104
5 | 103
2 | 101
(5 rows)
Time: 12.051 ms
Query explained:
postgres=# explain (verbose, costs off) SELECT num, count(*)
FROM fdw.remote_table
GROUP BY num
ORDER BY count(*) DESC
LIMIT 5;
QUERY PLAN
------------------------------------------------------------------------------------
Limit
Output: num, (count(*))
-> Sort
Output: num, (count(*))
Sort Key: (count(*)) DESC
-> Foreign Scan
Output: num, (count(*))
Relations: Aggregate on (fdw.remote_table)
Remote SQL: SELECT num, count(*) FROM public.remote_table GROUP BY 1
(9 rows)
Time: 0.790 ms
Understanding where the aggregation happens
With explain (verbose, costs off)
, the query plan will show the Remote SQL
query.
With PostgreSQL 9.6, the query executed on the remote server is:
SELECT num FROM public.remote_table
-- 1001 rows returned
and with PostgreSQL 10, it is:
SELECT num, count(*) FROM public.remote_table GROUP BY 1
-- 10 rows returned
It means that all rows matching the WHERE condition (all rows from the table in this case) are forwarded to the foreign data wrapper with PostgreSQL 9.6 which will then aggregate them locally. On PostgreSQL 10, rows are aggregated on the remote server and are forwarded pre-aggregated to the foreign data wrapper.
When the aggregation happens and when it does not
Pushing aggregate functions to the remote server will improve in the future and does not cover all queries.
As shown below, aggregation does not happen with a date function (like NOW()
). This is because dates are relative to the foreign data wrapper and are not immutables.
Aggregation does not happen with a date function:
SELECT num, count(*) FROM fdw.remote_table WHERE date > NOW() - INTERVAL '1 year' GROUP BY num ORDER BY count(*) DESC LIMIT 1;
-- Remote SQL:
SELECT num, date FROM public.remote_table
Even if the time zone is specified:
SELECT num, count(*) FROM fdw.remote_table WHERE date > (NOW() - INTERVAL '1 year') AT TIME ZONE 'US/Eastern' GROUP BY num ORDER BY count(*) DESC LIMIT 1;
-- Remote SQL:
SELECT num, date FROM public.remote_table
But it works with a string date:
SELECT num, count(*) FROM fdw.remote_table WHERE date > '2017-05-07' GROUP BY num ORDER BY count(*) DESC LIMIT 1;
-- Remote SQL:
SELECT num, count(*) FROM public.remote_table WHERE ((date > '2017-05-07 00:00:00+00'::timestamp with time zone)) GROUP BY 1
Since multiple elements are taken in consideration by the foreign data wrapper when deciding wether or not to use pushdown aggregation, I strongly recommend to use explain (verbose, costs off)
with all queries to understand the query plan and the remote query.