10 tips and tricks for PostgreSQL (part 1)
Index
Here are a few tips and tricks for PostgreSQL I have been using for some time. I invite you to have a look at them:
- Sum, averages... without subqueries with Window Functions
- Get the number of the current row
- Age Calculator
- Convert a PostgreSQL array to a PHP array
- Generate series of dates
- Manage NULLs in your ORDER BY (link to part 2 of this article)
- Access data from the previous and the next row with lag() and lead() (link to part 2 of this article)
- Difference between two dates (link to part 2 of this article)
- Export the result from a query to CSV (link to part 2 of this article)
- Customize psql with the .psqlrc file (link to part 2 of this article)
1. Sum, average and more without subqueries
You can use PostgreSQL's Window Functions to perform calculations across a set of rows related to the current row.
For example, let's create a table with a list of sales and the date of sale:
DROP TABLE IF EXISTS foo;
CREATE TEMP TABLE foo (date date, sale_amount numeric);
INSERT INTO foo VALUES ('2014-08-01', 11), ('2014-08-01', 120), ('2014-08-01', 80),
('2014-08-02', 14), ('2014-08-04', 180), ('2014-08-04', 70), ('2014-08-01', 18);
A basic query would display the date and amount of the sale:
gab@test> SELECT date, sale_amount
FROM foo
ORDER BY date;
date | sale_amount
------------+-------------
2014-08-01 | 18
2014-08-01 | 120
2014-08-01 | 80
2014-08-01 | 11
2014-08-02 | 14
2014-08-04 | 70
2014-08-04 | 180
(7 rows)
With a Window Function we can for example get the total number of sales and the average without using a subquery:
gab@test> SELECT date, sale_amount, SUM(sale_amount) OVER() as sales_total,
AVG(sale_amount) OVER() as sales_avg
FROM foo
ORDER BY date;
date | sale_amount | sales_total | sales_avg
------------+-------------+-------------+---------------------
2014-08-01 | 18 | 493 | 70.4285714285714286
2014-08-01 | 120 | 493 | 70.4285714285714286
2014-08-01 | 80 | 493 | 70.4285714285714286
2014-08-01 | 11 | 493 | 70.4285714285714286
2014-08-02 | 14 | 493 | 70.4285714285714286
2014-08-04 | 70 | 493 | 70.4285714285714286
2014-08-04 | 180 | 493 | 70.4285714285714286
(7 rows)
We can also get the total number of sales for each day as well as the average sale price for each day:
gab@test> SELECT date, sale_amount, SUM(sale_amount) OVER(PARTITION BY date) as sales_total_day,
AVG(sale_amount) OVER(PARTITION BY date) as sales_avg_day
FROM foo
ORDER BY date;
date | sale_amount | sales_total_day | sales_avg_day
------------+-------------+-----------------+----------------------
2014-08-01 | 18 | 229 | 57.2500000000000000
2014-08-01 | 120 | 229 | 57.2500000000000000
2014-08-01 | 80 | 229 | 57.2500000000000000
2014-08-01 | 11 | 229 | 57.2500000000000000
2014-08-02 | 14 | 14 | 14.0000000000000000
2014-08-04 | 70 | 250 | 125.0000000000000000
2014-08-04 | 180 | 250 | 125.0000000000000000
(7 rows)
2. Get the number of the current row
For our example, we will use the same schema as for the first tip:
DROP TABLE IF EXISTS foo;
CREATE TEMP TABLE foo (date date, sale_amount numeric);
INSERT INTO foo VALUES ('2014-08-01', 11), ('2014-08-01', 120), ('2014-08-01', 80),
('2014-08-02', 14), ('2014-08-04', 180), ('2014-08-04', 70), ('2014-08-01', 18);
Still using a PostgreSQL's Window Functions, you can use the row_number() function to get the number of the current row:
gab@test> SELECT ROW_NUMBER() OVER(ORDER BY date), date, sale_amount
FROM foo
ORDER BY date;
row_number | date | sale_amount
------------+------------+-------------
1 | 2014-08-01 | 18
2 | 2014-08-01 | 120
3 | 2014-08-01 | 80
4 | 2014-08-01 | 11
5 | 2014-08-02 | 14
6 | 2014-08-04 | 70
7 | 2014-08-04 | 180
(7 rows)
You could also get the row_number() for each row with a partition by date for example (to have it reset for each date):
gab@test> SELECT ROW_NUMBER() OVER(PARTITION BY date ORDER BY date), date, sale_amount
FROM foo
ORDER BY date;
row_number | date | sale_amount
------------+------------+-------------
1 | 2014-08-01 | 18
2 | 2014-08-01 | 120
3 | 2014-08-01 | 80
4 | 2014-08-01 | 11
1 | 2014-08-02 | 14
1 | 2014-08-04 | 70
2 | 2014-08-04 | 180
(7 rows)
3. Age Calculator
This is how old John Lennon would have been by the time I wrote this article:
gab@test> SELECT age('1940-10-09'::date);
age
-------------------------
73 years 8 mons 28 days
(1 row)
You can easily extract the year to make some other calculation with this data:
gab@test> SELECT EXTRACT(YEAR FROM age('1940-10-09'::date));
date_part
-----------
73
(1 row)
4. Convert a PostgreSQL array to a PHP array
Arrays are one of PostgreSQL data types. You can easily create multidimensional arrays and store numerous data in the same table. Here is a basic PostgreSQL array:
gab@test> SELECT ARRAY[1, 2, 3, 5];
array
-----------
{1,2,3,5}
(1 row)
To convert this array to a PHP array, you must first convert it to JSON with the function array_to_json() (you need PostgreSQL >= 9.2):
gab@test> SELECT array_to_json(ARRAY[1, 2, 3, 5]);
array_to_json
---------------
[1,2,3,5]
(1 row)
You can now easily convert it to PHP with json_decode():
<?php
$array = json_decode("[1,2,3,5]", true);
print_r($array);
?>
5. Generate series of dates
You can use the generate_series() function to generate a serie of dates:
gab@test> SELECT generate_series(NOW()::date, NOW() + INTERVAL '1 week', '1 day');
generate_series
------------------------
2014-07-06 00:00:00-04
2014-07-07 00:00:00-04
2014-07-08 00:00:00-04
2014-07-09 00:00:00-04
2014-07-10 00:00:00-04
2014-07-11 00:00:00-04
2014-07-12 00:00:00-04
2014-07-13 00:00:00-04
(8 rows)
6 -10. See part 2!
The second part of this article is available here: 10 tips and tricks for PostgreSQL (part 2)