PostgreSQL: querying Bigquery using bigquery_fdw foreign data wrapper
Introduction
This article is a step by step how-to to query BigQuery from a PostgreSQL instance using the foreign data wrapper bigquery_fdw.
Prerequisites:
- PostgreSQL >= 9.5 installed
- You are using a debian based server
The examples below are based on a Debian 9.4 instance with PostgreSQL 10.4.
Install dependencies
# Install required packages
apt-get update
apt-get install --yes postgresql-server-dev-10 python3-setuptools python3-dev make gcc git
# Install pip3
easy_install3 pip
For PostgresSQL 9.X, install postgresql-server-dev-9.X
instead of postgresql-server-dev-10
.
Install Multicorn and bigquery_fdw
# Install `setuptools` if necessary
pip3 install --upgrade setuptools
# Install Multicorn
git clone git://github.com/Kozea/Multicorn.git && cd Multicorn
export PYTHON_OVERRIDE=python3
make && make install
# Install bigquery_fdw
pip3 install bigquery-fdw
Create a BigQuery service account
- Go to IAM & admin > Service accounts in the Google cloud console
- Click on "CREATE SERVICE ACCOUNT"
- Choose a "Service account name"
- In "Roles", select "BigQuey > BigQuery User" and "BigQuey > BigQuery Data Viewer"
- Click "Furnish a new private key" and ensure that JSON is selected
- Click "CREATE"
The private key should be downloaded.
Store the private key
# Create directory
mkdir -p /opt/bigquery_fdw
# Copy key in directory
cp /current/path/to/key.json /opt/bigquery_fdw/key.json
# Set correct permissions
chown -R postgres.postgres /opt/bigquery_fdw
chmod -R 700 /opt/bigquery_fdw
Test the connectivity
Before creating a foreign table, you can use the bq_client_test
tool to test the connectivity to BigQuery from your server with the Json key.
You need to replace my_dataset.my_table
with an existing dataset and table name.
bq_client_test --key /opt/bigquery_fdw/key.json \
--query "SELECT count(*) FROM my_dataset.my_table"
* BigQuery client instance:
<google.cloud.bigquery.client.Client object at 0x7fa8f30dd278>
* Query instance:
<google.cloud.bigquery.job.QueryJob object at 0x7fa8f30dd128>
* Query results:
Row((155756,), {'f0_': 0})
Create the extension and server in PostgreSQL
Login to PostgreSQL:
su postgres
psql
Create the extension and the server:
CREATE EXTENSION multicorn;
CREATE SERVER bigquery_srv FOREIGN DATA WRAPPER multicorn
OPTIONS (
wrapper 'bigquery_fdw.fdw.ConstantForeignDataWrapper'
);
Create a table matching a foreign table
Create a table that matches BigQuery datatypes with PostgreSQL equivalent types (see conversion table).
CREATE FOREIGN TABLE my_bigquery_table (
column1 text,
column2 bigint
) SERVER bigquery_srv
OPTIONS (
fdw_dataset 'my_dataset',
fdw_table 'my_table',
fdw_key '/opt/bigquery_fdw/user.json'
);
You can now run queries from PostgreSQL to your BigQuery table:
postgres=# SELECT count(*) FROM my_bigquery_table;
count
--------
155756
(1 row)
postgres=# SELECT count(*) FROM my_bigquery_table WHERE some_column = 'some_value';
count
-------
22560
(1 row)
Try it yourself
Try it yourself with this how-to and the documentation available at https://github.com/gabfl/bigquery_fdw.