Using Polars & DuckDB with Postgres

Posted on March 8, 2024 by aklaver

This post will look at two relatively new programs for transforming data sets and their applicability to moving data into a Postgres database. Both programs have Python API’s that will be used for this exercise. Polars is written in Rust and also has a Rust API. DuckDB is written in C++ and has multitude of API’s. Both programs also offer CLI versions that leverage the SQLite3 CLI. The DuckDB CLI is quite useable. The Polars CLI not so much. The data set to be transformed and transferred comes from the New York City taxi data page Trip data. The specific data is that for September 2023 in the form of a Parquet file and numbers approximately 2.8 million rows. As to why this month, the query against this data will answer that.

The software used for this demo:

adbc-driver-manager==0.10.0
adbc-driver-postgresql==0.10.0
duckdb==0.10.0
polars==0.20.7
SQLAlchemy==2.0.23

Machine specifications.

Intel® Core™ i5-8400 CPU @ 2.80GHz × 6

Memory 16.0 GiB

The code to do this is actually quite straightforward:

import polars as pl
import duckdb
import sqlalchemy as sa

uri = "postgresql://postgres:postgres@localhost:5452/test"
pl.read_parquet('/home/aklaver/Downloads/yellow_tripdata_2023-09.parquet').write_database('nyc_taxi_pl', uri, engine='adbc', if_table_exists='replace')
# Wall time: 10.7 s

duckdb.load_extension('postgres')
duckdb.sql("ATTACH 'postgres:dbname=test port=5452 user=postgres host=localhost' AS postgres")
duckdb.sql('create or replace table postgres.nyc_taxi_duckdb as select * from read_parquet("/home/aklaver/Downloads/yellow_tripdata_2023-09.parquet")')
# Wall time: 5.4 s

For the Polars case write_database() takes the data frame created by read_parquet() and writes it out to the Postgres table nyc_taxi_pl. The if_table_exists=’replace’ argument means an existing version of the table will be dropped and then a new version of the table is created and populated with the data from the data frame. The structure of the table, column names and types will be seen later when the query is run. Probably the most important part is the engine=’adbc’ argument, where ‘adbc’ means Arrow Database Connectivity from the Apache Arrow project. To use this the appropriate packages will need to be installed in the Python environment:

pip install adbc-driver-manager adbc-driver-postgresql

write_database() will also accept an SQLAlchemy connection. The issue is that
the SLA connection uses INSERTs to move the data into the table and is
painfully slow. The adbc connection uses COPY and is fast.

The DuckDB case is best understood when you realize the code is a layer over the
DuckDB CLI interface and that the CLI emulates the SQLite CLI. See the DuckDB
CLI link above to get the deep dive. In this case the:

duckdb.load_extension('postgres')

loads the DuckDB Postgres extension. In order for this to work at some point prior to this the extension would have needed be installed with

Python

duckdb.install_extension('postgres')

CLI

install postgres

Then:

duckdb.sql("ATTACH 'postgres:dbname=test port=5452 user=postgres host=localhost' AS postgres")

connects to the Postgres instance and attaches it to the DuckDB in memory
database. At this point the Postgres database can be treated essentially as a
schema in the DuckDB database. Which is where this:

duckdb.sql('create or replace table postgres.nyc_taxi_duckdb as select * from read_parquet("/home/aklaver/Downloads/yellow_tripdata_2023-09.parquet")')

comes into play. The query reads the Parquet file and then creates a Postgres
table if it does not exist or replaces it if it does exist.

A quick explanation on what led me to the query detailed in next section.

polars_df = pl.read_parquet('/home/aklaver/Downloads/yellow_tripdata_2023-09.parquet')
polars_df.select("trip_distance").describe()

When I did this originally I ran polars_df.describe() which showed all the columns
and their statistics. I have narrowed it down here to highlight the column that
caught my attention, in particular the max value of 265869.44 miles.

Now to the Postgres query against the data. The query displays the total trip
distance, total fare amount, time of drop off, time of pickup, the difference
between the times as duration and the average mph for the trip. Trip distances
are limited to trip distances of >= 1000 miles, these are the high achieving
cabs. Further only the 10 fastest cabs, in descending order, are displayed.

Using Polars.

Note for this case SQLAlchemy is being used instead of the adbc
driver used to add data to the table. The reason is the Postgres adbc driver is
a work in progress and at the time of writing does not support the interval
type represented by the duration values. The set_fmt_float=”full” is done to
keep the large numbers from being displayed in scientific notation.

pl_qry = """SELECT
trip_distance, total_amount, tpep_dropoff_datetime, tpep_pickup_datetime,
tpep_dropoff_datetime - tpep_pickup_datetime as duration,
round(trip_distance/(extract(epoch from tpep_dropoff_datetime - tpep_pickup_datetime)/3600)) as mph
FROM
nyc_taxi_pl
WHERE
trip_distance >= 1000
ORDER BY
mph desc
LIMIT
10
"""

import sqlalchemy as sa

pl_con = sa.create_engine('postgresql://postgres:@localhost:5452/test')

with pl.Config(set_fmt_float="full"):
print(pl.read_database(pl_qry, pl_con))

Using DuckDB. The query logic is the same, the table name is different to match
that in the Postgres database. The con = duckdb.connect(“:default:”) sets up
the connect to the DuckDB default in memory database. Then since the Postgres
database was attached to the DuckDB database the FROM postgres.nyc_taxi_duckdb
fetches from Postgres.

duck_qry = """SELECT
trip_distance, total_amount, tpep_dropoff_datetime, tpep_pickup_datetime,
tpep_dropoff_datetime - tpep_pickup_datetime as duration,
round(trip_distance/(extract(epoch from tpep_dropoff_datetime - tpep_pickup_datetime)/3600)) as mph
FROM
postgres.nyc_taxi_duckdb
WHERE
trip_distance >= 1000
ORDER BY
mph desc
LIMIT
10
"""

con = duckdb.connect(":default:")
con.sql(duck_qry)

My take away is that the cab at the top of the list is the Millennium Falcon disguised as a NYC Yellow cab. Also that a 42,652.39 mile trip in 1 minute for $20.48 is the deal of the millennium.

Postgres Debian/Ubuntu Packaging(Part 2)

Posted on July 1, 2023 by aklaver

In a previous post Part 1 I went over how to install Postgres using the Debian/Ubuntu packages. It would be a good idea to go over that post before reading this one.

This post will cover the usage of the package specific commands that are installed.

First we will look at how the postgresql-client-common package wraps the Postgres native commands to work with multiple instances of Postgres installed on the same machine. Native commands being for instance psql, pg_dump, pg_restore, etc.

To make this system work wrapper code is present that runs the Postgres binaries
on a per cluster basis. As was alluded to in the installation section a cluster is identified as the combination of Postgres version and cluster name, where cluster name has to be unique for a given Postgres version. The wrapper that handles this is in the postgresql-client-common package that was installed as part of sudo apt-get -y install postgresql. It is a PERL program that can be found at /usr/share/postgresql-common/pg_wrapper for those that want to see what is pulling the levers behind the curtain. The connection to the Postgres commands e.g. psql, pg_dump, pg_restore, etc happens in /usr/bin where these commands are symlinked to pg_wrapper.

Using man pg_wrapper will provide information on how it works in conjunction with the other commands. FYI, man postgresql-common will lead to the same information. The man entry details the order of precedence for how pg_wrapper selects the binary appropriate for the cluster Postgres version. A quick simple overview of common cases, the binaries used will be those belonging to the cluster that match the below in descending order of priority:

The cluster specified using --cluster version/cluster name.
From PGHOST environment variable.
From PGCLUSTER environment variable.
Port specified as -p or PGPORT environment variable.
With multiple clusters the one that is listening on port 5432.

Again look at the man file for in depth information on the interaction of above. There are some commands that are not changed by the above as they always default to the newest installed version. These commands are psql, pg_archivecleanup, and pg_isready.

An example of how this works:

aklaver@arkansas:~$ pg_dump --version
pg_dump (PostgreSQL) 14.8 (Ubuntu 14.8-1.pgdg22.04+1)

--The above returns 14.8 as that is the server version listening on port 5432.
--If the ports where swapped and Postgres 15 was listening on 5432 then the 
--version returned would have been 15.3.


Below the cluster is specified by --cluster so the binary will the one associated with 
it.

aklaver@arkansas:~$pg_dump --cluster 14/main  --version 
pg_dump (PostgreSQL) 14.8 (Ubuntu 14.8-1.pgdg22.04+1)

aklaver@arkansas:~$ pg_dump --cluster 15/main  --version
pg_dump (PostgreSQL) 15.3 (Ubuntu 15.3-1.pgdg22.04+1)

Just to prove the psql version is not affected by the cluster version:

aklaver@arkansas:~$ psql --cluster 14/main  --version 
psql (PostgreSQL) 15.3 (Ubuntu 15.3-1.pgdg22.04+1)

aklaver@arkansas:~$ psql --cluster 15/main  --version 
psql (PostgreSQL) 15.3 (Ubuntu 15.3-1.pgdg22.04+1)

Next we move on to the additional commands installed by the postgresql-common package.

Commands that are unique to the packaging. These commands work at the cluster level and are used to manage clusters through their life cycle.

pg_backupcluster
pg_buildext
pg_createcluster
pg_ctlcluster
pg_dropcluster
pg_lsclusters
pg_renamecluster
pg_restorecluster
pg_upgradecluster
pg_virtualenv
pg_config * sort of unique

pg_config
It shares a name with the Postgres native pg_config command, but is actually a shell script that runs the latest version of the native pg_config. There is no man file for this command.

For the rest of the commands listed above detailed information can be found on each using man <command name>. What follows is a simple explanation of the commands for purposes of deciding what to use for a given situation. They are presented in an presumed order of importance/usage. For these commands the cluster is specified using general form:

command [options] cluster-version cluster-name.

pg_lsclusters
Shows the status of installed clusters along with some basic configuration information.
The clusters can be filtered using a version and further with a cluster name.

aklaver@arkansas:~$ pg_lsclusters
Ver Cluster Port Status Owner Data directory Log file
14 main 5432 online postgres /var/lib/postgresql/14/main /var/log/postgresql/postgresql-14-main.log
15 main 5433 online postgres /var/lib/postgresql/15/main /var/log/postgresql/postgresql-15-main.log

aklaver@arkansas:~$ pg_lsclusters 14 main
Ver Cluster Port Status Owner Data directory Log file
14 main 5432 online postgres /var/lib/postgresql/14/main /var/log/postgresql/postgresql-14-main.log

pg_ctlcluster
Wraps the Postgres pg_ctl command. Is used to control the actions (start, stop, restart, reload, status, promote) of the Postgres service. When run as root it redirects the actions to systemd, exceptions being when options are passed directly to the underlying postgres and pg_ctl commands.

aklaver@arkansas:~$ sudo pg_ctlcluster 15 main start

aklaver@arkansas:~$ pg_lsclusters
Ver Cluster Port Status Owner Data directory Log file
14 main 5432 online postgres /var/lib/postgresql/14/main /var/log/postgresql/postgresql-14-main.log
15 main 5433 online postgres /var/lib/postgresql/15/main /var/log/postgresql/postgresql-15-main.log

NOTE: The action(stop/start/restart,etc) can be before or after the cluster designation.

pg_createcluster
Create a new cluster. The version name combination need to be unique across clusters.

aklaver@arkansas:~$ sudo pg_createcluster 14 new_instance
Creating new PostgreSQL cluster 14/new_instance ...
/usr/lib/postgresql/14/bin/initdb -D /var/lib/postgresql/14/new_instance --auth-local peer --auth-host scram-sha-256 --no-instructions
The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.

The database cluster will be initialized with locale "en_US.UTF-8".
The default database encoding has accordingly been set to "UTF8".
The default text search configuration will be set to "english".

Data page checksums are disabled.

fixing permissions on existing directory /var/lib/postgresql/14/new_instance ... ok
creating subdirectories ... ok
selecting dynamic shared memory implementation ... posix
selecting default max_connections ... 100
selecting default shared_buffers ... 128MB
selecting default time zone ... America/Los_Angeles
creating configuration files ... ok
running bootstrap script ... ok
performing post-bootstrap initialization ... ok
syncing data to disk ... ok
Ver Cluster      Port Status Owner    Data directory                      Log file
14  new_instance 5434 down   postgres /var/lib/postgresql/14/new_instance /var/log/postgresql/postgresql-14-new_instance.log

pg_dropcluster
Completely removes a cluster and associated files(conf, log, etc). If the cluster is running the –stop will need to be used to first shutdown the cluster.

aklaver@arkansas:~$ sudo pg_dropcluster --stop 15 main

pg_lsclusters
Ver Cluster Port Status Owner Data directory Log file
14 main 5432 online postgres /var/lib/postgresql/14/main /var/log/postgresql/postgresql-14-main.log

pg_backupcluster
Backup up cluster using either Postgres binaries pg_basebackup or pg_dump. Manage backups. This includes expiring backups and WAL management. See man file for all the options.

aklaver@arkansas:~$ sudo pg_backupcluster 14 main dump
Creating dump in /var/backups/postgresql/14-main/2023-06-27T181915Z.dump
Dumping postgres to /var/backups/postgresql/14-main/2023-06-27T181915Z.dump/postgres.dump …
Dumping redmine to /var/backups/postgresql/14-main/2023-06-27T181915Z.dump/redmine.dump …
Dumping template1 to /var/backups/postgresql/14-main/2023-06-27T181915Z.dump/template1.dump …
Dumping test to /var/backups/postgresql/14-main/2023-06-27T181915Z.dump/test.dump …

Besides individual database dumps which are in the pg_dump custom format, the global values e.g. roles, tablespaces are dumped. The cluster configuration(postgresql.conf, pg_hba.conf, etc) is stored in the config.tar.gz file.

aklaver@arkansas:~$ sudo ls -al /var/backups/postgresql/14-main/2023-06-27T181915Z.dump/
total 828
drwxr-x--- 2 postgres postgres 4096 Jun 27 11:21 .
drwxr-xr-x 3 postgres postgres 4096 Jun 27 11:24 ..
-rw-r----- 1 postgres postgres 11656 Jun 27 11:19 config.tar.gz
-rw-r----- 1 postgres postgres 65 Jun 27 11:19 createcluster.opts
-rw-r----- 1 postgres postgres 253 Jun 27 11:19 databases.sql
-rw-r----- 1 postgres postgres 1124 Jun 27 11:19 globals.sql
-rw-r----- 1 postgres postgres 1072 Jun 27 11:19 postgres.dump
-rw-r----- 1 postgres postgres 798415 Jun 27 11:19 redmine.dump
-rw-r----- 1 postgres postgres 143 Jun 27 11:19 status
-rw-r----- 1 postgres postgres 1569 Jun 27 11:19 template1.dump
-rw-r----- 1 postgres postgres 1270 Jun 27 11:19 test.dump

Do a basebackup:

aklaver@arkansas:~$ sudo pg_backupcluster 14 main basebackup
Creating backup in /var/backups/postgresql/14-main/2023-06-27T183046Z.backup
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 5/2E000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: created temporary replication slot "pg_basebackup_3671245"
52498/52498 kB (100%), 1/1 tablespace
pg_basebackup: write-ahead log end point: 5/2E000100
pg_basebackup: waiting for background process to finish streaming …
pg_basebackup: syncing data to disk …
pg_basebackup: renaming backup_manifest.tmp to backup_manifest
pg_basebackup: base backup completed

The commands above where repeated to get multiple backups for the below.

aklaver@arkansas:~$ sudo pg_backupcluster 14 main list
Cluster 14 main backups in /var/backups/postgresql/14-main:
Dumps:
/var/backups/postgresql/14-main/2023-06-27T181915Z.dump: 815567 Bytes
/var/backups/postgresql/14-main/2023-06-27T183329Z.dump: 815567 Bytes
Basebackups:
/var/backups/postgresql/14-main/2023-06-27T183046Z.backup: 7544428 Bytes
/var/backups/postgresql/14-main/2023-06-27T183343Z.backup: 7544478 Bytes
Total: 16720040 Bytes

aklaver@arkansas:~$ sudo pg_backupcluster 14 main expiredumps 1
Removing /var/backups/postgresql/14-main/2023-06-27T181915Z.dump ..

aklaver@arkansas:~$ sudo pg_backupcluster 14 main expirebasebackups 1
Removing /var/backups/postgresql/14-main/2023-06-27T183046Z.backup …

aklaver@arkansas:~$ sudo pg_backupcluster 14 main list
Cluster 14 main backups in /var/backups/postgresql/14-main:
Dumps:
/var/backups/postgresql/14-main/2023-06-27T183329Z.dump: 815567 Bytes
Basebackups:
/var/backups/postgresql/14-main/2023-06-27T183343Z.backup: 7544478 Bytes
Total: 8360045 Bytes

pg_restorecluster
Restore a dump or basebackup from pg_backupcluster to a new cluster. Using new PostgreSQL cluster created in pg_createcluster section.

aklaver@arkansas~$ sudo pg_restorecluster 14 new_instance
/usr/lib/postgresql/14/bin/initdb -D /var/lib/postgresql/14/new_instance --auth-local peer --auth-host scram-sha-256 --no-instructions --encoding UTF8 --lc-collate en_US.UTF-8 --lc-ctype en_US.UTF-8
The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.

The database cluster will be initialized with locale "en_US.UTF-8".
The default text search configuration will be set to "english".

Data page checksums are disabled.

fixing permissions on existing directory /var/lib/postgresql/14/new_instance … ok
creating subdirectories … ok
selecting dynamic shared memory implementation … posix
selecting default max_connections … 100
selecting default shared_buffers … 128MB
selecting default time zone … America/Los_Angeles
creating configuration files … ok
running bootstrap script … ok
performing post-bootstrap initialization … ok
syncing data to disk … ok
Ver Cluster Port Status Owner Data directory Log file
14 new_instance 5434 down postgres /var/lib/postgresql/14/new_instance /var/log/postgresql/postgresql-14-new_instance.log

Restoring /var/backups/postgresql/14-main/2023-06-27T183329Z.dump/config.tar.gz to /etc/postgresql/14/new_instance …
Setting cluster_name = 14/new_instance
Setting data_directory = /var/lib/postgresql/14/new_instance
Setting external_pid_file = /var/run/postgresql/14-new_instance.pid
Setting hba_file = /etc/postgresql/14/new_instance/pg_hba.conf
Setting ident_file = /etc/postgresql/14/new_instance/pg_ident.conf
Setting port = 5434
Setting stats_temp_directory = /var/run/postgresql/14-new_instance.pg_stat_tmp
Starting cluster 14 new_instance …
Restoring /var/backups/postgresql/14-main/2023-06-27T183329Z.dump/globals.sql …
Creating databases from /var/backups/postgresql/14-main/2023-06-27T183329Z.dump/databases.sql …
Restoring /var/backups/postgresql/14-main/2023-06-27T183329Z.dump/postgres.dump to database postgres …
Restoring /var/backups/postgresql/14-main/2023-06-27T183329Z.dump/redmine.dump to database redmine …
Restoring /var/backups/postgresql/14-main/2023-06-27T183329Z.dump/template1.dump to database template1 …
Restoring /var/backups/postgresql/14-main/2023-06-27T183329Z.dump/test.dump to database test …
vacuumdb: processing database "postgres": Generating minimal optimizer statistics (1 target)
vacuumdb: processing database "redmine": Generating minimal optimizer statistics (1 target)
vacuumdb: processing database "template1": Generating minimal optimizer statistics (1 target)
vacuumdb: processing database "test": Generating minimal optimizer statistics (1 target)
vacuumdb: processing database "postgres": Generating medium optimizer statistics (10 targets)
vacuumdb: processing database "redmine": Generating medium optimizer statistics (10 targets)
vacuumdb: processing database "template1": Generating medium optimizer statistics (10 targets)
vacuumdb: processing database "test": Generating medium optimizer statistics (10 targets)
vacuumdb: processing database "postgres": Generating default (full) optimizer statistics
vacuumdb: processing database "redmine": Generating default (full) optimizer statistics
vacuumdb: processing database "template1": Generating default (full) optimizer statistics
vacuumdb: processing database "test": Generating default (full) optimizer statistics

Ver Cluster Port Status Owner Data directory Log file
14 new_instance 5434 online postgres /var/lib/postgresql/14/new_instance /var/log/postgresql/postgresql-14-new_instance.log

pg_renamecluster
Rename a cluster.

aklaver@arkansas:~$ sudo pg_renamecluster 14 new_instance test
Stopping cluster 14 new_instance …
Starting cluster 14 test …
aklaver@arkansas:~$ pg_lsclusters
Ver Cluster Port Status Owner Data directory Log file
14 main 5432 online postgres /var/lib/postgresql/14/main /var/log/postgresql/postgresql-14-main.log
14 test 5434 online postgres /var/lib/postgresql/14/test /var/log/postgresql/postgresql-14-test.log
15 main 5433 online postgres /var/lib/postgresql/15/main /var/log/postgresql/postgresql-15-main.log

pg_upgradecluster
Uses pg_upgrade to move from one major version to another. By default the –method used is dump which uses pg_dump to dump the old version and pg_restore to restore to new version. Using –method=upgrade will use pg_upgrade. See man file for variations of upgrade e.g. link and clone.

In below pg_dump/pg_restore are used and the new version(-v 15) is explicitly selected.
Without -v the newest version of Postgres installed will be used as the upgrade target.

aklaver@arkansas:~$ sudo pg_upgradecluster -v 15 14 test
Stopping old cluster…
Restarting old cluster with restricted connections…
Notice: extra pg_ctl/postgres options given, bypassing systemctl for start operation
Creating new PostgreSQL cluster 15/test …
/usr/lib/postgresql/15/bin/initdb -D /var/lib/postgresql/15/test --auth-local peer --auth-host scram-sha-256 --no-instructions --encoding UTF8 --lc-collate en_US.UTF-8 --lc-ctype en_US.UTF-8
The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.

The database cluster will be initialized with locale "en_US.UTF-8".
The default text search configuration will be set to "english".

Data page checksums are disabled.

fixing permissions on existing directory /var/lib/postgresql/15/test … ok
creating subdirectories … ok
selecting dynamic shared memory implementation … posix
selecting default max_connections … 100
selecting default shared_buffers … 128MB
selecting default time zone … America/Los_Angeles
creating configuration files … ok
running bootstrap script … ok
performing post-bootstrap initialization … ok
syncing data to disk … ok

Copying old configuration files…
Copying old start.conf…
Copying old pg_ctl.conf…
Starting new cluster…
Notice: extra pg_ctl/postgres options given, bypassing systemctl for start operation
Roles, databases, schemas, ACLs…

set_config

(1 row)

set_config

(1 row)

set_config

(1 row)

set_config

(1 row)

set_config

(1 row)

set_config

(1 row)

Fixing hardcoded library paths for stored procedures…
Upgrading database postgres…
Analyzing database postgres…
Fixing hardcoded library paths for stored procedures…
Upgrading database redmine…
Analyzing database redmine…
Fixing hardcoded library paths for stored procedures…
Upgrading database test…
Analyzing database test…
Fixing hardcoded library paths for stored procedures…
Upgrading database template1…
Analyzing database template1…
Stopping target cluster…
Stopping old cluster…
Disabling automatic startup of old cluster…
Starting upgraded cluster on port 5434…

Success. Please check that the upgraded cluster works. If it does,
you can remove the old cluster with
pg_dropcluster 14 test

Ver Cluster Port Status Owner Data directory Log file
14 test 5435 down postgres /var/lib/postgresql/14/test /var/log/postgresql/postgresql-14-test.log
Ver Cluster Port Status Owner Data directory Log file
15 test 5434 online postgres /var/lib/postgresql/15/test /var/log/postgresql/postgresql-15-test.log

aklaver@arkansas:~$ pg_lsclusters
Ver Cluster Port Status Owner Data directory Log file
14 main 5432 online postgres /var/lib/postgresql/14/main /var/log/postgresql/postgresql-14-main.log
14 test 5435 down postgres /var/lib/postgresql/14/test /var/log/postgresql/postgresql-14-test.log
15 main 5433 online postgres /var/lib/postgresql/15/main /var/log/postgresql/postgresql-15-main.log
15 test 5434 online postgres /var/lib/postgresql/15/test /var/log/postgresql/postgresql-15-test.log

Set up for using upgrade method by dropping the 15 test cluster created using the dump/restore method above.

aklaver@localhost:~$ sudo pg_dropcluster --stop 15 test

aklaver@arkansas:~$ pg_lsclusters
Ver Cluster Port Status Owner Data directory Log file
14 main 5432 online postgres /var/lib/postgresql/14/main /var/log/postgresql/postgresql-14-main.log
14 test 5435 down postgres /var/lib/postgresql/14/test /var/log/postgresql/postgresql-14-test.log
15 main 5433 online postgres /var/lib/postgresql/15/main /var/log/postgresql/postgresql-15-main.log

Using the pg_upgrade method.

aklaver@arkansas:~$ sudo pg_upgradecluster -v 15 -m upgrade 14 test
Restarting old cluster with restricted connections…
Notice: extra pg_ctl/postgres options given, bypassing systemctl for start operation
Stopping old cluster…
Creating new PostgreSQL cluster 15/test …
/usr/lib/postgresql/15/bin/initdb -D /var/lib/postgresql/15/test --auth-local peer --auth-host scram-sha-256 --no-instructions --encoding UTF8 --lc-collate en_US.UTF-8 --lc-ctype en_US.UTF-8
The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.

The database cluster will be initialized with locale "en_US.UTF-8".
The default text search configuration will be set to "english".

Data page checksums are disabled.

fixing permissions on existing directory /var/lib/postgresql/15/test … ok
creating subdirectories … ok
selecting dynamic shared memory implementation … posix
selecting default max_connections … 100
selecting default shared_buffers … 128MB
selecting default time zone … America/Los_Angeles
creating configuration files … ok
running bootstrap script … ok
performing post-bootstrap initialization … ok
syncing data to disk … ok

Copying old configuration files…
Copying old start.conf…
Copying old pg_ctl.conf…
/usr/lib/postgresql/15/bin/pg_upgrade -b /usr/lib/postgresql/14/bin -B /usr/lib/postgresql/15/bin -p 5435 -P 5434 -d /etc/postgresql/14/test -D /etc/postgresql/15/test
Finding the real data directory for the source cluster ok
Finding the real data directory for the target cluster ok

Performing Consistency Checks

Checking cluster versions ok
Checking database user is the install user ok
Checking database connection settings ok
Checking for prepared transactions ok
Checking for system-defined composite types in user tables ok
Checking for reg* data types in user tables ok
Checking for contrib/isn with bigint-passing mismatch ok
Creating dump of global objects ok
Creating dump of database schemas
ok
Checking for presence of required libraries ok
Checking database user is the install user ok
Checking for prepared transactions ok
Checking for new cluster tablespace directories ok

If pg_upgrade fails after this point, you must re-initdb the
new cluster before continuing.

Performing Upgrade

Analyzing all rows in the new cluster ok
Freezing all rows in the new cluster ok
Deleting files from new pg_xact ok
Copying old pg_xact to new server ok
Setting oldest XID for new cluster ok
Setting next transaction ID and epoch for new cluster ok
Deleting files from new pg_multixact/offsets ok
Copying old pg_multixact/offsets to new server ok
Deleting files from new pg_multixact/members ok
Copying old pg_multixact/members to new server ok
Setting next multixact ID and offset for new cluster ok
Resetting WAL archives ok
Setting frozenxid and minmxid counters in new cluster ok
Restoring global objects in the new cluster ok
Restoring database schemas in the new cluster
ok
Copying user relation files
ok
Setting next OID for new cluster ok
Sync data directory to disk ok
Creating script to delete old cluster ok
Checking for extension updates ok

Upgrade Complete

Optimizer statistics are not transferred by pg_upgrade.
Once you start the new server, consider running:
/usr/lib/postgresql/15/bin/vacuumdb --all --analyze-in-stages

Running this script will delete the old cluster's data files:
./delete_old_cluster.sh
pg_upgrade output scripts are in /var/log/postgresql/pg_upgradecluster-14-15-test.mKua
Disabling automatic startup of old cluster…

Success. Please check that the upgraded cluster works. If it does,
you can remove the old cluster with
pg_dropcluster 14 test

Ver Cluster Port Status Owner Data directory Log file
14 test 5434 down postgres /var/lib/postgresql/14/test /var/log/postgresql/postgresql-14-test.log
Ver Cluster Port Status Owner Data directory Log file
15 test 5435 down postgres /var/lib/postgresql/15/test /var/log/postgresql/postgresql-15-test.log


aklaver@arkansas:~$ pg_lsclusters
Ver Cluster Port Status Owner Data directory Log file
14 main 5432 online postgres /var/lib/postgresql/14/main /var/log/postgresql/postgresql-14-main.log
14 test 5434 down postgres /var/lib/postgresql/14/test /var/log/postgresql/postgresql-14-test.log
15 main 5433 online postgres /var/lib/postgresql/15/main /var/log/postgresql/postgresql-15-main.log
15 test 5435 down postgres /var/lib/postgresql/15/test /var/log/postgresql/postgresql-15-test.log

NOTE: Be sure and look at the information under Upgrade Complete section above before starting the new cluster and/or dropping the old cluster.

pg_buildext
Build and install Postgres extension. The man page indicates this is more a packaging
command and not really for end users.

pg_virtualenv
Create a temporary Postgres instance for running tests on.

Postgres Debian/Ubuntu Packaging

Posted on June 29, 2023 by aklaver

Part 1

What follows is a simple explanation of how the Postgres Debian/Ubuntu packaging
system works in order to answer recurring questions that are posted on various
forums. This post will focus on the install and initial setup. The next post
will deal with the commands that are added by the the packaging.

The Debian/Ubuntu packaging is designed to allow for multiple concurrent
instances of Postgres to be run on a single machine. Exactly what you can run is
going to depend on the repositories from which you fetch the packages. The
Debian and Ubuntu native repos will be ‘pinned’ to one version of Postgres. So
for the Ubuntu distribution being used here, 22.04, the version is Postgres 14.
For the previous LTS release, 20.04, it would be version 12. In the Debian case
stable(bullseye) is at version 13 and unstable(sid) is at 15. In either of
these cases the packaging will allow the running multiple cluster/instances of
the distribution version. NOTE: Cluster or instance will be used interchangeably
to refer to the same concept, namely the data directory produced by initdb. For
this article a different repository will be used, the PGDG(PostgreSQL Global Development Group) version. Also referred to as the Postgres community repos. The reason is that these repos can install multiple versions of Postgres on a given distribution. The versions available will be those that currently have community support. Since there is a release a year and approximately 5 year support for each release this means there will be generally 5 versions to choose from. At the time of writing (06/2023) that is 11, 12, 13, 14 and 15. For each major version the repo will have the latest minor version. For releases other then the latest supported production releases visit the links under Archive Repo and Development/Beta Repos below. In order to run multiple instances the packaging system utilizes wrapper code that runs the Postgres utilities on a cluster specific basis. How that is done is explained in the following sections. There are snippets of information available on the Web, but nothing I could find that ties it all together. Hence, this attempt.

Archive repo

Announcing archive repo.
Archive repo

This repo will have older not supported releases. This includes releases no longer
supported by community as well a prior point releases of currently supported
releases.

Development/Beta Repos

Beta version

Development snapshots

Install

In order to explore the packaging it needs to be installed. To install from the PGDG
repos use the following commands per instructions from here Install instructions:

1) Get sources list and repo key.

sudo sh -c 'echo "deb http://apt.postgresql.org/pub/repos/apt $(lsb_release -cs)-pgdg main" > /etc/apt/sources.list.d/pgdg.list'

wget --quiet -O - https://www.postgresql.org/media/keys/ACCC4CF8.asc | sudo apt-key add -

# Above deprecated in Ubuntu 22.04, use below instead.
wget --quiet -O - https://www.postgresql.org/media/keys/ACCC4CF8.asc | sudo tee /etc/apt/trusted.gpg.d/pg_repo.asc

2) Install Postgresql 15(latest production version at time of writing).

sudo apt-get update

sudo apt-get -y install postgresql-15


Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
The following additional packages will be installed:
libcommon-sense-perl libjson-perl libjson-xs-perl libllvm14 libpq5 
libtypes-serialiser-perl postgresql-15 postgresql-client-15 
postgresql-client-common postgresql-common ssl-cert
Suggested packages:
postgresql-doc postgresql-doc-15
The following NEW packages will be installed:
libcommon-sense-perl libjson-perl libjson-xs-perl libllvm14 libpq5 
libtypes-serialiser-perl postgresql postgresql-15 postgresql-client-15 
postgresql-client-common postgresql-common ssl-cert
0 upgraded, 12 newly installed, 0 to remove and 0 not upgraded.
Need to get 43.4 MB of archives.

...

3) The above will automatically create an active Postgres cluster that can seen with:

pg_lsclusters 
Ver Cluster Port Status Owner    Data directory              Log file
14  main    5432 online postgres /var/lib/postgresql/14/main /var/log/postgresql/postgresql-14-main.log
15  main    5433 online postgres /var/lib/postgresql/15/main /var/log/postgresql/postgresql-15-main.log

Note the port number 5433 assigned to the new cluster as a pre-existing cluster already has the default port of 5432. Cluster designation follows pattern PostgreSQL version number and cluster name. The default name is main. More on this in commands section.

4) File locations:

a)  Binaries

    ls -al /usr/lib/postgresql/15/bin/
    total 12920
    drwxr-xr-x 2 root root    4096 Jun 27 09:13 .
    drwxr-xr-x 4 root root    4096 Jun 27 09:13 ..
    -rwxr-xr-x 1 root root   56048 May  9 10:05 clusterdb
    -rwxr-xr-x 1 root root   60376 May  9 10:05 createdb
    -rwxr-xr-x 1 root root   56400 May  9 10:05 createuser
    -rwxr-xr-x 1 root root   51888 May  9 10:05 dropdb
    -rwxr-xr-x 1 root root   51824 May  9 10:05 dropuser
    ...


b)  Data

    sudo ls -al /var/lib/postgresql/15/main/

    total 92
    drwx------ 19 postgres postgres 4096 Jun 27 09:13 .
    drwxr-xr-x  3 postgres postgres 4096 Jun 27 09:13 ..
    drwx------  5 postgres postgres 4096 Jun 27 09:13 base
    drwx------  2 postgres postgres 4096 Jun 27 09:14 global
    drwx------  2 postgres postgres 4096 Jun 27 09:13 pg_commit_ts
    drwx------  2 postgres postgres 4096 Jun 27 09:13 pg_dynshmem
    drwx------  4 postgres postgres 4096 Jun 27 09:13 pg_logical

    ...

c)  Log 

    ls -al /var/log/postgresql/postgresql-15-main.log 
    -rw-r----- 1 postgres adm 660 Jun 27 09:13 /var/log/postgresql/postgresql-15-main.log


d)  Conf

    ls -al /etc/postgresql/15/main/
    total 68
    drwxr-xr-x 3 postgres postgres  4096 Jun 27 09:13 .
    drwxr-xr-x 3 postgres postgres  4096 Jun 27 09:13 ..
    drwxr-xr-x 2 postgres postgres  4096 Jun 27 09:13 conf.d
    -rw-r--r-- 1 postgres postgres   315 Jun 27 09:13 environment
    -rw-r--r-- 1 postgres postgres   143 Jun 27 09:13 pg_ctl.conf
    -rw-r----- 1 postgres postgres  5002 Jun 27 09:13 pg_hba.conf
    -rw-r----- 1 postgres postgres  1636 Jun 27 09:13 pg_ident.conf
    -rw-r--r-- 1 postgres postgres 29735 Jun 27 09:13 postgresql.conf
    -rw-r--r-- 1 postgres postgres   317 Jun 27 09:13 start.conf

Connecting

Initial connection

At this point the only user that is present in the database cluster is postgres, as setup by the package install. To connect as that user there needs to be a side trip to the pg_hba.conf file where hba stands for Host Based Authentication. It is the configuration file that controls who can access a database from what host with what authentication method. More information can be found at pg_hba.conf.

The pg_hba.conf file is set up by the package install in the location below:

sudo vi /etc/postgresql/15/main/pg_hba.conf

…

# DO NOT DISABLE!
# If you change this first entry you will need to make sure that the
# database superuser can access the database using some other method.
# Noninteractive access to all databases is required during automatic
# maintenance (custom daily cronjobs, replication, and similar tasks).
#
# Database administrative login by Unix domain socket
local   all             postgres                                peer

# TYPE  DATABASE        USER            ADDRESS                 METHOD

# "local" is for Unix domain socket connections only
local   all             all                                     peer
# IPv4 local connections:
host    all             all             127.0.0.1/32            scram-sha-256
# IPv6 local connections:
host    all             all             ::1/128                 scram-sha-256
# Allow replication connections from localhost, by a user with the
# replication privilege.
local   replication     all                                     peer
host    replication     all             127.0.0.1/32            scram-sha-256
host    replication     all             ::1/128                 scram-sha-256

…

Note the lines:

Database administrative login by Unix domain socket

local all postgres peer

This sets up the postgres user to log in via the local socket using peer authentication.
Where peer authentication uses the logged in OS user name as the database user
name to login in as. The package install creates the OS user postgres as part of the process. This OS user does not have a home directory nor can you directly log in as that user. To operate as that OS user you indirectly log in using sudo. This means the initial log in can be done as:

aklaver@arkansas:~$ sudo -i -u postgres psql -p 5433
psql (15.3 (Ubuntu 15.3-1.pgdg22.04+1))
Type "help" for help.

NOTE: I already have a Postgres instance running on port 5432 so the new 15 instance
was assigned port 5433 on install.

The above uses the OS postgres user to login to the cluster using the psql
client. psql without any further arguments defaults to using the OS user name as
the database name to connect to and the database user name to connect as. Also
since no host was specified the connection will be made to the local socket.
Therefore the pg_hba.conf line shown above will be used as it is the first one
that matches the connection parameters.

The command below will show you the current databases in the cluster.

postgres=# \l
List of databases
Name | Owner | Encoding | Collate | Ctype |
-----------+----------+----------+-------------+-------------+ ...
postgres | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8  ...
template0 | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8  ...
template1 | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8  ...
(3 rows)

These where created by the initdb done by the install. More information on the template databases can be found here Template database. For now it is sufficient to know template0 is read only and template1 is the default template used to create new databases. The postgres database is there to operate as a default connection database to do further operations on the cluster. These databases should not be used as end user databases. For that new application specific databases should be created.

It is a good idea to set up the postgres user to work with the lines below from pg_hba.conf.

# IPv4 local connections:
host    all             all             127.0.0.1/32            scram-sha-256
# IPv6 local connections:
host    all             all             ::1/128                 scram-sha-256

To that end run the command below with the strong password of your choice:

postgres=# alter role postgres with password 'pg_test_SimplePWD';
ALTER ROLE

This then allows for logging from a OS user account without going through sudo.

psql -d postgres -h localhost -U postgres -p 5433
Password for user postgres:
psql (15.3 (Ubuntu 15.3-1.pgdg22.04+1))
SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, compression: off)
Type "help" for help.

postgres=#

In the above the -d postgres specifies connecting to the postgres database, -h
localhost makes the connection be a host one and -U postgres means connect as
the database user postgres.

It is also a good idea to create a non superuser database user you can also log in as. Example shown below.

postgres=# create role aklaver with login createdb createrole password 'TEST_pwd_Cat';
CREATE ROLE
psql -d postgres -h localhost -U aklaver -p 5433
Password for user aklaver:
psql (15.3 (Ubuntu 15.3-1.pgdg22.04+1))
SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, compression: off)
Type "help" for help.

postgres=>

It would also not hurt to go over the server configuration file:

sudo vi /etc/postgresql/15/main/postgresql.conf

The above holds the configuration settings for the operation of the cluster. It is out of the scope of this post to go through what they all mean. That information can be found here:

Server configuration

There is another file set up by the Postgres package install and deals with how the
server is started. It can be viewed here:

sudo vi /etc/postgresql/15/main/start.conf

# Automatic startup configuration
#   auto: automatically start the cluster
#   manual: manual startup with pg_ctlcluster/postgresql@.service only
#   disabled: refuse to start cluster
# See pg_createcluster(1) for details. When running from systemd,
# invoke 'systemctl daemon-reload' after editing this file.

auto

The commented(#) section explains what settings are available and what they do.
By default the server startup will be set to auto which means it will start the
server on boot/reboot.

The below deals with some basic commands to manage the server via the command
line. There will be more on this in the next post.

Managing server state.

Using pg-common(installed by package) commands

aklaver@arkansas:~$ sudo pg_ctlcluster stop 15 main

aklaver@arkansas:~$ pg_lsclusters
Ver Cluster Port Status Owner Data directory Log file
14 main 5432 online postgres /var/lib/postgresql/14/main /var/log/postgresql/postgresql-14-main.log
15 main 5433 down postgres /var/lib/postgresql/15/main /var/log/postgresql/postgresql-15-main.log

aklaver@arkansas:~$ sudo pg_ctlcluster 15 main start

aklaver@arkansas:~$ pg_lsclusters
Ver Cluster Port Status Owner Data directory Log file
14 main 5432 online postgres /var/lib/postgresql/14/main /var/log/postgresql/postgresql-14-main.log
15 main 5433 online postgres /var/lib/postgresql/15/main /var/log/postgresql/postgresql-15-main.log

NOTE: The action(stop/start) can be before or after the cluster designation.
Other actions are restart, reload, status and promote. More on this in commands
section.

Using systemd.

aklaver@arkansas:~$ sudo systemctl stop postgresql@15-main.service

aklaver@arkansas:~$ sudo systemctl status postgresql@15-main.service
postgresql@15-main.service - PostgreSQL Cluster 15-main
Loaded: loaded (/lib/systemd/system/postgresql@.service; enabled-runtime; vendor preset: enabled)
Active: inactive (dead) since Tue 2023-06-27 09:36:51 PDT; 1min 10s ago
Process: 3669737 ExecStart=/usr/bin/pg_ctlcluster --skip-systemctl-redirect 15-main start (code=exited, status=0/SUCCESS)
Process: 3669769 ExecStop=/usr/bin/pg_ctlcluster --skip-systemctl-redirect -m fast 15-main stop (code=exited, status=0/SUCCESS)
Main PID: 3669742 (code=exited, status=0/SUCCESS)
CPU: 215ms

Jun 27 09:35:56 arkansas systemd[1]: Starting PostgreSQL Cluster 15-main…
Jun 27 09:35:58 arkansas systemd[1]: Started PostgreSQL Cluster 15-main.
Jun 27 09:36:51 arkansas systemd[1]: Stopping PostgreSQL Cluster 15-main…
Jun 27 09:36:51 arkansas systemd[1]: postgresql@15-main.service: Deactivated successfully.
Jun 27 09:36:51 arkansas systemd[1]: Stopped PostgreSQL Cluster 15-main.

aklaver@arkansas:~$ sudo systemctl start postgresql@15-main.service

aklaver@arkansas:~$ sudo systemctl status postgresql@15-main.service
postgresql@15-main.service - PostgreSQL Cluster 15-main
Loaded: loaded (/lib/systemd/system/postgresql@.service; enabled-runtime; vendor preset: enabled)
Active: active (running) since Tue 2023-06-27 09:38:42 PDT; 5s ago
Process: 3669790 ExecStart=/usr/bin/pg_ctlcluster --skip-systemctl-redirect 15-main start (code=exited, status=0/SUCCESS)
Main PID: 3669795 (postgres)
Tasks: 6 (limit: 4676)
Memory: 18.6M
CPU: 139ms
CGroup: /system.slice/system-postgresql.slice/postgresql@15-main.service
├─3669795 /usr/lib/postgresql/15/bin/postgres -D /var/lib/postgresql/15/main -c config_file=/etc/postgresql/15/main/postgresql.conf
├─3669796 "postgres: 15/main: checkpointer " "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" """"" "" "" "" "" ">
├─3669797 "postgres: 15/main: background writer " "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "">
├─3669799 "postgres: 15/main: walwriter " "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ">
├─3669800 "postgres: 15/main: autovacuum launcher " "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" >
└─3669801 "postgres: 15/main: logical replication launcher " "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" >

Jun 27 09:38:39 arkansas systemd[1]: Starting PostgreSQL Cluster 15-main…
Jun 27 09:38:42 arkansas systemd[1]: Started PostgreSQL Cluster 15-main.

In either case pg_lsclusters will also show you the correct state.

aklaver@arkansas:~$ pg_lsclusters
Ver Cluster Port Status Owner Data directory Log file
14 main 5432 online postgres /var/lib/postgresql/14/main /var/log/postgresql/postgresql-14-main.log
15 main 5433 online postgres /var/lib/postgresql/15/main /var/log/postgresql/postgresql-15-main.log

Using iCalendar RRULE in Postgres

Posted on June 22, 2023 by aklaver

RRULE is an iCalendar specification for computing recurring calendar events. Something like, on first Tuesday of each month pay this bill. This is done as a rule that generates the occurrences of the event as needed. This solves the issue of a continually recurring event having to be physically stored as set of occurrences. The link above has some examples and there is this site RRULE generator where you can explore the options. This post will be a light introduction on how to store to, retrieve from a Postgres database the rules using Python and Javascript. Then use that information to populate a Javascript calendar in a Flask application. For Python the rrule module of the dateutil program will be used. In Javascript the rrule.js program which is a port of dateutil.rrule.

Setting up Python dateutil:

from dateutil.parser import parse
from dateutil.rrule import *

all for rrule is

["rrule", "rruleset", "rrulestr",

"YEARLY", "MONTHLY", "WEEKLY", "DAILY",

"HOURLY", "MINUTELY", "SECONDLY",

"MO", "TU", "WE", "TH", "FR", "SA", "SU"]

Examples.

Note the use of count. This is good habit to get into until you are
sure of what the rule is going to produce. Unless you want to produce an
infinite list of occurrences and bring your computer to its knees:). Don’t ask
me how I know.

Start at dstart and reoccur every month on same day of month for five occurences.

list(rrule(freq=MONTHLY, count=5, dtstart=parse("06/22/23")))

[datetime.datetime(2023, 6, 22, 0, 0),
 datetime.datetime(2023, 7, 22, 0, 0),
 datetime.datetime(2023, 8, 22, 0, 0),
 datetime.datetime(2023, 9, 22, 0, 0),
 datetime.datetime(2023, 10, 22, 0, 0)]

Same as above but specify occurrences to be on 31st of month. This skips month with < 31 days as the RRULE specification requires incorrect dates and/or times to be skipped not ’rounded’ down.

list(rrule(freq=MONTHLY, bymonthday=31, count=5, dtstart=parse("06/22/23")))

[datetime.datetime(2023, 7, 31, 0, 0),
 datetime.datetime(2023, 8, 31, 0, 0),
 datetime.datetime(2023, 10, 31, 0, 0),
 datetime.datetime(2023, 12, 31, 0, 0),
 datetime.datetime(2024, 1, 31, 0, 0)]

bymonthday supports negative indexing, so to get last day of month regardless of its day number use -1.

list(rrule(freq=MONTHLY, bymonthday=-1, count=5, dtstart=parse("06/22/23")))

[datetime.datetime(2023, 6, 30, 0, 0),
 datetime.datetime(2023, 7, 31, 0, 0),
 datetime.datetime(2023, 8, 31, 0, 0),
 datetime.datetime(2023, 9, 30, 0, 0),
 datetime.datetime(2023, 10, 31, 0, 0)]

To get a better idea of what is possible I recommend looking at the examples
here rrule examples

Incorporating RRULE into Postgres.

Create database table to hold rules and associated information.

CREATE TABLE public.rrule_example(
    task_id integer PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
    task_title varchar NOT NULL,
    task_desc varchar NOT NULL,
    task_rrule varchar NOT NULL,
    start_date date NOT NULL,
    until_date date
);

Underlying RRULE is a string format that is fully explained in the RFC. The quick and dirty way to derive that in dateutil.rrule is to use the str() method on a rrule.

r = rrule(freq=WEEKLY, interval=2,   dtstart=parse("06/22/2023"))

r.__str__()
'DTSTART:20230622T000000\nRRULE:FREQ=WEEKLY;INTERVAL=2'

Insert string form of rrule into database.

INSERT INTO public.rrule_example OVERRIDING SYSTEM VALUE VALUES (1, 'Every two weeks', 'Task occurrs every two weeks on Thursday', E'DTSTART:20230622T000000\nRRULE:FREQ=WEEKLY;INTERVAL=2', '2023-06-22', NULL);

select * from rrule_example;
-[ RECORD 1 ]----------------------------------------
task_id    | 1
task_title | Every two weeks
task_desc  | Task occurrs every two weeks on Thursday
task_rrule | DTSTART:20230622T000000                 +
           | RRULE:FREQ=WEEKLY;INTERVAL=2
start_date | 06/22/2023
until_date | NULL

Create function to find next rule occurrence using plpython3u procedural language.

CREATE OR REPLACE FUNCTION public.rrule_next_occurrence(t_rrule character
varying, start_dt timestamp with time zone)
RETURNS timestamp with time zone
LANGUAGE plpython3u
SECURITY DEFINER
AS $function$
from datetime import datetime
from dateutil.parser import parse
from dateutil.rrule import rrulestr

rule = rrulestr(t_rrule, ignoretz=True)
next_occ = rule.after(parse(start_dt, ignoretz=True), inc=True)

return next_occ

$function$
;

The function uses dateutil.rrulestr to parse the string version of the rrule. Then the after() method to find first occurrence of rule after specified date.

select rrule_next_occurrence(task_rrule, '2023-06-21') from rrule_example where task_id =1;

rrule_next_occurrence  
-------------------------
 06/22/2023 00:00:00 PDT

Create function to find previous rule occurrence.

CREATE OR REPLACE FUNCTION public.rrule_prior_occurrence(t_rrule character
varying, start_dt timestamp with time zone)
RETURNS timestamp with time zone
LANGUAGE plpython3u
SECURITY DEFINER
AS $function$
from datetime import datetime
from dateutil.parser import parse
from dateutil.rrule import rrulestr

rule = rrulestr(t_rrule, ignoretz=True)
prior_occ = rule.before(parse(start_dt, ignoretz=True), inc=True)

return prior_occ

$function$
;

Use rrulestr to parse string rrule. Then before() to find last occurrence of
rule before specified date.

select rrule_prior_occurrence(task_rrule, '2023-06-23') from rrule_example where task_id =1;

 rrule_prior_occurrence  
-------------------------
 06/22/2023 00:00:00 PDT

Using this information in a Web page.

Using Flask set up FullCalendar(https://fullcalendar.io/) calendar to display recurring
events using rrule.js(https://github.com/jakubroztocil/rrule).

Need to include rrule-tz.js first then the FullCalendar rrule plugin.

<!--rrule.js with timezone support-->
<script type=""text/javascript" src="{{ url_for('static', 
filename='js/external/rrule/rrule-tz.js') }}"></script>
<script type=""text/javascript" src="{{ url_for('static', 
filename='js/external/full_calendar/main.js') }}"></script>
<!--FullCalendar rrule plugin-->
<script type=""text/javascript" src="{{ url_for('static', 
filename='js/external/rrule/main.global.js') }}"></script>

In calendar constructor eventSources is where the calendar gets the information
to fill in the calendar.

<script>

      document.addEventListener('DOMContentLoaded', function() {
        var calendarEl = document.getElementById('calendar');
        var calendar = new FullCalendar.Calendar(calendarEl, {
            timeZone: "US/Pacific",
            slotMinTime: "07:00",
            slotMaxTime: "19:00",
            slotDuration: "00:15:00",
            forceEventDuration: true,
            defaultTimedEventDuration: "00:15",
            initialView: "dayGridMonth",
            headerToolbar: {
                left: "prev,next today, prevYear,nextYear",
                center: "title",
                right: "dayGridMonth,timeGridWeek,timeGridDay"
                },
            stickyHeaderDates: true,
            eventSources: [
                {
                   url: "/task_calendar_data",
                },
                {events: 
                    [{
                        title: 'Weekly Mon/Fri',
                        rrule: {
                            freq: 'weekly',
                            interval: 1,
                            byweekday: [ 'mo', 'fr' ],
                            dtstart: '2023-06-01T10:30:00', 
                            until: '2023-10-31'
                        }
                    }],
                    id: "fixed_event"

                
                }
            ]
        });
        calendar.render();
      });

    </script>

In this case there are two sources url which fetches from a view in Flask and
events which is a fixed event that uses the rrule.js syntax to build an event.

The view is:

@calendar_bp.route("/task_calendar_data")
def taskCalendarData():
    today_dt = date.today()
    start_dt = request.args.get("start", today_dt.strftime("%m/%d/%Y"))
    end_dt = request.args.get("end",
                              (today_dt
                               + timedelta(days=1)).strftime("%m/%d/%Y"))
    # The connection(con) returned from get_db() uses cursor_factory=RealDictCursor
    # so results are returned as dictionaries.
    con = db.get_db()
    cur = con.cursor()
    cur.execute("select * from rrule_example")
    rs = cur.fetchall()
    tasks = []
    if rs:
        for task in rs:
            tasks.append({"id": task["task_id"], "title": task["task_title"], 
                          "rrule": task["task_rrule"], "allDay": True})
    response = current_app.response_class(
                response=json.dumps(tasks),
                mimetype='application/json'
            )
    return response

allDay is set True to pin the task to 00:00.

Insert a rrule that shows an occurrence on last day of month.

INSERT INTO
    public.rrule_example OVERRIDING SYSTEM VALUE
VALUES 
(2, 'Last day of month', 'Task occurrs last day of each month',
E'DTSTART:20230622T000000\nRRULE:FREQ=MONTHLY;BYMONTHDAY=-1', 
'2023-06-22', NULL);

The calendar display for the rrules inserted into the database and from the eventSources in the calendar constructor. The current month and October 2023 when the rrule in the calendar constructor ends.

Compiling plpython(3)u

Posted on December 12, 2021 by aklaver

Time marches and Python 2 is now past EOL, with the last release 2.7 no longer
supported as of 2020-01-01. In the Postgres world Python 2 lives on though as the
default version when referring to the procedural language plpythonu per
documentation plpython 2/3. In order to use a Python 3 version of the procedural language the plpython3u variant needs to be present. For users getting their Postgres through a packaging system; DEB, RPM, etc this is taken care of with packages available for both versions of Python. If you are compiling Postgres from source, either from necessity or habit, the process is a bit more involved.

At this point plpythonu will only be built for Python 2(>=2.6). When you run
configure it will probe for Python versions in the order of python, python3,
python2. If python -V resolves to a Python 2 version then you will get
plpythonu if it resolves to Python 3 you will get plpython3u. The question is how to build for the other version of Python? The answer is you can put your thumb on the scale by setting the environment variable PYTHON to the Python version you want built:

export PYTHON=python3

What follows is the procedure I use where the system python is Python 2.

Initial configure. For this example I am using a minimal setup to just show the Python building. Normally there would be additional arguments say –with-openssl –with-libxml, etc.

Verify what python is pointing to.

python -V
2.7.x

The configure finds the Python 2 libraries.

/configure --with-python --prefix=/usr/local/pgsql14
checking for python… /usr/bin/python
configure: using python 2.7.18 (default, Apr 23 2020, 09:27:04) [GCC]
checking for Python distutils module… yes
checking Python configuration directory… /usr/lib64/python2.7/config
checking Python include directories… -I/usr/include/python2.7
checking how to link an embedded Python application… -L/usr/lib64 -lpython2.7
-lpthread -ldl -lutil -lm

Build the Python 2 version of the procedural language and install it. At this point the make is done at the top level of the build to have the entire source tree be built.

make
sudo make install
/usr/bin/mkdir -p '/usr/local/pgsql14/lib64'
/usr/bin/install -c -m 755 plpython2.so '/usr/local/pgsql14/lib64/plpython2.so'
/usr/bin/mkdir -p '/usr/local/pgsql14/share/extension' '/usr/local/pgsql14/include/server' '/usr/local/pgsql14/lib64/pgxs/src/pl/plpython'
/usr/bin/install -c -m 644 ./plpython2u.control ./plpython2u--1.0.sql ./plpythonu.control ./plpythonu--1.0.sql '/usr/local/pgsql14/share/extension/'
/usr/bin/install -c -m 644 ./plpython.h ./plpy_cursorobject.h ./plpy_elog.h ./plpy_exec.h ./plpy_main.h ./plpy_planobject.h ./plpy_plpymodule.h ./plpy_procedure.h ./plpy_resultobject.h ./plpy_spi.h ./plpy_subxactobject.h ./plpy_typeio.h ./plpy_util.h '/usr/local/pgsql14/include/server'
/usr/bin/install -c -m 644 ./regress-python3-mangle.mk '/usr/local/pgsql14/lib64/pgxs/src/pl/plpython'/\

The procedural language plpython2u is essentially symlinked to plpythonu in the above.

Use the PYTHON environment variable to point at python3. Verify that variable was set.

export PYTHON=python3
env | grep -i PYTHON
PYTHON=python3

Run configure again to pick up the new version(3) of Python, which it does.

./configure --with-python --prefix=/usr/local/pgsql14
checking for PYTHON… python3
configure: using python 3.6.12 (default, Dec 02 2020, 09:44:23) [GCC]
checking for Python distutils module… yes
checking Python configuration directory… /usr/lib64/python3.6/config-3.6m-x86_64-linux-gnu
checking Python include directories… -I/usr/include/python3.6m
checking how to link an embedded Python application… -L/usr/lib64 -lpython3.6m -lpthread -ldl -lutil -lm

Change directories to get to plpython subdirectory. Run make clean to get rid of previous Python 2 build. Then do the make/make install to build install plpython3u.

cd src/pl/plpython/
make clean
make
sudo make install

/usr/bin/mkdir -p '/usr/local/pgsql14/lib64'
/usr/bin/install -c -m 755 plpython3.so '/usr/local/pgsql14/lib64/plpython3.so'
/usr/bin/mkdir -p '/usr/local/pgsql14/share/extension' '/usr/local/pgsql14/include/server' '/usr/local/pgsql14/lib64/pgxs/src/pl/plpython'
/usr/bin/install -c -m 644 ./plpython3u.control ./plpython3u--1.0.sql '/usr/local/pgsql14/share/extension/'
/usr/bin/install -c -m 644 ./plpython.h ./plpy_cursorobject.h ./plpy_elog.h ./plpy_exec.h ./plpy_main.h ./plpy_planobject.h ./plpy_plpymodule.h ./plpy_procedure.h ./plpy_resultobject.h ./plpy_spi.h ./plpy_subxactobject.h ./plpy_typeio.h ./plpy_util.h '/usr/local/pgsql14/include/server'
/usr/bin/install -c -m 644 ./regress-python3-mangle.mk '/usr/local/pgsql14/lib64/pgxs/src/pl/plpython'

Now that they have been built it is just a matter of installing them, as extensions, into the database or databases of you choice.

create extension plpythonu;
create extension plpython3u;

One heads up about having both versions installed in the same database.

DO $$
    plpy.notice('Python 2')
$$ LANGUAGE plpythonu;
NOTICE:  Python 2
DO
DO $$
    plpy.notice('Python 3')
$$ LANGUAGE plpython3u;
FATAL:  multiple Python libraries are present in session
DETAIL:  Only one Python major version can be used in one session.
server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.

As the message says you can only use one plpythonu version at a time in a given session.

Postgres and JSON

Posted on December 7, 2021 by aklaver

Time for a renovation on an old post that dealt with Postgres and hstore. This post will move from the hstore data type to json(b). At the time the original post was written introduction of the json type was about three months away and jsonb two and half years out. It also took several years for the operator and function choices to be fleshed out. Now that the support has been mature for some time it seems a good time to make the change. To recap, the original post dealt with using hstore to capture deleted records into an audit table without having to modify the audit table when the source table structure changed. On to doing this with JSON instead.

Setup section.

This is being done using Postgres version 14.1. In the previous post a table from an actual application was used. Unfortunately, that relied on information not available to the reader. This post uses a test set up that allows for running the code locally should you desire to.

The setup DDL:

DROP TABLE IF EXISTS archive_test;

CREATE TABLE archive_test (
    line_id smallint PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
    varchar_fld varchar(10) NOT NULL,
    integer_fld integer,
    boolean_fld boolean NOT NULL DEFAULT 'f',
    ts_fld  timestamptz NOT NULL DEFAULT now()
);

DROP TABLE IF EXISTS archive_test_delete;

CREATE TABLE archive_test_delete (
    del_id integer PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
    record_fld jsonb,
    del_ts timestamp with time zone DEFAULT now(),
    del_user character varying
);

DROP FUNCTION IF EXISTS archive_record();

CREATE OR REPLACE FUNCTION archive_record()
RETURNS trigger
LANGUAGE plpgsql
SECURITY DEFINER
AS $BODY$
DECLARE
    tbl_name text := TG_TABLE_NAME || '_delete' ;
    archive_row jsonb := row_to_json(OLD.*);
    usr_name text := session_user;
BEGIN
    EXECUTE 'INSERT INTO ' ||quote_ident(tbl_name)
    ||'(record_fld,del_ts,del_user)' ||
    ' VALUES('||quote_literal(archive_row)||',now(),'
    ||quote_literal(usr_name)||')';
RETURN OLD;
END;
$BODY$;
 
CREATE TRIGGER 
    p1_delete 
AFTER DELETE ON 
    archive_test 
FOR EACH ROW EXECUTE PROCEDURE archive_record();

The archive_record() trigger function uses the row_to_json() function from here JSON Functions to convert the OLD record that represents the deleted values in the function to a JSON object. This is then combined with now() and session_user to be inserted into the audit table. The audit table name to be inserted into is derived from the source table name with the addition of the suffix _delete.

Initial data values:

INSERT INTO 
    archive_test 
VALUES 
    (DEFAULT, 'test', 1, 't', '12/06/2021 10:12:30'),
    (DEFAULT, 'cat', 2, 'f', '12/06/2021 11:00:15'),
    (DEFAULT, 'dog', 3, 't', '12/06/2021 11:15:32');

SELECT * FROM archive_test;

-[ RECORD 1 ]-----------------------
line_id     | 1
varchar_fld | test
integer_fld | 1
boolean_fld | t
ts_fld      | 2021-12-06 10:12:30-08
-[ RECORD 2 ]-----------------------
line_id     | 2
varchar_fld | cat
integer_fld | 2
boolean_fld | f
ts_fld      | 2021-12-06 11:00:15-08
-[ RECORD 3 ]-----------------------
line_id     | 3
varchar_fld | dog
integer_fld | 3
boolean_fld | t
ts_fld      | 2021-12-06 11:15:32-08

What happens when you DELETE a record:

DELETE FROM archive_test WHERE line_id = 2;

SELECT * FROM archive_test_delete;
-[ RECORD 1 ]----------------------------------------------------
del_id     | 1
record_fld | {"ts_fld": "2021-12-06T11:00:15-08:00", "line_id": 2, "boolean_fld": false, "integer_fld": 2, "varchar_fld": "cat"}
del_ts     | 2021-12-06 13:52:29.266929-08
del_user   | postgres

Per the trigger function the deleted row is added as JSON object to the audit table along with the time of deletion as well as the user that deleted it.

Change table by adding a column:

ALTER TABLE 
    archive_test
ADD COLUMN
    json_fld json;
    
--INSERT values into new form of table:

INSERT INTO 
    archive_test 
VALUES 
    (DEFAULT, 'fish', 4, 't', '11/26/2021 10:12:30', '{"one": 1, "two": 2}'),
    (DEFAULT, 'rabbit', 5, 'f', '11/26/2021 11:00:15', '{"three": 3, "four":4}'),
    (DEFAULT, 'cow', 6, 't', '11/26/2021 11:15:32', '{"five": 5, "six": 6}');
    
--DELETE from changed table:
    
DELETE FROM archive_test WHERE line_id = 6;

SELECT * FROM archive_test_delete WHERE del_id = 2;

-[ RECORD 1 ]----------------------------------------------------
del_id     | 2
record_fld | {"ts_fld": "2021-11-26T11:15:32-08:00", "line_id": 6, "json_fld": {"six": 6, "five": 5}, "boolean_fld": true, "integer_fld": 6, "varchar_fld": "cow"}
del_ts     | 2021-12-06 13:56:22.47116-08
del_user   | postgres

The function picks up the new column and includes it in the delete audit table .

Retrieving data from the audit table:

SELECT 
    t.* 
FROM 
    archive_test_delete, 
LATERAL 
    jsonb_populate_record(null::archive_test, record_fld) t;
    
-[ RECORD 1 ]-----------------------
line_id     | 2
varchar_fld | cat
integer_fld | 2
boolean_fld | f
ts_fld      | 2021-12-06 11:00:15-08
json_fld    | 
-[ RECORD 2 ]-----------------------
line_id     | 6
varchar_fld | cow
integer_fld | 6
boolean_fld | t
ts_fld      | 2021-11-26 11:15:32-08
json_fld    | {"six": 6, "five": 5}

Again using a function from JSON Functions pull the data out of the audit table and expand the JSON object into a record. json_populate_record takes as it’s first argument a composite type, in this case archive_test. In Postgres every table has a composite type defined for its structure so using archive_test means that the second argument record_fld(the JSON record) will use the archive_test table to match its field names to the column names and types for the table. The null:: just says use NULL for any columns in the archive_test type/table that do not have corresponding fields in the JSON object. The LATERAL comes from SELECT and in this particular case is really not needed. It is there to emphasize that the json_populate_record is working against each row in the archive_test_delete table.

What happens if you drop a column:

BEGIN;

ALTER TABLE archive_test DROP COLUMN json_fld ;

SELECT 
    t.* 
FROM 
    archive_test_delete, LATERAL jsonb_populate_record(null::archive_test, record_fld) t;


-[ RECORD 1 ]-----------------------
line_id     | 2
varchar_fld | cat
integer_fld | 2
boolean_fld | f
ts_fld      | 2021-12-06 11:00:15-08
-[ RECORD 2 ]-----------------------
line_id     | 6
varchar_fld | cow
integer_fld | 6
boolean_fld | t
ts_fld      | 2021-11-26 11:15:32-08

ROLLBACK;

Because jsonb_populate_record is using the archive_test composite type and the json_fld no longer exists it also ‘disappears’ from the query. This also applies if the column is renamed. The upshot is that if you are only always adding columns to a table and not renaming them then this system works without to much fuss on the retrieval end. Otherwise you will just have to pull out the record_fld as a JSON object and view the data from there.

Building dynamic SQL using psycopg2

Posted on April 21, 2018 by aklaver

It has been awhile since I posted anything. Thought I would start back with a software
package I use in many different situations, psycopg2 . Psycopg2 is a Python database adapter for Postgres that follows the Python DB API. It is feature rich and today I will introduce a feature new to the latest major release(2.7), namely the sql module for building complete SQL statements dynamically. I will give a quick run down of the basics below. For the official docs on the module see:
http://initd.org/psycopg/docs/sql.html#module-psycopg2.sql.
For example data I will be using the Pagila database (derived from the MySQL
Sakila sample db) obtained from here.

Setting up imports:

from psycopg2 import connect, sql
from psycopg2.extras import RealDictCursor

Create connection and cursor:

con = connect("dbname=pagila host=localhost user=postgres", 
              cursor_factory=RealDictCursor)
cur = con.cursor()

Build simple query string:

qry_str = sql.SQL("SELECT {}, {} FROM {}").format(
sql.Identifier('customer_id'),
sql.Identifier("first_name"),
sql.Identifier("customer")
)
print(qry_str.as_string(con))
select "customer_id", "first_name" from "customer"
cur.execute(qry_str)
rs = cur.fetchone()
rs
{'customer_id': 1, 'first_name': 'MARY'}

Breaking the above down. SQL is a class used to build the string and as such has
a format method that follows that for the Python str.format() method. See psycopg2
docs above for exceptions. Identifier is a class that handles those strings to be used
as query identifiers e.g. column/field names versus data values. Therefore in the
above the ‘{}’ are replaced in order by the Identifier values in the format().
To verify the string you can use its as_string method provided you supply a connection
or cursor object to it. The query string is used in cur.execute() to fetch records
from the database. You can also build the query string directly in the execute()
to save a step.

More compact/flexible string construction:

qry_str = sql.SQL("SELECT {} FROM {}").format(
sql.SQL(",").join([sql.Identifier('customer_id'), 
                   sql.Identifier("first_name")]), 
sql.Identifier("customer")
)

The above eliminates the need to provide a ‘{}’ for each field in the SELECT
field list. Instead the .join() method to SQL is used build a comma separated
list of field names. Helpful if you may be receiving a variable number of field names
for different iterations of the query. As example where the contents of fld_list
may change:

fld_list = ["customer_id", "first_name", "last_name", "email"]
qry_str = sql.SQL("SELECT {} FROM {}").format(
sql.SQL(",").join(map(sql.Identifier, fld_list)),
sql.Identifier("customer")
)
print(qry_str.as_string(con))
select 
"customer_id","first_name","last_name","email" 
from "customer"

Generating placeholders in the query. The sql module supports two types of placeholders,
%s and %(name)s. For %s placeholders a sequence of values need to be provided
e.g. a list or tuple. For named placeholders a dictionary is used to supply values.

placeholder_str = sql.SQL("SELECT {} FROM {} WHERE first_name = {}").format(
sql.SQL(",").join(map(sql.Identifier, fld_list)),
sql.Identifier("customer"),
sql.Placeholder()
)
print(placeholder_str.as_string(con))
select 
"customer_id","first_name","last_name","email" 
from "customer" where first_name = %s
cur.execute(placeholder_str, ["MARY"])
rs = cur.fetchone()
rs
{'customer_id': 1,
'email': 'MARY.SMITH@sakilacustomer.org',
'first_name': 'MARY',
'last_name': 'SMITH'}

In the above %s is created as the placeholder for the first_name field value in the
WHERE clause using the Placeholder class. To create a named placeholder the procedure is:

placeholder_str = sql.SQL("SELECT {} FROM {} WHERE first_name = {}").format(
sql.SQL(",").join(map(sql.Identifier, fld_list)),
sql.Identifier("customer"),
sql.Placeholder(name='first_name')
)
print(placeholder_str.as_string(con))
select 
"customer_id","first_name","last_name","email" 
from "customer" where first_name = %(first_name)s
cur.execute(placeholder_str, {"first_name": "MARY"})

Where this leads to is a building a query string from data sourced from a Python dictionary:

new_customer_list = [
{"store_id": 1, "first_name": "ADRIAN" , "last_name": "KLAVER", 
"email": "ADRIAN.KLAVER@sakilacustomer.org", "address_id": 67},
{"store_id": 2, "first_name": "JANE" , "last_name": "DOE", 
"email": "JANE.DOE@sakilacustomer.org", "address_id": 7},
{"store_id": 1, "first_name": "WHO" , "last_name": "KNOWS", 
"email": "WHO.KNOWS@sakilacustomer.org", "address_id": 189 }
]

insert_flds = [fld_name for fld_name in new_customer_list[0].keys()]

insert_str = sql.SQL("INSERT INTO customer ({}) VALUES ({})").format(
sql.SQL(",").join(map(sql.Identifier, insert_flds)),
sql.SQL(",").join(map(sql.Placeholder, insert_flds))
)

from psycopg2.extras import execute_batch

execute_batch(cur, insert_str, new_customer_list)

I snuck in another new feature from psycopg2 2.7, execute_batch. This is way of batching statements for fewer round trips to the server, by default 100. For this example not really a win, but it is nice to know it is out there. For cases where I have a lot of data to transfer to a table I use psycopg2’s COPY features whenever possible. That is a subject for another day though.

Postgres and hstore

Posted on June 23, 2012 by aklaver

I recently started working with the hstore contrib module in Postgresql. My impetus was the desire to create a generic way of redirecting deletes on key tables to archive tables. I do that now using a rule based mechanism. The problem is it is fairly fragile and prone to breakage when the parent table changes. I had seen references to hstore and had a vague idea of what it could do. This was enough to make me dig deeper and see if I could use it for the delete process. This is still a work in progress, though I am pleased with what I found so far, which is sketched out below.
1) The parent table:

CREATE TABLE plant1 (
p_item_no smallint DEFAULT nextval(('plant1_p_item_no_seq'::text)::regclass) NOT NULL,
common character varying(40) NOT NULL,
genus character varying(20),
species character varying(30),
variety character varying(40),
plant_type character varying(6) NOT NULL,
series character varying(20),
season character varying(9),
ts_insert timestamp(0) without time zone DEFAULT now(),
ts_update timestamp(0) without time zone,
user_update character varying(20),
user_insert character varying(20) DEFAULT "session_user"(),
color character varying(30),
category_type character varying(5) DEFAULT 'plant'::character varying NOT NULL,
category_sub_type character varying(15) DEFAULT 'normal'::character varying NOT NULL,
plant_note text
);

2) The delete table:
CREATE TABLE plant1_delete (
del_id integer NOT NULL,
record_fld public.hstore,
del_ts timestamp with time zone DEFAULT now(),
del_user character varying
);

3) The delete function and trigger:
CREATE OR REPLACE FUNCTION utility.archive_record()
RETURNS trigger
LANGUAGE plpgsql
SECURITY DEFINER
AS $BODY$
DECLARE
tbl_name text := TG_TABLE_NAME || '_delete' ;
archive_row hstore := hstore(OLD.*);
usr_name text := session_user;
BEGIN
EXECUTE 'INSERT INTO ' ||quote_ident(tbl_name)
||'(record_fld,del_ts,del_user)' ||
' VALUES('||quote_literal(archive_row)||',now(),'
||quote_literal(usr_name)||')';
RETURN OLD;
END;
$BODY$

CREATE TRIGGER p1_delete AFTER DELETE ON plant1 FOR EACH ROW EXECUTE PROCEDURE utility.archive_record();

4) The end result:
DELETE from plant1 where p_item_no=578;
DELETE 1
The raw record in the delete table:
SELECT * from plant1_delete;
-[ RECORD 1 ]------------------------------------------------------------------------------------------

del_id | 6
record_fld | "color"=>NULL, "genus"=>"selaginella", "common"=>"moss little club", "season"=>NULL, "series"=>"", "species"=>"brownii", "variety"=>"", "p_item_no"=>"578", "ts_insert"="2004-01-01 00:00:00", "ts_update"=>"2005-11-21 10:45:30", "plant_note"=>NULL, "plant_type"=>"herb", "user_insert"=>"aklaver", "user_update"=>"aklaver", "category_type"=>"plant", "category_sub_type"=>"normal"
del_ts | 2012-06-23 16:19:00.193426-07
del_user | aklaver

The hstore field parsed out, record selected using value in hstore:
SELECT (each(record_fld)).* from plant1_delete where record_fld ->'p_item_no'='578';

key | value
-------------------+---------------------
color |
genus | selaginella
common | moss little club
season |
series |
species | brownii
variety |
p_item_no | 578
ts_insert | 2004-01-01 00:00:00
ts_update | 2005-11-21 10:45:30
plant_note |
plant_type | herb
user_insert | aklaver
user_update | aklaver
category_type | plant
category_sub_type | normal

I can see this making life a lot easier. Added bonus is that psycopg2(a Python database adapter) supports hstore and will adapt from it to a Python dictionary and back again. That is subject for another blog.

Thinking

Posted on May 28, 2012 by aklaver

When presented with a problem I tend to see the solution as a snapshot encompassing the depth and breadth of whatever is presented. What is usually wanted is an answer that is some subset of that snapshot. Naked bear just walked by. Unfortunately, I tend to try spewing out the entire thing and make the answer worse than the original problem. The solution would be to filter my responses down to the immediate subject at hand. This entails taking what is a full picture and editing out the relevant bits and then pushing those out in a coherent linear sequence. When writing this is more easily done then when talking off the cuff. There I have varying degrees of success. Often the result is my stuttering out the answer as too much information heads for the door at the same time. The other issue is that even if I do limit the information stream and it comes out it in a somewhat smooth sequence, the sequence is wrong. The result is confusion, not enlightenment. There are times though when I am ‘on’, often when I am tired, and everything just flows. The key would seem to be that when I tired my mind is less active and more pliable to linear thinking. In other words I tend to be more focused, because I have less energy to cast a wide net. How to capture that focus? Bear being chased by rabbit now. I like letting my mind loose, it is my natural instinct. The act of writing tends to force focus, this piece notwithstanding, so one way would be to only talk off a written script. Churchill was quoted as saying many of his ‘extemporaneous’ speeches came from written preparation. If I remember correctly something on the order of three hours prep for each hour of speech. While that works for quite a few situations, it not a complete solution. A more concerted effort of pre-editing the information stream would seem to be the general solution. Leave my kitchen sink thoughts to myself. Bear up a tree, rabbit barking. I can see a lot of biting my tongue in the future

Social Networking

Posted on May 20, 2012 by aklaver

A very brief look at social networking now and then and what it means.

Recently I gave a short presentation on the Python package Brewery. At the conclusion of said talk I was so bold as to mention I was thinking of subclassing some of the code to make it work with DBF files. To that end I went to the GitHub repository to fork it and start my work. At the conclusion of the process I went into contemplative mode. The realization struck that the code started with a guy in Bratislava, Slovakia was uploaded to a site hosted somewhere and then downloaded to my computer here in Bellingham, WA. GitHub being the social hub that facilitated the exchange. Now, since I have a restless mind and need to feed it a varied diet I also read histories/biographies on a regular basis, Theodore Roosevelt being the current topic. Over the past year the subjects covered have included the elucidation of DNA, Einstein, Darwin and the Desert War in WWII. Much is made of social networking and its impact on the current world(though it would not seem the stock market:)) While I would admit the ease with which people can interact has been significantly improved, I am not sure there is anything truly new or useful going on. In all the books mentioned there where active social networks present. Needless to say the means of interacting where tailored to the technological level of the day. Though the pace would seem glacial by today’s standards information managed to traverse great distances and human thought progressed. In fact my feeling is it progressed at higher plane than currently. The speed and relentless persistence of present information flow seems to preclude the deep and contemplative thought found in the not too distant past. So the question is the social networking of today really an improvement or just another problem to overcome?

This and That

Ramblings from Adrian Klaver

Author Archives: aklaver