Compiling plpython(3)u

Time marches and Python 2 is now past EOL, with the last release 2.7 no longer
supported as of 2020-01-01. In the Postgres world Python 2 lives on though as the
default version when referring to the procedural language plpythonu per
documentation plpython 2/3. In order to use a Python 3 version of the procedural language the plpython3u variant needs to be present. For users getting their Postgres through a packaging system; DEB, RPM, etc this is taken care of with packages available for both versions of Python. If you are compiling Postgres from source, either from necessity or habit, the process is a bit more involved.

At this point plpythonu will only be built for Python 2(>=2.6). When you run
configure it will probe for Python versions in the order of python, python3,
python2. If python -V resolves to a Python 2 version then you will get
plpythonu if it resolves to Python 3 you will get plpython3u. The question is how to build for the other version of Python? The answer is you can put your thumb on the scale by setting the environment variable PYTHON to the Python version you want built:

export PYTHON=python3

What follows is the procedure I use where the system python is Python 2.

Initial configure. For this example I am using a minimal setup to just show the Python building. Normally there would be additional arguments say –with-openssl –with-libxml, etc.

Verify what python is pointing to.

python -V
2.7.x

The configure finds the Python 2 libraries.

/configure --with-python --prefix=/usr/local/pgsql14
checking for python… /usr/bin/python
configure: using python 2.7.18 (default, Apr 23 2020, 09:27:04) [GCC]
checking for Python distutils module… yes
checking Python configuration directory… /usr/lib64/python2.7/config
checking Python include directories… -I/usr/include/python2.7
checking how to link an embedded Python application… -L/usr/lib64 -lpython2.7
-lpthread -ldl -lutil -lm

Build the Python 2 version of the procedural language and install it. At this point the make is done at the top level of the build to have the entire source tree be built.

make
sudo make install
/usr/bin/mkdir -p '/usr/local/pgsql14/lib64'
/usr/bin/install -c -m 755 plpython2.so '/usr/local/pgsql14/lib64/plpython2.so'
/usr/bin/mkdir -p '/usr/local/pgsql14/share/extension' '/usr/local/pgsql14/include/server' '/usr/local/pgsql14/lib64/pgxs/src/pl/plpython'
/usr/bin/install -c -m 644 ./plpython2u.control ./plpython2u--1.0.sql ./plpythonu.control ./plpythonu--1.0.sql '/usr/local/pgsql14/share/extension/'
/usr/bin/install -c -m 644 ./plpython.h ./plpy_cursorobject.h ./plpy_elog.h ./plpy_exec.h ./plpy_main.h ./plpy_planobject.h ./plpy_plpymodule.h ./plpy_procedure.h ./plpy_resultobject.h ./plpy_spi.h ./plpy_subxactobject.h ./plpy_typeio.h ./plpy_util.h '/usr/local/pgsql14/include/server'
/usr/bin/install -c -m 644 ./regress-python3-mangle.mk '/usr/local/pgsql14/lib64/pgxs/src/pl/plpython'/\

The procedural language plpython2u is essentially symlinked to plpythonu in the above.

Use the PYTHON environment variable to point at python3. Verify that variable was set.

export PYTHON=python3
env | grep -i PYTHON
PYTHON=python3

Run configure again to pick up the new version(3) of Python, which it does.

./configure --with-python --prefix=/usr/local/pgsql14
checking for PYTHON… python3
configure: using python 3.6.12 (default, Dec 02 2020, 09:44:23) [GCC]
checking for Python distutils module… yes
checking Python configuration directory… /usr/lib64/python3.6/config-3.6m-x86_64-linux-gnu
checking Python include directories… -I/usr/include/python3.6m
checking how to link an embedded Python application… -L/usr/lib64 -lpython3.6m -lpthread -ldl -lutil -lm

Change directories to get to plpython subdirectory. Run make clean to get rid of previous Python 2 build. Then do the make/make install to build install plpython3u.

cd src/pl/plpython/
make clean
make
sudo make install

/usr/bin/mkdir -p '/usr/local/pgsql14/lib64'
/usr/bin/install -c -m 755 plpython3.so '/usr/local/pgsql14/lib64/plpython3.so'
/usr/bin/mkdir -p '/usr/local/pgsql14/share/extension' '/usr/local/pgsql14/include/server' '/usr/local/pgsql14/lib64/pgxs/src/pl/plpython'
/usr/bin/install -c -m 644 ./plpython3u.control ./plpython3u--1.0.sql '/usr/local/pgsql14/share/extension/'
/usr/bin/install -c -m 644 ./plpython.h ./plpy_cursorobject.h ./plpy_elog.h ./plpy_exec.h ./plpy_main.h ./plpy_planobject.h ./plpy_plpymodule.h ./plpy_procedure.h ./plpy_resultobject.h ./plpy_spi.h ./plpy_subxactobject.h ./plpy_typeio.h ./plpy_util.h '/usr/local/pgsql14/include/server'
/usr/bin/install -c -m 644 ./regress-python3-mangle.mk '/usr/local/pgsql14/lib64/pgxs/src/pl/plpython'

Now that they have been built it is just a matter of installing them, as extensions, into the database or databases of you choice.

create extension plpythonu;
create extension plpython3u;

One heads up about having both versions installed in the same database.

DO $$
    plpy.notice('Python 2')
$$ LANGUAGE plpythonu;
NOTICE:  Python 2
DO
DO $$
    plpy.notice('Python 3')
$$ LANGUAGE plpython3u;
FATAL:  multiple Python libraries are present in session
DETAIL:  Only one Python major version can be used in one session.
server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.

As the message says you can only use one plpythonu version at a time in a given session.

Postgres and JSON

Time for a renovation on an old post that dealt with Postgres and hstore. This post will move from the hstore data type to json(b). At the time the original post was written introduction of the json type was about three months away and jsonb two and half years out. It also took several years for the operator and function choices to be fleshed out. Now that the support has been mature for some time it seems a good time to make the change. To recap, the original post dealt with using hstore to capture deleted records into an audit table without having to modify the audit table when the source table structure changed. On to doing this with JSON instead.

Setup section.

This is being done using Postgres version 14.1. In the previous post a table from an actual application was used. Unfortunately, that relied on information not available to the reader. This post uses a test set up that allows for running the code locally should you desire to.

The setup DDL:

DROP TABLE IF EXISTS archive_test;

CREATE TABLE archive_test (
    line_id smallint PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
    varchar_fld varchar(10) NOT NULL,
    integer_fld integer,
    boolean_fld boolean NOT NULL DEFAULT 'f',
    ts_fld  timestamptz NOT NULL DEFAULT now()
);

DROP TABLE IF EXISTS archive_test_delete;

CREATE TABLE archive_test_delete (
    del_id integer PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
    record_fld jsonb,
    del_ts timestamp with time zone DEFAULT now(),
    del_user character varying
);

DROP FUNCTION IF EXISTS archive_record();

CREATE OR REPLACE FUNCTION archive_record()
RETURNS trigger
LANGUAGE plpgsql
SECURITY DEFINER
AS $BODY$
DECLARE
    tbl_name text := TG_TABLE_NAME || '_delete' ;
    archive_row jsonb := row_to_json(OLD.*);
    usr_name text := session_user;
BEGIN
    EXECUTE 'INSERT INTO ' ||quote_ident(tbl_name)
    ||'(record_fld,del_ts,del_user)' ||
    ' VALUES('||quote_literal(archive_row)||',now(),'
    ||quote_literal(usr_name)||')';
RETURN OLD;
END;
$BODY$;
 
CREATE TRIGGER 
    p1_delete 
AFTER DELETE ON 
    archive_test 
FOR EACH ROW EXECUTE PROCEDURE archive_record();

The archive_record() trigger function uses the row_to_json() function from here JSON Functions to convert the OLD record that represents the deleted values in the function to a JSON object. This is then combined with now() and session_user to be inserted into the audit table. The audit table name to be inserted into is derived from the source table name with the addition of the suffix _delete.

Initial data values:

INSERT INTO 
    archive_test 
VALUES 
    (DEFAULT, 'test', 1, 't', '12/06/2021 10:12:30'),
    (DEFAULT, 'cat', 2, 'f', '12/06/2021 11:00:15'),
    (DEFAULT, 'dog', 3, 't', '12/06/2021 11:15:32');

SELECT * FROM archive_test;

-[ RECORD 1 ]-----------------------
line_id     | 1
varchar_fld | test
integer_fld | 1
boolean_fld | t
ts_fld      | 2021-12-06 10:12:30-08
-[ RECORD 2 ]-----------------------
line_id     | 2
varchar_fld | cat
integer_fld | 2
boolean_fld | f
ts_fld      | 2021-12-06 11:00:15-08
-[ RECORD 3 ]-----------------------
line_id     | 3
varchar_fld | dog
integer_fld | 3
boolean_fld | t
ts_fld      | 2021-12-06 11:15:32-08

What happens when you DELETE a record:

DELETE FROM archive_test WHERE line_id = 2;

SELECT * FROM archive_test_delete;
-[ RECORD 1 ]----------------------------------------------------
del_id     | 1
record_fld | {"ts_fld": "2021-12-06T11:00:15-08:00", "line_id": 2, "boolean_fld": false, "integer_fld": 2, "varchar_fld": "cat"}
del_ts     | 2021-12-06 13:52:29.266929-08
del_user   | postgres


Per the trigger function the deleted row is added as JSON object to the audit table along with the time of deletion as well as the user that deleted it.

Change table by adding a column:

ALTER TABLE 
    archive_test
ADD COLUMN
    json_fld json;
    
--INSERT values into new form of table:

INSERT INTO 
    archive_test 
VALUES 
    (DEFAULT, 'fish', 4, 't', '11/26/2021 10:12:30', '{"one": 1, "two": 2}'),
    (DEFAULT, 'rabbit', 5, 'f', '11/26/2021 11:00:15', '{"three": 3, "four":4}'),
    (DEFAULT, 'cow', 6, 't', '11/26/2021 11:15:32', '{"five": 5, "six": 6}');
    
--DELETE from changed table:
    
DELETE FROM archive_test WHERE line_id = 6;

SELECT * FROM archive_test_delete WHERE del_id = 2;

-[ RECORD 1 ]----------------------------------------------------
del_id     | 2
record_fld | {"ts_fld": "2021-11-26T11:15:32-08:00", "line_id": 6, "json_fld": {"six": 6, "five": 5}, "boolean_fld": true, "integer_fld": 6, "varchar_fld": "cow"}
del_ts     | 2021-12-06 13:56:22.47116-08
del_user   | postgres

The function picks up the new column and includes it in the delete audit table .

Retrieving data from the audit table:

SELECT 
    t.* 
FROM 
    archive_test_delete, 
LATERAL 
    jsonb_populate_record(null::archive_test, record_fld) t;
    
-[ RECORD 1 ]-----------------------
line_id     | 2
varchar_fld | cat
integer_fld | 2
boolean_fld | f
ts_fld      | 2021-12-06 11:00:15-08
json_fld    | 
-[ RECORD 2 ]-----------------------
line_id     | 6
varchar_fld | cow
integer_fld | 6
boolean_fld | t
ts_fld      | 2021-11-26 11:15:32-08
json_fld    | {"six": 6, "five": 5}

Again using a function from JSON Functions pull the data out of the audit table and expand the JSON object into a record. json_populate_record takes as it’s first argument a composite type, in this case archive_test. In Postgres every table has a composite type defined for its structure so using archive_test means that the second argument record_fld(the JSON record) will use the archive_test table to match its field names to the column names and types for the table. The null:: just says use NULL for any columns in the archive_test type/table that do not have corresponding fields in the JSON object. The LATERAL comes from SELECT and in this particular case is really not needed. It is there to emphasize that the json_populate_record is working against each row in the archive_test_delete table.

What happens if you drop a column:

BEGIN;

ALTER TABLE archive_test DROP COLUMN json_fld ;

SELECT 
    t.* 
FROM 
    archive_test_delete, LATERAL jsonb_populate_record(null::archive_test, record_fld) t;


-[ RECORD 1 ]-----------------------
line_id     | 2
varchar_fld | cat
integer_fld | 2
boolean_fld | f
ts_fld      | 2021-12-06 11:00:15-08
-[ RECORD 2 ]-----------------------
line_id     | 6
varchar_fld | cow
integer_fld | 6
boolean_fld | t
ts_fld      | 2021-11-26 11:15:32-08

ROLLBACK;

Because jsonb_populate_record is using the archive_test composite type and the json_fld no longer exists it also ‘disappears’ from the query. This also applies if the column is renamed. The upshot is that if you are only always adding columns to a table and not renaming them then this system works without to much fuss on the retrieval end. Otherwise you will just have to pull out the record_fld as a JSON object and view the data from there.

Building dynamic SQL using psycopg2

It has been awhile since I posted anything. Thought I would start back with a software
package I use in many different situations, psycopg2 . Psycopg2 is a Python database adapter for Postgres that follows the Python DB API. It is feature rich and today I will introduce a feature new to the latest major release(2.7), namely the sql module for building complete SQL statements dynamically. I will give a quick run down of the basics below. For the official docs on the module see:
http://initd.org/psycopg/docs/sql.html#module-psycopg2.sql.
For example data I will be using the Pagila database (derived from the MySQL
Sakila sample db) obtained from here.

Setting up imports:

from psycopg2 import connect, sql
from psycopg2.extras import RealDictCursor

Create connection and cursor:

con = connect("dbname=pagila host=localhost user=postgres", 
              cursor_factory=RealDictCursor)
cur = con.cursor()

Build simple query string:

qry_str = sql.SQL("SELECT {}, {} FROM {}").format(
sql.Identifier('customer_id'),
sql.Identifier("first_name"),
sql.Identifier("customer")
)
print(qry_str.as_string(con))
select "customer_id", "first_name" from "customer"
cur.execute(qry_str)
rs = cur.fetchone()
rs
{'customer_id': 1, 'first_name': 'MARY'}

Breaking the above down. SQL is a class used to build the string and as such has
a format method that follows that for the Python str.format() method. See psycopg2
docs above for exceptions. Identifier is a class that handles those strings to be used
as query identifiers e.g. column/field names versus data values. Therefore in the
above the ‘{}’ are replaced in order by the Identifier values in the format().
To verify the string you can use its as_string method provided you supply a connection
or cursor object to it. The query string is used in cur.execute() to fetch records
from the database. You can also build the query string directly in the execute()
to save a step.

More compact/flexible string construction:

qry_str = sql.SQL("SELECT {} FROM {}").format(
sql.SQL(",").join([sql.Identifier('customer_id'), 
                   sql.Identifier("first_name")]), 
sql.Identifier("customer")
)

The above eliminates the need to provide a ‘{}’ for each field in the SELECT
field list. Instead the .join() method to SQL is used build a comma separated
list of field names. Helpful if you may be receiving a variable number of field names
for different iterations of the query. As example where the contents of fld_list
may change:

fld_list = ["customer_id", "first_name", "last_name", "email"]
qry_str = sql.SQL("SELECT {} FROM {}").format(
sql.SQL(",").join(map(sql.Identifier, fld_list)),
sql.Identifier("customer")
)
print(qry_str.as_string(con))
select 
"customer_id","first_name","last_name","email" 
from "customer"

Generating placeholders in the query. The sql module supports two types of placeholders,
%s and %(name)s. For %s placeholders a sequence of values need to be provided
e.g. a list or tuple. For named placeholders a dictionary is used to supply values.

placeholder_str = sql.SQL("SELECT {} FROM {} WHERE first_name = {}").format(
sql.SQL(",").join(map(sql.Identifier, fld_list)),
sql.Identifier("customer"),
sql.Placeholder()
)
print(placeholder_str.as_string(con))
select 
"customer_id","first_name","last_name","email" 
from "customer" where first_name = %s
cur.execute(placeholder_str, ["MARY"])
rs = cur.fetchone()
rs
{'customer_id': 1,
'email': 'MARY.SMITH@sakilacustomer.org',
'first_name': 'MARY',
'last_name': 'SMITH'}

In the above %s is created as the placeholder for the first_name field value in the
WHERE clause using the Placeholder class. To create a named placeholder the procedure is:

placeholder_str = sql.SQL("SELECT {} FROM {} WHERE first_name = {}").format(
sql.SQL(",").join(map(sql.Identifier, fld_list)),
sql.Identifier("customer"),
sql.Placeholder(name='first_name')
)
print(placeholder_str.as_string(con))
select 
"customer_id","first_name","last_name","email" 
from "customer" where first_name = %(first_name)s
cur.execute(placeholder_str, {"first_name": "MARY"})

Where this leads to is a building a query string from data sourced from a Python dictionary:

new_customer_list = [
{"store_id": 1, "first_name": "ADRIAN" , "last_name": "KLAVER", 
"email": "ADRIAN.KLAVER@sakilacustomer.org", "address_id": 67},
{"store_id": 2, "first_name": "JANE" , "last_name": "DOE", 
"email": "JANE.DOE@sakilacustomer.org", "address_id": 7},
{"store_id": 1, "first_name": "WHO" , "last_name": "KNOWS", 
"email": "WHO.KNOWS@sakilacustomer.org", "address_id": 189 }
]

insert_flds = [fld_name for fld_name in new_customer_list[0].keys()]

insert_str = sql.SQL("INSERT INTO customer ({}) VALUES ({})").format(
sql.SQL(",").join(map(sql.Identifier, insert_flds)),
sql.SQL(",").join(map(sql.Placeholder, insert_flds))
)

from psycopg2.extras import execute_batch

execute_batch(cur, insert_str, new_customer_list)

I snuck in another new feature from psycopg2 2.7, execute_batch. This is way of batching statements for fewer round trips to the server, by default 100. For this example not really a win, but it is nice to know it is out there. For cases where I have a lot of data to transfer to a table I use psycopg2’s COPY features whenever possible. That is a subject for another day though.

Postgres and hstore

I recently started working with the hstore contrib module in Postgresql.  My impetus was  the desire to create a generic way of redirecting deletes on key tables to archive tables.  I do that now using a rule based mechanism. The problem is it is fairly fragile and prone to breakage when the parent table changes.  I had seen references to hstore and had a vague idea of what it could do. This was enough to make me dig deeper and see if I could use it for the delete process.  This is still a work in progress, though I am pleased with what I found so far, which is sketched out below.
1) The parent table:

CREATE TABLE plant1 (
p_item_no smallint DEFAULT nextval(('plant1_p_item_no_seq'::text)::regclass) NOT NULL,
common character varying(40) NOT NULL,
genus character varying(20),
species character varying(30),
variety character varying(40),
plant_type character varying(6) NOT NULL,
series character varying(20),
season character varying(9),
ts_insert timestamp(0) without time zone DEFAULT now(),
ts_update timestamp(0) without time zone,
user_update character varying(20),
user_insert character varying(20) DEFAULT "session_user"(),
color character varying(30),
category_type character varying(5) DEFAULT 'plant'::character varying NOT NULL,
category_sub_type character varying(15) DEFAULT 'normal'::character varying NOT NULL,
plant_note text
);

2) The delete table:

CREATE TABLE plant1_delete (
del_id integer NOT NULL,
record_fld public.hstore,
del_ts timestamp with time zone DEFAULT now(),
del_user character varying
);

3) The delete function and trigger:

CREATE OR REPLACE FUNCTION utility.archive_record()
RETURNS trigger
LANGUAGE plpgsql
SECURITY DEFINER
AS $BODY$
DECLARE
tbl_name text := TG_TABLE_NAME || '_delete' ;
archive_row hstore := hstore(OLD.*);
usr_name text := session_user;
BEGIN
EXECUTE 'INSERT INTO ' ||quote_ident(tbl_name)
||'(record_fld,del_ts,del_user)' ||
' VALUES('||quote_literal(archive_row)||',now(),'
||quote_literal(usr_name)||')';
RETURN OLD;
END;
$BODY$

CREATE TRIGGER p1_delete AFTER DELETE ON plant1 FOR EACH ROW EXECUTE PROCEDURE utility.archive_record();

4) The end result:

DELETE from plant1 where p_item_no=578;
DELETE 1
The raw record in the delete table:
SELECT * from plant1_delete;
-[ RECORD 1 ]------------------------------------------------------------------------------------------

del_id | 6
record_fld | "color"=>NULL, "genus"=>"selaginella", "common"=>"moss little club", "season"=>NULL, "series"=>"", "species"=>"brownii", "variety"=>"", "p_item_no"=>"578", "ts_insert"="2004-01-01 00:00:00", "ts_update"=>"2005-11-21 10:45:30", "plant_note"=>NULL, "plant_type"=>"herb", "user_insert"=>"aklaver", "user_update"=>"aklaver", "category_type"=>"plant", "category_sub_type"=>"normal"
del_ts | 2012-06-23 16:19:00.193426-07
del_user | aklaver

The hstore field parsed out, record selected using value in hstore:
SELECT (each(record_fld)).* from plant1_delete where record_fld ->'p_item_no'='578';

key | value
-------------------+---------------------
color |
genus | selaginella
common | moss little club
season |
series |
species | brownii
variety |
p_item_no | 578
ts_insert | 2004-01-01 00:00:00
ts_update | 2005-11-21 10:45:30
plant_note |
plant_type | herb
user_insert | aklaver
user_update | aklaver
category_type | plant
category_sub_type | normal

I can see this making life a lot easier. Added bonus is that psycopg2(a Python database adapter) supports hstore and will adapt from it to a Python dictionary and back again. That is subject for another blog.

 

 

 

Thinking

When presented with a problem I tend to see the solution as a snapshot encompassing the depth and breadth of whatever is presented. What is usually wanted is an answer that is some subset of that snapshot. Naked bear just walked by. Unfortunately, I tend to try spewing out the entire thing and make the answer worse than the original problem.  The solution would be to filter my responses down to the immediate subject at hand. This entails taking what is a full picture and editing out the relevant bits and then pushing those out in a coherent linear sequence. When writing this is more easily done then when talking off the cuff. There I have varying degrees of success. Often the result is my stuttering out the answer as too much information heads for the door at the same time. The other issue is that even if I do limit the information stream  and it comes out it in a somewhat smooth sequence, the sequence is wrong. The result is confusion, not enlightenment. There are times though when I am ‘on’, often when I am tired, and everything just flows. The key would seem to be that when I tired my mind is less active and more pliable to linear thinking. In other words I tend to be more focused, because I have less energy to cast a wide net. How to capture that focus? Bear being chased by rabbit now.  I like letting my mind loose, it is my natural instinct. The act of writing tends to force focus, this piece notwithstanding, so one way would be to only talk off a written script. Churchill was quoted as saying many of his ‘extemporaneous’ speeches came from written preparation. If I remember correctly something on the order of three hours prep for each hour of speech.  While that works for quite a few situations, it not a complete solution. A more concerted effort of pre-editing the information stream would seem to be the general solution. Leave my kitchen sink thoughts to myself. Bear up a tree, rabbit barking. I can see a lot of biting my tongue in the future

Social Networking

A very brief look at social networking now and then and what it means.

Recently I gave a short presentation on the Python package Brewery. At the conclusion of said talk I was so bold as to mention I was thinking of subclassing some of the code to make it work with DBF files. To that end I went to the GitHub repository to fork it and start my work. At the conclusion of the process I went into contemplative mode. The realization struck that the code started with a guy in Bratislava, Slovakia was uploaded to a site hosted somewhere and then downloaded to my computer here in Bellingham, WA. GitHub being the social hub that facilitated the exchange. Now, since I have a restless mind and need to feed it a varied diet I also read histories/biographies on a regular basis, Theodore Roosevelt being the current topic.  Over the past year the subjects covered have included the elucidation of DNA, Einstein, Darwin and the Desert War in WWII.  Much is made of social networking and its impact on the current world(though it would not seem the stock market:)) While I would admit the ease with which people can interact has been significantly improved, I am not sure there is anything truly new or useful going on. In all the books mentioned there where active social networks present. Needless to say the means of interacting where tailored to the technological level of the day.  Though the pace would seem glacial by today’s standards information managed to traverse great distances and human thought progressed.  In fact my feeling is it progressed at higher plane than currently. The speed and relentless persistence of  present information flow seems to preclude the deep and contemplative thought found in the not too distant past.  So the question is the social networking of today really an improvement or just another problem to overcome?

Making things easier

Have not trolled the source enough to know who specifically to thank, so a general thank you to whoever is responsible for \e(f) in psql and the command DO. I have been working on a fairly complex plpythonu function in a 9.0 database and the combination of the afore mentioned tools made the development  much easier.  The ability to test Python code snippets by writing DO functions inside an editor inside psql is priceless.

PostgreSQL 9.1, first beta around the corner

Joshua Berkus recently gave a heads up on the Advocacy list of the forthcoming release of the first beta for PostgreSQL 9.1, tentatively scheduled for May 3.  I have been following the development fairly closely, though given the number of changes, not in depth.  Good time to point out the Waiting for.. blog series from Hubert ‘depesz’ Lubaczewski found here. It covers many of the changes and uses examples to illustrate, a very handy resource. So what is 9.1 offering?

First some of the headliners.

Synchronous Replication:
Extends the built in replication to eliminate any lag between the master and standby   databases. For those who absolutely need a 1:1 correspondence with their standbys.

Per column collation:
A different language for each column, the Tower of Babel realized:)

Unlogged tables:
Permission is granted to run with scissors. You can now create tables that do not log to the Write Ahead Log(WAL).  This makes data operations on the table faster, at the cost of data security. Data will not be recovered in the event of a crash, nor are unlogged tables replicated.

SE-Postgres.
Speaking of security Postgres will now have access to the SE-Linux framework.

The above are great and I will be exploring them at some point, but what follows are the new features I am really excited about.

PL/Python changes:
Thanks to Jan Urbanski and others there have been a host of changes to the procedural language support for Python.  To mention a few; table function support, better error handling, explicit subtransactions.

pg_basebackup:
Thanks Magnus. A way of doing a basebackup for replication using a Postgres connection. Very handy.  Along with this,  the replication permission for non-superusers.

SQL/MED:
Basically pulling external data sources into Postgres. The implementation for  9.1 is basic, but the potential is too great to ignore.

Extensions:
A easier way to work with database plugins. Combined with the PGXN site a definite plus.

I have just scratched the surface on what is new and improved in PostgreSQL 9.1, for a more in depth look I suggest the latest alpha Release Notes. I mentioned specific people above because I have associated their names with particular features.  Lest I forget, a big thank you to all those who have participated.  There is much to be proud of in this release as in the previous ones.

Troubleshooting

Some thoughts on the art and science of troubleshooting, or how not to resort to the bigger hammer. What I offer up is based on my experience trying to solve problems across an array of disciplines, from truck repair to software debugging. There are common elements to all that are listed below.

1) Work from the known to the unknown not the other way.
This is best illustrated by working with electrical circuits. There are any number of      reasons why electricity is not reaching the end of circuit and working backwards through the circuit often leads down false branches. It is a lot easier to start at the closest panel box and follow the circuit out until voltage is lost.

2) Keep notes.
I am not talking an encyclopedic compendium, something more along the likes of a bread crumb trail to keep from going in circles or trodding down old paths. To extend the electrical example, was that the green wire with white trace or white wire with green trace?

3) All information is important.
I will not recount the number of times I ignored an error message because I ‘knew’ it was not important, much to my later chagrin.

4) KISS(Keep It Simple Stupid).
The poor mans version of Occam’s Razor. If the choice is between an alien gamma ray device flipping bits in the computer or poor quality memory chips, go with the latter.

5) First guess.
This is the you know more than you think principle. Also known as listening to the little voice. In the absence of definite plan, go with the first thought that strikes you. I use to spend a lot of time fighting the little voice, until I realized it was the part of me that was actually paying attention.

6) Walk away.
My dog has helped me solve many problems. I don’t know how many times I took a break to walk the dog and had the answer present itself while the dog was leaving a message. Disconnecting from the problem is often good way to make the connection.

SQL inside Jedit

I use Jedit as my primary text editing tool. I recently ran across a plugin (SQL) to the editor that allows for running SQL from within an editor  buffer. What follows is an explanation of how to set it up and run it.  Some background, Jedit is a Java application so if you plan to use it you will need a JVM installed. I have found that to have it run well you need to use the Sun(Oracle) Java code.

Set up:
Java runtime 1.6+
Jedit 4.3+
Installing the SQL plugin from the Plugin Manager will install dependencies  most notable Project Viewer.
The set up for SQL has global and per project parts.
The global portion is set up through the Plugin Manager plugin options menu.  By default SQL only returns 10 rows and that can be changed here. I have yet to find a way to get it to return with no limits. I would also suggest checking the Show Toolbar and  Popup messages options. The most important global setting is the JDBC page, where you set up the path to the JDBC drivers you wish to use.  SQL supports quite a few databases (Oracle, MySQL, PostgreSQL, DB2, Sybase, Microsoft SQL Server see help file), though notably not SQLite.

Project Viewer (PV) is installed as a dependency because it is used to create per project settings for SQL. It is possible to use  a PV project for just the SQL settings and not use the rest of the package, though I would suggest checking PV out. One note, PV seems to like to throw non fatal errors on a regular basis. They seem to be related to redraw operations and so far have not caused me any problems.  Once you have created a project you can go to the project SQL setting to enter the necessary information.  See screenshot:

Once you have the server set up the fun starts. As an example I wrote a script to CREATE a table and another to populate it with data.  If you look at the screenshot you will see the SQL toolbar I alluded to earlier, it starts with Database:. The second button to the right of the drop down box is the Execute buffer command. Clicking on that causes the entire sql script in the selected buffer to execute. The result is shown in the message box, in this case for the CREATE TABLE statement. I did this again on the buffer holding the INSERT commands .

So now we have a table with data, now what? See below. In the right buffer I have written a SELECT statement against the previously created table. In this case I could have done Execute buffer but for illustration used another method Execute selection. This is the first button to the right of the drop down list. It only runs whatever text is selected.

So what about variable substitution? In the first screenshot below I have used the drop down list Preprocessors to select variable substitution. In the left buffer is another SELECT query this time using the substitution variable ‘?’. The second screenshot shows the dialog box that pops up for data entry. For more than one variable the plugin will cycle through them and present a dialog for each.

One more interesting feature, SqlVFS . This allows one to browse the database through the Jedit File Browser. When you open the File Browser select the Plugins dropdown and then Show databases. Double clicking on the database allows you start drilling down through the database structure.

In the short time I have been using this tool I have found it quite helpful, so I thought would pass along my experience.  Saves a round trip between the text editor and a terminal window.