Building dynamic SQL using psycopg2

It has been awhile since I posted anything. Thought I would start back with a software
package I use in many different situations, psycopg2 . Psycopg2 is a Python database adapter for Postgres that follows the Python DB API. It is feature rich and today I will introduce a feature new to the latest major release(2.7), namely the sql module for building complete SQL statements dynamically. I will give a quick run down of the basics below. For the official docs on the module see:
For example data I will be using the Pagila database (derived from the MySQL
Sakila sample db) obtained from here.

Setting up imports:

from psycopg2 import connect, sql
from psycopg2.extras import RealDictCursor

Create connection and cursor:

con = connect("dbname=pagila host=localhost user=postgres", 
cur = con.cursor()

Build simple query string:

qry_str = sql.SQL("SELECT {}, {} FROM {}").format(
select "customer_id", "first_name" from "customer"
rs = cur.fetchone()
{'customer_id': 1, 'first_name': 'MARY'}

Breaking the above down. SQL is a class used to build the string and as such has
a format method that follows that for the Python str.format() method. See psycopg2
docs above for exceptions. Identifier is a class that handles those strings to be used
as query identifiers e.g. column/field names versus data values. Therefore in the
above the ‘{}’ are replaced in order by the Identifier values in the format().
To verify the string you can use its as_string method provided you supply a connection
or cursor object to it. The query string is used in cur.execute() to fetch records
from the database. You can also build the query string directly in the execute()
to save a step.

More compact/flexible string construction:

qry_str = sql.SQL("SELECT {} FROM {}").format(

The above eliminates the need to provide a ‘{}’ for each field in the SELECT
field list. Instead the .join() method to SQL is used build a comma separated
list of field names. Helpful if you may be receiving a variable number of field names
for different iterations of the query. As example where the contents of fld_list
may change:

fld_list = ["customer_id", "first_name", "last_name", "email"]
qry_str = sql.SQL("SELECT {} FROM {}").format(
sql.SQL(",").join(map(sql.Identifier, fld_list)),
from "customer"

Generating placeholders in the query. The sql module supports two types of placeholders,
%s and %(name)s. For %s placeholders a sequence of values need to be provided
e.g. a list or tuple. For named placeholders a dictionary is used to supply values.

placeholder_str = sql.SQL("SELECT {} FROM {} WHERE first_name = {}").format(
sql.SQL(",").join(map(sql.Identifier, fld_list)),
from "customer" where first_name = %s
cur.execute(placeholder_str, ["MARY"])
rs = cur.fetchone()
{'customer_id': 1,
'email': '',
'first_name': 'MARY',
'last_name': 'SMITH'}

In the above %s is created as the placeholder for the first_name field value in the
WHERE clause using the Placeholder class. To create a named placeholder the procedure is:

placeholder_str = sql.SQL("SELECT {} FROM {} WHERE first_name = {}").format(
sql.SQL(",").join(map(sql.Identifier, fld_list)),
from "customer" where first_name = %(first_name)s
cur.execute(placeholder_str, {"first_name": "MARY"})

Where this leads to is a building a query string from data sourced from a Python dictionary:

new_customer_list = [
{"store_id": 1, "first_name": "ADRIAN" , "last_name": "KLAVER", 
"email": "", "address_id": 67},
{"store_id": 2, "first_name": "JANE" , "last_name": "DOE", 
"email": "", "address_id": 7},
{"store_id": 1, "first_name": "WHO" , "last_name": "KNOWS", 
"email": "", "address_id": 189 }

insert_flds = [fld_name for fld_name in new_customer_list[0].keys()]

insert_str = sql.SQL("INSERT INTO customer ({}) VALUES ({})").format(
sql.SQL(",").join(map(sql.Identifier, insert_flds)),
sql.SQL(",").join(map(sql.Placeholder, insert_flds))

from psycopg2.extras import execute_batch

execute_batch(cur, insert_str, new_customer_list)

I snuck in another new feature from psycopg2 2.7, execute_batch. This is way of batching statements for fewer round trips to the server, by default 100. For this example not really a win, but it is nice to know it is out there. For cases where I have a lot of data to transfer to a table I use psycopg2’s COPY features whenever possible. That is a subject for another day though.

Postgres and hstore

I recently started working with the hstore contrib module in Postgresql.  My impetus was  the desire to create a generic way of redirecting deletes on key tables to archive tables.  I do that now using a rule based mechanism. The problem is it is fairly fragile and prone to breakage when the parent table changes.  I had seen references to hstore and had a vague idea of what it could do. This was enough to make me dig deeper and see if I could use it for the delete process.  This is still a work in progress, though I am pleased with what I found so far, which is sketched out below.
1) The parent table:

p_item_no smallint DEFAULT nextval(('plant1_p_item_no_seq'::text)::regclass) NOT NULL,
common character varying(40) NOT NULL,
genus character varying(20),
species character varying(30),
variety character varying(40),
plant_type character varying(6) NOT NULL,
series character varying(20),
season character varying(9),
ts_insert timestamp(0) without time zone DEFAULT now(),
ts_update timestamp(0) without time zone,
user_update character varying(20),
user_insert character varying(20) DEFAULT "session_user"(),
color character varying(30),
category_type character varying(5) DEFAULT 'plant'::character varying NOT NULL,
category_sub_type character varying(15) DEFAULT 'normal'::character varying NOT NULL,
plant_note text

2) The delete table:

CREATE TABLE plant1_delete (
del_id integer NOT NULL,
record_fld public.hstore,
del_ts timestamp with time zone DEFAULT now(),
del_user character varying

3) The delete function and trigger:

CREATE OR REPLACE FUNCTION utility.archive_record()
RETURNS trigger
LANGUAGE plpgsql
tbl_name text := TG_TABLE_NAME || '_delete' ;
archive_row hstore := hstore(OLD.*);
usr_name text := session_user;
EXECUTE 'INSERT INTO ' ||quote_ident(tbl_name)
||'(record_fld,del_ts,del_user)' ||
' VALUES('||quote_literal(archive_row)||',now(),'

CREATE TRIGGER p1_delete AFTER DELETE ON plant1 FOR EACH ROW EXECUTE PROCEDURE utility.archive_record();

4) The end result:

DELETE from plant1 where p_item_no=578;
The raw record in the delete table:
test=>; SELECT * from plant1_delete;
-[ RECORD 1 ]------------------------------------------------------------------------------------------

del_id | 6
record_fld | "color"=>;NULL, "genus"=>;"selaginella", "common"=>;"moss little club", "season"=>;NULL, "series"=>;"", "species"=>;"brownii", "variety"=>;"", "p_item_no"=>;"578", "ts_insert"=>;"2004-01-01 00:00:00", "ts_update"=>;"2005-11-21 10:45:30", "plant_note"=>;NULL, "plant_type"=>;"herb", "user_insert"=>;"aklaver", "user_update"=>;"aklaver", "category_type"=>;"plant", "category_sub_type"=>;"normal"
del_ts | 2012-06-23 16:19:00.193426-07
del_user | aklaver

The hstore field parsed out, record selected using value in hstore:
test=>; SELECT (each(record_fld)).* from plant1_delete where record_fld ->;'p_item_no'='578';

key | value
color |
genus | selaginella
common | moss little club
season |
series |
species | brownii
variety |
p_item_no | 578
ts_insert | 2004-01-01 00:00:00
ts_update | 2005-11-21 10:45:30
plant_note |
plant_type | herb
user_insert | aklaver
user_update | aklaver
category_type | plant
category_sub_type | normal

I can see this making life a lot easier. Added bonus is that psycopg2(a Python database adapter) supports hstore and will adapt from it to a Python dictionary and back again. That is subject for another blog.





When presented with a problem I tend to see the solution as a snapshot encompassing the depth and breadth of whatever is presented. What is usually wanted is an answer that is some subset of that snapshot. Naked bear just walked by. Unfortunately, I tend to try spewing out the entire thing and make the answer worse than the original problem.  The solution would be to filter my responses down to the immediate subject at hand. This entails taking what is a full picture and editing out the relevant bits and then pushing those out in a coherent linear sequence. When writing this is more easily done then when talking off the cuff. There I have varying degrees of success. Often the result is my stuttering out the answer as too much information heads for the door at the same time. The other issue is that even if I do limit the information stream  and it comes out it in a somewhat smooth sequence, the sequence is wrong. The result is confusion, not enlightenment. There are times though when I am ‘on’, often when I am tired, and everything just flows. The key would seem to be that when I tired my mind is less active and more pliable to linear thinking. In other words I tend to be more focused, because I have less energy to cast a wide net. How to capture that focus? Bear being chased by rabbit now.  I like letting my mind loose, it is my natural instinct. The act of writing tends to force focus, this piece notwithstanding, so one way would be to only talk off a written script. Churchill was quoted as saying many of his ‘extemporaneous’ speeches came from written preparation. If I remember correctly something on the order of three hours prep for each hour of speech.  While that works for quite a few situations, it not a complete solution. A more concerted effort of pre-editing the information stream would seem to be the general solution. Leave my kitchen sink thoughts to myself. Bear up a tree, rabbit barking. I can see a lot of biting my tongue in the future

Social Networking

A very brief look at social networking now and then and what it means.

Recently I gave a short presentation on the Python package Brewery. At the conclusion of said talk I was so bold as to mention I was thinking of subclassing some of the code to make it work with DBF files. To that end I went to the GitHub repository to fork it and start my work. At the conclusion of the process I went into contemplative mode. The realization struck that the code started with a guy in Bratislava, Slovakia was uploaded to a site hosted somewhere and then downloaded to my computer here in Bellingham, WA. GitHub being the social hub that facilitated the exchange. Now, since I have a restless mind and need to feed it a varied diet I also read histories/biographies on a regular basis, Theodore Roosevelt being the current topic.  Over the past year the subjects covered have included the elucidation of DNA, Einstein, Darwin and the Desert War in WWII.  Much is made of social networking and its impact on the current world(though it would not seem the stock market:)) While I would admit the ease with which people can interact has been significantly improved, I am not sure there is anything truly new or useful going on. In all the books mentioned there where active social networks present. Needless to say the means of interacting where tailored to the technological level of the day.  Though the pace would seem glacial by today’s standards information managed to traverse great distances and human thought progressed.  In fact my feeling is it progressed at higher plane than currently. The speed and relentless persistence of  present information flow seems to preclude the deep and contemplative thought found in the not too distant past.  So the question is the social networking of today really an improvement or just another problem to overcome?

Making things easier

Have not trolled the source enough to know who specifically to thank, so a general thank you to whoever is responsible for \e(f) in psql and the command DO. I have been working on a fairly complex plpythonu function in a 9.0 database and the combination of the afore mentioned tools made the development  much easier.  The ability to test Python code snippets by writing DO functions inside an editor inside psql is priceless.

PostgreSQL 9.1, first beta around the corner

Joshua Berkus recently gave a heads up on the Advocacy list of the forthcoming release of the first beta for PostgreSQL 9.1, tentatively scheduled for May 3.  I have been following the development fairly closely, though given the number of changes, not in depth.  Good time to point out the Waiting for.. blog series from Hubert ‘depesz’ Lubaczewski found here. It covers many of the changes and uses examples to illustrate, a very handy resource. So what is 9.1 offering?

First some of the headliners.

Synchronous Replication:
Extends the built in replication to eliminate any lag between the master and standby   databases. For those who absolutely need a 1:1 correspondence with their standbys.

Per column collation:
A different language for each column, the Tower of Babel realized:)

Unlogged tables:
Permission is granted to run with scissors. You can now create tables that do not log to the Write Ahead Log(WAL).  This makes data operations on the table faster, at the cost of data security. Data will not be recovered in the event of a crash, nor are unlogged tables replicated.

Speaking of security Postgres will now have access to the SE-Linux framework.

The above are great and I will be exploring them at some point, but what follows are the new features I am really excited about.

PL/Python changes:
Thanks to Jan Urbanski and others there have been a host of changes to the procedural language support for Python.  To mention a few; table function support, better error handling, explicit subtransactions.

Thanks Magnus. A way of doing a basebackup for replication using a Postgres connection. Very handy.  Along with this,  the replication permission for non-superusers.

Basically pulling external data sources into Postgres. The implementation for  9.1 is basic, but the potential is too great to ignore.

A easier way to work with database plugins. Combined with the PGXN site a definite plus.

I have just scratched the surface on what is new and improved in PostgreSQL 9.1, for a more in depth look I suggest the latest alpha Release Notes. I mentioned specific people above because I have associated their names with particular features.  Lest I forget, a big thank you to all those who have participated.  There is much to be proud of in this release as in the previous ones.


Some thoughts on the art and science of troubleshooting, or how not to resort to the bigger hammer. What I offer up is based on my experience trying to solve problems across an array of disciplines, from truck repair to software debugging. There are common elements to all that are listed below.

1) Work from the known to the unknown not the other way.
This is best illustrated by working with electrical circuits. There are any number of      reasons why electricity is not reaching the end of circuit and working backwards through the circuit often leads down false branches. It is a lot easier to start at the closest panel box and follow the circuit out until voltage is lost.

2) Keep notes.
I am not talking an encyclopedic compendium, something more along the likes of a bread crumb trail to keep from going in circles or trodding down old paths. To extend the electrical example, was that the green wire with white trace or white wire with green trace?

3) All information is important.
I will not recount the number of times I ignored an error message because I ‘knew’ it was not important, much to my later chagrin.

4) KISS(Keep It Simple Stupid).
The poor mans version of Occam’s Razor. If the choice is between an alien gamma ray device flipping bits in the computer or poor quality memory chips, go with the latter.

5) First guess.
This is the you know more than you think principle. Also known as listening to the little voice. In the absence of definite plan, go with the first thought that strikes you. I use to spend a lot of time fighting the little voice, until I realized it was the part of me that was actually paying attention.

6) Walk away.
My dog has helped me solve many problems. I don’t know how many times I took a break to walk the dog and had the answer present itself while the dog was leaving a message. Disconnecting from the problem is often good way to make the connection.

SQL inside Jedit

I use Jedit as my primary text editing tool. I recently ran across a plugin (SQL) to the editor that allows for running SQL from within an editor  buffer. What follows is an explanation of how to set it up and run it.  Some background, Jedit is a Java application so if you plan to use it you will need a JVM installed. I have found that to have it run well you need to use the Sun(Oracle) Java code.

Set up:
Java runtime 1.6+
Jedit 4.3+
Installing the SQL plugin from the Plugin Manager will install dependencies  most notable Project Viewer.
The set up for SQL has global and per project parts.
The global portion is set up through the Plugin Manager plugin options menu.  By default SQL only returns 10 rows and that can be changed here. I have yet to find a way to get it to return with no limits. I would also suggest checking the Show Toolbar and  Popup messages options. The most important global setting is the JDBC page, where you set up the path to the JDBC drivers you wish to use.  SQL supports quite a few databases (Oracle, MySQL, PostgreSQL, DB2, Sybase, Microsoft SQL Server see help file), though notably not SQLite.

Project Viewer (PV) is installed as a dependency because it is used to create per project settings for SQL. It is possible to use  a PV project for just the SQL settings and not use the rest of the package, though I would suggest checking PV out. One note, PV seems to like to throw non fatal errors on a regular basis. They seem to be related to redraw operations and so far have not caused me any problems.  Once you have created a project you can go to the project SQL setting to enter the necessary information.  See screenshot:

Once you have the server set up the fun starts. As an example I wrote a script to CREATE a table and another to populate it with data.  If you look at the screenshot you will see the SQL toolbar I alluded to earlier, it starts with Database:. The second button to the right of the drop down box is the Execute buffer command. Clicking on that causes the entire sql script in the selected buffer to execute. The result is shown in the message box, in this case for the CREATE TABLE statement. I did this again on the buffer holding the INSERT commands .

So now we have a table with data, now what? See below. In the right buffer I have written a SELECT statement against the previously created table. In this case I could have done Execute buffer but for illustration used another method Execute selection. This is the first button to the right of the drop down list. It only runs whatever text is selected.

So what about variable substitution? In the first screenshot below I have used the drop down list Preprocessors to select variable substitution. In the left buffer is another SELECT query this time using the substitution variable ‘?’. The second screenshot shows the dialog box that pops up for data entry. For more than one variable the plugin will cycle through them and present a dialog for each.

One more interesting feature, SqlVFS . This allows one to browse the database through the Jedit File Browser. When you open the File Browser select the Plugins dropdown and then Show databases. Double clicking on the database allows you start drilling down through the database structure.

In the short time I have been using this tool I have found it quite helpful, so I thought would pass along my experience.  Saves a round trip between the text editor and a terminal window.

PostgreSQL Roles

Today I got involved in a discussion on the General mailing list about using ROLE in Postgres.  I realized this is a topic that can get confusing in a hurry and I thought I might try to clear things up, in my mind if nothing else.  Part of the confusion arises from a historical carry over. In versions of Postgres prior to 8.1 there where USER and GROUP.  (An aside, I found this out by using this : Feature Matrix, a very handy tool. ) These where distinct and operated somewhat like their OS equivalents, in that users logged in and then could be part of a group. In 8.1+, in other words all current community supported versions, USER and GROUP have been replaced by ROLE.  The confusion arises because while the current underlying mechanism is based solely on ROLE, the USER and GROUP commands have been maintained for backwards compatibility. So when you use the SQL command CREATE USER in 8.1+ you are just creating a ROLE with the clause LOGIN  whereas CREATE GROUP is just CREATE ROLE which by default has the clause NOLOGIN. For my own personal sanity I just use the CREATE ROLE command and pretend USER and GROUP do not exist. Since we already done the rabbit hole I might as well mention the psql \d commands that apply as well. Again for historical reasons these are \du and \dg which at one time listed users and groups respectively. Now they both return the same thing, a list of the roles. The thing to remember is that unless you are working with a pre 8.1 database you are working with some variation of a ROLE.

So what does a ROLE get you? As was implied above a ROLE can have the characteristics of a user or group. For my own use I follow the pattern used by the current CREATE USER/GROUP commands where a ROLE with LOGIN set is a taken to be a user and one with NOLOGIN is equivalent to a group.  Note: This is just a convenient way of viewing things and you will not see a ROLE defined that way when you list them. Assuming that pattern then it possible to create group like ROLES and then assign user like ROLES membership in them. For instance:

CREATE ROLE accounting;
CREATE ROLE production;


CREATE ROLE aklaver LOGIN IN ROLE accounting,sales;
CREATE ROLE daffy_duck LOGIN  IN ROLE production;
If you want to change role membership after CREATE ROLE then you will need to use the GRANT or REVOKE commands.

Where this pays off is in object permissions. Give the ‘group’ roles permissions to their respective objects and then add or drop ‘users’ to the ‘groups’ as needed. This way you can maintain access flexibility without continually  having to change permissions on objects.

I have touched on just the bare basics, this is a subject that covers a lot of ground. To really learn it requires multiple passes through the documentation and plenty of hands on testing.

Quantas gripe sheet

Ran across this years back, it still gives me a chuckle:

After every flight, Qantas pilots fill out a form, called a “gripe sheet,” which tells mechanics about the problems with the aircraft.

The mechanics correct the problems, document their repairs on the form, and then pilots review the gripe sheets before the next flight.  Never let it be said that ground crews lack a sense of humor.

Here are some actual maintenance complaints submitted by Qantas’ pilots (marked with a “P”) and the solutions recorded (marked with a “S”) by maintenance engineers.

By the way, Qantas is the only major airline that has never had an accident.

P: Left inside main tire almost needs replacement.

S: Almost replaced left inside main tire.

P: Test flight OK, except auto-land very rough.

S: Auto-land not installed on this aircraft.

P: Something loose in cockpit.

S: Something tightening in cockpit.

P: Dead bugs on windshield.

S: Live bugs on back-order.

P: Autopilot in altitude-hold mode produces a 200 feet per minute descent.

S: Cannot reproduce problem on ground.

P: Evidence of leak on right main landing gear.

S: Evidence removed.

P: DME volume unbelievably loud.

S: DME volume set to more believable level.

P: Friction locks cause throttle levers to stick.

S: That’s what they’re for.

P: IFF inoperative.

S: IFF always inoperative in OFF mode.

P: Suspected crack in windshield.

S: Suspect you’re right.

P: Number 3 engine missing.

S: Engine found on right wing after brief search.

P: Aircraft handles funny.

S: Aircraft warned to straighten up, fly right, and be serious.

P: Target radar hums.

S: Reprogramming target radar with lyrics.

P: Mouse in cockpit.

S: Cat installed.

And the best one for last…

P: Noise coming from under instrument panel.  Sounds like a midget pounding on something with a hammer.

S: Took hammer away from midget.