Building dynamic SQL using psycopg2

It has been awhile since I posted anything. Thought I would start back with a software package I use in many different situations, psycopg2 . Psycopg2 is a Python database adapter for Postgres that follows the Python DB API. It is feature rich and today I will introduce a feature new to the latest major release(2.7), namely the sql module for building complete SQL statements dynamically. I will give a quick run down of the basics below. For the official docs on the module see: http://initd.org/psycopg/docs/sql.html#module-psycopg2.sql. For example data I will be using the Pagila database (derived from the MySQL Sakila sample db) obtained from here. Setting up imports:
from psycopg2 import connect, sql
from psycopg2.extras import RealDictCursor
Create connection and cursor:
con = connect("dbname=pagila host=localhost user=postgres", 
              cursor_factory=RealDictCursor)
cur = con.cursor()
Build simple query string:
qry_str = sql.SQL("SELECT {}, {} FROM {}").format(
sql.Identifier('customer_id'),
sql.Identifier("first_name"),
sql.Identifier("customer")
)
print(qry_str.as_string(con))
select "customer_id", "first_name" from "customer"
cur.execute(qry_str)
rs = cur.fetchone()
rs
{'customer_id': 1, 'first_name': 'MARY'}
Breaking the above down. SQL is a class used to build the string and as such has a format method that follows that for the Python str.format() method. See psycopg2 docs above for exceptions. Identifier is a class that handles those strings to be used as query identifiers e.g. column/field names versus data values. Therefore in the above the '{}' are replaced in order by the Identifier values in the format(). To verify the string you can use its as_string method provided you supply a connection or cursor object to it. The query string is used in cur.execute() to fetch records from the database. You can also build the query string directly in the execute() to save a step. More compact/flexible string construction:
qry_str = sql.SQL("SELECT {} FROM {}").format(
sql.SQL(",").join([sql.Identifier('customer_id'), 
                   sql.Identifier("first_name")]), 
sql.Identifier("customer")
)
The above eliminates the need to provide a '{}' for each field in the SELECT field list. Instead the .join() method to SQL is used build a comma separated list of field names. Helpful if you may be receiving a variable number of field names for different iterations of the query. As example where the contents of fld_list may change:
fld_list = ["customer_id", "first_name", "last_name", "email"]
qry_str = sql.SQL("SELECT {} FROM {}").format(
sql.SQL(",").join(map(sql.Identifier, fld_list)),
sql.Identifier("customer")
)
print(qry_str.as_string(con))
select 
"customer_id","first_name","last_name","email" 
from "customer"
Generating placeholders in the query. The sql module supports two types of placeholders, %s and %(name)s. For %s placeholders a sequence of values need to be provided e.g. a list or tuple. For named placeholders a dictionary is used to supply values.
placeholder_str = sql.SQL("SELECT {} FROM {} WHERE first_name = {}").format(
sql.SQL(",").join(map(sql.Identifier, fld_list)),
sql.Identifier("customer"),
sql.Placeholder()
)
print(placeholder_str.as_string(con))
select 
"customer_id","first_name","last_name","email" 
from "customer" where first_name = %s
cur.execute(placeholder_str, ["MARY"])
rs = cur.fetchone()
rs
{'customer_id': 1,
'email': 'MARY.SMITH@sakilacustomer.org',
'first_name': 'MARY',
'last_name': 'SMITH'}
In the above %s is created as the placeholder for the first_name field value in the WHERE clause using the Placeholder class. To create a named placeholder the procedure is:
placeholder_str = sql.SQL("SELECT {} FROM {} WHERE first_name = {}").format(
sql.SQL(",").join(map(sql.Identifier, fld_list)),
sql.Identifier("customer"),
sql.Placeholder(name='first_name')
)
print(placeholder_str.as_string(con))
select 
"customer_id","first_name","last_name","email" 
from "customer" where first_name = %(first_name)s
cur.execute(placeholder_str, {"first_name": "MARY"})
Where this leads to is a building a query string from data sourced from a Python dictionary:
new_customer_list = [
{"store_id": 1, "first_name": "ADRIAN" , "last_name": "KLAVER", 
"email": "ADRIAN.KLAVER@sakilacustomer.org", "address_id": 67},
{"store_id": 2, "first_name": "JANE" , "last_name": "DOE", 
"email": "JANE.DOE@sakilacustomer.org", "address_id": 7},
{"store_id": 1, "first_name": "WHO" , "last_name": "KNOWS", 
"email": "WHO.KNOWS@sakilacustomer.org", "address_id": 189 }
]

insert_flds = [fld_name for fld_name in new_customer_list[0].keys()]

insert_str = sql.SQL("INSERT INTO customer ({}) VALUES ({})").format(
sql.SQL(",").join(map(sql.Identifier, insert_flds)),
sql.SQL(",").join(map(sql.Placeholder, insert_flds))
)

from psycopg2.extras import execute_batch

execute_batch(cur, insert_str, new_customer_list)
I snuck in another new feature from psycopg2 2.7, execute_batch. This is way of batching statements for fewer round trips to the server, by default 100. For this example not really a win, but it is nice to know it is out there. For cases where I have a lot of data to transfer to a table I use psycopg2's COPY features whenever possible. That is a subject for another day though.

Postgres and hstore

I recently started working with the hstore contrib module in Postgresql.  My impetus was  the desire to create a generic way of redirecting deletes on key tables to archive tables.  I do that now using a rule based mechanism. The problem is it is fairly fragile and prone to breakage when the parent table changes.  I had seen references to hstore and had a vague idea of what it could do. This was enough to make me dig deeper and see if I could use it for the delete process.  This is still a work in progress, though I am pleased with what I found so far, which is sketched out below.
1) The parent table:
CREATE TABLE plant1 (
p_item_no smallint DEFAULT nextval(('plant1_p_item_no_seq'::text)::regclass) NOT NULL,
common character varying(40) NOT NULL,
genus character varying(20),
species character varying(30),
variety character varying(40),
plant_type character varying(6) NOT NULL,
series character varying(20),
season character varying(9),
ts_insert timestamp(0) without time zone DEFAULT now(),
ts_update timestamp(0) without time zone,
user_update character varying(20),
user_insert character varying(20) DEFAULT "session_user"(),
color character varying(30),
category_type character varying(5) DEFAULT 'plant'::character varying NOT NULL,
category_sub_type character varying(15) DEFAULT 'normal'::character varying NOT NULL,
plant_note text
);
2) The delete table:
CREATE TABLE plant1_delete (
del_id integer NOT NULL,
record_fld public.hstore,
del_ts timestamp with time zone DEFAULT now(),
del_user character varying
);
3) The delete function and trigger:
CREATE OR REPLACE FUNCTION utility.archive_record()
RETURNS trigger
LANGUAGE plpgsql
SECURITY DEFINER
AS $BODY$
DECLARE
tbl_name text := TG_TABLE_NAME || '_delete' ;
archive_row hstore := hstore(OLD.*);
usr_name text := session_user;
BEGIN
EXECUTE 'INSERT INTO ' ||quote_ident(tbl_name)
||'(record_fld,del_ts,del_user)' ||
' VALUES('||quote_literal(archive_row)||',now(),'
||quote_literal(usr_name)||')';
RETURN OLD;
END;
$BODY$

CREATE TRIGGER p1_delete AFTER DELETE ON plant1 FOR EACH ROW EXECUTE PROCEDURE utility.archive_record();
4) The end result:
DELETE from plant1 where p_item_no=578;
DELETE 1
The raw record in the delete table:
test=>; SELECT * from plant1_delete;
-[ RECORD 1 ]------------------------------------------------------------------------------------------

del_id | 6
record_fld | "color"=>;NULL, "genus"=>;"selaginella", "common"=>;"moss little club", "season"=>;NULL, "series"=>;"", "species"=>;"brownii", "variety"=>;"", "p_item_no"=>;"578", "ts_insert"=>;"2004-01-01 00:00:00", "ts_update"=>;"2005-11-21 10:45:30", "plant_note"=>;NULL, "plant_type"=>;"herb", "user_insert"=>;"aklaver", "user_update"=>;"aklaver", "category_type"=>;"plant", "category_sub_type"=>;"normal"
del_ts | 2012-06-23 16:19:00.193426-07
del_user | aklaver

The hstore field parsed out, record selected using value in hstore:
test=>; SELECT (each(record_fld)).* from plant1_delete where record_fld ->;'p_item_no'='578';

key | value
-------------------+---------------------
color |
genus | selaginella
common | moss little club
season |
series |
species | brownii
variety |
p_item_no | 578
ts_insert | 2004-01-01 00:00:00
ts_update | 2005-11-21 10:45:30
plant_note |
plant_type | herb
user_insert | aklaver
user_update | aklaver
category_type | plant
category_sub_type | normal
I can see this making life a lot easier. Added bonus is that psycopg2(a Python database adapter) supports hstore and will adapt from it to a Python dictionary and back again. That is subject for another blog.      

Making things easier

Have not trolled the source enough to know who specifically to thank, so a general thank you to whoever is responsible for \e(f) in psql and the command DO. I have been working on a fairly complex plpythonu function in a 9.0 database and the combination of the afore mentioned tools made the development  much easier.  The ability to test Python code snippets by writing DO functions inside an editor inside psql is priceless.

PostgreSQL 9.1, first beta around the corner

Joshua Berkus recently gave a heads up on the Advocacy list of the forthcoming release of the first beta for PostgreSQL 9.1, tentatively scheduled for May 3.  I have been following the development fairly closely, though given the number of changes, not in depth.  Good time to point out the Waiting for.. blog series from Hubert 'depesz' Lubaczewski found here. It covers many of the changes and uses examples to illustrate, a very handy resource. So what is 9.1 offering? First some of the headliners. Synchronous Replication: Extends the built in replication to eliminate any lag between the master and standby   databases. For those who absolutely need a 1:1 correspondence with their standbys. Per column collation: A different language for each column, the Tower of Babel realized:) Unlogged tables: Permission is granted to run with scissors. You can now create tables that do not log to the Write Ahead Log(WAL).  This makes data operations on the table faster, at the cost of data security. Data will not be recovered in the event of a crash, nor are unlogged tables replicated. SE-Postgres. Speaking of security Postgres will now have access to the SE-Linux framework. The above are great and I will be exploring them at some point, but what follows are the new features I am really excited about. PL/Python changes: Thanks to Jan Urbanski and others there have been a host of changes to the procedural language support for Python.  To mention a few; table function support, better error handling, explicit subtransactions. pg_basebackup: Thanks Magnus. A way of doing a basebackup for replication using a Postgres connection. Very handy.  Along with this,  the replication permission for non-superusers. SQL/MED: Basically pulling external data sources into Postgres. The implementation for  9.1 is basic, but the potential is too great to ignore. Extensions: A easier way to work with database plugins. Combined with the PGXN site a definite plus. I have just scratched the surface on what is new and improved in PostgreSQL 9.1, for a more in depth look I suggest the latest alpha Release Notes. I mentioned specific people above because I have associated their names with particular features.  Lest I forget, a big thank you to all those who have participated.  There is much to be proud of in this release as in the previous ones.

Troubleshooting

Some thoughts on the art and science of troubleshooting, or how not to resort to the bigger hammer. What I offer up is based on my experience trying to solve problems across an array of disciplines, from truck repair to software debugging. There are common elements to all that are listed below. 1) Work from the known to the unknown not the other way. This is best illustrated by working with electrical circuits. There are any number of      reasons why electricity is not reaching the end of circuit and working backwards through the circuit often leads down false branches. It is a lot easier to start at the closest panel box and follow the circuit out until voltage is lost. 2) Keep notes. I am not talking an encyclopedic compendium, something more along the likes of a bread crumb trail to keep from going in circles or trodding down old paths. To extend the electrical example, was that the green wire with white trace or white wire with green trace? 3) All information is important. I will not recount the number of times I ignored an error message because I 'knew' it was not important, much to my later chagrin. 4) KISS(Keep It Simple Stupid). The poor mans version of Occam's Razor. If the choice is between an alien gamma ray device flipping bits in the computer or poor quality memory chips, go with the latter. 5) First guess. This is the you know more than you think principle. Also known as listening to the little voice. In the absence of definite plan, go with the first thought that strikes you. I use to spend a lot of time fighting the little voice, until I realized it was the part of me that was actually paying attention. 6) Walk away. My dog has helped me solve many problems. I don't know how many times I took a break to walk the dog and had the answer present itself while the dog was leaving a message. Disconnecting from the problem is often good way to make the connection.

SQL inside Jedit

I use Jedit as my primary text editing tool. I recently ran across a plugin (SQL) to the editor that allows for running SQL from within an editor  buffer. What follows is an explanation of how to set it up and run it.  Some background, Jedit is a Java application so if you plan to use it you will need a JVM installed. I have found that to have it run well you need to use the Sun(Oracle) Java code. Set up: Java runtime 1.6+ Jedit 4.3+ Installing the SQL plugin from the Plugin Manager will install dependencies  most notable Project Viewer. The set up for SQL has global and per project parts. The global portion is set up through the Plugin Manager plugin options menu.  By default SQL only returns 10 rows and that can be changed here. I have yet to find a way to get it to return with no limits. I would also suggest checking the Show Toolbar and  Popup messages options. The most important global setting is the JDBC page, where you set up the path to the JDBC drivers you wish to use.  SQL supports quite a few databases (Oracle, MySQL, PostgreSQL, DB2, Sybase, Microsoft SQL Server see help file), though notably not SQLite. Project Viewer (PV) is installed as a dependency because it is used to create per project settings for SQL. It is possible to use  a PV project for just the SQL settings and not use the rest of the package, though I would suggest checking PV out. One note, PV seems to like to throw non fatal errors on a regular basis. They seem to be related to redraw operations and so far have not caused me any problems.  Once you have created a project you can go to the project SQL setting to enter the necessary information.  See screenshot:

Once you have the server set up the fun starts. As an example I wrote a script to CREATE a table and another to populate it with data.  If you look at the screenshot you will see the SQL toolbar I alluded to earlier, it starts with Database:. The second button to the right of the drop down box is the Execute buffer command. Clicking on that causes the entire sql script in the selected buffer to execute. The result is shown in the message box, in this case for the CREATE TABLE statement. I did this again on the buffer holding the INSERT commands .

So now we have a table with data, now what? See below. In the right buffer I have written a SELECT statement against the previously created table. In this case I could have done Execute buffer but for illustration used another method Execute selection. This is the first button to the right of the drop down list. It only runs whatever text is selected.

So what about variable substitution? In the first screenshot below I have used the drop down list Preprocessors to select variable substitution. In the left buffer is another SELECT query this time using the substitution variable '?'. The second screenshot shows the dialog box that pops up for data entry. For more than one variable the plugin will cycle through them and present a dialog for each.

One more interesting feature, SqlVFS . This allows one to browse the database through the Jedit File Browser. When you open the File Browser select the Plugins dropdown and then Show databases. Double clicking on the database allows you start drilling down through the database structure.

In the short time I have been using this tool I have found it quite helpful, so I thought would pass along my experience.  Saves a round trip between the text editor and a terminal window.

PostgreSQL Roles

Today I got involved in a discussion on the General mailing list about using ROLE in Postgres.  I realized this is a topic that can get confusing in a hurry and I thought I might try to clear things up, in my mind if nothing else.  Part of the confusion arises from a historical carry over. In versions of Postgres prior to 8.1 there where USER and GROUP.  (An aside, I found this out by using this : Feature Matrix, a very handy tool. ) These where distinct and operated somewhat like their OS equivalents, in that users logged in and then could be part of a group. In 8.1+, in other words all current community supported versions, USER and GROUP have been replaced by ROLE.  The confusion arises because while the current underlying mechanism is based solely on ROLE, the USER and GROUP commands have been maintained for backwards compatibility. So when you use the SQL command CREATE USER in 8.1+ you are just creating a ROLE with the clause LOGIN  whereas CREATE GROUP is just CREATE ROLE which by default has the clause NOLOGIN. For my own personal sanity I just use the CREATE ROLE command and pretend USER and GROUP do not exist. Since we already done the rabbit hole I might as well mention the psql \d commands that apply as well. Again for historical reasons these are \du and \dg which at one time listed users and groups respectively. Now they both return the same thing, a list of the roles. The thing to remember is that unless you are working with a pre 8.1 database you are working with some variation of a ROLE. So what does a ROLE get you? As was implied above a ROLE can have the characteristics of a user or group. For my own use I follow the pattern used by the current CREATE USER/GROUP commands where a ROLE with LOGIN set is a taken to be a user and one with NOLOGIN is equivalent to a group.  Note: This is just a convenient way of viewing things and you will not see a ROLE defined that way when you list them. Assuming that pattern then it possible to create group like ROLES and then assign user like ROLES membership in them. For instance: 'Groups': CREATE ROLE accounting; CREATE ROLE sales; CREATE ROLE production; 'Users' CREATE ROLE aklaver LOGIN IN ROLE accounting,sales; CREATE ROLE daffy_duck LOGIN  IN ROLE production; If you want to change role membership after CREATE ROLE then you will need to use the GRANT or REVOKE commands. Where this pays off is in object permissions. Give the 'group' roles permissions to their respective objects and then add or drop 'users' to the 'groups' as needed. This way you can maintain access flexibility without continually  having to change permissions on objects. I have touched on just the bare basics, this is a subject that covers a lot of ground. To really learn it requires multiple passes through the documentation and plenty of hands on testing.

First Postgres post

This is more of a test post than anything. A way of checking that I have my feed to Planet PostgreSQL set up right. The point being to have only Postgres related posts come through, not my thoughts on cat juggling(thanks Steve Martin) or other such topics. Should the system work I will follow with a post on Streaming replication/Hot Standby/WAL archiving. As Bruce Momjian is aware I have been experimenting with SR.  That has led me to trying to figure out the relationships between the previously mentioned components.  More thoughts on that later.