Building dynamic SQL using psycopg2

It has been awhile since I posted anything. Thought I would start back with a software
package I use in many different situations, psycopg2 . Psycopg2 is a Python database adapter for Postgres that follows the Python DB API. It is feature rich and today I will introduce a feature new to the latest major release(2.7), namely the sql module for building complete SQL statements dynamically. I will give a quick run down of the basics below. For the official docs on the module see:
http://initd.org/psycopg/docs/sql.html#module-psycopg2.sql.
For example data I will be using the Pagila database (derived from the MySQL
Sakila sample db) obtained from here.

Setting up imports:

from psycopg2 import connect, sql
from psycopg2.extras import RealDictCursor

Create connection and cursor:

con = connect("dbname=pagila host=localhost user=postgres", 
              cursor_factory=RealDictCursor)
cur = con.cursor()

Build simple query string:

qry_str = sql.SQL("SELECT {}, {} FROM {}").format(
sql.Identifier('customer_id'),
sql.Identifier("first_name"),
sql.Identifier("customer")
)
print(qry_str.as_string(con))
select "customer_id", "first_name" from "customer"
cur.execute(qry_str)
rs = cur.fetchone()
rs
{'customer_id': 1, 'first_name': 'MARY'}

Breaking the above down. SQL is a class used to build the string and as such has
a format method that follows that for the Python str.format() method. See psycopg2
docs above for exceptions. Identifier is a class that handles those strings to be used
as query identifiers e.g. column/field names versus data values. Therefore in the
above the ‘{}’ are replaced in order by the Identifier values in the format().
To verify the string you can use its as_string method provided you supply a connection
or cursor object to it. The query string is used in cur.execute() to fetch records
from the database. You can also build the query string directly in the execute()
to save a step.

More compact/flexible string construction:

qry_str = sql.SQL("SELECT {} FROM {}").format(
sql.SQL(",").join([sql.Identifier('customer_id'), 
                   sql.Identifier("first_name")]), 
sql.Identifier("customer")
)

The above eliminates the need to provide a ‘{}’ for each field in the SELECT
field list. Instead the .join() method to SQL is used build a comma separated
list of field names. Helpful if you may be receiving a variable number of field names
for different iterations of the query. As example where the contents of fld_list
may change:

fld_list = ["customer_id", "first_name", "last_name", "email"]
qry_str = sql.SQL("SELECT {} FROM {}").format(
sql.SQL(",").join(map(sql.Identifier, fld_list)),
sql.Identifier("customer")
)
print(qry_str.as_string(con))
select 
"customer_id","first_name","last_name","email" 
from "customer"

Generating placeholders in the query. The sql module supports two types of placeholders,
%s and %(name)s. For %s placeholders a sequence of values need to be provided
e.g. a list or tuple. For named placeholders a dictionary is used to supply values.

placeholder_str = sql.SQL("SELECT {} FROM {} WHERE first_name = {}").format(
sql.SQL(",").join(map(sql.Identifier, fld_list)),
sql.Identifier("customer"),
sql.Placeholder()
)
print(placeholder_str.as_string(con))
select 
"customer_id","first_name","last_name","email" 
from "customer" where first_name = %s
cur.execute(placeholder_str, ["MARY"])
rs = cur.fetchone()
rs
{'customer_id': 1,
'email': 'MARY.SMITH@sakilacustomer.org',
'first_name': 'MARY',
'last_name': 'SMITH'}

In the above %s is created as the placeholder for the first_name field value in the
WHERE clause using the Placeholder class. To create a named placeholder the procedure is:

placeholder_str = sql.SQL("SELECT {} FROM {} WHERE first_name = {}").format(
sql.SQL(",").join(map(sql.Identifier, fld_list)),
sql.Identifier("customer"),
sql.Placeholder(name='first_name')
)
print(placeholder_str.as_string(con))
select 
"customer_id","first_name","last_name","email" 
from "customer" where first_name = %(first_name)s
cur.execute(placeholder_str, {"first_name": "MARY"})

Where this leads to is a building a query string from data sourced from a Python dictionary:

new_customer_list = [
{"store_id": 1, "first_name": "ADRIAN" , "last_name": "KLAVER", 
"email": "ADRIAN.KLAVER@sakilacustomer.org", "address_id": 67},
{"store_id": 2, "first_name": "JANE" , "last_name": "DOE", 
"email": "JANE.DOE@sakilacustomer.org", "address_id": 7},
{"store_id": 1, "first_name": "WHO" , "last_name": "KNOWS", 
"email": "WHO.KNOWS@sakilacustomer.org", "address_id": 189 }
]

insert_flds = [fld_name for fld_name in new_customer_list[0].keys()]

insert_str = sql.SQL("INSERT INTO customer ({}) VALUES ({})").format(
sql.SQL(",").join(map(sql.Identifier, insert_flds)),
sql.SQL(",").join(map(sql.Placeholder, insert_flds))
)

from psycopg2.extras import execute_batch

execute_batch(cur, insert_str, new_customer_list)

I snuck in another new feature from psycopg2 2.7, execute_batch. This is way of batching statements for fewer round trips to the server, by default 100. For this example not really a win, but it is nice to know it is out there. For cases where I have a lot of data to transfer to a table I use psycopg2’s COPY features whenever possible. That is a subject for another day though.