DBIx::Connector and Serializable Snapshot Isolation

I was at Postgres Open week before last. This was a great conference, very welcoming atmosphere and lots of great talks. One of the more significant, for me, was the session on serializable transactions by Kevin Grittner, who developed SSI for PostgreSQL 9.1. I hadn’t paid much attention to this feature before now, but it became clear to me, during the talk, that it’s time.

So what is SSI? Well, serializable transactions are almost certainly how you think of transactions already. Here’s how Kevin describes them:

True serializable transactions can simplify software development. Because any transaction which will do the right thing if it is the only transaction running will also do the right thing in any mix of serializable transactions, the programmer need not understand and guard against all possible conflicts. If this feature is used consistently, there is no need to ever take an explicit lock or SELECT FOR UPDATE/SHARE.

This is, in fact, generally how I’ve thought about transactions. But I’ve certainly run into cases where it wasn’t true. Back in 2006, I wrote an article on managing many-to-many relationships with PL/pgSQL which demonstrated a race condition one might commonly find when using an ORM. The solution I offered was to always use a PL/pgSQL function that does the work, and that function executes a SELECT...FOR UPDATE statement to overcome the race condition. This creates a lock that forces conflicting transactions to be performed serially.

Naturally, this is something one would rather not have to think about. Hence SSI. When you identify a transaction as serializable, it will be executed in a truly serializable fashion. So I could actually do away with the SELECT...FOR UPDATE workaround — not to mention any other race conditions I might have missed — simply by telling PostgreSQL to enforce transaction isolation. This essentially eliminates the possibility of unexpected side-effects.

This comes at a cost, however. Not in terms of performance so much, since the SSI implementation uses some fancy, recently-developed algorithms to keep things efficient. (Kevin tells me via IRC: “Usually the rollback and retry work is the bulk of the additional cost in an SSI load, in my testing so far. A synthetic load to really stress the LW locking, with a fully-cached database doing short read-only transactions will have no serialization failures, but can run up some CPU time in LW lock contention.”) No, the cost is actually in increased chance of transaction rollback. Because SSI will catch more transaction conflicts than the traditional “read committed” isolation level, frameworks that expect to work with SSI need to be prepared to handle more transaction failures. From the fine manual:

The Serializable isolation level provides the strictest transaction isolation. This level emulates serial transaction execution, as if transactions had been executed one after another, serially, rather than concurrently. However, like the Repeatable Read level, applications using this level must be prepared to retry transactions due to serialization failures.

And that brings me to DBIx::Connector, my Perl module for safe connection and transaction management. It currently has no such retry smarts built into it. The feature closest to that is the “fixup” connection mode, wherein if a execution of a code block fails due to a connection failure, DBIx::Connector will re-connect to the database and execute the code reference again.

I think I should extend DBIx::Connector to take isolation failures and deadlocks into account. That is, fixup mode would retry a code block not only on connection failure but also on serialization failure (SQLSTATE 40001) and deadlocks (SQLSTATE 40P01). I would also add a new attribute, retries, to specify the number of times to retry such execution, with a default of three (which likely will cover the vast majority of cases). This has actually been an oft-requested feature, and I’m glad to have a new reason to add it.

There are a few design issues to overcome, however:

  • Fixup mode is supported not just by txn(), which scopes the execution of a code reference to a single transaction, but also run(), which does no transaction handling. Should the new retry support be added there, too? I could see it either way (a single SQL statement executed in run() is implicitly transaction-scoped).
  • Fixup mode is also supported by svp(), which scopes the execution of a code reference to a savepoint (a.k.a. a subtransaction). Should the rollback and retry be supported there, too, or would the whole transaction have to be retried? I’m thinking the latter, since that’s currently the behavior for connection failures.
  • Given these issues, will it make more sense to perhaps create a new mode? Maybe it would be supported only by txn().

This is do-able, will likely just take some experimentation to figure it out and settle on the appropriate API. I’ll need to find the tuits for that soon.

In the meantime, given currently in-progress changes, I’ve just released a new version of DBIx::Connector with a single change: All uses of the deprecated catch syntax now throw warnings. The previous version threw warnings only the first time the syntax was used in a particular context, to keep error logs from getting clogged up. Hopefully most folks have changed their code in the two months since the previous release and switched to Try::Tiny or some other model for exception handling. The catch syntax will be completely removed in the next release of DBIx::Connector, likely around the end of the year. Hopefully the new SSI-aware retry functionality will have been integrated by then, too.

In a future post I’ll likely chew over whether or not to add an API to set the transaction isolation level within a call to txn() and friends.

  • E-mail this story to a friend!
  • Sphinn
  • StumbleUpon
  • Facebook
  • del.icio.us
  • LinkedIn
  • TwitThis
  • Digg
  • Google
  • MySpace
  • Reddit
  • StumbleUpon
  • Technorati
  • Yahoo! Buzz

Fixing Foreign Key Deadlocks in PostgreSQL

PGX had a client come to us recently with a rather nasty deadlock issue. It took far longer than we would have liked to figure out the issue, and once we did, they were able to clear it up by dropping an unnecessary index. Still, it shouldn’t have been happening to begin with. Joel Jacobson admirably explained the issue on pgsql-hackers (and don’t miss the screencast).

Some might consider it a bug in PostgreSQL, but the truth is that PostgreSQL can obtain stronger than necessary locks. Such locks cause some operations to block unnecessarily and some other operations to deadlock, especially when foreign keys are used in a busy database. And really, who doesn’t use FKs in their busy database?

Fortunately, Simon Riggs proposed a solution. And it’s a good one. So good that PGX is partnering with Glue Finance and Command Prompt as founding sponsors on a new FOSSExperts project to actually get it done. Álvaro Herrera is doing the actual hacking on the project, and has already blogged about it here and here.

If you use foreign key constraints (and you should!) and you have a high transaction load on your database (or expect to soon!), this matters to you. In fact, if you use ActiveRecord with Rails, there might even be a special place in your heart for this issue, says Mina Naguib. We’d really like to get this done in time for the PostgreSQL 9.1 release. But it will only happen if the project can be funded.

Yes, that’s right, as with PGXN, this is community project for which we’re raising funds from the community to get it done. I think that more and more work could be done this way, as various interested parties contribute small amounts to collectively fund improvements to the benefit of us all. So can you help out? Hit the FOSSExperts project page for all the project details, and to make your contribution.

Help us help the community to make PostgreSQL better than ever!

  • E-mail this story to a friend!
  • Sphinn
  • StumbleUpon
  • Facebook
  • del.icio.us
  • LinkedIn
  • TwitThis
  • Digg
  • Google
  • MySpace
  • Reddit
  • StumbleUpon
  • Technorati
  • Yahoo! Buzz

Managing Key/Value Pairs in PostgreSQL

Let’s say that you’ve been following the latest research in key/value data storage and are interested in managing such data in a PostgreSQL database. You want to have functions to store and retrieve pairs, but there is no natural way to represent pairs in SQL. Many languages have hashes or or data dictionaries to fulfill this role, and you can pass them to functional interfaces. SQL’s got nothin’. In PostgreSQL, have two options: use nested arrays (simple, fast) or use a custom composite data type (sugary, legible).

Let’s assume you have this table for storing your pairs:

CREATE TEMPORARY TABLE kvstore (
    key        TEXT PRIMARY KEY,
    value      TEXT,
    expires_at TIMESTAMPTZ DEFAULT NOW() + '12 hours'::interval
);

To store pairs, you can use nested arrays like so:

SELECT store(ARRAY[ ['foo', 'bar'], ['baz', 'yow'] ]);

Not too bad, and since SQL arrays are a core feature of PostgreSQL, there’s nothing special to do. Here’s the store() function:

CREATE OR REPLACE FUNCTION store(
    params text[][]
) RETURNS VOID LANGUAGE plpgsql AS $$
BEGIN
    FOR i IN 1 .. array_upper(params, 1) LOOP
        UPDATE kvstore
           SET value      = params[i][2],
               expires_at = NOW() + '12 hours'::interval
         WHERE key        = param[i][1];
        CONTINUE WHEN FOUND;
        INSERT INTO kvstore (key, value)
        VALUES (params[i][1], params[i][2]);
    END LOOP;
END;
$$;

I’ve seen worse. The trick is to iterate over each nested array, try an update for each, and insert when no row is updated. Alas, you have no control over how many elements a user might include in a nested array. One might call it as:

SELECT store(ARRAY[ ['foo', 'bar', 'baz'] ]);

Or:

SELECT store(ARRAY[ ['foo'] ]);

No errors will be thrown in either case. In the first the "baz" will be ignored, and in the second the value will default to NULL. If you really didn’t like these behaviors, you could add some code to throw an exception if array_upper(params, 2) returns anything other than 2.

Let’s look at fetching values for keys. PostgreSQL 8.4 added variadic function arguments, so it’s easy to provide a nice interface for retrieving one or more values. The obvious one fetches a single value:

CREATE OR REPLACE FUNCTION getval(
    text
) RETURNS TEXT LANGUAGE SQL AS $$
    SELECT value FROM kvstore WHERE key = $1;
$$;

Nice and simple:

SELECT getval('baz');

 getval 
--------’
 yow

The variadic version looks like this:

CREATE OR REPLACE FUNCTION getvals(
    variadic text[]
) RETURNS SETOF text LANGUAGE SQL AS $$
    SELECT value
      FROM kvstore
      JOIN (SELECT generate_subscripts($1, 1)) AS f(i)
        ON kvstore.key = $1[i]
     ORDER BY i;
$$;

Note the use of ORDER BY i to ensure that the values are returned in the same order as the keys are passed to the function. So if I’ve got the key/value pairs ’foo’ => ‘bar’ and ’baz’ => ‘yow’, the output is:

SELECT * FROM getvals('foo', 'baz');

 getvals 
---------
 bar
 yow

If we want to the rows to have the keys and values together, we can return them as arrays, like so:

CREATE OR REPLACE FUNCTION getpairs(
    variadic text[]
) RETURNS SETOF text[] LANGUAGE SQL AS $$
    SELECT ARRAY[key, value]
      FROM kvstore
      JOIN unnest($1) AS k ON kvstore.key = k
$$;

Here I’m assuming that order isn’t important, which means we can use unnest to "flatten" the array, instead of the slightly more baroque generate_subscripts() with array access. The output:

SELECT * FROM getpairs('foo', 'baz');

  getpairs   
-------------
 {baz,yow}
 {foo,bar}

Now, this is good as far as it goes, but the use of nested arrays to represent key/value pairs is not exactly ideal: just looking at the use of a function, there’s nothing to indicate that you’re using key/value pairs. What would be ideal is to use row constructors to pass arbitrary pairs:

SELECT store( ROW('foo', 'bar'), ROW('baz', 42) );

Alas, one cannot pass RECORD values (the data type returned by ROW()) to non-C functions in PostgreSQL.1 But if you don’t mind your keys and values always being TEXT, we can get almost all the way there by creating an "ordered pair" data type as a composite type like so:

CREATE TYPE pair AS ( k text, v text );

Then we can create store() with a signature of VARIADIC pair[] and pass in any number of these suckers:

CREATE OR REPLACE FUNCTION store(
    params variadic pair[]
) RETURNS VOID LANGUAGE plpgsql AS $$
DECLARE
    param pair;
BEGIN
    FOR param IN SELECT * FROM unnest(params) LOOP
        UPDATE kvstore
           SET value = param.v,
               expires_at = NOW() + '12 hours'::interval
         WHERE key = param.k;
        CONTINUE WHEN FOUND;
        INSERT INTO kvstore (key, value) VALUES (param.k, param.v);
    END LOOP;
END;
$$;

Isn’t it nice how we can access keys and values as param.k and param.v? Call the function like this:

SELECT store( ROW('foo', 'bar')::pair, ROW('baz', 'yow')::pair );

Of course, that can get a bit old, casting to pair all the time, so let’s create some pair constructor functions to simplify things:

CREATE OR REPLACE FUNCTION pair(anyelement, text)
RETURNS pair LANGUAGE SQL AS 'SELECT ROW($1, $2)::pair';

CREATE OR REPLACE FUNCTION pair(text, anyelement)
RETURNS pair LANGUAGE SQL AS 'SELECT ROW($1, $2)::pair';

CREATE OR REPLACE FUNCTION pair(anyelement, anyelement)
RETURNS pair LANGUAGE SQL AS 'SELECT ROW($1, $2)::pair';

CREATE OR REPLACE FUNCTION pair(text, text)
RETURNS pair LANGUAGE SQL AS 'SELECT ROW($1, $2)::pair;';

I’ve created four variants here to allow for the most common combinations of types. So any of the following will work:

SELECT pair('foo', 'bar');
SELECT pair('foo', 1);
SELECT pair(12.3, 'foo');
SELECT pair(1, 43);

Alas, you can’t mix any other types, so this will fail:

SELECT pair(1, 12.3);

ERROR:  function pair(integer, numeric) does not exist
LINE 1: SELECT pair(1, 12.3);

We could create a whole slew of additional constructors, but since we’re using a key/value store, it’s likely that the keys will usually be text anyway. So now we can call store() like so:

SELECT store( pair('foo', 'bar'), pair('baz', 'yow') );

Better, eh? Hell, we can go all the way and create a nice binary operator to make it still more sugary. Just map each of the pair functions to the operator like so:

CREATE OPERATOR -> (
    LEFTARG   = text,
    RIGHTARG  = anyelement,
    PROCEDURE = pair
);

CREATE OPERATOR -> (
    LEFTARG   = anyelement,
    RIGHTARG  = text,
    PROCEDURE = pair
);

CREATE OPERATOR -> (
    LEFTARG   = anyelement,
    RIGHTARG  = anyelement,
    PROCEDURE = pair
);

CREATE OPERATOR -> (
    LEFTARG   = text,
    RIGHTARG  = text,
    PROCEDURE = pair
);

Looks like a lot of repetition, I know, but checkout the new syntax:

SELECT store( 'foo' -> 'bar', 'baz' -> 1 );

Cute, eh? I chose to use -> because => is deprecated as an operator in PostgreSQL 9.0: SQL 2011 reserves that operator for named parameter assignment.2

As a last twist, let’s rewrite getpairs() to return pairs instead of arrays:

CREATE OR REPLACE FUNCTION getpairs(
    variadic text[]
) RETURNS SETOF pair LANGUAGE SQL AS $$
    SELECT key -> value
      FROM kvstore
      JOIN unnest($1) AS k ON kvstore.key = k
$$;

Cute, eh? Its use is just like before, only now the output is more table-like:

SELECT * FROM getpairs('foo', 'baz');

  k  |   v   
-----+-------
 baz | yow
 foo | bar

You can also get them back as composites by omitting * FROM:

SELECT getpairs('foo', 'baz');

  getpairs   
-------------
 (foo,bar)
 (baz,yow)

Anyway, just something to consider the next time you need a function that allows any number of key/value pairs to be passed. It’s not perfect, but it’s pretty sweet.


  1. In the recent pgsql-hackers discussion that inspired this post, Pavel Stehule suggested adding something like Oracle COLLECTIONs to address this shortcoming. I don’t know how far this idea will get, but it sure would be nice to be able to pass objects with varying kinds of data, rather than be limited to data all of one type (values in an SQL array must all be of the same type).

  2. No, you won’t be able to use named parameters for this application because named parameters are inherently non-variadic. That is, you can only pre-declare so many named parameters: you can’t anticipate every parameter that’s likely to be wanted as a key in our key/value store.

  • E-mail this story to a friend!
  • Sphinn
  • StumbleUpon
  • Facebook
  • del.icio.us
  • LinkedIn
  • TwitThis
  • Digg
  • Google
  • MySpace
  • Reddit
  • StumbleUpon
  • Technorati
  • Yahoo! Buzz

PGXN Blog and Twitterstream

I crated the PGXN Blog yesterday. Tune in there for news and announcements. I’ll also be posting status reports once development gets underway, so that all you fans out there can follow my progress. Once the site is done (or at 1.0 anyway), the blog will be used for announcements, discussion of support issues, etc. So tune in!

Oh, and I created a PGXN Twitterstream, too. You should follow it! New blog posts will be tweeted, and once the site gets going, new uploads will be tweeted, too. Check it out!

  • E-mail this story to a friend!
  • Sphinn
  • StumbleUpon
  • Facebook
  • del.icio.us
  • LinkedIn
  • TwitThis
  • Digg
  • Google
  • MySpace
  • Reddit
  • StumbleUpon
  • Technorati
  • Yahoo! Buzz

PGXN Development Project

I’m pleased to announce the launch of the PGXN development project. I’ve written a detailed specification and pushed it through general approval on pgsql-hackers. I’ve written up a detailed project plan and estimated things at a highly reduced PostgreSQL Experts rate to come up with a fundraising goal: $25,000. And now, thanks to founding contributions from myYearbook.com, and PostgreSQL Experts, we have started the fundraising phase of the project.

So what’s this all about? PGXN, the PostgreSQL Extension Network, is modeled on CPAN, the Perl community’s archive of “all things Perl.” PGXN will provide four major pieces of infrastructure to the PostgreSQL community:

I’ve been wanting to start this project for a long time, but given my need to pay the bills, it didn’t seem like I’d ever be able to find the time for it. Then Josh Berkus suggested that we try to get community interest and raise money for me to have the time to work on it. So I jumped on that, putting in the hours needed to get general approval from the core PostgreSQL developers and to create a reasonable project plan and web site. And thanks to MyYearook’s and PGX’s backing, I’m really excited about it. I hope to start on it in August.

If you’d like to contribute, first: Thank You!. The PGXN site has a Google Checkout widget that makes it easy to make a donation. If you’d rather pay by some other means (checks are great for us!), drop me a line and we’ll work something out. We have a few levels of contribution as well, including permanent linkage on the PGXN site for your organization, as well as the usual t-shirts launch party invitations.

  • E-mail this story to a friend!
  • Sphinn
  • StumbleUpon
  • Facebook
  • del.icio.us
  • LinkedIn
  • TwitThis
  • Digg
  • Google
  • MySpace
  • Reddit
  • StumbleUpon
  • Technorati
  • Yahoo! Buzz

PGAN Bikeshedding

I’ve put together a description of PGAN, the PostgreSQL extension distribution system I plan to develop later this year based on the Comprehensive Archive Perl Network or CPAN. Its primary features will be:

  • Extension distribution
  • Search site with extension documentation
  • Client for downloading, building, testing, and installing extensions.

I’ve never been thrilled with the name, though, so I’m asking for suggestions for a better one. I’ve used the term "extension" here because it seems to be the term that the PostgreSQL community has settled on, but other terms might work, since things other than extensions might be distributed.

What I’ve come up with so far is:

Name Long Name Pronounciation Advantages Disadvantages
PGAN PostgreSQL Add-on Network pee-gan Short, similar to CPAN Ugly
PGEX PostgreSQL Extensions pee-gee-ex or pee-gex Short, easier to pronounce Too similar to PGX)
PGCAN PostgreSQL Comprehensive Archive Network pee-gee-can Similar to CPAN Similar to CPAN
PGDAN PostgreSQL Distribution Archive Network pee-gee-dan Short, easy to pronounce Who’s “Dan”? Doesn’t distribute PostgreSQL itself.
PGEDAN PostgreSQL Extension Distribution Archive Network pee-gee-ee-dan References extensions Long, sounds stupid

Of these, I think I like “PGEX” best, but none are really great. So I’m opening up the bike shed to all. What’s a better name? Or if you can’t think of one, which of the above do you like best? Just leave a comment on this post. The only requirements for suggestions are that a .org domain be available and that it suck less than the alternatives.

Comments close in 2 weeks. Thanks!

  • E-mail this story to a friend!
  • Sphinn
  • StumbleUpon
  • Facebook
  • del.icio.us
  • LinkedIn
  • TwitThis
  • Digg
  • Google
  • MySpace
  • Reddit
  • StumbleUpon
  • Technorati
  • Yahoo! Buzz

Execute SQL Code on Connect

I’ve been writing a fair bit of PL/Perl for a client, and one of the things I’ve been doing is eliminating a ton of duplicate code by creating utility functions in the %_SHARED hash. This is great, as long as the code that creates those functions gets executed at the beginning of every database connection. So I put the utility generation code into a single function, called prepare_perl_utils(). It looks something like this:

CREATE OR REPLACE FUNCTION prepare_perl_utils(
) RETURNS bool LANGUAGE plperl IMMUTABLE AS $$
    # Don't bother if we've already loaded.
    return 1 if $_SHARED{escape_literal};

    $_SHARED{escape_literal} = sub {
        $_[0] =~ s/'/''/g; $_[0] =~ s/\\/\\\\/g; $_[0];
    };

    # Create other code refs in %_SHARED…
$$;

So now all I have to do is make sure that all the client’s apps execute this function as soon as they connect, so that the utilities will all be loaded up and ready to go. Here’s how I did it.

First, for the Perl app, I just took advantage of the DBI’s callbacks to execute the SQL I need when the DBI connects to the database. That link might not work just yet, as the DBI’s callbacks have only just been documented and that documentation appears only in dev releases so far. Once 1.611 drops, the link should work. At any rate, the use of callbacks I’m exploiting here has been in the DBI since 1.49, which was released in November 2005.

The approach is the same as I’ve described before: Just specify the Callbacks parameter to DBI->connect, like so:

my $dbh = DBI->connect_cached($dsn, $user, $pass, {
    PrintError     => 0,
    RaiseError     => 1,
    AutoCommit     => 1,
    Callbacks      => {
        connected => sub { shift->do('SELECT prepare_perl_utils()' },
    },
});

That’s it. The connected method is a no-op in the DBI that gets called to alert subclasses that they can do any post-connection initialization. Even without a subclass, we can take advantage of it to do our own initialization.

It was a bit trickier to make the same thing happen for the client’s Rails app. Rails, alas, provides no on-connection callbacks. So we instead have to monkey-patch Rails to do what we want. With some help from “dfr|mac” on #rubyonrails (I haven’t touched Rails in 3 years!), I got it worked down to this:

class ActiveRecord::ConnectionAdapters::PostgreSQLAdapter
  def initialize_with_perl_utils(*args)
    returning(initialize_without_perl_utils(*args)) do
      execute('SELECT prepare_perl_utils()')
    end
  end
  alias_method_chain :initialize, :perl_utils
end

Basically, we overpower the PostgreSQL adapter’s initialize method and have it call initialize_with_perl_utils before it returns. It’s a neat trick; if you’re going to practice fuck typing, alias_method_chain makes it about as clean as can be, albeit a little too magical for my tastes.

Anyway, recorded here for posterity (my blog is my other brain!).

  • E-mail this story to a friend!
  • Sphinn
  • StumbleUpon
  • Facebook
  • del.icio.us
  • LinkedIn
  • TwitThis
  • Digg
  • Google
  • MySpace
  • Reddit
  • StumbleUpon
  • Technorati
  • Yahoo! Buzz

Powered by KinoSearch