Managing Key/Value Pairs in PostgreSQL

Let’s say that you’ve been following the latest research in key/value data storage and are interested in managing such data in a PostgreSQL database. You want to have functions to store and retrieve pairs, but there is no natural way to represent pairs in SQL. Many languages have hashes or or data dictionaries to fulfill this role, and you can pass them to functional interfaces. SQL’s got nothin’. In PostgreSQL, have two options: use nested arrays (simple, fast) or use a custom composite data type (sugary, legible).

Let’s assume you have this table for storing your pairs:

CREATE TEMPORARY TABLE kvstore (
    key        TEXT PRIMARY KEY,
    value      TEXT,
    expires_at TIMESTAMPTZ DEFAULT NOW() + '12 hours'::interval
);

To store pairs, you can use nested arrays like so:

SELECT store(ARRAY[ ['foo', 'bar'], ['baz', 'yow'] ]);

Not too bad, and since SQL arrays are a core feature of PostgreSQL, there’s nothing special to do. Here’s the store() function:

CREATE OR REPLACE FUNCTION store(
    params text[][]
) RETURNS VOID LANGUAGE plpgsql AS $$
BEGIN
    FOR i IN 1 .. array_upper(params, 1) LOOP
        UPDATE kvstore
           SET value      = params[i][2],
               expires_at = NOW() + '12 hours'::interval
         WHERE key        = param[i][1];
        CONTINUE WHEN FOUND;
        INSERT INTO kvstore (key, value)
        VALUES (params[i][1], params[i][2]);
    END LOOP;
END;
$$;

I’ve seen worse. The trick is to iterate over each nested array, try an update for each, and insert when no row is updated. Alas, you have no control over how many elements a user might include in a nested array. One might call it as:

SELECT store(ARRAY[ ['foo', 'bar', 'baz'] ]);

Or:

SELECT store(ARRAY[ ['foo'] ]);

No errors will be thrown in either case. In the first the "baz" will be ignored, and in the second the value will default to NULL. If you really didn’t like these behaviors, you could add some code to throw an exception if array_upper(params, 2) returns anything other than 2.

Let’s look at fetching values for keys. PostgreSQL 8.4 added variadic function arguments, so it’s easy to provide a nice interface for retrieving one or more values. The obvious one fetches a single value:

CREATE OR REPLACE FUNCTION getval(
    text
) RETURNS TEXT LANGUAGE SQL AS $$
    SELECT value FROM kvstore WHERE key = $1;
$$;

Nice and simple:

SELECT getval('baz');

 getval 
--------’
 yow

The variadic version looks like this:

CREATE OR REPLACE FUNCTION getvals(
    variadic text[]
) RETURNS SETOF text LANGUAGE SQL AS $$
    SELECT value
      FROM kvstore
      JOIN (SELECT generate_subscripts($1, 1)) AS f(i)
        ON kvstore.key = $1[i]
     ORDER BY i;
$$;

Note the use of ORDER BY i to ensure that the values are returned in the same order as the keys are passed to the function. So if I’ve got the key/value pairs ’foo’ => ‘bar’ and ’baz’ => ‘yow’, the output is:

SELECT * FROM getvals('foo', 'baz');

 getvals 
---------
 bar
 yow

If we want to the rows to have the keys and values together, we can return them as arrays, like so:

CREATE OR REPLACE FUNCTION getpairs(
    variadic text[]
) RETURNS SETOF text[] LANGUAGE SQL AS $$
    SELECT ARRAY[key, value]
      FROM kvstore
      JOIN unnest($1) AS k ON kvstore.key = k
$$;

Here I’m assuming that order isn’t important, which means we can use unnest to "flatten" the array, instead of the slightly more baroque generate_subscripts() with array access. The output:

SELECT * FROM getpairs('foo', 'baz');

  getpairs   
-------------
 {baz,yow}
 {foo,bar}

Now, this is good as far as it goes, but the use of nested arrays to represent key/value pairs is not exactly ideal: just looking at the use of a function, there’s nothing to indicate that you’re using key/value pairs. What would be ideal is to use row constructors to pass arbitrary pairs:

SELECT store( ROW('foo', 'bar'), ROW('baz', 42) );

Alas, one cannot pass RECORD values (the data type returned by ROW()) to non-C functions in PostgreSQL.1 But if you don’t mind your keys and values always being TEXT, we can get almost all the way there by creating an "ordered pair" data type as a composite type like so:

CREATE TYPE pair AS ( k text, v text );

Then we can create store() with a signature of VARIADIC pair[] and pass in any number of these suckers:

CREATE OR REPLACE FUNCTION store(
    params variadic pair[]
) RETURNS VOID LANGUAGE plpgsql AS $$
DECLARE
    param pair;
BEGIN
    FOR param IN SELECT * FROM unnest(params) LOOP
        UPDATE kvstore
           SET value = param.v,
               expires_at = NOW() + '12 hours'::interval
         WHERE key = param.k;
        CONTINUE WHEN FOUND;
        INSERT INTO kvstore (key, value) VALUES (param.k, param.v);
    END LOOP;
END;
$$;

Isn’t it nice how we can access keys and values as param.k and param.v? Call the function like this:

SELECT store( ROW('foo', 'bar')::pair, ROW('baz', 'yow')::pair );

Of course, that can get a bit old, casting to pair all the time, so let’s create some pair constructor functions to simplify things:

CREATE OR REPLACE FUNCTION pair(anyelement, text)
RETURNS pair LANGUAGE SQL AS 'SELECT ROW($1, $2)::pair';

CREATE OR REPLACE FUNCTION pair(text, anyelement)
RETURNS pair LANGUAGE SQL AS 'SELECT ROW($1, $2)::pair';

CREATE OR REPLACE FUNCTION pair(anyelement, anyelement)
RETURNS pair LANGUAGE SQL AS 'SELECT ROW($1, $2)::pair';

CREATE OR REPLACE FUNCTION pair(text, text)
RETURNS pair LANGUAGE SQL AS 'SELECT ROW($1, $2)::pair;';

I’ve created four variants here to allow for the most common combinations of types. So any of the following will work:

SELECT pair('foo', 'bar');
SELECT pair('foo', 1);
SELECT pair(12.3, 'foo');
SELECT pair(1, 43);

Alas, you can’t mix any other types, so this will fail:

SELECT pair(1, 12.3);

ERROR:  function pair(integer, numeric) does not exist
LINE 1: SELECT pair(1, 12.3);

We could create a whole slew of additional constructors, but since we’re using a key/value store, it’s likely that the keys will usually be text anyway. So now we can call store() like so:

SELECT store( pair('foo', 'bar'), pair('baz', 'yow') );

Better, eh? Hell, we can go all the way and create a nice binary operator to make it still more sugary. Just map each of the pair functions to the operator like so:

CREATE OPERATOR -> (
    LEFTARG   = text,
    RIGHTARG  = anyelement,
    PROCEDURE = pair
);

CREATE OPERATOR -> (
    LEFTARG   = anyelement,
    RIGHTARG  = text,
    PROCEDURE = pair
);

CREATE OPERATOR -> (
    LEFTARG   = anyelement,
    RIGHTARG  = anyelement,
    PROCEDURE = pair
);

CREATE OPERATOR -> (
    LEFTARG   = text,
    RIGHTARG  = text,
    PROCEDURE = pair
);

Looks like a lot of repetition, I know, but checkout the new syntax:

SELECT store( 'foo' -> 'bar', 'baz' -> 1 );

Cute, eh? I chose to use -> because => is deprecated as an operator in PostgreSQL 9.0: SQL 2011 reserves that operator for named parameter assignment.2

As a last twist, let’s rewrite getpairs() to return pairs instead of arrays:

CREATE OR REPLACE FUNCTION getpairs(
    variadic text[]
) RETURNS SETOF pair LANGUAGE SQL AS $$
    SELECT key -> value
      FROM kvstore
      JOIN unnest($1) AS k ON kvstore.key = k
$$;

Cute, eh? Its use is just like before, only now the output is more table-like:

SELECT * FROM getpairs('foo', 'baz');

  k  |   v   
-----+-------
 baz | yow
 foo | bar

You can also get them back as composites by omitting * FROM:

SELECT getpairs('foo', 'baz');

  getpairs   
-------------
 (foo,bar)
 (baz,yow)

Anyway, just something to consider the next time you need a function that allows any number of key/value pairs to be passed. It’s not perfect, but it’s pretty sweet.


  1. In the recent pgsql-hackers discussion that inspired this post, Pavel Stehule suggested adding something like Oracle COLLECTIONs to address this shortcoming. I don’t know how far this idea will get, but it sure would be nice to be able to pass objects with varying kinds of data, rather than be limited to data all of one type (values in an SQL array must all be of the same type).

  2. No, you won’t be able to use named parameters for this application because named parameters are inherently non-variadic. That is, you can only pre-declare so many named parameters: you can’t anticipate every parameter that’s likely to be wanted as a key in our key/value store.

  • E-mail this story to a friend!
  • Sphinn
  • StumbleUpon
  • Facebook
  • del.icio.us
  • LinkedIn
  • TwitThis
  • Digg
  • Google
  • MySpace
  • Reddit
  • StumbleUpon
  • Technorati
  • Yahoo! Buzz

PGXN Blog and Twitterstream

I crated the PGXN Blog yesterday. Tune in there for news and announcements. I’ll also be posting status reports once development gets underway, so that all you fans out there can follow my progress. Once the site is done (or at 1.0 anyway), the blog will be used for announcements, discussion of support issues, etc. So tune in!

Oh, and I created a PGXN Twitterstream, too. You should follow it! New blog posts will be tweeted, and once the site gets going, new uploads will be tweeted, too. Check it out!

  • E-mail this story to a friend!
  • Sphinn
  • StumbleUpon
  • Facebook
  • del.icio.us
  • LinkedIn
  • TwitThis
  • Digg
  • Google
  • MySpace
  • Reddit
  • StumbleUpon
  • Technorati
  • Yahoo! Buzz

PGXN Development Project

I’m pleased to announce the launch of the PGXN development project. I’ve written a detailed specification and pushed it through general approval on pgsql-hackers. I’ve written up a detailed project plan and estimated things at a highly reduced PostgreSQL Experts rate to come up with a fundraising goal: $25,000. And now, thanks to founding contributions from myYearbook.com, and PostgreSQL Experts, we have started the fundraising phase of the project.

So what’s this all about? PGXN, the PostgreSQL Extension Network, is modeled on CPAN, the Perl community’s archive of “all things Perl.” PGXN will provide four major pieces of infrastructure to the PostgreSQL community:

I’ve been wanting to start this project for a long time, but given my need to pay the bills, it didn’t seem like I’d ever be able to find the time for it. Then Josh Berkus suggested that we try to get community interest and raise money for me to have the time to work on it. So I jumped on that, putting in the hours needed to get general approval from the core PostgreSQL developers and to create a reasonable project plan and web site. And thanks to MyYearook’s and PGX’s backing, I’m really excited about it. I hope to start on it in August.

If you’d like to contribute, first: Thank You!. The PGXN site has a Google Checkout widget that makes it easy to make a donation. If you’d rather pay by some other means (checks are great for us!), drop me a line and we’ll work something out. We have a few levels of contribution as well, including permanent linkage on the PGXN site for your organization, as well as the usual t-shirts launch party invitations.

  • E-mail this story to a friend!
  • Sphinn
  • StumbleUpon
  • Facebook
  • del.icio.us
  • LinkedIn
  • TwitThis
  • Digg
  • Google
  • MySpace
  • Reddit
  • StumbleUpon
  • Technorati
  • Yahoo! Buzz

PGAN Bikeshedding

I’ve put together a description of PGAN, the PostgreSQL extension distribution system I plan to develop later this year based on the Comprehensive Archive Perl Network or CPAN. Its primary features will be:

  • Extension distribution
  • Search site with extension documentation
  • Client for downloading, building, testing, and installing extensions.

I’ve never been thrilled with the name, though, so I’m asking for suggestions for a better one. I’ve used the term "extension" here because it seems to be the term that the PostgreSQL community has settled on, but other terms might work, since things other than extensions might be distributed.

What I’ve come up with so far is:

Name Long Name Pronounciation Advantages Disadvantages
PGAN PostgreSQL Add-on Network pee-gan Short, similar to CPAN Ugly
PGEX PostgreSQL Extensions pee-gee-ex or pee-gex Short, easier to pronounce Too similar to PGX)
PGCAN PostgreSQL Comprehensive Archive Network pee-gee-can Similar to CPAN Similar to CPAN
PGDAN PostgreSQL Distribution Archive Network pee-gee-dan Short, easy to pronounce Who’s “Dan”? Doesn’t distribute PostgreSQL itself.
PGEDAN PostgreSQL Extension Distribution Archive Network pee-gee-ee-dan References extensions Long, sounds stupid

Of these, I think I like “PGEX” best, but none are really great. So I’m opening up the bike shed to all. What’s a better name? Or if you can’t think of one, which of the above do you like best? Just leave a comment on this post. The only requirements for suggestions are that a .org domain be available and that it suck less than the alternatives.

Comments close in 2 weeks. Thanks!

  • E-mail this story to a friend!
  • Sphinn
  • StumbleUpon
  • Facebook
  • del.icio.us
  • LinkedIn
  • TwitThis
  • Digg
  • Google
  • MySpace
  • Reddit
  • StumbleUpon
  • Technorati
  • Yahoo! Buzz

Execute SQL Code on Connect

I’ve been writing a fair bit of PL/Perl for a client, and one of the things I’ve been doing is eliminating a ton of duplicate code by creating utility functions in the %_SHARED hash. This is great, as long as the code that creates those functions gets executed at the beginning of every database connection. So I put the utility generation code into a single function, called prepare_perl_utils(). It looks something like this:

CREATE OR REPLACE FUNCTION prepare_perl_utils(
) RETURNS bool LANGUAGE plperl IMMUTABLE AS $$
    # Don't bother if we've already loaded.
    return 1 if $_SHARED{escape_literal};

    $_SHARED{escape_literal} = sub {
        $_[0] =~ s/'/''/g; $_[0] =~ s/\\/\\\\/g; $_[0];
    };

    # Create other code refs in %_SHARED…
$$;

So now all I have to do is make sure that all the client’s apps execute this function as soon as they connect, so that the utilities will all be loaded up and ready to go. Here’s how I did it.

First, for the Perl app, I just took advantage of the DBI’s callbacks to execute the SQL I need when the DBI connects to the database. That link might not work just yet, as the DBI’s callbacks have only just been documented and that documentation appears only in dev releases so far. Once 1.611 drops, the link should work. At any rate, the use of callbacks I’m exploiting here has been in the DBI since 1.49, which was released in November 2005.

The approach is the same as I’ve described before: Just specify the Callbacks parameter to DBI->connect, like so:

my $dbh = DBI->connect_cached($dsn, $user, $pass, {
    PrintError     => 0,
    RaiseError     => 1,
    AutoCommit     => 1,
    Callbacks      => {
        connected => sub { shift->do('SELECT prepare_perl_utils()' },
    },
});

That’s it. The connected method is a no-op in the DBI that gets called to alert subclasses that they can do any post-connection initialization. Even without a subclass, we can take advantage of it to do our own initialization.

It was a bit trickier to make the same thing happen for the client’s Rails app. Rails, alas, provides no on-connection callbacks. So we instead have to monkey-patch Rails to do what we want. With some help from “dfr|mac” on #rubyonrails (I haven’t touched Rails in 3 years!), I got it worked down to this:

class ActiveRecord::ConnectionAdapters::PostgreSQLAdapter
  def initialize_with_perl_utils(*args)
    returning(initialize_without_perl_utils(*args)) do
      execute('SELECT prepare_perl_utils()')
    end
  end
  alias_method_chain :initialize, :perl_utils
end

Basically, we overpower the PostgreSQL adapter’s initialize method and have it call initialize_with_perl_utils before it returns. It’s a neat trick; if you’re going to practice fuck typing, alias_method_chain makes it about as clean as can be, albeit a little too magical for my tastes.

Anyway, recorded here for posterity (my blog is my other brain!).

  • E-mail this story to a friend!
  • Sphinn
  • StumbleUpon
  • Facebook
  • del.icio.us
  • LinkedIn
  • TwitThis
  • Digg
  • Google
  • MySpace
  • Reddit
  • StumbleUpon
  • Technorati
  • Yahoo! Buzz

No more USE_PGXS=1?

I’ve become very tired of having to set USE_PGXS=1 every time I build pgTAP outside the contrib directory of a PostgreSQL distribution:

make USE_PGXS=1
make USE_PGXS=1 install
make USE_PGXS=1 installcheck

I am forever forgetting to set it, and it’s just not how one normally expects a build incantation to work. It was required because that’s how the core contrib extensions work: They all have this code in their Makefiles, which those of us who develop third-party modules have borrowed:

ifdef USE_PGXS
PG_CONFIG = pg_config
PGXS := $(shell $(PG_CONFIG) —pgxs)
include $(PGXS)
else
subdir = contrib/citext
top_builddir = ../..
include $(top_builddir)/src/Makefile.global
include $(top_srcdir)/contrib/contrib-global.mk
endif

They generally expect ../../src/Makefile.global to exist, and if it doesn’t, you have to tell it so. I find this annoying, because third-party extensions are almost never built from the contrib directory, so one must always remember to specify USE_PGXS=1.

I’d like to propose, instead, that those of us who maintain third-party extensions like pgTAP, PL/Parrot, and Temporal PostgreSQL not force our users to have to remember this special variable by instead checking to see if it’s needed ourselves. As such, I’ve just added this code to pgTAP’s Makefile:

ifdef USE_PGXS
PG_CONFIG = pg_config
PGXS := $(shell $(PG_CONFIG) —pgxs)
else
ifeq (exists, $(shell [ -e ../../src/bin/pg_config/pg_config ] && echo exists) ) 
top_builddir = ../..
PG_CONFIG := $(top_builddir)/src/bin/pg_config/pg_config
else
PG_CONFIG = pg_config
PGXS := $(shell $(PG_CONFIG) —pgxs)
endif
endif

So it still respects USE_PGXS=1, but if it’s not set, it looks to see if it can find pg_config where it would expect it to be if built from the contrib directory. If it’s not there, it simply uses pg_config as if USE_PGXS=1 was set. This makes building from the contrib directory or from anywhere else the same process:

make
make install
make installcheck

Much better, much easier to remember.

Is there any reason why third-party PostgreSQL extensions should not adopt this pattern? I don’t think it makes sense for contrib extensions in core to do it, but for those that will never be in core, I think it makes a lot of sense.

Comments?

  • E-mail this story to a friend!
  • Sphinn
  • StumbleUpon
  • Facebook
  • del.icio.us
  • LinkedIn
  • TwitThis
  • Digg
  • Google
  • MySpace
  • Reddit
  • StumbleUpon
  • Technorati
  • Yahoo! Buzz

SQL Hack: The Something-est From Each Entity

This is a pattern that I have dealt with many times, but never figured out how to adequately handle. Say that you have imported a mailbox into your database, and you want a list of the latest messages between each pair of recipients (sender and receiver — I’m ignoring multiple receivers for the moment). The data might look like this:

BEGIN;

CREATE TABLE messages (
   sender   TEXT        NOT NULL,
   receiver TEXT        NOT NULL,
   sent_at  TIMESTAMPTZ NOT NULL DEFAULT clock_timestamp(),
   body     TEXT        NOT NULL DEFAULT ''
);

INSERT INTO messages ( sender, receiver, body )
VALUES ('Theory', 'Strongrrl', 'Hi There.' );

INSERT INTO messages ( sender, receiver, body )
VALUES ('Strongrrl', 'Theory', 'Hi yourself.' );

INSERT INTO messages ( sender, receiver, body )
VALUES ('Anna', 'Theory', 'What''s for dinner?' );

INSERT INTO messages ( sender, receiver, body )
VALUES ('Theory', 'Anna', 'Brussels Sprouts.' );

INSERT INTO messages ( sender, receiver, body )
VALUES ('Anna', 'Theory', 'Oh man!' );

COMMIT;

So the goal is to show the most recent message between Theory and Strongrrl and the most recent message between Theory and Anna, without regard to who is the sender and who is the receiver. After running into this many times, today I consulted my colleagues, showing them this dead simple (and wrong!) query to demonstrate what I wanted:

SELECT sender, recipient, sent_at, body
  FROM messages
 GROUP BY sender, recipient
HAVING sent_at = max(sent_at);

That’s wrong because one can’t have columns in the SELECT expression that are not either aggregate expressions or included in theGROUP BY expression. It’s a violation of the standard (and prone to errors, I suspect). Andrew immediately said, “Classic case for DISTINCT ON”. This lovely little expression is a PostgreSQL extension not included in the SQL standard. It’s implementation looks like this:

SELECT DISTINCT ON (
          CASE WHEN receiver > sender
              THEN receiver || sender
              ELSE sender   || receiver
          END
       ) sender, receiver, sent_at, body
  FROM messages
 ORDER BY CASE WHEN receiver > sender
              THEN receiver || sender
              ELSE sender   || receiver
          END, sent_at DESC;

This query is saying, “fetch the rows where the sender and the receiver are distinct, and order by sent_at DESC. THE CASE statement to get a uniform value for the combination of sender and receiver is a bit unfortunate, but it does the trick:

  sender   | receiver |            sent_at            |     body     
-----------+----------+-------------------------------+--------------
 Anna      | Theory   | 2010-01-12 05:00:07.026711+00 | Oh man!
 Strongrrl | Theory   | 2010-01-12 05:00:07.02589+00  | Hi yourself.

Great, exactly the data I wanted. And the CASE statement can actually be indexed to speed up filtering. But I wondered if it would be possible to get the same results without the DISTINCT ON. In other words, can this be done with standard SQL? If you’re using PostgreSQL 8.4, the answer is “yes.” All you have to do is exploit window functions and a subquery. It looks like this:

SELECT sender, receiver, sent_at, body
  FROM (
    SELECT sender, receiver, sent_at, body,
           row_number() OVER ( PARTITION BY 
               CASE WHEN receiver > sender
                   THEN receiver || sender
                   ELSE sender   || receiver
               END
               ORDER BY sent_at DESC
           ) AS rnum
      FROM messages
  ) AS t
 WHERE rnum = 1;

Same nasty CASE statement as before (no way around it with this database design, alas), but this is fully conforming SQL. It’s also the first time I’ve ever used window functions. If you just focus on the row_number() OVER () expression, it’s simply partitioning the table according to the same value as in the DISTINCT ON value, but it’s ordering it by sent_at directly. The result is a row number, where the first is 1 for the most recent message for each combination of recipients. Then we just filter for that in the WHERE clause.

Not exactly intuitive (I’m really only understanding it now as I explain write it out), but quite straight-forward once you accept the expressivity in this particular OVER expression. It might be easier to understand if we remove some of the cruft. If instead we wanted the most recent message from each sender (regardless of the recipient), we’d write:

SELECT sender, receiver, sent_at, body
  FROM (
    SELECT sender, receiver, sent_at, body,
           row_number() OVER (
               PARTITION BY sender ORDER BY sent_at DESC
           ) AS rnum
      FROM messages
  ) AS t
 WHERE rnum = 1;

And that yields:

  sender   | receiver |            sent_at            |     body     
-----------+----------+-------------------------------+--------------
 Anna      | Theory   | 2010-01-12 05:00:07.026711+00 | Oh man!
 Strongrrl | Theory   | 2010-01-12 05:00:07.02589+00  | Hi yourself.
 Theory    | Anna     | 2010-01-12 05:00:07.24982+00  | Brussels Sprouts.

Furthermore, we can use a common table expression to eliminate the subquery. This query is functionally identical to the subquery example (returning to uniqueness for sender and receiver), just with the WITH clause coming before the SELECT clause, setting things up for it:

WITH t AS (
    SELECT sender, receiver, sent_at, body,
           row_number() OVER (PARTITION BY CASE
               WHEN receiver > sender
                   THEN receiver || sender
                   ELSE sender   || receiver
                   END
               ORDER BY sent_at DESC
           ) AS rnum
      FROM messages
) SELECT sender, receiver, sent_at, body
    FROM t
   WHERE rnum = 1;

So it’s kind of like putting the subquery first, only it’s not a subquery, it’s more like a temporary view. Nice, eh? Either way, the results are the same as before:

  sender   | receiver |            sent_at            |     body     
-----------+----------+-------------------------------+--------------
 Anna      | Theory   | 2010-01-12 05:00:07.026711+00 | Oh man!
 Strongrrl | Theory   | 2010-01-12 05:00:07.02589+00  | Hi yourself.

I hereby dub this “The Entity’s Something-est” pattern (I’m certain someone else has already come up with a good name for it, but this will do). I can see it working any place requiring the highest, lowest, latest, earliest, or something else-est item from each of a list of entities. Perhaps the latest headline from every news source:

WITH t AS (
    SELECT source, headline, dateline, row_number() OVER (
               PARTITION BY source ORDER BY dateline DESC
           ) AS rnum
      FROM news
) SELECT source, headline, dateline
    FROM t
   WHERE rnum = 1;

Or perhaps the lowest score for for each basketball team over the course of a season:

WITH t AS (
    SELECT team, date, score, row_number() OVER (
               PARTITION BY team ORDER BY score
           ) AS rnum
      FROM games
) SELECT team, date, score
    FROM t
   WHERE rnum = 1;

Easy! How have you handled a situation like this in your database hacking?

  • E-mail this story to a friend!
  • Sphinn
  • StumbleUpon
  • Facebook
  • del.icio.us
  • LinkedIn
  • TwitThis
  • Digg
  • Google
  • MySpace
  • Reddit
  • StumbleUpon
  • Technorati
  • Yahoo! Buzz

Powered by KinoSearch