Committed: pgTAP Result Set Assertion Functions

Regular readers will know that I’ve been thinking a lot about testing SQL result sets and how to how to name result testing functions, and various implementation issues. I am very happy to say that I’ve now committed the first three such test functions to the Git repository. They’ve been tested on 8.4 and 8.3. Here’s what I came up with.

I had a pretty good idea how to compare sets and how to compare ordered bags, but ordered sets and unordered bags of results escaped me. During two days of intense hacking and experimentation, I quickly wrote set_eq(), which performs a set comparison of the results of two queries, and obag_eq(), which performs a row-by-row comparison of the results of two queries. I then set to work on bag_eq(), which would do a set comparison but require the same number of duplicate rows between the two queries. set_eq() was easy because I just needed to create temporary tables of the two queries and then execute two EXCEPT queries against them to see where they differ, if at all. bag_eq() was getting kind of hairy, though, so I asked about it on the Freenode #postgresql channel, where depesz looked at my example and pointed out that EXCEPT ALL would do just want I needed.

Hot damn, all it took was the addition a single extra word to the same queries used by set_eq() and I was set. This made me very happy, and such well-thought-out features are the reason I love PostgreSQL. My main man depesz made my day.

But oset_eq(), which was to compare ordered sets of results was proving much harder. The relational operators that operate on sets don’t care about order, so I would have to write the code to care myself. And because dupes needed to be ignored, it got even harder. In fact, it proved just not worth the effort. The main reason I abandoned this test function, though, was not difficulties of implementation (which were significant), but ambiguity of interpretation. After all, if duplicates are allowed but ignored, how does one deal with their effect on order? For example, say that I have two queries that order people based on name. One query might order them like so:

select * from people order by name;
  name  | age 
--------+-----
 Damian |  19
 Larry  |  53
 Tom    |  35
 Tom    |  44
 Tom    |  35

Another run of the same query could give me a different order:

select * from people order by name;
  name  | age 
--------+-----
 Damian |  19
 Larry  |  53
 Tom    |  35
 Tom    |  35
 Tom    |  44

Because I ordered only on “name,” the database was free to sort records with the same name in an undefined way. Meaning that the rows could be in different orders. This is known, if I understand correctly, as a “Partially ordered set,” or “poset.” Which is all well and good, but from my point of view makes it damn near impossible to be able to do a row-by-row comparison and ignore dupes, because they could be in different orders!

So once I gave up on that, I was down to three functions instead of four, and only one depends on ordering. So I also dropped the idea of having the “o” in the function names. Instead, I changed obag_eq() to results_eq(), and now I think I have three much more descriptive names. To summarize, the functions are:

results_eq
Compares two result sets row by row, meaning that they must be in the same order and have the same number of duplicate rows in the same places.
set_eq
Compares result sets to ensure they have the same rows, without regard to order or duplicate rows.
bag_eq
Compares result sets without regard to order, but each must have the same duplicate rows.

I’m very happy with this, because I was able to give up on the stupid function names with the word “order” included or implicit in them. Plus, I have different names for functions that are similar, which is nicely in adherence to the principle of distinction. They all provide nice diagnostics on failure, as well, like this from results_eq():

# Failed test 146
#     Results differ beginning at row 3:
#         have: (1,Anna)
#         want: (22,Betty)

Or this from set_eq() or bag_eq()

# Failed test 146
#     Extra records:
#         (87,Jackson)
#         (1,Jacob)
#     Missing records:
#         (44,Anna)
#         (86,Angelina)

set_eq() and bag_eq() also offer up useful diagnostics when the data types of the rows vary:

# Failed test 147
#     Columns differ between queries:
#         have: (integer,text)
#         want: (text,integer)

results_eq() doesn’t have access to such data, though if I can find some tuits (got any to give me?), I’ll write a quick C function that can return an array of the data types in a record object.

Now, as for the issue of arguments, what I settled on is, like Epic, passing strings of SQL to these functions. However, unlike Epic, if you pass in a simple string with no spaces, or a double-quoted string, pgTAP assumes that it’s the name of a prepared statement. The documentation now recommends prepared statements, which you can use like this:

PREPARE my_test AS SELECT * FROM active_users() WHERE name LIKE ‘A%’;
PREPARE expect AS SELECT * FROM users WHERE active = $1 AND name LIKE $2;
SELECT results_eq('my_test', 'expect');

This allows you to keep your SQL written as SQL, keeping your test, um, SQLish. But in those cases where you have some really simple SQL, you can just use that, too:

SELECT set_eq(
    'SELECT * FROM active_users()',
    'SELECT * FROM users ORDER BY id'
);

This feels like a good compromise to me, allowing the best of both worlds: keeping things in pure SQL to avoid quoting ugliness in SQL strings, while letting users pass in SQL strings if they really want to.

It turns out that I wasn’t able to support cursors for set_eq() or bag_eq(), because they use the statements passed to them to create temporary tables and then compare the records in those temporary tables. But results_eq() uses cursors internally. And it turns out that there’s a data type for cursors, refcursor. So it was easy to add cursor support to results_eq() for those who want to use it:

DECLARE cwant CURSOR FOR SELECT * FROM active_users();
DECLARE chave CURSOR FOR SELECT * FROM users WHERE active ORDER BY name;
SELECT results_eq('cwant'::refcursor, 'chave'::refcursor );

Neat, huh? As I said, I’m very pleased with this approach overall. There are a few caveats, such as less strict comparisons in results_eq() on 8.3 and lower, and less useful diagnostics for data type differences in results_eq(), but overall, I think that the implementation is pretty good, and that these functions will be really useful.

So what do you think? Please clone the Git repository and take the functions for a test drive on 8.3 or 8.4. Let me know what you think!

In the meantime, before releasing a new version, I still plan to add:

  • set_includes() - Set includes records in another set.
  • set_excludes() - Set excludes records in another set.
  • bag_includes() - Bag includes records in another bag.
  • bag_excludes() - Bag excludes records in another bag.
  • col_eq() - Single column result set equivalent to an array of values.
  • row_eq() - Single row form a query equivalent to a record.
  • rowtype_is() - The data type of the rows in a query is equivalent to an array of types.

Hopefully I can find some time to work on those next week. The only challenging one is row_eq(), so I may skip that one for now.

  • E-mail this story to a friend!
  • Sphinn
  • StumbleUpon
  • Facebook
  • del.icio.us
  • LinkedIn
  • TwitThis
  • Digg
  • Google
  • MySpace
  • Reddit
  • StumbleUpon
  • Technorati
  • Yahoo! Buzz

pgTAP Set-Testing Update

I’ve been thinking more about testing SQL result sets and how to name functions that do such testing, and I’ve started to come to some conclusions. Some of the constraints I’m looking at:

  • Cursors are required for tests where the order of the results returned is important. It might be best for such functions to create the cursors themselves.

  • For comparisons where order isn’t important, the results of each SQL statement must be inserted into a temporary table and then the table used for the comparisons. Otherwise, each statement would be executed twice, as is required to calculate symmetric difference. By executing each once and storing the results in a temporary table, we get around this issue (and indeed, this one of the cases that Epic’s global() function addresses).

  • When order is not important, the most efficient way to compare result sets is with symmetric difference. However, said comparison is a set comparison, meaning that duplicate rows are ignored. So if set A has 3 rows and set B has four, but two of those four are identical, then sets A and B can still be equivalent.

I’m starting to think that I would have two basic result set testing functions, set_eq() and bag_eq(). The former would do a set comparison, while the latter would require that duplicate rows be present in both result sets. Unfortunately, that would mean that it would be difficult for set_eq to have a variation that tests ordered sets, as symmetric difference ignores relational ordering. And a simple bag_eq() function would require that the relations be ordered, as it would iterate over each row in each relation in turn and compare row to row. But as I pointed out to commenter “@dave0,” bags are not inherently ordered, so it would be imposing a requirement that’s not necessarily appropriate.

This is starting to drive me a bit nuts.

I think that there are ways to enforce an ordered comparison on a set and an unordered comparison on a bag, but both would be pretty inefficient. Maybe I should do it anyway, include the appropriate caveats in the documentation, and then improve when feasible in the future. In that case, what I’d be looking at is something like this:

set_eq( sql, sql )
Test for set equivalence of two SQL statements.
oset_eq( sql, sql )
Test for ordered set equivalence of two SQL statements.
bag_eq( sql, sql )
Test for row equivalence of two SQL statements.
obag_eq( sql, sql )
Test for ordered row equivalence of two SQL statements.

The preferred tests would be set_eq() and obag_eq(). If a single word is passed to any of these functions, it’s assumed to be a prepared statement. Cursors would be created internally for the functions that require ordered comparison. The non-ordered versions would create temporary tables to hold the values and then use those tables for the comparisons. bag_eq() would also construct cursors on the temporary tables to ensure that rows could be compared in the same order in which they were generated by the SQL statement.

Interface-wise, perhaps a boolean would be preferred to indicate whether or not to compare the rows in an ordered fashion? That would be:

set_eq( sql, sql, bool )
Test for set equivalence of two SQL statements. The sets must be in the same order if the boolean argument is true.
bag_eq( sql, sql, bool )
Test for row equivalence of two SQL statements. The bags must have their rows in the same order if the boolean argument is true.

I like that there are fewer functions this way, but is it harder to remember what the boolean is for? (It would not be required, and would default to false for both functions). Thoughts?

By the way, I would likely throw in a couple of other resultset-comparing functions:

set_includes( sql, sql )
Test that the set returned by the first statement includes the rows returned by the second statement.
set_excludes( sql, sql )
Test that the set returned by the first statement excludes the rows returned by the second statement.
bag_includes( sql, sql )
Test that the bag returned by the first statement includes the rows returned by the second statement, including duplicates.
bag_excludes( sql, sql )
Test that the bag returned by the first statement excludes the rows returned by the second statement, including duplicates.

Seem useful? Please leave a comment with your thoughts.

  • E-mail this story to a friend!
  • Sphinn
  • StumbleUpon
  • Facebook
  • del.icio.us
  • LinkedIn
  • TwitThis
  • Digg
  • Google
  • MySpace
  • Reddit
  • StumbleUpon
  • Technorati
  • Yahoo! Buzz

Need Help Naming Result Set Testing Functions

I’ve been thinking more since I posted about testing SQL result sets, and I think I’ve settled on two sets of functions for pgTAP: one that tests two SQL queries (though you will be encouraged to use a prepared statement), and one to test two cursors. I’m thinking of naming them:

  • query_gets()
  • cursor_gets()

I had been planning on *_returns() or *_yields(), but they didn’t feel right. “Returns” implies that I would be passing a query and a data structure (to me at least), and while I want to support that, too, it’s not what I was looking for right now. “Yield,” on the other hand, is more related to set-returning functions in my mind (even if PL/pgSQL doesn’t use that term). Anyway, I like the use of “gets” because it’s short and pretty unambiguous.

These function will compare query results as unordered sets, but I want variants that test ordered sets, as well. I’ve been struggling to come up with a decent name for these variants, but not liking any very well. The obvious ones are:

  • ordered_query_gets()
  • ordered_cursor_gets()

And:

  • sorted_query_gets()
  • sorted_cursor_gets()

But these are kind of long for functions that will be, I believe, used frequently. I could just add a character to get the same idea, in the spirit of sprintf:

  • oquery_gets()
  • ocursor_gets()

Or:

  • squery_gets()
  • scursor_gets()

I think that these are okay, but might be somewhat confusing. I think that the “s” variant probably won’t fly, since for sprintf and friends, the “s” stands for “string.” So I’m leaning towards the “o” variants.

But I’m throwing it out there for the masses to make suggestions: Got any ideas for better function names? Are there some relational terms for ordered sets, for example, that might make more sense? What do you think?

As a side note, I’m also considering:

  • col_is() to compare the result of a single column query to an array or other query. This would need an ordered variant, as well.
  • row_is(), although I have no idea how I’d be able to support passing a row expression to a function, since PostgreSQL doesn’t allow RECORDs to be passed to functions.
  • E-mail this story to a friend!
  • Sphinn
  • StumbleUpon
  • Facebook
  • del.icio.us
  • LinkedIn
  • TwitThis
  • Digg
  • Google
  • MySpace
  • Reddit
  • StumbleUpon
  • Technorati
  • Yahoo! Buzz

Thoughts on Testing SQL Result Sets

pgTAP: The Critique

I’ve been continuing hacking on pgTAP in order to add a lot more schema-testing functionality and a few other niceties. But back when I started the project, I using it to write tests for CITEXT, which was great for my sanity as I developed it, but proved a bit controversial. In a pgsql-hackers post, Tom Lane wrote:

There was some discussion earlier about whether the proposed regression tests for citext are suitable for use in contrib or not. After playing with them for awhile, I have to come down very firmly on the side of "not". I have these gripes:

  1. The style is gratuitously different from every other regression test in the system. This is not a good thing. If it were an amazingly better way to do things, then maybe, but as far as I can tell the style pgTAP forces on you is really pretty darn poorly suited for SQL tests. You have to contort what could naturally be expressed in SQL as a table result into a scalar. Plus it’s redundant with the expected-output file.

  2. It’s ridiculously slow; at least a factor of ten slower than doing equivalent tests directly in SQL. This is a very bad thing. Speed of regression tests matters a lot to those of us who run them a dozen times per day —– and I do not wish to discourage any developers who don’t work that way from learning better habits ;–)

Because of #1 and #2 I find the use of pgTAP to be a nonstarter.

These are legitimate criticisms, of course. To take the second item first, I would eventually like to figure out a way to make pgTAP a lot faster (in my own benchmarks, I found it to be about 4 times slower than pure SQL, not ten times, but still). A number of functions can likely be rewritten in C, and maybe data can be stored in memory rather than in a temporary table. Overall, though, the speed of the tests doesn’t really concern me much. I’m quite used to large test suites, such as that for Bricolage, that take 5 or 10 minutes or more. This is because, compared to the time it would take me to maintain the code without tests, it’s nothing. I find and fix bugs much more quickly thanks to regression tests. And really, one should just run a subset of the tests for whatever one is working on, and then run the full suite before checking in. One could even have a larger, more comprehensive (read: slower) test suite that’s run via a cron job, so that it identifies bugs in checked in code but developers don’t have to spend a lot of time waiting for tests to finish running.

As a result, I wouldn’t advocate for converting the existing PostgreSQL regression test suite to pgTAP. I could see writing a new suite of tests on pgTAP that run on the build farm. This would be great, as they would complement the existing test suite, and be able to test stuff that can’t be tested with pg_regress.

So really, the performance issue can be addressed in a few ways, some technical, some social, some structural. Like I said, I’m not overly concerned about it, and I wouldn’t make Tom suffer unduly from it, either (I converted all of the CITEXT tests to plain SQL).

Coercing Composite Values

The first issue is tougher, however. Tom was responding to a test like this:

SELECT is(
    ARRAY( SELECT name FROM srt ORDER BY name )::text,
    ARRAY['AAA', 'aardvark', 'aba', 'ABC', 'abc']::text,
    'The words should be case-insensitively sorted'
);

Now, I agree that it’s redundant with the expected-output file, but the assumption with TAP is that there is no expected output file: you just analyze its output using a harness. The need for an expected output file is driven by the legacy of pg_regress.

A bigger issue, and the one I’ll focus on for the remainder of this post, is the requirement currently inherent in pgTAP to “contort what could naturally be expressed in SQL as a table result into a scalar.” The issue is apparent in the above example: even though I’m selecting a number of rows from a table, I use the ARRAY() constructor function to force them into a scalar value—an array—in order to easily do the comparison. It also results in a useful diagnostic message in case the test fails:

# Failed test 40: "The words should be case-insensitively sorted"
#         have: {AAA,aardvark,ABC,abc,aba}
#         want: {AAA,aardvark,aba,ABC,abc}

So for simple cases like this, it doesn’t bother me much personally. But I’ve also had to write tests for functions that return composite types—that is, rows—and again I had to fall back on coercing them into scalar values to do the comparison. For example, say that the fooey() function returns a dude value, which is a composite type with an integer and a text string. Here’s how to test it with pgTAP:

SELECT is(
    fooey()::text,
    ROW( 42, 'Bob' )::text,
    'Should get what we expect from fooey()'
);

So I’m again coercing a value into something else (of course, if I could pass records to functions, that issue goes away). And it does yield nice diagnostics on failure:

# Failed test 96: "Should get what we expect from fooey()"
#         have: (42,Fred)
#         want: (42,Bob)

It gets much worse with set returning functions—Tom’s “table result:” it requires both type and row coercion (or “contortion” if you’d prefer). Here’s an example of a fooies() function that returns a set of dudes:

SELECT is(
    ARRAY( SELECT ROW(f.*)::text FROM fooies() f ),
    ARRAY[
        ROW( 42, 'Fred' )::text,
        ROW( 99, 'Bob' )::text
    ],
    'Should get what we expect from fooies()'
);

As you can see, it’s do-able, but clumsy and error prone. We really are taking a table result and turning into a scalar value. And thanks to the casts to text, the test can actually incorrectly pass if, for example, the integer was actually stored as text (although, to be fair, the same is true of a pg_regress test, where everything is converted to text before comparing results).

What we really need is a way to write two queries and compare their result sets, preferably without any nasty casts or coercion into scalar values, and with decent diagnostics when a test fails.

As an aside, another approach is to use EXCEPT queries to make sure that two data sets are the same:

SELECT ok(
    NOT EXISTS (
        (SELECT 42, 'Fred' UNION SELECT 99, 'Bob')
        EXCEPT
        SELECT * from fooies()
    ),
    'Should get what we expect from fooies()'
);

SELECT ok(
    NOT EXISTS (
        SELECT * from fooies()
        EXCEPT
        (SELECT 42, 'Fred' UNION SELECT 99, 'Bob')
    ),
    'Should have no unexpected rows from fooies()'
);

Here I’ve created two separate tests. The first makes sure that fooies() returns all the expected rows, and the second makes sure that it doesn’t return any unexpected rows. But since this is just a boolean test (yes, we’ve coerced the results into booleans!), there are no diagnostics if the test fails: you’d have to go ahead and run the query yourself to see what’s unexpected. Again, this is do-able, and probably a more correct comparison than using the casts of rows to text, but makes it harder to diagnose failures. And besides, EXCEPT compares sets, which are inherently unordered. That means that if you need to test that results come back in a specific order, you can’t use this approach.

That said, if someone knows of a way to do this in one query—somehow make some sort of NOT EXCEPT operator work—I’d be very glad to hear it!

Prior Art

pgTAP isn’t the only game in town. There is also Dmitry Koterov’s PGUnit framework and Bob Brewer’s Epic. PGUnit seems to have one main assertion function, assert_same(), which works much like pgTAP’s is(). Epic’s assert_equal() does, too, but Epic also offers a few functions for testing result sets that neither pgTAP nor PGUnit support. One such function is assert_rows(), to which you pass strings that contain SQL to be evaluated. For example:

CREATE OR REPLACE FUNCTION test.test_fooies() RETURNS VOID AS $_$
BEGIN
    PERFORM test.assert_rows(
        $$ VALUES(42, 'Fred'), (99, 'Bob') $$,
        $$ SELECT * FROM fooies()          $$
    );
  RAISE EXCEPTION '[OK]';
END;
$_$ LANGUAGE plpgsql;

This works reasonably well. Internally, Epic runs each query twice, using EXCEPT to compare result sets, just as in my boolean example above. This yields a proper comparison, and because assert_rows() iterates over returned rows, it emits a useful message upon test failure:

psql:try_epic.sql:21: ERROR:  Record: (99,Bob) from: VALUES(42, 'Fred'), (99, 'Bob') not found in: SELECT * FROM fooies()
CONTEXT:  SQL statement "SELECT  test.assert_rows( $$ VALUES(42, 'Fred'), (99, 'Bob') $$, $$ SELECT * FROM fooies() $$ )"
PL/pgSQL function "test_fooies" line 2 at PERFORM

A bit hard to read with all of the SQL exception information, but at least the information is there. At PGCon, Bob told me that passing strings of SQL code made things a lot easier to implement in Epic, and I can certainly see how that could be (pgTAP uses SQL code strings too, with its throws_ok(), lives_ok(), and performs_ok() assertions). But it just doesn’t feel SQLish to me. I mean, if you needed to write a really complicated query, it might be harder to maintain: even using dollar quoting, it’s harder to track stuff. Furthermore, it’s slow, as PL/pgSQL’s EXECUTE must be called twice and thus plan twice. And don’t even try to test a query with side-effects—such as a function that inserts a row and returns an ID—as the second run will likely lead to test failure just might blow something up.

SQL Blocks?

One approach is to use blocks. I’m thinking here of something like Ruby blocks or Perl code references: a way to dynamically create some code that is compiled and planned when it loads, but its execution can be deferred. In Perl it works like this:

my $code = sub { say "woof!" };
$code->(); # prints "woof!"

In Ruby (and to a lesser extent in Perl), you can pass a block to a method:

foo.bar { puts "woof!" }

The bar method can then run that code at its leisure. We can sort of do this in PostgreSQL using PREPARE. To take advantage of it for Epic’s assert_rows() function, one can do something like this:

CREATE OR REPLACE FUNCTION test.test_fooies() RETURNS VOID AS $_$
BEGIN
    PREPARE want AS VALUES(42, 'Fred'), (99, 'Bob');
    PREPARE have AS SELECT * FROM public.fooies();
    PERFORM test.assert_rows(
        test.global($$ EXECUTE want $$),
        test.global($$ EXECUTE have $$)
    );
    RAISE EXCEPTION '[OK]';
END;
$_$ LANGUAGE plpgsql;

The nice thing about using a prepared statement is that you can actually write all of your SQL in SQL, rather than in an SQL string, and then pass the simple EXECUTE statement to assert_rows(). Also note the calls to test.global() in this example. This is a tricky function in Epic that takes an SQL statement, turns its results into a temporary table, and then returns the table name. This is required for the EXECUTE statements to work properly, but a nice side-effect is that the actual queries are executed only once each, to create the temporary tables. Thereafter, those temporary tables are used to fetch results for the test.

Another benefit of prepared statements is that you can write a query once and use it over and over again in your tests. Say that you had a few set returning functions that return different results from the users table. You could then test them all like so:

CREATE OR REPLACE FUNCTION test.test_user_funcs() RETURNS VOID AS $_$
BEGIN
    PREPARE want(bool) AS SELECT * FROM users WHERE active = $1;
    PREPARE active     AS SELECT * FROM get_active_users();
    PREPARE inactive   AS SELECT * FROM get_inactive_users();
    PERFORM test.assert_rows(
        test.global($$ EXECUTE want(true) $$),
        test.global($$ EXECUTE active     $$)
    );
    PERFORM test.assert_rows(
        test.global($$ EXECUTE want(false) $$),
        test.global($$ EXECUTE inactive    $$)
    );
    RAISE EXCEPTION '[OK]';
END;
$_$ LANGUAGE plpgsql;

Note how I’ve tested both the get_active_users() and the get_inactive_users() function by passing different values when executing the want prepared statement. Not bad. I think that this is pretty SQLish, aside from the necessity for test.global().

Still, the use of prepared statements with Epic’s assert_rows() is not without issues. There is still a lot of execution here (to create the temporary tables and to select from them a number of times). Hell, this last example reveals an inefficiency in the creation of the temporary tables, as the two different executions of have create two separate temporary tables for data that’s already in the users table. If you have a lot of rows to compare, a lot more memory will be used. And you still can’t check the ordering of your results, either.

So for small result sets and no need to check the ordering of results, this is a pretty good approach. But there’s another.

Result Set Handles

Rather than passing blocks to be executed by the tests, in many dynamic testing frameworks you can pass data structures be compared. For example, Test::More’s is_deeply() assertion allows you to test that two data structures contain the same values in the same structures:

is_deeply \@got_data, \@want_data, 'We should have the right stuff';

This does a deep comparison between the contents of the @got_data array and @want_data. Similarly, I could imagine a test to check the contents of a DBIx::Class result set object:

results_are( $got_resultset, $want_resultset );

In this case, the is_results() function would iterate over the two result sets, comparing each result to make sure that they were identical. So if prepared statements in SQL are akin to blocks in dynamic languages, what is akin to a result set?

The answer, if you’re still with me, is cursors.

Now, cursors don’t work with Epic’s SQL-statement style tests, but I could certainly see how a pgTAP function like this would be useful:

DECLARE want CURSOR FOR SELECT * FROM users WHERE active;
DECLARE have CURSOR FOR SELECT * FROM get_active_users();
SELECT results_are( 'want', 'have' );

The nice thing about this approach is that, even more than with prepared statements, everything is written in SQL. The results_are() function would simply iterate over each row returned from the two cursors to make sure that they were the same. In the event that there was a difference, the diagnostic output would be something like:

#   Failed test 42:
#     Results begin differing at row 3:
#          have: (3,Larry,t)
#          want: (3,Larry,f)

So there’s a useful diagnostic, ordering is preserved, no temporary tables are created, and the data is fetched directly from its sources (tables or functions or whatever) just as it would be in a straight SQL statement. You still have the overhead of PL/pgSQL’s EXECUTE, and iterating over the results, but, outside of some sort of NOT INTERSECT operator, I don’t see any other way around it.

The Plan

So I think I’ll actually look at adding support for doing this in two ways: one with prepared statements (or query strings, if that’s what floats your boat) like Epic does, though I’m going to look at avoiding the necessity for something like Epic’s global() function. But I’ll also add functions to test cursors. And maybe a few combinations of these things.

So, does an approach like this, especially the cursor solution, address Tom’s criticism? Does it feel more relational? Just to rewrite the kind of test Tom originally objected to, it would now look something like this:

DECLARE have CURSOR FOR SELECT name FROM srt ORDER BY name;
DECLARE want CURSOR FOR VALUES ('AAA'), ('aardvark'), ('aba'), ('ABC'), ('abc');
SELECT results_are(
    'have', 'want',
    'The words should be case-insensitively sorted'
);

Thoughts? I’m not going to get to it this week, so feedback would be greatly appreciated.

  • E-mail this story to a friend!
  • Sphinn
  • StumbleUpon
  • Facebook
  • del.icio.us
  • LinkedIn
  • TwitThis
  • Digg
  • Google
  • MySpace
  • Reddit
  • StumbleUpon
  • Technorati
  • Yahoo! Buzz

pgTAP 0.21 Drops

Mmmmm…pgTAP

I just dropped a new version of pgTAP, following a few marathon hack sessions since my talk at PGCon (movie here, BTW). Actually, the new performs_ok() function came about as I wrote the presentation, but all the rest came on the flight home and in the few days since. Not sure when I’ll hack on it this much again (it’s getting a bit big at 5,758 lines of PL/pgSQL and SQL).

Overall I’m very happy with this release, as it adds a lot of new assertion functions. In particular, I added a slew of functions that test that the objects in a given schema (or visible in the search path, if you prefer) are exactly the objects that should be there. This is useful for a couple of things. For one, Norman Yamada, in his PGCon talk, mentioned that his team was using pgTAP to compare database objects between replicated databases. I like this because it’s a great example of using pgTAP for system testing, rather than just unit testing as I’ve been advocating. See, pgTAP can be used for any kind of testing!

Another use for these functions is in a large organization where many different people might be making changes to a schema. In this scenario, you might have application developers adding new objects to the database (or dropping objects) without necessarily informing the DBAs. Using, for example, tables_are() and functions_are() and continuous testing, the DBAs can see when objects have been modified by the developers. Even better, if the developers are running the pgTAP tests themselves (as they should be!), they will be reminded to add new tests for their changes when the existing tests notice that things have been added or dropped and thus fail.

Beyond that, I added a bunch of new functions for testing functions and a number of other goodies. Check out the release notes for all the details.

With these changes, I’ve finished nearly everything I’ve thought of for pgTAP. There are only a few sequence-testing functions left on the To Do list, as well as a call to add a throws_like() function, which I’ll throw in soon if no one else steps up. Beyond these changes, I have a few ideas of where to take it next, but so far I’m kind of stumped. Mainly what I think should be done is to add an interface that makes it easier to compare relations (or result sets, if you prefer). Epic does this by allowing query strings to be passed to a function, but I’d really like to keep queries in SQL rather than in SQL strings. I’ll be giving it some more thought and will post about it soon.

  • E-mail this story to a friend!
  • Sphinn
  • StumbleUpon
  • Facebook
  • del.icio.us
  • LinkedIn
  • TwitThis
  • Digg
  • Google
  • MySpace
  • Reddit
  • StumbleUpon
  • Technorati
  • Yahoo! Buzz

Unit Test Your Database!

Gave my talk at PGCon today. I felt that it went well, and was well-received. So here it is for everyone else, for posterity, thanks to SlideShare. Enjoy!

  • E-mail this story to a friend!
  • Sphinn
  • StumbleUpon
  • Facebook
  • del.icio.us
  • LinkedIn
  • TwitThis
  • Digg
  • Google
  • MySpace
  • Reddit
  • StumbleUpon
  • Technorati
  • Yahoo! Buzz

PostgreSQL + OSSP UUID on Mac OS X

Wanted to get this down since I just dealt with it for the second time in the last year. The issue is this: If you have OSSP uuid installed on Mac OS X, and you want to build PostgreSQL with OSSP uuid support, you pass —with-ossp-uuid to PostgreSQL’s configure script. However, you might notice this in the output:

checking ossp/uuid.h usability... no
checking ossp/uuid.h presence... yes
configure: WARNING: ossp/uuid.h: present but cannot be compiled
configure: WARNING: ossp/uuid.h:     check for missing prerequisite headers?
configure: WARNING: ossp/uuid.h: see the Autoconf documentation
configure: WARNING: ossp/uuid.h:     section "Present But Cannot Be Compiled"
configure: WARNING: ossp/uuid.h: proceeding with the preprocessor's result
configure: WARNING: ossp/uuid.h: in the future, the compiler will take precedence
configure: WARNING:     ## ———————————————————— ##
configure: WARNING:     ## Report this to pgsql-bugs@postgresql.org ##
configure: WARNING:     ## ———————————————————— ##
checking for ossp/uuid.h... yes

The reason for this message is that OSSP uuid has symbols that conflicts those included with Mac OS X. If you look in config.log, you’ll see something like this:

configure:13224: checking ossp/uuid.h usability
configure:13241: gcc -no-cpp-precomp -c -O2 -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -fno-strict-aliasing -fwrapv  -I/usr/local/include/libxml2  -I/usr/local/include conftest.c >&5
In file included from conftest.c:98:
/usr/local/include/ossp/uuid.h:94: error: conflicting types for 'uuid_t'
/usr/include/unistd.h:133: error: previous declaration of 'uuid_t' was here

It turns out that I reported this issue to OSSP a while ago. For PostgreSQL, at least it doesn’t seem to be much of a problem: the build continues and I’m able to install the ossp-uuid contrib module without a problem. So the upshot is: you can ignore the above warning!

One recommendation I do have, however, is to install the OSSP uuid header file in a non-default location. Why? Because if you build Apache and APR from source, like I do, you’ll get the same failure because of conflicting uuid_t symbols, and APR will fail to actually build! So I pass —includedir=/usr/local/include/ossp to OSSP uuid’s configure. This has no effect on how OSSP uuid itself behaves, and the PostgreSQL is smart enough to look there without having to be told. Meanwhile, it will then be out of the way of your APR build (assuming you delete /usr/local/include/uuid.h or /usr/include/uuid.h).

  • E-mail this story to a friend!
  • Sphinn
  • StumbleUpon
  • Facebook
  • del.icio.us
  • LinkedIn
  • TwitThis
  • Digg
  • Google
  • MySpace
  • Reddit
  • StumbleUpon
  • Technorati
  • Yahoo! Buzz

Powered by KinoSearch