Today on the Perl Advent Calendar

Hey look everybody, I wrote today’s Perl Advent Calendar post, Less Tedium, More Transactions. Go read it!

  • E-mail this story to a friend!
  • Sphinn
  • StumbleUpon
  • Facebook
  • del.icio.us
  • LinkedIn
  • TwitThis
  • Digg
  • Google
  • MySpace
  • Reddit
  • StumbleUpon
  • Technorati
  • Yahoo! Buzz

Up for Adoption: SVN::Notify

I’ve kept my various Perl modules in a Subversion server run by my Bricolage support company, Kineticode, for many years. However, I’m having to shut down the server I’ve used for all my services, including Subversion, so I’ve moved them all to GitHub. As such, I no longer use Subversion in my day-to-day work.

It no longer seems appropriate that I maintain SVN::Notify. This has probably been my most popular modules, and I know that it’s used a lot. It’s also relatively stable, with few bug reports or complaints. Nevertheless, there certainly could be some things that folks want to add, like TLS support, I18N, and inline CSS.

Therefore, SVN::Notify is formally up for adoption. If you’re a Subversion users, it’s a great tool. Just look at this sample output. If you’d like to take over maintenance, make it even better, please get in touch. Leave a comment on this post, or @theory me on Twitter, or send an email.

PS: Would love it if someone also could take over activitymail, the CVS notification script from which SVN::Notify was derived — and which I have even less right to maintain, given that I haven’t used CVS in years.

  • E-mail this story to a friend!
  • Sphinn
  • StumbleUpon
  • Facebook
  • del.icio.us
  • LinkedIn
  • TwitThis
  • Digg
  • Google
  • MySpace
  • Reddit
  • StumbleUpon
  • Technorati
  • Yahoo! Buzz

DBIx::Connector Exception Handling Design

In response to a bug report, I removed the documentation suggesting that one use the catch function exported by Try::Tiny to specify an exception-handling function to the DBIx::Connector execution methods. When I wrote those docs, Try::Tiny’s catch method just returned the subroutine. It was later changed to return an object, and that didn’t work very well. It seemed a much better idea not to depend on an external function that could change its behavior when there is no direct dependency on Try::Tiny in DBIx::Connector. I removed that documentation in 0.43. So instead of this:

$conn->run(fixup => sub {
    ...
}, catch {
    ...
});

It now recommends this:

$conn->run(fixup => sub {
    ...
}, catch => sub {
    ...
});

Which frankly is better balanced anyway.

But in discussion with Mark Lawrence in the ticket, it has become clear that there’s a bit of a design problem with this approach. And that problem is that there is no try keyword, only catch. The fixup in the above example does not try, but the inclusion of the catch implicitly makes it behave like try. That also means that if you use the default mode (which can be set via the mode method), then there will usually be no leading keyword, either. So we get something like this:

$conn->run(sub {
    ...
}, catch => sub {
    ...
});

So it starts with a sub {} and no fixup keyword, but there is a catch keyword, which implicitly wraps that first sub {} in a try-like context. And aesthetically, it’s unbalanced.

So I’m trying to decide what to do about these facts:

  • The catch implicitly makes the first sub be wrapped in a try-type context but without a try-like keyword.
  • If one specifies no mode for the first sub but has a catch, then it looks unbalanced.

At one level, I’m beginning to think that it was a mistake to add the exception-handling code at all. Really, that should be the domain of another module like Try::Tiny or, better, the language. In that case, the example would become:

use Try::Tiny;
try {
    $conn->run(sub {
        ...
    });
} catch {
  ....
}

And maybe that really should be the recommended approach. It seems silly to have replicated most of Try::Tiny inside DBIx::Connector just to cut down on the number of anonymous subs and indentation levels. The latter can be got round with some semi-hinky nesting:

try { $conn->run(sub {
    ...
}) } catch {
    ...
}

Kind of ugly, though. The whole reason the catch stuff was added to DBIx::Connector was to make it all nice and integrated (as discussed here). But perhaps it was not a valid tradeoff. I’m not sure.

So I’m trying to decide how to solve these problems. The options as I see them are:

  1. Add another keyword to use before the first sub that means "the default mode". I’m not keen on the word "default", but it would look something like this:

    $conn->run(default => sub {
        ...
    }, catch => sub {
        ...
    });
    

    This would provide the needed balance, but the catch would still implicitly execute the first sub in a try context. Which isn’t a great idea.

  2. Add a try keyword. So then one could do this:

    $conn->run(try => sub {
        ...
    }, catch => sub {
        ...
    });
    

    This makes it explicit that the first sub executes in a try context. I’d also have to add companion try_fixup, try_ping, and try_no_ping keywords. Which are ugly. And furthermore, if there was no try keyword, would a catch be ignored? That’s what would be expected, but it changes the current behavior.

  3. Deprecate the try/catch stuff in DBIx::Connector and eventually remove it. This would simplify the code and leave the responsibility for exception handling to other modules where it’s more appropriate. But it would also be at the expense of formatting; it’s just a little less aesthetically pleasing to have the try/catch stuff outside the method calls. But maybe it’s just more appropriate.

I’m leaning toward #3, but perhaps might do #1 anyway, as it’d be nice to be more explicit and one would get the benefit of the balance with catch blocks for as long as they’re retained. But I’m not sure yet. I want your feedback on this. How do you want to use exception-handling with DBIx::Connector? Leave me a comment here or on the ticket.

  • E-mail this story to a friend!
  • Sphinn
  • StumbleUpon
  • Facebook
  • del.icio.us
  • LinkedIn
  • TwitThis
  • Digg
  • Google
  • MySpace
  • Reddit
  • StumbleUpon
  • Technorati
  • Yahoo! Buzz

Encoding is a Headache

I have to spend way too much of my programming time worrying about character encodings. Take my latest module, Text::Markup for example. The purpose of the module is very simple: give in the name of a file, and it will figure out the markup it uses (HTML, Markdown, Textile, whatever) and return a string containing the HTML generated from the file. Simple, right?

But, hang on. Should the HTML it returns be decoded to Perl’s internal form? I’m thinking not, because the HTML itself might declare the encoding, either in a XML declaration or via something like

<meta http-equiv="Content-type" content="text/html;charset=Big5" />

And as you can see, it’s not UTF-8. So decoded it would be lying. So it should be encoded, right? Parsers like XML::LibXML::Parser are smart enough to see such declarations and decode as appropriate.

But wait a minute! Some markup languages, like Markdown, don’t have XML declarations or headers. They’re HTML fragments. So there’s no wait to tell the encoding of the resulting HTML unless it’s decoded. So maybe it should be decoded. Or perhaps it should be decoded, and then given an XML declaration that declares the encoding as UTF-8 and encoded it as UTF-8 before returning it.

But, hold the phone! When reading in a markup file, should it be decoded before it’s passed to the parser? Does Text::Markdown know or care about encodings? And if it should be decoded, what encoding should one assume the source file uses? Unless it uses a BOM, how do you know what its encoding is?

Text::Markup is a dead simple idea, but virtually all of my time is going into thinking about this stuff. It drives me nuts. When will the world cease to be this way?

Oh, and you have answers to any of these questions, please do feel free to leave a comment. I hate having to spend so much time on this, but I’d much rather do so and get things right (or close to right) than wrong.

  • E-mail this story to a friend!
  • Sphinn
  • StumbleUpon
  • Facebook
  • del.icio.us
  • LinkedIn
  • TwitThis
  • Digg
  • Google
  • MySpace
  • Reddit
  • StumbleUpon
  • Technorati
  • Yahoo! Buzz

Serious Exception-Handling Bug Fixed in DBIx::Connector 0.42

I’ve just released DBIx::Connector 0.42 to CPAN. This release fixes a serious bug with catch blocks. In short, if you threw an exception from inside a catch block, it would not be detectable from outside. For example, given this code:

eval {
    $conn->run(sub { die ‘WTF’ }, catch => sub { die ‘OMG!’ });
};
if (my $err = $@) {
    say "We got an error: $@\n";
}

With DBIx::Connector 0.41 and lower, the if block would never be called, because even though the catch block threw an exception, $@ was not set. In other words, the exception would not be propagated to its caller. This could be terribly annoying, as you can imagine. I was being a bit too clever about localizing $@, with the scope much too broad. 0.42 uses a much tighter scope to localize $@, so now it should propagate properly everywhere.

So if you’re using DBIx::Connector catch blocks, please upgrade ASAP. Sorry for the hassle.

  • E-mail this story to a friend!
  • Sphinn
  • StumbleUpon
  • Facebook
  • del.icio.us
  • LinkedIn
  • TwitThis
  • Digg
  • Google
  • MySpace
  • Reddit
  • StumbleUpon
  • Technorati
  • Yahoo! Buzz

Handling Multiple Exceptions

I ran into an issue with DBIx::Connector tonight: SQLite started throwing an exception from within a call to rollback(): “DBD::SQLite::db rollback failed: cannot rollback transaction – SQL statements in progress”. This is rather annoying, as it ate the underlying exception that led to the rollback.

So I’ve added a test to DBIx::Connector that looks like this:

my $dmock = Test::MockModule->new($conn->driver);
$dmock->mock(rollback => sub { die ‘Rollback WTF’ });

eval { $conn->txn(sub {
    my $sth = shift->prepare("select * from t");
    die ‘Transaction WTF’;
}) };

ok my $err = $@, ‘We should have died’;
like $err, qr/Transaction WTF/, ‘Should have the transaction error’;

It fails as expected: the error is “Rollback WTF”. So far so good. Now the question is, how should I go about fixing it? Ideally I’d be able to access both exceptions in whatever exception handling I do. How to go about that?

I see three options. The first is that taken by Bricolage and DBIx::Class: create a new exception that combines both the transaction exception and the rollback exception into one. DBIx::Class does it like this:

$self->throw_exception(
  "Transaction aborted: ${exception}. "
  . "Rollback failed: ${rollback_exception}"
);

That’s okay as far as it goes. But what if $exception is an Exception::Class::DBI object, or some other exception object? It would get stringified and the exception handler would lose the advantages of the object. But maybe that doesn’t matter so much, since the rollback exception is kind of important to address first?

The second option is to throw a new exception object with the original exceptions as attributes. Something like (pseudo-code):

DBIx::Connector::RollbackException->new(
    txn_exception      => $exception,
    rollback_exception => $rollback_exception,
);

This has the advantage of keeping the original exception as an object, although the exception handler would have to expect this exception and go digging for it. So far in DBIx::Connector, I’ve left DBI exception construction up to the DBI and to the consumer, so I’m hesitant to add a one-off special-case exception object like this.

The third option is to use a special variable, @@, and put both exceptions into it. Something like:

@@ = ($exception, $rollback_exception);
die $rollback_exception;

This approach doesn’t require a dependency like the previous approach, but the user would still have to know to dig into @@ if they caught the rollback exception. But then I might as well have thrown a custom exception object that’s easier to interrogate than an exception string. Oh, and is it appropriate to use @@? I seem to recall seeing some discussion of this variable on the perl5-porters mail list, but it’s not documented or supported. Or something. Right?

What would you do?

  • E-mail this story to a friend!
  • Sphinn
  • StumbleUpon
  • Facebook
  • del.icio.us
  • LinkedIn
  • TwitThis
  • Digg
  • Google
  • MySpace
  • Reddit
  • StumbleUpon
  • Technorati
  • Yahoo! Buzz

Fuck Typing LWP

I’m working on a project that fetches various files from the Internet via LWP. I wanted to make sure that I was a polite user, such that my app would pay attention to Last-Modified/If-Modified-Since and ETag/If-None-Match headers. And in most contexts I also want to respect the robots.txt file on the hosts to which I’m sending requests. So I was very interested to read chromatic’s hack for this very issue. I happily implemented two classes for my app, MyApp::UA, which inherits from LWP::UserAgent::WithCache, and MyApp::UA::Robot, which inherits from MyApp::UA but changes LWP::UserAgent::WithCache to inherit from LWP::UARobot:

@LWP::UserAgent::WithCache::ISA = ('LWP::RobotUA');

So far so good, right? Well, no. What I didn’t think about, stupidly, is that by changing LWP::UserAgent::WithCache’s base class, I was doing so globally. So now both MyApp::UA and MyApp::UA::Robot were getting the LWP::RobotUA behavior. Urk.

So my work around is to use a little fuck typing to ensure that MyApp::UA::Robot has the robot behavior but MyApp::UA does not. Here’s what it looks like (BEWARE: black magic ahead!):

package MYApp::UA::Robot;

use 5.12.0;
use utf8;
use parent ‘MyApp::UA’;
use LWP::RobotUA;

do {
    # Import the RobotUA interface. This way we get its behavior without
    # having to change LWP::UserAgent::WithCache's inheritance.
    no strict 'refs';
    while ( my ($k, $v) = each %{'LWP::RobotUA::'} ) {
        *{$k} = *{$v}{CODE} if *{$v}{CODE} && $k ne 'new';
    }
};

sub new {
    my ($class, $app) = (shift, shift);
    # Force RobotUA configuration.
    local @LWP::UserAgent::WithCache::ISA = ('LWP::RobotUA');
    return $class->SUPER::new(
        $app,
        delay => 1, # be very nice — max one hit per minute.
    );
}

The do block is where I do the fuck typing. It iterates over all the symbols in LWP::RobotUA, inserts a reference to all subroutines into the current package. Except for new, which I implement myself. This is so that I can keep my inheritance from MyApp::UA intact. But in order for it to properly configure the LWP::RobotUA interface, new must temporarily fool Perl into thinking that LWP::UserAgent::WithCache inherits from LWP::RobotUA.

Pure evil, right? Wait, it gets worse. I’ve also overridden LWP::RoboUA’s host_wait method, because if it’s the second request to a given host, I don’t want it to sleep (the first request is for the robots.txt, and I see no reason to sleep after that). So I had to modify the do block to skip both new and host_wait:

    while ( my ($k, $v) = each %{'LWP::RobotUA::'} ) {
        *{$k} = *{$v}{CODE} if *{$v}{CODE} && $k !~ /^(?:new|host_wait)$/;
    }

If I “override” any other LWP::RobotUA methods, I’ll need to remember to add them to that regex. Of course, since I’m not actually inheriting from LWP::RobotUA, in order to dispatch to its host_wait method, I can’t use SUPER, but must dispatch directly:

sub host_wait {
    my ($self, $netloc) = @_;
    # First visit is for robots.txt, so let it be free.
    return if !$netloc || $self->no_visits($netloc) < 2;
    $self->LWP::RobotUA::host_wait($netloc);
}

Ugly, right? Yes, I am an evil bastard. “Fuck typing” is right, yo! At least it’s all encapsulated.

This just reinforces chromatic’s message in my mind. I’d sure love to see LWP reworked to use roles!

  • E-mail this story to a friend!
  • Sphinn
  • StumbleUpon
  • Facebook
  • del.icio.us
  • LinkedIn
  • TwitThis
  • Digg
  • Google
  • MySpace
  • Reddit
  • StumbleUpon
  • Technorati
  • Yahoo! Buzz

Powered by KinoSearch