<?xml version="1.0" encoding="utf-8"?>

<rdf:RDF 
  xmlns="http://purl.org/rss/1.0/"
  xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" 
  xmlns:dc="http://purl.org/dc/elements/1.1/"
  xmlns:cc="http://web.resource.org/cc/"
  xmlns:admin="http://webns.net/mvcb/"
  xmlns:content="http://purl.org/rss/1.0/modules/content/"
  xmlns:trackback="http://madskills.com/public/xml/rss/module/trackback/"
> 

  <channel rdf:about="http://www.justatheory.com">
    <title>Just a Theory</title>
    <link>http://www.justatheory.com</link>
    <description>Theory waxes practical. By David Wheeler.</description>
    <language>en-us</language>
    <dc:creator>David Wheeler (david@justatheory.com)</dc:creator>
    <dc:rights>Copyright David Wheeler</dc:rights>
    <cc:license rdf:resource="http://creativecommons.org/licenses/by-nc/2.0" />
    <admin:generatorAgent rdf:resource="http://www.raelity.org/apps/blosxom/?v=2.0" />
    <admin:errorReportsTo rdf:resource="mailto:david@justatheory.com"/>

    <items>
      <rdf:Seq>
        <rdf:li rdf:resource="http://www.justatheory.com/computers/programming/perl/modules/svnnotify-2.70.html" />
        <rdf:li rdf:resource="http://www.justatheory.com/computers/programming/perl/regex_named_captures.html" />
        <rdf:li rdf:resource="http://www.justatheory.com/computers/programming/perl/odd_test_failures.html" />
        <rdf:li rdf:resource="http://www.justatheory.com/computers/programming/perl/fsa_rules_graph_improved.html" />
        <rdf:li rdf:resource="http://www.justatheory.com/computers/programming/perl/stepped_series.html" />
        <rdf:li rdf:resource="http://www.justatheory.com/computers/programming/perl/dbi/subclass_in_shell.html" />
        <rdf:li rdf:resource="http://www.justatheory.com/computers/programming/perl/modules/svnnotify-2.57.html" />
        <rdf:li rdf:resource="http://www.justatheory.com/computers/programming/perl/modules/svnnotify_2.56.html" />
        <rdf:li rdf:resource="http://www.justatheory.com/computers/programming/perl/port_svn_notify_to_windows.html" />
        <rdf:li rdf:resource="http://www.justatheory.com/computers/programming/perl/fsa_rules_annotated.html" />
        <rdf:li rdf:resource="http://www.justatheory.com/computers/programming/perl/perltidy_method_blocks.html" />
        <rdf:li rdf:resource="http://www.justatheory.com/computers/programming/perl/which_digest.html" />
        <rdf:li rdf:resource="http://www.justatheory.com/computers/programming/perl/perltidy_in_emacs.html" />
        <rdf:li rdf:resource="http://www.justatheory.com/computers/programming/perl/modules/svnnotify_2.50.html" />
        <rdf:li rdf:resource="http://www.justatheory.com/computers/programming/perl/split_words.html" />

      </rdf:Seq>
    </items>


    <image rdf:resource="http://meerkat.oreillynet.com/icons/meerkat-powered.jpg" />

  </channel>

  <image rdf:about="http://www.justatheory.com/logo.gif">
    <title>Just a Theory</title>
    <url>http://www.justatheory.com/logo.gif</url>
    <link>http://www.justatheory.com</link>
  </image>

  <item rdf:about="http://www.justatheory.com/computers/programming/perl/modules/svnnotify-2.70.html">
    <title>SVN::Notify 2.70: Output Filtering and Character Encoding</title>
    <link>http://www.justatheory.com/computers/programming/perl/modules/svnnotify-2.70.html</link>
    <description></description>
    <dc:subject>/computers/programming/perl/modules</dc:subject>
    <dc:creator>David Wheeler</dc:creator>
    <dc:date>2008-02-29T09:36-08:00</dc:date>
    
    <cc:license rdf:resource="http://creativecommons.org/licenses/by-nc/2.0" />
    <content:encoded><![CDATA[<p>I&#x2019;m very pleased to announce the release of <a href="http://search.cpan.org/dist/SVN-Notify/" title="SVN::Notify on CPAN">SVN::Notify</a> 2.70. You can see an example of its colordiff output <a href="/computers/programming/perl/modules/svnnotify-2.70_colordiff_example.html" title="Example output from SVN::Notify::HTML::ColorDiff 2.70">here</a>. This is a major release that I&#x2019;ve spent the last several weeks polishing and tweaking to get just right. There are quite a few <a href="http://search.cpan.org/src/DWHEELER/SVN-Notify-2.70/Changes" title="SVN::Notify Changes">changes</a>, but the two most important are imporoved character encoding support and output filtering.</p>

<h3>Improved Character Encoding Support</h3>

<p>I&#x2019;ve had a number of bug reports regarding issues with character encodings. Particularly for folks working in Europe and Asia, but really for <em>anyone</em> using multibyte characters in their source code and log messages (and we all do nowadays, don&#x2019;t we?), it has been difficult to find the proper incantation to get SVN::Notify to convert data from and to their proper encodings. Using a patch from Toshikazu Kinkoh as a starting-point, and with a lot of reading and experimentation, as well as regular and patient tests on Toshikazu&#x2019;s and Martin Lindhe&#x2019;s production systems, I think I&#x2019;ve finally got it nailed down.</p>

<p>Now you can use the <code>&#x002d;&#x002d;encoding</code> (formerly <code>&#x002d;&#x002d;charset</code>), <code>&#x002d;&#x002d;svn-encoding</code>, and <code>&#x002d;&#x002d;diff-encoding</code> options—as well as <code>&#x002d;&#x002d;language</code>—to get SVN::Notify to do the right thing. As long as your Subversion server&#x2019;s OS supports an appropriate locale, you should be golden (mine is old, with no UTF-8 locales :\). And if all else fails, you can still set the <code>$LANG</code> environment variable before executing <code>svnnotify</code>.</p>

<p>There is actually a fair bit to know about encodings to get it to work properly, but if you use UTF-8 throughout and your OS supports UTF-8 locales, you shouldn&#x2019;t have to do anything. You might have to set <code>&#x002d;&#x002d;language</code> in order to get it to use the proper locale. See the new <a href="http://search.cpan.org/dist/SVN-Notify/lib/SVN/Notify.pm#Character_Encoding_Support" title="Character Encoding Support in SVN::Notify">documentation of the encoding support</a> for all the details. And if you still have problems, please do <a href="https://rt.cpan.org/Ticket/Create.html?Queue=SVN-Notify" title="Open a Ticket for SVN::Notify">let me know</a>.</p>

<h3>Output Filtering</h3>

<p>Much sexier is the addition of output filtering in SVN::Notify 2.70. I got pretty tired of getting feature requests for what are essentially formatting modifications, such as <a href="https://rt.cpan.org/Ticket/Display.html?id=26944" title="SVN::Notify feature requst for KDE keywords support">this one</a> requesting support for KDE-style <a href="http://techbase.kde.org/Policies/SVN_Commit_Policy#Special_keywords_in_SVN_log_messages" title="KDE TechBase: Special keywords in SVN log messages">keyword support</a>. I myself was using <a href="http://trac.edgewall.org/wiki/WikiFormatting" title="Trac Wiki Formatting Syntax">Trac wiki syntax</a> in commit messages on a <a href="http://iwantsandy.com/" title="Sandy: Your virtual personal assistant">recent project</a> and wanted to see them converted to HTML for messages output by SVN::Notify::HTML::ColorDiff.</p>

<p>So I finally sat down and gave some though on how to implement a simple plugin architecture for SVN::Notify. When I realized that it was generally just formatting that people wanted, it became simpler: I just needed a way to allow folks to write simple output filters. The solution I came up with was to just use Perl. Output filters are simply subroutines named for the kind of output they filter. They live in perl packages. That&#x2019;s it.</p>

<p>For example, say that your developers write their commit log messages in <a href="http://www.textism.com/tools/textile/" title="Textile">Textile</a>, and rather than receive them stuck inside <code>&lt;pre&gt;</code> tags, you&#x2019;d like them converted to HTML. It&#x2019;s simple. Just put this code in a Perl module file:</p>

<pre>
package SVN::Notify::Filter::Textile;
use Text::Textile ();

sub log_message {
    my ($notifier, $lines) = @_;
    return $lines unless $notify->content_type eq &#x0027;text/html&#x0027;;
    return [ Text::Textile->new->process( join $/, @$lines ) ];
}
</pre>

<p>Put the file, <em>SVN/Notify/Filter/Textile.pm</em> somewhere in a Perl library directory. Then use the new <code>&#x002d;&#x002d;filter</code> option to <code>svnnotify</code> to put it to work:</p>

<pre>
svnnotify -p "$1" -r "$2" &#x002d;&#x002d;handler HTML::ColorDiff &#x002d;&#x002d;filter Textile
</pre>

<p>Yep, that&#x2019;s it! SVN::Notify will find the filter module, load it, register its filtering subroutine, and then call it at the appropriate time. Of course, there are a lot of things you can filter; consult the  <a href="http://search.cpan.org/dist/SVN-Notify/lib/SVN/Notify/Filter.pm" title="SVN::Notify Output Filtering Documentation">complete documentation</a> for all of the details. But hopefully this gives you a flavor for how easy it is to write new filters for SVN::Notify. I&#x2019;m hoping that all those folks who want featurs can now stop bugging me and writing their own filters to do the job, and uploading them to CPAN for all to share!</p>

<p>To get things started, I scratched my own itch, writing a <a href="http://search.cpan.org/dist/SVN-Notify/lib/SVN/Notify/Filter/Trac.pm" title="SVN::Notify::Filter::Trac Documentation">Trac filter</a> myself. The filter is almost as simple as the Textile example above, but I also spent quite a bit of time tweaking the CSS so that most of the Trac-generated HTML looks good. You can see an example <a href="/computers/programming/perl/modules/svnnotify-2.70_trac_example.html" title="Example output from SVN::Notify 2.70 and modified by the Trac filter">right here</a>. Thanks to a number of bug fixes in  <a href="http://search.cpan.org/dist/Text-Trac/">Text::Trac</a>, as well as Trac-specific CSS added via a filter on CSS output, it works beautifully. If I&#x2019;m feeling motivated in the next week or so, I&#x2019;ll create a separate CPAN distribution with just a Markdown filter and upload it. That will create a nice distriution example for folks to copy to creat their own. Or maybe someone on the Lazy Web Will do it for me! Maybe <em>you?</em></p>

<p>I wish I&#x2019;d thought to do this from the beginning; it would have saved me from having to add so many features/cruft to SVN::Notify over the years. Here&#x2019;s a quick list of the features that likely could have been implemented via filters instead of added to the core:</p>

<ul>
  <li><code>&#x002d;&#x002d;user-domain</code>: Combine the SVN username with a domain for the <q>From</q> header.</li>
  <li><code>&#x002d;&#x002d;add-header</code>: Add a header to the message.</li>
  <li><code>&#x002d;&#x002d;reply-to</code>: Add a specific header to the message.</li>
  <li>SVN::Notify::HTML::ColorDiff: Frankly, looking back on it, I don&#x2019;t know why I didn&#x2019;t just put this support right into SVN::Notify::HTML. But even if I hadn&#x2019;t, it could have been implemented via filters.</li>
  <li><code>&#x002d;&#x002d;subject-prefix:</code>: Modify the message subject.</li>
  <li><code>&#x002d;&#x002d;subject-cx</code>: Add the commit context to the subject.</li>
  <li><code>&#x002d;&#x002d;strip-cx-regex</code>: More subject context modification.</li>
  <li><code>&#x002d;&#x002d;no-first-line</code>: Another subject filter.</li>
  <li><code>&#x002d;&#x002d;max-sub-length</code>: Yet another!</li>
  <li><code>&#x002d;&#x002d;max-diff-length</code>: A filter could truncate the diff, although this might be tricky with the HTML formatting.</li>
  <li><code>&#x002d;&#x002d;author-url</code>: Modify the metadata section to add a link to the author URL.</li>
  <li><code>&#x002d;&#x002d;revision-url</code>: Ditto for the revision URL.</li>
  <li><code>&#x002d;&#x002d;ticket-map</code>: Filter the log message for various ticketing system strings to convert to URLs. This also encompasses the old <code>&#x002d;&#x002d;rt-url</code>, <code>&#x002d;&#x002d;bugzilla-url</code>, <code>&#x002d;&#x002d;gnats-url</code>, and <code>&#x002d;&#x002d;jira-url</code> options.</li>
  <li><code>&#x002d;&#x002d;header</code>: Filter the beginning of the message.</li>
  <li><code>&#x002d;&#x002d;footer</code>: Filter the end of the message.</li>
  <li><code>&#x002d;&#x002d;linkize</code>: Filter the log message to convert URLs to links for HTML messages.</li>
  <li><code>&#x002d;&#x002d;css-url</code>: Filter the CSS to modify it, or filter the start of the HTML to add a link to an external CSS URL.</li>
  <li><code>&#x002d;&#x002d;wrap-log</code>: Reformat the log message for HTML.</li>
</ul>

<p>Yes, <em>really!</em> That&#x2019;s about half the functionality right there. I&#x2019;m glad that I won&#x2019;t have to add any more like that; filters are a <em>much</em> better way to go.</p>

<p>So download it, install it, write some filters, get your multibyte characters output properly, and enjoy! And as usual, send me your <a href="https://rt.cpan.org/Ticket/Create.html?Queue=SVN-Notify" title="Open a Ticket for SVN::Notify">bug reports</a>, but implement your own improvements using filters!</p>
]]></content:encoded>
  </item>

  <item rdf:about="http://www.justatheory.com/computers/programming/perl/regex_named_captures.html">
    <title>How to Use Regex Named Captures in Perl 5</title>
    <link>http://www.justatheory.com/computers/programming/perl/regex_named_captures.html</link>
    <description></description>
    <dc:subject>/computers/programming/perl</dc:subject>
    <dc:creator>David Wheeler</dc:creator>
    <dc:date>2006-10-16T19:17-08:00</dc:date>
    
    <cc:license rdf:resource="http://creativecommons.org/licenses/by-nc/2.0" />
    <content:encoded><![CDATA[<p>I ran some Perl 5 regular expression syntax that I&#x2019;d never seen the other
day. It used two features I&#x2019;d never seen before:</p>

<ul>
  <li><code>(?{ })</code>, a zero-width, non-capturing assertion that executes
    arbitrary Perl code.</li>
  <li><code>$^N</code>, a variable for getting the contents of the most recent
    capture in a regular expresion.</li>
</ul>

<p>The cool thing is that, used in combination, these two features can be used
to hack named captures into Perl regular expressions. Here&#x2019;s an example:</p>

<pre>
use warnings;
use strict;
use Data::Dumper;

my $string = &#x0027;The quick brown fox jumps over the lazy dog&#x0027;;

my %found;

my @captures = $string =~ /
    (?: (quick|slow) \s+    (?{ $found{speed}  = $^N  }) )
    (?: (brown|blue) \s+    (?{ $found{color}  = $^N  }) )
    (?: (sloth|fox)  \s+    (?{ $found{animal} = $^N  }) )
    (?: (eats|jumps)        (?{ $found{action} = $^N  }) )
/xms;

print Dumper \@captures;
print Dumper \%found;
</pre>

<p>The output of running this program is:</p>

<pre>
$VAR1 = [
          &#x0027;quick&#x0027;,
          &#x0027;brown&#x0027;,
          &#x0027;fox&#x0027;,
          &#x0027;jumps&#x0027;
        ];
$VAR1 = {
          &#x0027;color&#x0027; =&gt; &#x0027;brown&#x0027;,
          &#x0027;speed&#x0027; =&gt; &#x0027;quick&#x0027;,
          &#x0027;action&#x0027; =&gt; &#x0027;jumps&#x0027;,
          &#x0027;animal&#x0027; =&gt; &#x0027;fox&#x0027;
        };
</pre>

<p>So the positional captures are still returned, <em>and</em> we&#x2019;ve assigned
them to keys in a hash. This can be very convenient for complex regular
expressions.</p>

<p>This is a cool feature, but there are a few caveats. First, according to
the Perl regular expression
<a href="http://search.cpan.org/perldoc/perlre#(?{_code_})" title="Read about
(?{ }) on CPAN">documentation</a>, <code>(?{ })</code> is a highly
experimental feature that could go away at any time. But more importantly, if
you&#x2019;re relying on this feature you should be aware of the side effects. What I
mean by that is that, if a regular expression match fails, but there are some
successful matches during execution, then the code in the <code>(?{ })</code>
assertions could still execute. For example, if you changed the
word <q>jumps</q> to <q>poops</q> in the above example, the output becomes:</p>

<pre>
$VAR1 = [];
$VAR1 = {
          &#x0027;color&#x0027; =&gt; &#x0027;brown&#x0027;,
          &#x0027;speed&#x0027; =&gt; &#x0027;quick&#x0027;,
          &#x0027;animal&#x0027; =&gt; &#x0027;fox&#x0027;
        };
</pre>

<p>Which means that the match failed, but there were still assignments to our
hash, because some of the captures succeeded before the overall match failed.
The upshot is that you should always check the return value from the match
before relying on whatever the code inside the <code>(?{ })</code> assertions
did.</p>

<p>The problem becomes even more subtle if your regular expressions trigger
backgracking. In that case, you might have an optional group match and its
value assigned to the hash, and then the next required group fail. Perl will
then backtrack to throw out the successfull group match and then see if the
next required match succeeds. If so, you can have a successful match and
potentially invalid data in your hash. Here&#x2019;s an example:</p>

<pre>
my @captures = $string =~ /
    (?: (quick|slow) \s+    (?{ $found{speed}  = $^N  }) )
    (?: (brown|blue) \s+    (?{ $found{color}  = $^N  }) )?
    (?: (brown\s+fox)       (?{ $found{animal} = $^N  }) )
/xms;

print Dumper \@captures;
print Dumper \%found;
</pre>

<p>And the output is:</p>

<pre>
$VAR1 = [
          &#x0027;quick&#x0027;,
          undef,
          &#x0027;brown fox&#x0027;
        ];
$VAR1 = {
          &#x0027;color&#x0027; =&gt; &#x0027;brown&#x0027;,
          &#x0027;speed&#x0027; =&gt; &#x0027;quick&#x0027;,
          &#x0027;animal&#x0027; =&gt; &#x0027;brown fox&#x0027;
        };
</pre>

<p>So while the second group returned <code>undef</code> for the color
capture, the <code>%found</code>hash still had the color key in it. This may
or may not be what you want.</p>

<p>Of course, all this seems cool, but since it&#x2019;s a truly evil hack, you have
to be careful. If you can wait, though, perhaps we&#x2019;ll
see <a
href="http://www.nntp.perl.org/group/perl.perl5.porters/;msgid=9b18b3110610051158h43c58810ted1017129929a539[at]mail.gmail.com"
title="Perl 5 Porters: &#x201c;[PATCH] Initial attempt at named captures for
perls regexp engine&#x201d;">named captures in Perl 5.10</a>.</p>
]]></content:encoded>
  </item>

  <item rdf:about="http://www.justatheory.com/computers/programming/perl/odd_test_failures.html">
    <title>What's With These CPAN-Testers Failures?</title>
    <link>http://www.justatheory.com/computers/programming/perl/odd_test_failures.html</link>
    <description></description>
    <dc:subject>/computers/programming/perl</dc:subject>
    <dc:creator>David Wheeler</dc:creator>
    <dc:date>2006-10-02T20:58-08:00</dc:date>
    
    <cc:license rdf:resource="http://creativecommons.org/licenses/by-nc/2.0" />
    <content:encoded><![CDATA[<p>So I just learned about and subscribed to
the <a href="http://testers.cpan.org/author/DWHEELER.rss" title="My
CPAN-Testers Feed">CPAN-Testers feed for my modules</a>. There appear to be a
number of odd failures.
Take <a href="http://nntp.x.perl.org/group/perl.cpan.testers/249132"
title="FAIL Text-Diff-HTML-0.04 5.8.5 on freebsd 5.4-stable
(i386-freebsd)">this one</a>. It says, <q>Can&#x2019;t locate Algorithm/Diff.pm,</q>
despite the fact that I have properly specified the requirement
for <code>Text::Diff</code>, which itself properly
requires <code>Algorithm::Diff.</code>. Is this an instance
of <code>CPAN.pm</code> or <code>CPANPLUS</code> not following all
prerequisites, or what?</p>

<p>Or take <a href="http://www.nntp.perl.org/group/perl.cpan.testers/240189"
title="FAIL Apache-Dir-0.04 5.8.5 on solaris 2.9
(sun4-solaris-thread-multi)">this failure</a>. It says, <q>[CP_ERROR] [Mon Sep
5 09:32:08 2005] No such module &#x2018;mod_perl&#x2019; found on CPAN</q>.
Yet <a href="http://search.cpan.org/~gozer/mod_perl-1.29/mod_perl.pod"
title="mod_perl on CPAN">here it is</a>. Maybe the <code>CPANPLUS</code>
indexer has a bug? Or are people&#x2019;s configurations just horked? Or am I just
doing something braindead?</p>

<p>Opinions welcomed.</p>
]]></content:encoded>
  </item>

  <item rdf:about="http://www.justatheory.com/computers/programming/perl/fsa_rules_graph_improved.html">
    <title>FSA::Rules Graphing Features Improved</title>
    <link>http://www.justatheory.com/computers/programming/perl/fsa_rules_graph_improved.html</link>
    <description></description>
    <dc:subject>/computers/programming/perl</dc:subject>
    <dc:creator>David Wheeler</dc:creator>
    <dc:date>2006-07-14T20:02-08:00</dc:date>
    
    <cc:license rdf:resource="http://creativecommons.org/licenses/by-nc/2.0" />
    <content:encoded><![CDATA[<img src="/computers/programming/perl/fsa_rules_sample.png" alt="FSA::Rules sample graph output" />

<p>I just released <a href="http://search.cpan.org/dist/FSA-Rules/"
title="FSA::Rules on CPAN">FSA::Rules</a> 0.25. This version came about as I
returned to the module to handle setting up a PostgreSQL database and found
the graphics that it churned out, well, wanting. I wanted a decision tree, but
the graphics just had the names of the states for the nodes, and then long
question-like labels on the edges. What I wanted instead was for each node to
be a question (or a statement about what the node was doing), and for the
edges to be simple answers to those questions (or indicators as to the success
of the code run in a state).</p>

<p>So I added a new attribute to the state class, <code>label</code>. You can
use this attribute to say something more about the state. In my case, I used
it to store the question the state asks, or the description of the state&#x2019;s
activities. I then changed the code that creates the graph to use this
attribute in preference to the state name when creating node labels. The
result is a much more natural decision graph, as you see here</p>

<p>The release features a number of other goodies, including the elimination
of a dependence on the <code>Clone</code> module, and thus also a big memory
savings. There is now a lot more control over the format of graphs, too.
Enjoy!</p>
]]></content:encoded>
  </item>

  <item rdf:about="http://www.justatheory.com/computers/programming/perl/stepped_series.html">
    <title>Stepped Series of Numbers in Perl</title>
    <link>http://www.justatheory.com/computers/programming/perl/stepped_series.html</link>
    <description></description>
    <dc:subject>/computers/programming/perl</dc:subject>
    <dc:creator>David Wheeler</dc:creator>
    <dc:date>2006-07-04T00:33-08:00</dc:date>
    
    <cc:license rdf:resource="http://creativecommons.org/licenses/by-nc/2.0" />
    <content:encoded><![CDATA[<p>In working on a Perl validation function for GTINs
(recipe <a
href="http://www.gs1.org/productssolutions/idkeys/support/check_digit_calculator.html#how"
title="GTIN/EAN/UPC validation tables">here</a>), I found a need to generate a
series of numbers with a step of two. For example, I in the series 1-10, I
first want 1, 3, 5, 7, and 9. And then later I want 2, 4, 6, 8, 10. Here&#x2019;s how
I went about creating those series in my GTIN function to create hash
slices:</p>

<pre>
sub isa_gtin {
    my @nums = reverse split q{}, shift;
    (
        sum( @nums[ grep {   $_ % 2  } 0..$#nums ] ) * 3
      + sum( @nums[ grep { !($_ % 2) } 0..$#nums ] )
    ) % 10 == 0;
}
</pre>

<p>But it seems wasteful to generate the series of numbers twice and to
calculate whether they&#x2019;re odd or even twice. Surely there&#x2019;s a more efficient
way to do this in Perl, perhaps even more expressive? Python seems to have a
useful syntax for creating array slices that step. In Python, I&#x2019;d do something
like this:</p>

<pre>
  sum( nums[1:10:2] ) * 3 + sum( nums[2:10:2])
</pre>

<p>But barring such a slice feature in Perl is there some cleaner way than the
ugly <code>grep</code> approach I created to generate a stepped series in
Perl?</p>
]]></content:encoded>
  </item>

  <item rdf:about="http://www.justatheory.com/computers/programming/perl/dbi/subclass_in_shell.html">
    <title>Hack: Force DBI::Shell to use a DBI Subclass</title>
    <link>http://www.justatheory.com/computers/programming/perl/dbi/subclass_in_shell.html</link>
    <description></description>
    <dc:subject>/computers/programming/perl/dbi</dc:subject>
    <dc:creator>David Wheeler</dc:creator>
    <dc:date>2006-04-18T18:32-08:00</dc:date>
    
    <cc:license rdf:resource="http://creativecommons.org/licenses/by-nc/2.0" />
    <content:encoded><![CDATA[<p>So I just had a need to use DBI::Shell with a subclass of the DBI. It doesn&#x2019;t support subclasses directly (it&#x2019;d be nice to be able to specify one on the command-line or something), but I was able to hack it into using one anyway by doing this:</p>

<pre>
use My::DBI;
BEGIN {
    sub DBI::Shell::Base::DBI () { &#x0027;My::DBI&#x0027; };
}
use DBI::Shell;
</pre>

<p>Yes, it&#x2019;s extremely sneaky. DBI::Shell::Base uses the string constant <code>DBI</code>, as in <code>DBI-&gt;connect(...)</code>, so by shoving a constant into DBI::Shell::Base before loading DBI::Shell, I convince it to use my subclass, instead.</p>
]]></content:encoded>
  </item>

  <item rdf:about="http://www.justatheory.com/computers/programming/perl/modules/svnnotify-2.57.html">
    <title>SVN::Notify 2.57 Supports Windows</title>
    <link>http://www.justatheory.com/computers/programming/perl/modules/svnnotify-2.57.html</link>
    <description></description>
    <dc:subject>/computers/programming/perl/modules</dc:subject>
    <dc:creator>David Wheeler</dc:creator>
    <dc:date>2006-04-06T23:08-08:00</dc:date>
    
    <cc:license rdf:resource="http://creativecommons.org/licenses/by-nc/2.0" />
    <content:encoded><![CDATA[<p>So I finally got &#x2018;round to porting <a href="http://search.cpan.org/dist/SVN-Notify/" title="SVN::Notify on CPAN">SVN::Notify</a> to Windows. Version 2.57 is making is way to CPAN right now. The solution turned out to be dead simple: I just had to use a different form of piping <code>open()</code> on Windows, i.e., <code>open FH, &quot;$cmd|&quot;</code> instead of <code>open FH, &quot;-|&quot;; exec($cmd);</code>. It&#x2019;s silly, really, but it works. It really makes me wonder why <code>-|</code> and <code>|-</code> haven&#x2019;t been emulated on Windows. Whatever.</p>

<p>&#x2019;Course the other thing I realized, after I made this change and all the tests pass, was that there is no equivalent of <em>sendmail</em> on Windows. So I added the <code>&#x2014;smtp</code> option, so that now email can be sent to an SMTP server rather than to a local <em>sendmail</em>. I tested it out, and it seems to work, but I&#x2019;d be especially interested to hear from folks using wide characters in their repositories: do they get printed properly to Net::SMTP&#x2019;s connection?</p>

<p>The whole list of changes in 2.57 (the output remains the same as in <a href="http://www.justatheory.com/computers/programming/perl/modules/svnnotify-2.56_colordiff_example.html" title="Example output from SVN::Notify 2.56">2.56</a>):</p>

<ul>
      <li>Finally ported to Win32. It was actually a simple matter of changing
        how command pipes are created.</li>
      <li>Added <code>&#x2014;smtp</code> option to enable sending messages to an SMTP server
        rather than to the local <em>sendmail</em> application. This is essential for
        Windows support.</li>
      <li>Added <code>&#x2014;io-layer</code> to the usage statement in <em>svnnotify</em>.</li>
      <li>Fixed single-dash arguments in documentation so that they&#x2019;re all
        documented with a single dash in SVN::Notify.</li>
</ul>

<p>Enjoy!</p>
]]></content:encoded>
  </item>

  <item rdf:about="http://www.justatheory.com/computers/programming/perl/modules/svnnotify_2.56.html">
    <title>SVN::Notify 2.56 Adds Alternative Formats</title>
    <link>http://www.justatheory.com/computers/programming/perl/modules/svnnotify_2.56.html</link>
    <description></description>
    <dc:subject>/computers/programming/perl/modules</dc:subject>
    <dc:creator>David Wheeler</dc:creator>
    <dc:date>2006-04-05T00:14-08:00</dc:date>
    
    <cc:license rdf:resource="http://creativecommons.org/licenses/by-nc/2.0" />
    <content:encoded><![CDATA[<p>I&#x2019;ve just uploaded <a href="http://search.cpan.org/dist/SVN-Notify/" title="SVN::Notify on CPAN">SVN::Notify</a> 2.56 to CPAN. Check a mirror near you! There have been a lot of changes since I last posted about SVN::Notify (for the <a href="/computers/programming/perl/modules/svnnotify_2.50.html" title="SVN::Notify 2.50 Announcement">2.50 release</a>), not least of which is that SourceForge has <a href="http://sourceforge.net/docs/E09#svn_notify" title="SourceForge: Commit Notifications via Email (SVN::Notify)">standardized on it</a> for their Subversion roll out. W00t! The result was a couple of patches from SourceForge&#x2019;s David Burley to add headers and footers and to truncate diffs over a certain size. See the <a href="http://www.justatheory.com/computers/programming/perl/modules/svnnotify-2.56_colordiff_example.html" title="Example output from SVN::Notify 2.56">sample output</a> for how it looks. Thanks, David!</p>

<p>The change I&#x2019;m most pleased with in 2.56 is the addition of SVN::Notify::Alternative, based on a submission from Jukka Zitting. This new subclass allows you to actually combine a number of other subclasses into a single activity notification message. Why? Well, mainly because, though you might like to get HTML messages with colorized diffs, some mail clients might not care for the HTML. They would much prefer the plain text version.</p>

<p>SVN::Notify::Alternative allows you to have your cake and eat it too: send a single message with <code>multipart/alternative</code> sections for both HTML output and plain text. Plain text will always be used; to use HTML::ColorDiff with it, just do this:</p>

<pre>
svnnotify &#x2014;repos-path &quot;$1&quot; &#x2014;revision &quot;$2&quot; \
  &#x2014;to developers@example.com &#x2014;handler Alternative \
  &#x2014;alternative HTML::ColorDiff &#x2014;with-diff
</pre>

<p>This incantation will send an email with both the plain text and HTML::ColorDiff formats. If you look at it in Mail.app, you&#x2019;ll see the nice colorized format, and if you look at it in <code>pine</code>, you&#x2019;ll see the plain text.</p>

<p>For the curious, here are all of the changes since 2.50:</p>

<dl>
  <dt>2.56  2006-04-04T23:16:37</dt>
  <dd>
    <ul>
      <li>Abstracted creation of the diff file handle into the new <code>diff_handle()</code>
        method.</li>
      <li>Documented use of <code>diff_handle()</code> in the output() method.</li>
      <li>Added optional second argument to <code>output()</code> to optionally suppress the
        output of the email headers. This argument is used by the new
        Alternative subclass.</li>
      <li>Added SVN::Notify::Alternative, which allows multiple versions of a
        commit email to be sent, such as text/plain plus HTML. The multiple
        versions are assembled into a single email message using the
        multipart/alternative media type. For those who want HTML messages but
        must support users that can only read plain text or rely on archives
        that ignore HTML messages, this can be very useful. Based on an
        implementation by Jukka Zitting.</li>
      <li>Fixed <code>use_ok()</code> tests that weren&#x2019;t running at all.</li>
      <li>Added an extra newline to separate the file list from an inline diff
        in the plain text format where <code>&#x2014;with-diff</code> has been specified.</li>
      <li>Moved the <code>multipart/mixed</code> content-type header generation from
        <code>output_headers()</code> to <code>output_content_type()</code>, not only because this makes
        more sense, but also because it makes attachments behave better when
        using SVN::Notify::Alternative.</li>
      <li>Documented accessors in SVN::Notify::HTML.</li>
    </ul>
  </dd>

  <dt>2.55  2006-04-03T23:11:11</dt>
  <dd>
    <ul>
      <li>Added the <code>io-layer</code> option to specify an alternate IO layer. Will be
        most useful for those with repositories containing text in multiple
        encodings, where it should be set to <q>raw</q>.</li>
      <li>Fixed the context output in the subject for the <code>&#x2014;subject-cx</code> option
        so that it&#x2019;s smarter about determining the longest common path.
        Reported by Max Horn.</li>
      <li>No longer modifying the values of the <code>to_regex_map</code> hash, so as not
        to mess with folks who might be passing it as a hash to more than one
        call to <code>new()</code>. Reported by Darby Felton.</li>
      <li>Added a <code>meta http-equiv=&quot;content-type&quot;</code> tag to HTML output that
        includes the character set to help some clients in the proper display
        of the characters in an HTML email. I&#x2019;m not sure if any clients
        actually need this help, but it certainly can&#x2019;t hurt!</li>
      <li>Added the <code>&#x2014;css-url</code> option to specify an alternate style sheet for
        HTML emails. SVN::Notify::HTML&#x2019;s own CSS is left in the email, as
        well, so the specified style sheet can just override the default,
        rather than have to style everything itself. Yes, it takes advantage
        of the <q>cascading</q> feature of cascading style sheets! Based on a
        suggestion by Steve James.</li>
    </ul>
  </dd>

  <dt>2.54  2006-03-06T00:33:42</dt>
  <dd>
    <ul>
      <li>Added <em>/usr/bin</em> to the list of paths searched for executables.
        Suggested by Nacho Barrientos.</li>
      <li>Added <code>&#x2014;max-diff-length</code> option. Patch from David Burley/SourceForge.</li>
    </ul>
  </dd>

  <dt>2.53  2006-02-24T21:30:48</dt>
  <dd>
    <ul>
      <li>Added <code>header</code> and <code>footer</code> attributes and command-line options to
        specify text to be put at the head and foot of each message. For HTML
        messages, the text will be escaped, unless it starts with <q>&lt;</q>, in
        which case it will be assumed to be valid HTML and will therefore not
        be escaped. Either way, it will be output between <code>&lt;div&gt;</code> tags with the
        IDs <q>header</q> or <q>footer</q> as appropriate. Based on a patch from David
        Burley/SourceForge.</li>
      <li>Fixed the executable-searching algorithm added in 2.52 to add <q>.exe</q>
        to the name of the executable being searched for if <code>$^O eq &#x0027;MSWin32&#x0027;</code>.</li>
      <li>Fixed encoding issues so that, under Perl 5.8 and later, the IO layer
        is set on file handles so as to encode input and decode output in the
        character set specified by the <code>charset</code> attribute. CPAN # 16050,
        reported by Michael Zehrer.</li>
      <li>Added a second argument to all calls to <code>encode_entities()</code> in
        SVN::Notify::HTML and SVN::Notify::HTML::ColorDiff so that only &#x0027;&gt;&#x0027;.
        &#x0027;&lt;&#x0027;, &#x0027;&amp;&#x0027;, and &#x0027;&quot;&#x0027; are escaped.</li>
      <li>Fixed a bug in the <code>_find_exe()</code> function that was attempting to modify
        a constant variable. Patch from John Peacock.</li>
      <li>Turned the <code>_find_exe()</code> function into the <code>find_exe()</code> class method,
        since subclasses (such as SVN::Notify::Mirror) might want to use it.</li>
    </ul>
  </dd>

  <dt>2.52  2006-02-19T18:50:24</dt>
  <dd>
    <ul>
      <li>Now uses <code>File::Spec-&gt;path</code> to search for a validate <em>sendmail</em> or <em>svnlook</em>
        when they&#x2019;re not specified via their respective command-line options or
        environment variables. Suggested by Andreas Koenig. Not that they
        should probably be explicitly set anyway, as the <code>$PATH</code> environment
        variable tends to be non-existent when running under Apache.</li>
    </ul>
  </dd>

  <dt>2.51  2006-01-02T23:28:11</dt>
  <dd>
    <ul>
      <li>Fixed ColorDiff HTML to once again be valid XHTML 1.1.</li>
    </ul>
  </dd>
</dl>

<p>Enjoy!</p>
]]></content:encoded>
  </item>

  <item rdf:about="http://www.justatheory.com/computers/programming/perl/port_svn_notify_to_windows.html">
    <title>Port SVN::Notify to Windows</title>
    <link>http://www.justatheory.com/computers/programming/perl/port_svn_notify_to_windows.html</link>
    <description></description>
    <dc:subject>/computers/programming/perl</dc:subject>
    <dc:creator>David Wheeler</dc:creator>
    <dc:date>2006-02-23T19:04-08:00</dc:date>
    
    <cc:license rdf:resource="http://creativecommons.org/licenses/by-nc/2.0" />
    <content:encoded><![CDATA[<p>So <a href="http://search.cpan.org/dist/SVN-Notify/" title="SVN::Notify on CPAN">SVN::Notify</a> doesn&#x2019;t currently run on Windows. Why not? Well, because I wanted to do things as <q>rightly</q> as possible. In terms of efficiency, what that meant was, rather than slurping in whole chunks of data, such as diffs, from <em>svnlook</em>, I instead follows the guidance in <a href="http://search.cpan.org/dist/perl/pod/perlipc.pod#Safe_Pipe_Opens" title="Read about Safe Pipe Opens in the perlipc documentation">perlipc</a> to open a file handle pipe to <em>svnlook</em> and then read from it line-by-line. The method I wrote to create the pipe looks like this:</p>

<pre>
sub _pipe {
    my ($self, $mode) = (shift, shift);
    # Safer version of backtick (see perlipc(1)).
    local *PIPE;
    my $pid = open(PIPE, $mode);
    die &quot;Cannot fork: $!\n&quot; unless defined $pid;

    if ($pid) {
        # Parent process. Return the file handle.
        return *PIPE;
    } else {
        # Child process. Execute the commands.
        exec(@_) or die &quot;Cannot exec $_[0]: $!\n&quot;;
        # Not reached.
    }
}
</pre>

<p>The problem is that it doesn&#x2019;t work on Windows. perlipc says:</p>

<blockquote>
  <p>Note that these operations are full Unix forks, which means they may not be correctly implemented on alien systems. Additionally, these are not true multithreading. If you&#x2019;d like to learn more about threading, see the modules file mentioned below in the SEE ALSO section.</p>
</blockquote>

<p>&#x2019;Course, the SEE ALSO section doesn&#x2019;t have much of for <q>alien systems,</q> but I have a comment in my code that suggests that <a href="http://search.cpan.org/dist/libwin32/Process/Process.pm" title="Win32::Process on CPAN">Win32::Process</a> might do for Windows compatibility. But I honestly don&#x2019;t know.</p>

<p>So what&#x2019;s the best approach for me to port SVN::Notify to Windows while keeping file handle pipes around for efficiency? Anyone care to take a stab at it, with tests for Winows, and send me a patch?</p>
]]></content:encoded>
  </item>

  <item rdf:about="http://www.justatheory.com/computers/programming/perl/fsa_rules_annotated.html">
    <title>FSA::Rules Annotated</title>
    <link>http://www.justatheory.com/computers/programming/perl/fsa_rules_annotated.html</link>
    <description></description>
    <dc:subject>/computers/programming/perl</dc:subject>
    <dc:creator>David Wheeler</dc:creator>
    <dc:date>2006-02-14T18:02-08:00</dc:date>
    
    <cc:license rdf:resource="http://creativecommons.org/licenses/by-nc/2.0" />
    <content:encoded><![CDATA[
<p><a href="http://annocpan.org/~DWHEELER/FSA-Rules-0.23/lib/FSA/Rules.pm" title="FSA::Rules on AnnoCPAN">This is pretty cool</a>. Chris Dolan added a comment about the synopsis, pointing out that it is overly complicated (yes, Chris, it&#x2019;s that way to show off the features). But I love that a user can take the time to comment on my docs and therefore make them even better!</p>
]]></content:encoded>
  </item>

  <item rdf:about="http://www.justatheory.com/computers/programming/perl/perltidy_method_blocks.html">
    <title>How Do I Tweak Perltidy Method/Funtion-call blocks?</title>
    <link>http://www.justatheory.com/computers/programming/perl/perltidy_method_blocks.html</link>
    <description></description>
    <dc:subject>/computers/programming/perl</dc:subject>
    <dc:creator>David Wheeler</dc:creator>
    <dc:date>2006-01-12T19:29-08:00</dc:date>
    
    <cc:license rdf:resource="http://creativecommons.org/licenses/by-nc/2.0" />
    <content:encoded><![CDATA[<pre>
my $process = Background->new($^X, &quot;-I$lib&quot;,
                              &quot;-MMyLong:Namespace::Bar::Bat&quot;,
                              &quot;-e 1&quot;, &quot;other&quot;, &quot;arguments&quot;, &quot;here&quot;);
</pre>

<p>Perltidy witll turn it into this:</p>

<pre>
my $process = Background->new( $^X, &quot;-I$lib&quot;, &quot;-MMyLong:Namespace::Bar::Bat&quot;,
    &quot;-e 1&quot;, &quot;other&quot;, &quot;arguments&quot;, &quot;here&quot; );
</pre>

<p>That&#x2019;s a little better, but I&#x2019;d much rather that it made it look like this:</p>

<pre>
my $process = Background->new(
    $^X,    &quot;-I$lib&quot;, &quot;-MMyLong:Namespace::Bar::Bat&quot;,
    &quot;-e 1&quot;, &quot;other&quot;,  &quot;arguments&quot;, &quot;here&quot;,
);
</pre>

<p>Or even this:</p>

<pre>
my $process = Background->new(
    $^X,
    &quot;-I$lib&quot;,
    &quot;-MMyLong:Namespace::Bar::Bat&quot;,
    &quot;-e 1&quot;,
    &quot;other&quot;,
    &quot;arguments&quot;,
    &quot;here&quot;,
);
</pre>

<p>Anyone know how to get it to do that? If so, please leave a comment!</p>
]]></content:encoded>
  </item>

  <item rdf:about="http://www.justatheory.com/computers/programming/perl/which_digest.html">
    <title>Which Digest Should I Use?</title>
    <link>http://www.justatheory.com/computers/programming/perl/which_digest.html</link>
    <description></description>
    <dc:subject>/computers/programming/perl</dc:subject>
    <dc:creator>David Wheeler</dc:creator>
    <dc:date>2005-12-24T01:35-08:00</dc:date>
    
    <cc:license rdf:resource="http://creativecommons.org/licenses/by-nc/2.0" />
    <content:encoded><![CDATA[
<p>With the recent <a href="http://it.slashdot.org/article.pl?sid=05/11/15/2037232">release of MD5 collision code</a>, I&#x2019;m reading that it&#x2019;s long since time that MD5 was dropped from applications. But it seems that SHA-1 isn&#x2019;t well-thought of anymore, either. So what should Perl programmers use now, instead?  <a href="http://search.cpan.org/dist/Digest-Whirlpool/">Digest::Whirlpool</a>? <a href="http://search.cpan.org/dist/Digest-SHA1/">Digest::SHA2</a>? <a href="http://search.cpan.org/dist/Digest-Tiger/">Digest::Tiger</a>? <a href="http://search.cpan.org/dist/Digest-Haval256/">Digest::Haval256</a>? A combination of these? Something else? I mainly used MD5 for hasing passwords. What&#x2019;s the best choice for that use? For other uses?</p>
]]></content:encoded>
  </item>

  <item rdf:about="http://www.justatheory.com/computers/programming/perl/perltidy_in_emacs.html">
    <title>Use Perltidy in Emacs</title>
    <link>http://www.justatheory.com/computers/programming/perl/perltidy_in_emacs.html</link>
    <description></description>
    <dc:subject>/computers/programming/perl</dc:subject>
    <dc:creator>David Wheeler</dc:creator>
    <dc:date>2005-12-22T20:38-08:00</dc:date>
    
    <cc:license rdf:resource="http://creativecommons.org/licenses/by-nc/2.0" />
    <content:encoded><![CDATA[<pre>
(defun perltidy ()
  &quot;Run perltidy on the current region or buffer.&quot;
  (interactive)
  (save-excursion
    (unless mark-active (mark-defun))
    (shell-command-on-region (point) (mark) &quot;perltidy -q&quot; nil t)))

(global-set-key &quot;\C-ct&quot; &#x0027;perltidy)
</pre>

<p>With Perltidy installed and this function thrown into your <em>~/.emacs</em> file, you can run <code>perltidy</code> on a region by just hitting <code>C-C t</code>. If no region is selected, it&#x2019;ll run <code>perltidy</code> on the whole buffer.</p>
]]></content:encoded>
  </item>

  <item rdf:about="http://www.justatheory.com/computers/programming/perl/modules/svnnotify_2.50.html">
    <title>SVN::Notify 2.50</title>
    <link>http://www.justatheory.com/computers/programming/perl/modules/svnnotify_2.50.html</link>
    <description></description>
    <dc:subject>/computers/programming/perl/modules</dc:subject>
    <dc:creator>David Wheeler</dc:creator>
    <dc:date>2005-11-11T00:23-08:00</dc:date>
    
    <cc:license rdf:resource="http://creativecommons.org/licenses/by-nc/2.0" />
    <content:encoded><![CDATA[<p>Here are all of the changes since the last version:</p>

<dl>

  <dt>2.50  2005-11-10T23:27:22</dt>
  <dd>
    <ul>
      <li>Added <code>&#x2014;ticket-url</code> and <code>&#x2014;ticket-regex</code>
        options to be used by those who want to match ticket identifers for
        systems other than RT, Bugzilla, GNATS, and JIRA. Based on a patch
        from Andrew O&#x2019;Brien.</li>
      <li>Removed bogus <code>use lib</code> line put
        into <em>Makefile.PL</em> by a prerelease version of Module::Build.</li>
      <li>Fixed HTML tests to match either <q>&#x0027;</q>
        or <q>&amp;#39;</q>, since HTML::Entities can be configured
        differently on different systems.</li>
    </ul>
  </dd>

  <dt>2.49  2005-09-29T17:26:14</dt>
  <dd>
    <ul>
      <li>Now require Getopt::Long 2.34 so that
        the <code>&#x2014;to-regex-map</code> option works correctly when it is used
        only once on the command-line.</li>
    </ul>
  </dd>

  <dt>2.48  2005-09-06T19:14:35</dt>
  <dd>
    <ul>
      <li>Swiched from <code>&lt;span class=&quot;add&quot;&gt;</code> and
        <code>&lt;span class=&quot;rem&quot;&gt;</code>
        to <code>&lt;ins&gt;</code> and <code>&lt;del&gt;</code> elements in
        SVN::Notify::HTML::ColorDiff in order to make the markup more
        semantic.</li>
    </ul>
  </dd>

  <dt>2.47  2005-09-03T18:54:43</dt>
  <dd>
    <ul>
      <li>Fixed options tests to work correctly with older versions of
        Getopt::Long. Reported by Craig McElroy.</li>
      <li>Slick new CSS treatment used for the HTML and HTML::ColorDiff emails.
        Based on a patch from Bill Lynch.</li>
      <li>Added <code>&#x2014;svnweb-url</code> option. Based on a patch from
      Ricardo Signes.</li>
    </ul>
  </dd>

  <dt>2.46  2005-05-05T05:22:54</dt>
  <dd>
    <ul>
      <li>Added support for <q>Copied</q> files to HTML::ColorDiff so that
        they display properly.</li>
    </ul>
  </dd>

  <dt>2.45  2005-05-04T20:38:18</dt>
  <dd>
    <ul>
      <li>Added support for links to
        the <a href="http://www.gnu.org/software/gnats/" title="GNATS: The GNU
        Bug Tracking System">GNATS</a> bug tracking system. Patch from Nathan
        Walp.</li>
    </ul>
  </dd>

  <dt>2.44  2005-03-18T06:10:01</dt>
  <dd>
    <ul>
      <li>Fixed Name in POD so that SVN::Notify&#x2019;s POD gets indexed by
        <a href="http://search.cpan.org/" title="CPAN
        Search">search.cpan.org</a>. Reported by Ricardo Signes.</li>
    </ul>
  </dd>

  <dt>2.43  2004-11-24T18:49:40</dt>
  <dd>
    <ul>
      <li>Added <code>&#x2014;strip-cx-regex</code> option to strip out parts of the
        context from the subject. Useful for removing parts of the file names
        you might not be interested in seeing in every commit message.</li>
      <li>Added <code>&#x2014;no-first-line</code> option to omit the first sentence
        or line of the log message from the subject. Useful in combination
        with the <code>&#x2014;subject-cx</code> option.</li>
    </ul>
  </dd>

  <dt>2.42  2004-11-19T18:47:20</dt>
  <dd>
    <ul>
      <li>Changed <q>Files</q> to <q>Paths</q> in hash returned by
        <code>file_label_map()</code> since directories can be listed as well
        as files.</li>
      <li>Fixed SVN::Notify::HTML so that directories listed among the
        changed paths are not links.</li>
      <li>Requiring Module::Build 0.26 to make sure that the installation
        works properly. Reported by Robert Spier.</li>
    </ul>
  </dd>
</dl>

<p>Enjoy!</p>
]]></content:encoded>
  </item>

  <item rdf:about="http://www.justatheory.com/computers/programming/perl/split_words.html">
    <title>Splitting Words in Perl</title>
    <link>http://www.justatheory.com/computers/programming/perl/split_words.html</link>
    <description></description>
    <dc:subject>/computers/programming/perl</dc:subject>
    <dc:creator>David Wheeler</dc:creator>
    <dc:date>2005-09-07T19:40-08:00</dc:date>
    
    <cc:license rdf:resource="http://creativecommons.org/licenses/by-nc/2.0" />
    <content:encoded><![CDATA[<p>After looking at discussions in <a href="http://www.amazon.com/exec/obidos/ASIN/0596003137/justatheory-20" title="Buy &#x201c;The Perl Cookbook&#x201d; on Amazon.com"><cite>The Perl Cookbook</cite></a> and <a href="http://www.amazon.com/exec/obidos/ASIN/0596002890/justatheory-20" title="Buy &#x201c;Mastering Regular Expressions&#x201d; on Amazon.com"><cite>Mastering Regular Expressions</cite></a>, I settled on using Friedl&#x2019;s pattern for identifying the starting boundary of words, which is <code>qr/(?&lt;!\w)(?=\w)/msx</code>. This pattern will turn the string, <q>this is O&#x0027;Reilly&#x0027;s string</q> into the following tokens:</p>

<pre>
[
    q{this },
    q{is },
    q{O&#x0027;},
    q{Reilly&#x0027;},
    q{s },
    q{string},
];
</pre>

<p>So it&#x2019;s imperfect, but it works well enough for me. I&#x2019;m thinking of using the Unicode character class for words, instead, at least for more recent versions of Perl that understand them (5.8.0 and later?). That would be <code>/(?&lt;!\p{IsWord})(?=\p{IsWord})/msx</code>. The results using that regular expression are the same.</p>

<p>But otherwise, I&#x2019;m not sure whether or not this is the best approach. I think that it&#x2019;s good enough for the general cases I have, and the matching of words in and of themselves is not that important. What I mean is that, as long as most tokens are words, it&#x2019;s okay with me if some, such as <q>O&#x0027;</q>, <q>Reilly&#x0027;</q>, and <q>s </q> in the above example, are not words. What I don&#x2019;t know is how well it&#x2019;ll work for non-Roman glyphs, such as in Japanese or Korean text. I tried a test on a Korean string I have lying around (borrowed from the Encode.pm test suite), but it didn&#x2019;t split it up at all (with <code>use utf8;</code>).</p>

<p>So what do you think? Does <a href="http://search.cpan.org/dist/Text-WordDiff/" title="Text::WordDiff on CPAN">Text::WordDiff</a> work for your text? Is there a better and more general solution for tokenizing the words in a string?</p>
]]></content:encoded>
  </item>

  <cc:License rdf:about="http://creativecommons.org/licenses/by-nc/2.0/">
    <cc:permits rdf:resource="http://web.resource.org/cc/Reproduction" />
    <cc:permits rdf:resource="http://web.resource.org/cc/Distribution" />
    <cc:requires rdf:resource="http://web.resource.org/cc/Notice" />
    <cc:requires rdf:resource="http://web.resource.org/cc/Attribution" />
    <cc:prohibits rdf:resource="http://web.resource.org/cc/CommercialUse" />
    <cc:permits rdf:resource="http://web.resource.org/cc/DerivativeWorks" />
  </cc:License>
</rdf:RDF>
