Just a Theory

By David E. Wheeler

Posts about Pod

Please Test Pod::Simple 3.29_3

Pod Book

I’ve just pushed Pod-Simple 3.29_v3 to CPAN. Karl Williamson did a lot of hacking on this release, finally adding support for EBCDIC. But as part of that work, and in coordination with Pod::Simple’s original author, Sean Burke, as well as pod-people, we have switched the default encoding from Latin-1 to CP-1252.

On the surface, that might sound like a big change, but in truth, it’s pretty straight-forward. CP-1252 is effectively a superset of Latin-1, repurposing 30 or so unused control characters from Latin-1. Those characters are pretty common on Windows (the home of the CP family of encodings), especially in pastes from Word. It’s nice to be able to pick those up essentially for free.

Still, Karl’s done more than that. He also updated the encoding detection to do a better job at detecting UTF-8. This is the real default. Pod::Simple only falls back on CP1252 if there are no obvious UTF-8 byte sequences in your Pod.

Overall these changes should be a great improvement. Better encoding support is always a good idea. But it is a pretty significant change, including a change to the Pod spec. Hence the test release. Please make sure it works well with your code by installing it today:

cpan D/DW/DWHEELER/Pod-Simple-3.29_3.tar.gz
cpanm DWHEELER/Pod-Simple-3.29_3.tar.gz

Oh, and one last thing: If Pod::Simple fails to properly recognize the encoding in your Pod file, you can always use the =encoding command early in your Pod file to make it explicit:

=encoding CP1254

Pod: Now with Sane Web Links

A couple months ago, RJBS and I collaborated on adding a new feature to Pod: sane URL links. For, well, ever, the case has been that to link to URLs or any other scheme: links in Pod, You had to do something like this:

For more information, consult the pgTAP documentation:
L<http://pgtap.projects.postgresql.org/documentation.html>

The reasons why you couldn’t include text in the link to server as the link text has never been really well spelled-out. Sean Burke, the most recent author of the Pod spec, had only said that the support wasn’t there “for various reasons.”

Meanwhile, I accidentally discovered that Pod::Simple has in fact supported such formats for a long time. At some point Sean added it, but didn’t update the spec. Maybe he thought it was fragile. I have no idea. But since the support was already there, and most of the other Pod tools already support it or want to, it was a simple change to make to the spec, and it was released in Perl 5.11.3 and Pod::Simple 3.11. It’s now officially a part of the spec. The above Pod can now be written as:

For more information, consult the
L<pgTAP documentation|http://pgtap.projects.postgresql.org/documentation.html>.

So much better! And to show it off, I’ve just updated all the links in SVN::Notify and released a new version. Check it out on CPAN Search. See how the links such as to “HookStart.exe” and “Windows Subversion + Apache + TortoiseSVN + SVN::Notify HOWTO” are nice links? They no longer use the URL for the link text. Contrast with the previous version.

And as of yesterday, the last piece to allow this went into place. Andy gave me maintenance of Test::Pod, and I immediately released a new version to allow the new syntax. So update your t/pod.t file to require Test::Pod 1.41, update your links, and celebrate the arrival of sane links in Pod documentation.

Looking for the comments? Try the old layout.

More about…

Pod::Simple 3.09 Hits the CPAN

I spent some time over the last few days helping Allison fix bugs and close tickets for a new version of Pod::Simple. I’m not sure how I convinced Allison to suddenly dedicate her day to fixing Pod::Simple bugs and putting out a new release. She must’ve had some studies or Parrot spec work she wanted to get out of or something.

Either way, it’s got some useful fixes and improvements:

  • The XHTML formatter now supports tables of contents (via the poorly-named-but-consistent-with-the-HTML-formatter index parameter).

  • You can now reformat verbatim blocks via the strip_verbatim_indent parameter/method. Because you have to indent verbatim blocks (code examples) with one or more spaces, you end up with those spaces remaining in output. Just have a look at an example on search.cpan.org. See how the code in the Synopsis is indented? That’s because it’s indented in the POD. But maybe you don’t want it to be indented in your final output. If not, you can strip out leading spaces via strip_verbatim_indent. Pass in the text to strip out:

    $parser->strip_verbatim_indent('  ');

    Or a code reference that figures out what to strip out. I’m fond of stripping based on the indentation of the first line, like so:

    $new->strip_verbatim_indent(sub {
        my $lines = shift;
        (my $indent = $lines->[0]) =~ s/\S.*//;
        return $indent;
    });
  • You can now use the nocase parameter to Pod::Simple::PullParser to tell the parser to ignore the case of POD blocks when searching for author, title, version, and description information. This is a hack that Graham has used for a while on search.cpan.org, in part because I nagged him about my modules, which don’t use uppercase =head1 text. Thanks Graham!

  • Fixed entity encoding in the XHTML formatter. It was failing to encode entities everywhere except code spans and verbatim blocks. Oops. It also now properly encodes E<sol> and E<verbar>, as well as numeric entities.

  • Multiparagraph items now work properly in the XHTML formatter, as do text items (definition lists).

  • A POD tag found inside a complex POD tag (e.g., C<<< C<foo> >>>) is now properly parsed as text and entities instead of a tag embedded in a tag (e.g., <foo>). This is in compliance with perlpod.

This last item is the only change I think might lead to problems. I fixed it in response to a bug report from Schwern. The relevant bit from the perlpod spec is:

A more readable, and perhaps more “plain” way is to use an alternate set of delimiters that doesn’t require a single “>” to be escaped. With the Pod formatters that are standard starting with perl5.5.660, doubled angle brackets (“<<” and “>>”) may be used if and only if there is whitespace right after the opening delimiter and whitespace right before the closing delimiter! For example, the following will do the trick:

C<< $a <=> $b >>

In fact, you can use as many repeated angle‐brackets as you like so long as you have the same number of them in the opening and closing delimiters, and make sure that whitespace immediately follows the last ’<’ of the opening delimiter, and immediately precedes the first “>” of the closing delimiter. (The whitespace is ignored.) So the following will also work:

C<<< $a <=> $b >>>
C<<<<  $a <=> $b     >>>>

And they all mean exactly the same as this:

C<$a E<lt>=E<gt> $b>

Although all of the examples use C<< >>, it seems pretty clear that it applies to all of the span tags (B<< >>, I<< >>, F<< >>, etc.). So I made the change so that tags embedded in these “complex” tags, as comments in Pod::Simple call them, are not treated as tags. That is, all < and > characters are encoded.

Unfortunately, despite what the perlpod spec says (at least in my reading), Sean had quite a few pathological examples in the tests that expected POD tags embedded in complex POD tags to work. Here’s an example:

L<<< Perl B<Error E<77>essages>|perldiag >>>

Before I fixed the bug, that was expected to be output as this XML:

<L to="perldiag" type="pod">Perl <B>Error Messages</B></L>

After the bug fix, it’s:

<L content-implicit="yes" section="Perl B&#60;&#60;&#60; Error E&#60;77&#62;essages" type="pod">&#34;Perl B&#60;&#60;&#60; Error E&#60;77&#62;essages&#34;</L>

Well, there’s a lot more crap that Pod::Simple puts in there, but the important thing to note is that neither the B<> nor the E<> is evaluated as a POD tag inside the L<<< >>> tag. If that seems inconsistent at all, just remember that POD tags still work inside non-complex POD tags (that is, when there is just one set of angle brackets):

L<Perl B<Error E<77>essages>|perldiag>

I’m pretty sure that few users were relying on POD tags working inside complex POD tags anyway. At least I hope so. I’m currently working up a patch for blead that updates Pod::Simple in core, so it will be interesting to see if it breaks anyone’s POD. Here’s to hoping it doesn’t!

Looking for the comments? Try the old layout.

JSDoc Doesn't Quite do the Trick for Me

After my request for JavaScript documentation standards, I investigated the one I found myself: JSDoc. I went ahead and used its syntax to document a JavaScript class I’d written, and it seemed to work pretty well. Initially, my main complaint was that their was no easy way to include arbitrary documentation. Everything has to be associated with a constructor, attribute, or method. Bleh.

But then I started documenting two purely functional JavaScript files I’d written. These just create functions in the Global scope for general use. And here’s where JSDoc started to really become a PITA. First, functions with the same names in the two files were declared to be pre-declared! They two files are part of the same project, but users will generally use one or the other, not both. But JSDoc has taken it upon itself to refuse to document functions that are in two different files in the same project. Surely that’s the JavaScript interpreter’s responsibility!

The next issue I ran into (after I commented out the code in JSDoc.pm that refused to document functions with the same names) was that it didn’t recognize one of the files as having documentation, because there was no constructor. Well duh! A purely functional implementation doesn’t have a constructor! It seems that Java’s bias for OO-only implementations has unduly influenced JSDoc, even though JavaScript applications often define no classes at all!

The clincher in my decision to ditch JSDoc, however, came when I realized that, for most projects, I won’t want the documentation in the same file as the code. While I generally prefer that they be in the same file, I will often have 4-10 times more documentation than actual code, and the bandwidth overhead seems unnecessary. JavaDoc and JSDoc of course require that any documentation be in the same files, since that’s where they parse method signatures and such.

So I think I’ll follow Chris Dolan’s advice from my original post and fall back on Good ‘ole POD. POD allows me to write as much or as little documentation as I like, with methods and functions documented in an order that makes sense to me, with headings even! I can write long descriptions, synopses, and even documentation completely unrelated to specifics of the interface. And all in a separate file, even!

This will do until someone formalizes a standard for JavaScript. Maybe it’ll be KwiD?

Looking for the comments? Try the old layout.

More about…

Is there a JavaScript Library Documentation Standard?

Is there a JavaScript documentation standard? I’ve been working on a test framework for JavaScript and I’d like to integrate documentation so that others can use it.

If there isn’t a documentation standard, I can see three possible options that I’d like to suggest:

Use XHTML.

Since JavaScript is mainly used for XHTML, it makes some sense to just use XHTML for its documentation. The downside to this is that there is currently no way to parse out the documentation, AFAIK. The format for putting the docs into comments would have to be standardized. I don’t really see that happening.

Use POD.

JavaScript is a dynamic language; it’d make some sense to use the documentation format of an existing dynamic language. And POD is a proven format. The downside, of course, is that there is not a parser for pulling POD out of a .js file. Same problem as for XHTML, essentially.

Use JavaDoc

Since the syntax of JavaScript is roughly based on JavaScript, and JavaScript supports the same comment syntax, one could simply use the JavaDoc format. The javadoc application probably couldn’t parse it out too well, since it parses the Java code (or byte code?) to automatically document method names, signatures, etc.

But a quick Googling yields JSDoc as a possible solution. The only downside to the JavaDoc/JSDoc solution is that it tends to allow authors to be too lazy. Since the application automatically documents the existence of functions and their signatures, often little else is documented. But that’s mainly a personal issue; I don’t have to be so lazy in my own documentation! I think I’ll give that a shot.

Meanwhile, if anyone knows of something better/more widely used, let me know!

"Learn about JSDoc (written in Perl!) on the project home
    page"

Looking for the comments? Try the old layout.