parse(3)
NAME
Tk::Parse - Parse perl's pod files (code deprecated. Use
Pod::Parser instead).
SYNOPSIS
THIS TK SNAPSHOT SHOULD BE REPLACED BY CPAN MODULE: Pod::Parser
DESCRIPTION
A module designed to simplify the job of parsing and for
matting ``pods'', the documentation format used by perl5.
This consists of several different functions to present
and modify predigested pod files.
GUESSES
This is a work in progress, so I may have some stuff
wrong, perhaps badly. Some of my more reaching guesses:
- · An =index paragraph should be split into lines, and
- each line placed inside an `X' formatting command
which is then preprended to the next paragraph, like
this:
=index foo
foo2
foo3
foo2!subfooFoo! - Will become:
X<foo>X<foo2>X<foo3>X<foo2!subfoo>Foo! - · A related change: that an `X' command is to be used
for indexing data. This implies that all formatters
need to at least ignore the `X' command. - · Inside an =command, no special significance is to be
placed on the first line of the argument. Thus the
following two lines should be parsed identically:
=item 1. ABC=item 1.
ABCNote that neither of these are identical to this:
=item 1.ABCwhich puts the "ABC" in a separate paragraph. - · I actually violate this rule twice: in parsing =index
commands, and in passing through the =pragma commands.
I hope this make sense. - · I added the =comment command, which simply ignores the
next paragraph
- · I also added =pragma, which also ignores the next
paragraph, but this time it gives the formatter a
chance at doing something sinister with it.
POD CONVENTIONS
This module has two goals: first, to simplify the usage of
the pod format, and secondly the codification of the pod
format. While perlpod contains some information, it hardly
gives the entire story. Here I present "the rules", or at
least the rules as far as I've managed to work them out.
- Paragraphs: The basic element
- The fundamental "atom" of a pod file is the paragraph,
where a paragraph is defined as the text up to the
next completely blank line ("0). Any pod parser
will read in paragraphs sequentially, deciding what do
to with each based solely on the current state and on
the text at the _beginning_ of the paragraph. - Commands: The method of communication
- A paragraph that starts with the `=' symbol is assumed
to be a special command. All of the alphanumeric
characters directly after the `=' are assumed to be
part of the name of the command, up to the first
whitespace. Anything past that whitespace is consid
ered "the arugment", and the argument continues up
till the end of the paragraph, regardless of newlines
or other whitespace. - Text: Commands that aren't Commands
- A paragraph that doesn't start with `=' is treated as
either of two types of text. If it starts with a space
or tab, it is considered a verbatim paragraph, which will be printed out... verbatim. No formatting changes
whatsover may be done. (Actually, this isn't quite
true, but I'll get back to that at a later date.) - A paragraph that doesn't start with whitespace or `='
is assumed to consist of formmated text that can be
molded as the formatter sees fit. Reformatting to fit
margins, whatever, it's fair game. These paragraphs
also can contain a number of different formatting
codes, which verbatim paragraphs can't. These format
ting codes are covered later. - =cut: The uncommand
- There is one command that needs special mention: =cut.
Anything after a paragraph starting with =cut is sim
ply ignored by the formatter. In addition, any text
before a valid command is equally ignored. Any valid
`=' command will reenable formating. This fact is used
to great benefit by Perl, which is glad to ignore any
thing between an `=' command and `=cut', so you can
embed a pod document right inside a perl program, and
neither will bother the other. - Reference to paragraph commands
- =cut
Ignore anything till the next paragraph starting
with `='. - =head1
A top-level heading. Anything after the command
(either on the same line or on further lines) is
included in the heading, up until the end of the
paragraph. - =head2
Secondary heading. Same as =head1, but different.
No, there isn't a head3, head4, etc. - =over [N]
Start a list. The "N" is the number of characters
to indent by. Not all formatters will listen to
this, though. A good number to use is 4.While =over sounds like it should just be indenta
tion, it's more complex then that. It actually
starts a nested environment, specifically for the
use of =item's. As this command recurses properly,
you can use more then one, you just have to make
sure they are closed off properly by =back com
mands. - =back
Ends the last =over block. Resets the indentation
to whatever it was previously. Closes off the list
of =item's. - =item
The point behind =over and =back. This command
should only be used between them. The argument
supplied should be consistent (within a list) to
one of three types: enumeration, itemization, or
description. To exemplify:An itemized list
=over 4=item *A bulleted item=item *Another bulleted item=backAn enumerated list
=over 4=item 1.First item.=item 2.Second item.=backA described list
=over 4=item Item #1First item=item Item #2 (which isn't really like #1, butis the second).Second item=backIf you aren't consistent about the arguments to
=item, Pod::Parse will complain. - =comment
Ignore this paragraph
- =pragma
Ignore this paragraph, as well, unless you know
what you are doing. - =index
Undecided at this time, but probably magic involv
ing X<>. - Reference to formatting directives
- B<...>
Format text inside the brackets as bold.
- I<...>
Format text inside the brackets as italics.
- Z<> Replace with a zero-width character. You'll proba
bly figure out some uses for this.
- And yet more that I haven't described yet...
USAGE
Parse
This function takes a list of files as an argument. If no
argument is given, it defaults to the contents of @ARGV.
Parse then reads through each file and returns the data as
a list. Each element of this list will be a nested list
containing data from a paragraph of the pod file. Elements
pertaining to "=over" paragraphs will themselves contain
the nested entries for all of the paragraphs within that
list. Thus, it's easier to parse the output of Parse using
a recursive parses. (Um, did that parse?)
It is highly recommended that you use the output of Sim
plify, not Parse, as it's simpler.
The output will consist of a list, where each element in
the list matches one of these prototypes:
- [0,0,0,0,$filename]
- This is produced at the beginning of each file parsed,
where $filename is the name of that file. - [-1,0,0,0,$filename]
- End of same.
- [1,$line,$pos,0,$verbatim]
- This is produced for each paragraph of verbatim text.
$verbatim is the text, $line is the line offset of the
paragraph within the file, and $pos is the byte off
set. (In all of the following elements, $pos and $line
have identical meanings, so I'll skip explaining them
each time.) - [2,$line,$pos,$level,$heading]
- Producded by a =head1 or =head2 command. $level is
either 1 or 2, and $heading is the argument. - [3,$line,$pos,0,$item]
- $item is the argument from an =item paragraph.
- [4,$line,$pos,0,$index]
- $index is the argument from an =index paragraph.
- [6,$line,$pos,0,$text]
- Normal formatted text paragraph. $text is the text.
- [7,$line,$pos,0,$pragma]
- $pragma is the argument from a =pragma paragraph.
- [8,$line,$pos,$indentation,$type,...]
- This item is produced for each matching =over/=back
pair. $indentation is the argument to =over, $type is
1 if the embedded =item's are bulleted, 2 if they are
enumerated, 3 if they are text, and 0 if there are no
items. - The "..." indicates an unlimited number of further
elements which are themselves nested arrays in exactly
the format being described. In other words, a list
item includes all the paragraphs inside the list
inside itself. (Clear? No? Nevermind.) - [9,$line,$pos,0,$cut]
- $cut contains the text from a =cut paragraph. You
shouldn't need to use this, but I _suppose_ it might
be necessary to do special breaks on a cut. I doubt it
though. This one is "depreciated", as Larry put it. Or
perhaps disappreciated. - Simplify
- This procedure takes as it's input the convoluted output
from Parse(), and outputs a much simpler array consisting of pairs of commands and arguments, designed to be easy
(easier?) to parse in your pod formatting code. - It is used very simply by saying something like:
@Pod = Simplify(Parse());- while($cmd = shift @Pod) { $arg = shift @Pod;
#...
- }
- Where #... is the code that responds to any of the com
mands from the following list. Note that you are welcome
to ignore any of the commands that you want to. Many con
tain duplicate information, or at least information that
will go unused. A formatted based on this data can be
quite simple indeed. (See pod2text for entirely too simple
an example.) - Reference to Simplify commands
- "filename"
- The argument contains the name of the pod file that is
being parsed. These will be present at the start of
each file. You should open an output file, output
headers, etc., based on this, and not when you start
parsing. - "endfile"
- The end of the file. Each file will be ended before
the next one begins, and after all files are done
with. You can do end processing here. The argument is
the same name as in "filename". - "setline"
- This gives you a chance to record the "current" input
line, probably for debugging purposes. In this case,
"current" means that the next command you see that was
derived from an input paragraph will have start at the
arguments line in the file. - "setloc"
- Same as setline, but the byte offset in the input,
instead of the line offset. - "pragma"
- The argument contains the text of a pragma command.
- "text"
- The argument contains a paragraph of formatted text.
- "verbatim"
- The argument contains a paragraph of verbatim text.
- "cut"
- A =cut command was hit. You shouldn't really need to
listen for this one. - "index"
- The argument contains an =index paragraph. (Note: Cur
rent =index commands are not fed through, but turned
into X<> commands.) - "head1"
"head2" - The argument contains the argument from a header com
mand. - "setindent"
- If you are tracking indentation, use the argument to
set the indentation level. - "listbegin"
- Start a list environment. The argument is the type of
list (1,2,3 or 0). - "listend"
- Ends a list environment. Same argument as listbegin.
- "listtype"
- The argument is the type of list. You can just record
the argument when you see one of these, instead of
paying attention to listbegin & listend. - "over"
- The argument is the indentation. It's probably better
to listen to the "list..." commands. - "back"
- Ends an "over" list. The argument is the original
indentation. - "item"
- The argument is the text of the =item command.
- Note that all of these various commands you've seen are
syncronized properly so you don't have to pay attention to
all at once, but they are all output for your benefit.
Consider the following example:
listtype 2
listbegin 2
setindent 4
over 4
item 1.
text Item #1
item 2.
text Item #2
setindent 0
listend 2
back 0
listtype 0- Normalize
- This command is normally invoked by Parse, so you
shouldn't need to deal with it. It just cleans up text a
little, turning spare '<', '>', and '&' characters into
HTML escapes (<, etc.) as well as generating warnings for
some pod formatting mistakes. - Normalize2
- A little more aggresive formating based on heuristics. Not
applied by default, as it might confuse your own heuris
tics. - %Escapes
- This hash is exported from Pod::Parse, and contains
default ASCII translations for some common HTML escape
sequences. You might like to use this as a basis for an
%HTML_Escapes array in your own formatter.
AUTHOR
Brad Appleton <bradapp@enteract.com>
Code currently maintained (but deprecated) by Achim Bohnet
<ach@mpe.mpg.de>. Use Pod::Parser instead. Send bug
reports to <ptk@lists.stanford.edu>.
- Copyright (c) 1997-1998 Brad Appleton. All rights
reserved. This program is free software; you can redis
tribute it and/or modify it under the same terms as Perl
itself.