html::element::traverse(3)

NAME

HTML::Element traverse - discussion of HTML::Element's
traverse method

SYNOPSIS

# $element->traverse is unnecessary and obscure.
#   Don't use it in new code.

DESCRIPTION

"HTML::Element" provides a method "traverse" that tra
verses the tree and calls user-specified callbacks for
each node, in pre- or post-order. However, use of the
method is quite superfluous: if you want to recursively
visit every node in the tree, it's almost always simpler
to write a subroutine does just that, than it is to bundle
up the pre- and/or post-order code in callbacks for the
"traverse" method.

EXAMPLES

Suppose you want to traverse at/under a node $tree and
give elements an 'id' attribute unless they already have
one.

You can use the "traverse" method:
{
my $counter = 'x0000';
$start_node->traverse(
[ # Callbacks;
# pre-order callback:
sub {
my $x = $_[0];
$x->attr('id', $counter++) unless defined
$x->attr('id');
return HTML::Element::OK; # keep traversing
},
# post-order callback:
undef
],
1, # don't call the callbacks for text nodes
);
}
or you can just be simple and clear (and not have to
understand the calling format for "traverse") by writing a
sub that traverses the tree by just calling itself:

{
my $counter = 'x0000';
sub give_id {
my $x = $_[0];
$x->attr('id', $counter++) unless defined $x->at
tr('id');
foreach my $c ($x->content_list) {
give_id($c) if ref $c; # ignore text nodes
}
};
give_id($start_node);
}
See, isn't that nice and clear?
But, if you really need to know:

THE TRAVERSE METHOD

The "traverse()" method is a general object-method for
traversing a tree or subtree and calling user-specified
callbacks. It accepts the following syntaxes:

$h->traverse(callback)
or $h->traverse(callback, $ignore_text)
or $h->traverse( [pre_callback,post_callback] ,
$ignore_text)

These all mean to traverse the element and all of its
children. That is, this method starts at node $h,
"pre-order visits" $h, traverses its children, and then
will "post-order visit" $h. "Visiting" means that the
callback routine is called, with these arguments:
$_[0] : the node (element or text segment),
$_[1] : a startflag, and
$_[2] : the depth
If the $ignore_text parameter is given and true, then the
pre-order call will not be happen for text content.
The startflag is 1 when we enter a node (i.e., in preorder calls) and 0 when we leave the node (in post-order
calls).
Note, however, that post-order calls don't happen for
nodes that are text segments or are elements that are pro
totypically empty (like "br", "hr", etc.).
If we visit text nodes (i.e., unless $ignore_text is given
and true), then when text nodes are visited, we will also
pass two extra arguments to the callback:

$_[3] : the element that's the parent
of this text node
$_[4] : the index of this text node
in its parent's content list
Note that you can specify that the pre-order routine can
be a different routine from the post-order one:

$h->traverse( [pre_callback,post_callback], ...);
You can also specify that no post-order calls are to be
made, by providing a false value as the post-order rou
tine:

$h->traverse([ pre_callback,0 ], ...);
And similarly for suppressing pre-order callbacks:

$h->traverse([ 0,post_callback ], ...);
Note that these two syntaxes specify the same operation:

$h->traverse([foo,foo], ...);
$h->traverse( foo , ...);
The return values from calls to your pre- or post-order
routines are significant, and are used to control recur
sion into the tree.
These are the values you can return, listed in descending
order of my estimation of their usefulness:
HTML::Element::OK, 1, or any other true value
...to keep on traversing.
Note that "HTML::Element::OK" et al are constants. So
if you're running under "use strict" (as I hope you
are), and you say: "return HTML::Element::PRUEN" the
compiler will flag this as an error (an unallowable
bareword, specifically), whereas if you spell PRUNE
correctly, the compiler will not complain.
undef, 0, '0', '', or HTML::Element::PRUNE
...to block traversing under the current element's
content. (This is ignored if received from a postorder callback, since by then the recursion has
already happened.) If this is returned by a pre-order
callback, no post-order callback for the current node
will happen. (Recall that if your callback exits with
just "return;", it is returning undef -- at least in
scalar context, and "traverse" always calls your call
backs in scalar context.)
HTML::Element::ABORT
...to abort the whole traversal immediately. This is
often useful when you're looking for just the first
node in the tree that meets some criterion of yours.
HTML::Element::PRUNE_UP
...to abort continued traversal into this node and its
parent node. No post-order callback for the current
or parent node will happen.
HTML::Element::PRUNE_SOFTLY
Like PRUNE, except that the post-order call for the
current node is not blocked.
Almost every task to do with extracting information from a
tree can be expressed in terms of traverse operations
(usually in only one pass, and usually paying attention to
only pre-order, or to only post-order), or operations
based on traversing. (In fact, many of the other methods
in this class are basically calls to traverse() with par ticular arguments.)
The source code for HTML::Element and HTML::TreeBuilder
contain several examples of the use of the "traverse"
method to gather information about the content of trees
and subtrees.
(Note: you should not change the structure of a tree while
you are traversing it.)
[End of documentation for the "traverse()" method]
Traversing with Recursive Anonymous Routines
Now, if you've been reading Structure and Interpretation of Computer Programs too much, maybe you even want a recursive lambda. Go ahead:

{
my $counter = 'x0000';
my $give_id;
$give_id = sub {
my $x = $_[0];
$x->attr('id', $counter++) unless defined $x->at
tr('id');
foreach my $c ($x->content_list) {
$give_id->($c) if ref $c; # ignore text nodes
}
};
$give_id->($start_node);
undef $give_id;
}
It's a bit nutty, and it's still more concise than a call
to the "traverse" method!
It is left as an exercise to the reader to figure out how
to do the same thing without using a $give_id symbol at
all.
It is also left as an exercise to the reader to figure out
why I undefine $give_id, above; and why I could achieved
the same effect with any of:

$give_id = 'I like pie!';
# or...
$give_id = [];
# or even;
$give_id = sub { print "Mmmm pie!0 };
But not:

$give_id = sub { print "I'm $give_id and I like pie!0
};
# nor...
$give_id = ive_id;
# nor...
$give_id = { 'pie' => ive_id, 'mode' => 'a la' };
Doing Recursive Things Iteratively
Note that you may at times see an iterative implementation
of pre-order traversal, like so:

{
my @to_do = ($tree); # start-node
while(@to_do) {
my $this = shift @to_do;
# "Visit" the node:
$this->attr('id', $counter++)
unless defined $this->attr('id');
unshift @to_do, grep ref $_, $this->content_list;
# Put children on the stack -- they'll be visited
next
}
}
This can under certain circumstances be more efficient than just a normal recursive routine, but at the cost of
being rather obscure. It gains efficiency by avoiding the
overhead of function-calling, but since there are several
method dispatches however you do it (to "attr" and "con
tent_list"), the overhead for a simple function call is
insignificant.
Pruning and Whatnot
The "traverse" method does have the fairly neat features
of the "ABORT", "PRUNE_UP" and "PRUNE_SOFTLY" signals.
None of these can be implemented totally straightforwardly with recursive routines, but it is quite possible.
"ABORT"-like behavior can be implemented either with using
non-local returning with "eval"/"die":

my $died_on; # if you need to know where...
sub thing {
... visits $_[0]...
... maybe set $died_on to $_[0] and die "ABORT_TRAV"
...
... else call thing($child) for each child...
...any post-order visiting $_[0]...
}
eval { thing($node) };
if($@) {
if($@ =~ m<^ABORT_TRAV>) {
...it died (aborted) on $died_on...
} else {
die $@; # some REAL error happened
}
}
or you can just do it with flags:

my($abort_flag, $died_on);
sub thing {
... visits $_[0]...
... maybe set $abort_flag = 1; $died_on = $_[0]; re
turn;
foreach my $c ($_[0]->content_list) {
thing($c);
return if $abort_flag;
}
...any post-order visiting $_[0]...
return;
}
$abort_flag = $died_on = undef;
thing($node);
...if defined $abort_flag, it died on $died_on

SEE ALSO

HTML::Element

COPYRIGHT

Copyright 2000,2001 Sean M. Burke

AUTHOR

Sean M. Burke, <sburke@cpan.org>
Copyright © 2010-2025 Platon Technologies, s.r.o.           Index | Man stránky | tLDP | Dokumenty | Utilitky | O projekte
Design by styleshout