domparse(3)

NAME

XML::Xerces::DOMParse - A Perl module for parsing DOMs.

SYNOPSIS

# Here;s an example that reads in an XML file from
the
#  command  line  and then removes all formatting,
re-adds
# formatting and then prints the  DOM  back  to  a
file.
use XML::Xerces;
use XML::Xerces::DOMParse;
my $parser = new XML::Xerces::DOMParser ();
$parser->parse ($ARGV[0]);
my $doc = $parser->getDocument ();
XML::Xerces::DOMParse::unformat ($doc);
XML::Xerces::DOMParse::format ($doc);
XML::Xerces::DOMParse::print (TDOUT, $doc);

DESCRIPTION

Use this module in conjunction with XML::Xerces. Once you
have read an XML file into a DOM tree in memory, this mod
ule provides routines for recursive descent parsing of the
DOM tree. It also provides three concrete and useful
functions to format, unformat and print DOM trees, all
which are built on the more general parsing functions.

FUNCTIONS

DOMParse::unformat ($node)

Processes $node and its children recursively and removes
all white space text nodes. It is often difficult to pro
cess a DOM tree with formatting while preserving reason
able formatting. Use unformat to remove formatting, then
proces the unformatted DOM, then use format to add format
ting back in that is reasonable for the new tree.

DOMParse::format ($node)

Processes $node and its children recursively and intro
duces white space text nodes to create a DOM tree that
will print with reasonable indents and newlines. Only
call format on a DOM tree that nas no formatting white
space in it. Otherwise the results will be incorrect.
Call unformat to remove formatting white space.

You can optionally set the string variable $INDENT to the
indent characters you want to use. By default it is a
single tab.

DOMParse::print ($file_handle, $node)

Processes $node and its children recursively and prints the DOM tree to $file_handle as a standard XML file. You can override printing behavior by supplying any of several "printer" functions.: $NODE_PRINTER
$DOCUMENT_NODE_PRINTER
$DOCUMENT_TYPE_NODE_PRINTER
$COMMENT_NODE_PRINTER
$TEXT_NODE_PRINTER
$CDATA_SECTION_NODE_PRINTER
$ELEMENT_NODE_PRINTER
$ENTITY_REFERENCE_NODE_PRINTER
$PROCESSING_INSTRUCTION_NODE_PRINTER
$ATTRIBUTE_PRINTER
Some of these printers call other printers. For example, $NODE_PRINTER determines the node type and calls the cor reponsing printer for that type, e.g. $ELE MENT_NODE_PRINTER. So if you replace a printer for a node which has children, you must take the responsibility for calling the child node printers.
All printers take two parameters, a file handle and the node. See DOMParse::parse_nodes and DOM Parse::parse_child_nodes for details.
It is very easy to write a replacement printer that adds value and then calls the default processing as follows.: my $original_text_node_printer = $TEXT_NODE_PRINT; ER;
$TEXT_NODE_PRINTER = my_text_node_printer;; sub my_text_node_printer {
my ($fh, $node) = @_;
# look at the text node and do something extra
return &$original_text_node_printer ($fh,

$node);; }
The $ESCAPE variable (integer) controls whether special XML characters like ampersand "&" are escaped, e.g. "&". Set $ESCAPE to 1 (default) to escape special characters, or to 0 to print characters literally.
print_string ($file_handle, $node)
Call print_string whenever you need to expand special characters (& < > ") to their escape sequence equivalents. The print_string is used extensively by the default imple mentation of DOMParse::print. When you replace various node printers, you should also be careful to use it to print node and attribute names and values (but probably not anything else).
The print function respects the global $ESCAPE flag. By default it is set to true (1) and escape conversion is performed. Set it to false (0) when you don't want escape conversion.
parse_nodes ($node, $process_node, $data)
Call parse_nodes to parse $node and all of its children recursively. Each node will be visited and your parsing function, $process_node, will be called. Optional data $data will be passed through if provided.
Your parsing funtion must have the following signature.: process_node ($node, $data)
If it returns 1 then children of $node will also be parsed. If it returns 0 then they won't. It is common to use one parsing function to get to a certain level in the DOM tree, then to return 0 and to call parse_child_nodes to parse nodes under that level with a different process ing function.
parse_child_nodes ($node, $process_node, $data)
Call to parse the children of $node recursively. This is just like parse_nodes except that $node is not parsed.
doc ($node)
Looks up the DOM tree until it finds the document node associated with the given $node. Then returns the docu ment node.
depth ($node)
Returns the depth of the specified $node in the DOM docu ment. The document has depth 0, the root node has depth 1, and so on.
element_text ($node)
It is common practice to have an element node that encloses a single text node. If you know you have such a node, you can call element_text to directly access the enclosed text as a string. This is faster than accessing the enclosed text node and then getting the value of it.
insert_before ($ref_node, $new_node)
Inserts $new_node in the DOM tree immediately before and as a sibling of $ref_node. It is safe to call insert_before while in the middle of parsing a DOM tree if $ref_node is the current node being parsed. The newly inserted node will not be parsed.
insert_after ($ref_node, $new_node)
Inserts $new_node in the DOM tree immediately after and as a sibling of $ref_node. It is safe to call insert_after while in the middle of parsing a DOM tree if $ref_node is the current node being parsed. The newly inserted node will not be parsed.
remove ($node)
Removes $node from the DOM tree. It is safe to call remove while in the middle of parsing a DOM tree if $node is the current node being parsed. The next node to be parsed will be the same that would have been parsed had $node not been removed, e.g. $node's next sibling.

AUTHORS

Tom Watson <rtwatson@us.ibm.com> wrote version 1.0 and submitted to the XML Apache project
<http://xml.apache.org>, where you can contribute to future versions and where the corresponding C++ and Java
compilers are also developed as OpenSource projects.

Jason Stewart <jason@openinformatics.com> adapted it to the Xerces-1.3 API.

BUGS

Any comments or questions about this module can be
addressed to the Xerces.pm development list
<xerces-p-dev@xml.apache.org>

docs.sk

comprehensive documentation repository

Most Viewed