xml::dom(3)

NAME

XML::DOM - A perl module for building DOM Level 1 compli
ant document structures

SYNOPSIS

use XML::DOM;
my $parser = new XML::DOM::Parser;
my $doc = $parser->parsefile ("file.xml");
# print all HREF attributes of all CODEBASE elements
my $nodes = $doc->getElementsByTagName ("CODEBASE");
my $n = $nodes->getLength;
for (my $i = 0; $i < $n; $i++)
{
    my $node = $nodes->item ($i);
    my $href = $node->getAttributeNode ("HREF");
    print $href->getValue . "0;
}
# Print doc file
$doc->printToFile ("out.xml");
# Print to string
print $doc->toString;
# Avoid memory leaks - cleanup  circular  references  for
garbage collection
$doc->dispose;

DESCRIPTION

This module extends the XML::Parser module by Clark
Cooper. The XML::Parser module is built on top of
XML::Parser::Expat, which is a lower level interface to
James Clark's expat library.

XML::DOM::Parser is derived from XML::Parser. It parses
XML strings or files and builds a data structure that con
forms to the API of the Document Object Model as described
at http://www.w3.org/TR/REC-DOM-Level-1. See the
XML::Parser manpage for other available features of the
XML::DOM::Parser class. Note that the 'Style' property
should not be used (it is set internally.)

The XML::Parser NoExpand option is more or less supported, in that it will generate EntityReference objects whenever
an entity reference is encountered in character data. I'm
not sure how useful this is. Any comments are welcome.

As described in the synopsis, when you create an
XML::DOM::Parser object, the parse and parsefile methods
create an XML::DOM::Document object from the specified input. This Document object can then be examined, modified
and written back out to a file or converted to a string.

When using XML::DOM with XML::Parser version 2.19 and up,
setting the XML::DOM::Parser option KeepCDATA to 1 will store CDATASections in CDATASection nodes, instead of con
verting them to Text nodes. Subsequent CDATASection nodes
will be merged into one. Let me know if this is a problem.

When using XML::Parser 2.27 and above, you can suppress
expansion of parameter entity references (e.g. %pent;) in
the DTD, by setting ParseParamEnt to 1 and ExpandParamEnt to 0. See Hidden Nodes for details.

A Document has a tree structure consisting of Node
objects. A Node may contain other nodes, depending on its
type. A Document may have Element, Text, Comment, and
CDATASection nodes. Element nodes may have Attr, Element,
Text, Comment, and CDATASection nodes. The other nodes
may not have any child nodes.

This module adds several node types that are not part of
the DOM spec (yet.) These are: ElementDecl (for <!ELEMENT
...> declarations), AttlistDecl (for <!ATTLIST ...> decla
rations), XMLDecl (for <?xml ...?> declarations) and
AttDef (for attribute definitions in an AttlistDecl.)

XML::DOM Classes

The XML::DOM module stores XML documents in a tree struc
ture with a root node of type XML::DOM::Document. Differ
ent nodes in tree represent different parts of the XML
file. The DOM Level 1 Specification defines the following
node types:

· XML::DOM::Node - Super class of all node types
· XML::DOM::Document - The root of the XML document
· XML::DOM::DocumentType - Describes the document struc
ture: <!DOCTYPE root [ ... ]>
· XML::DOM::Element - An XML element: <elem attr="val">
... </elem>
· XML::DOM::Attr - An XML element attribute: name="value"
· XML::DOM::CharacterData - Super class of Text, Comment
and CDATASection
· XML::DOM::Text - Text in an XML element
· XML::DOM::CDATASection - Escaped block of text:
<![CDATA[ text ]]>
· XML::DOM::Comment - An XML comment: <!-- comment -->
· XML::DOM::EntityReference - Refers to an ENTITY: &ent;
or %ent;
· XML::DOM::Entity - An ENTITY definition: <!ENTITY ...>
· XML::DOM::ProcessingInstruction - <?PI target>
· XML::DOM::DocumentFragment - Lightweight node for cut &
paste
· XML::DOM::Notation - An NOTATION definition: <!NOTATION
...>

In addition, the XML::DOM module contains the following
nodes that are not part of the DOM Level 1 Specification:

· XML::DOM::ElementDecl - Defines an element: <!ELEMENT
...>
· XML::DOM::AttlistDecl - Defines one or more attributes
in an <!ATTLIST ...>
· XML::DOM::AttDef - Defines one attribute in an <!ATTLIST
...>
· XML::DOM::XMLDecl - An XML declaration: <?xml ver
sion="1.0" ...>

Other classes that are part of the DOM Level 1 Spec:

· XML::DOM::Implementation - Provides information about
this implementation. Currently it doesn't do much.
· XML::DOM::NodeList - Used internally to store a node's
child nodes. Also returned by getElementsByTagName.
· XML::DOM::NamedNodeMap - Used internally to store an
element's attributes.

Other classes that are not part of the DOM Level 1 Spec:

· XML::DOM::Parser - An non-validating XML parser that
creates XML::DOM::Documents
· XML::DOM::ValParser - A validating XML parser that cre
ates XML::DOM::Documents. It uses XML::Checker to check
against the DocumentType (DTD)
· XML::Handler::BuildDOM - A PerlSAX handler that creates
XML::DOM::Documents.

XML::DOM package

Constant definitions
The following predefined constants indicate which type
of node it is.
UNKNOWN_NODE (0) The node type is unknown
(not part of DOM)
ELEMENT_NODE (1) The node is an Element.
ATTRIBUTE_NODE (2) The node is an Attr.
TEXT_NODE (3) The node is a Text node.
CDATA_SECTION_NODE (4) The node is a CDATASec
tion.
ENTITY_REFERENCE_NODE (5) The node is an EntityRef
erence.
ENTITY_NODE (6) The node is an Entity.
PROCESSING_INSTRUCTION_NODE (7) The node is a Processin
gInstruction.
COMMENT_NODE (8) The node is a Comment.
DOCUMENT_NODE (9) The node is a Document.
DOCUMENT_TYPE_NODE (10) The node is a Document
Type.
DOCUMENT_FRAGMENT_NODE (11) The node is a Document
Fragment.
NOTATION_NODE (12) The node is a Notation.
ELEMENT_DECL_NODE (13) The node is an ElementDe
cl (not part of DOM)
ATT_DEF_NODE (14) The node is an AttDef
(not part of DOM)
XML_DECL_NODE (15) The node is an XMLDecl
(not part of DOM)
ATTLIST_DECL_NODE (16) The node is an AttlistDe
cl (not part of DOM)
Usage:

if ($node->getNodeType == ELEMENT_NODE)
{
print "It's an Element";
}
Not In DOM Spec: The DOM Spec does not mention UNKNOWN_NODE and, quite frankly, you should never
encounter it. The last 4 node types were added to support
the 4 added node classes.
Global Variables
$VERSION
The variable $XML::DOM::VERSION contains the version
number of this implementation, e.g. "1.39".
METHODS
These methods are not part of the DOM Level 1 Specifica
tion.
getIgnoreReadOnly and ignoreReadOnly (readOnly)
The DOM Level 1 Spec does not allow you to edit cer
tain sections of the document, e.g. the DocumentType,
so by default this implementation throws DOMExceptions
(i.e. NO_MODIFICATION_ALLOWED_ERR) when you try to
edit a readonly node. These readonly checks can be
disabled by (temporarily) setting the global Ignor
eReadOnly flag.
The ignoreReadOnly method sets the global IgnoreRead
Only flag and returns its previous value. The getIg
noreReadOnly method simply returns its current value.

my $oldIgnore = XML::DOM::ignoreReadOnly (1);
eval {
... do whatever you want, catching any other excep
tions ...
};
XML::DOM::ignoreReadOnly ($oldIgnore); # restore
previous value
Another way to do it, using a local variable:

{ # start new scope
local $XML::DOM::IgnoreReadOnly = 1;
... do whatever you want, don't worry about excep
tions ...
} # end of scope ($IgnoreReadOnly is set back to its
previous value)
isValidName (name)
Whether the specified name is a valid "Name" as speci
fied in the XML spec. Characters with Unicode values
> 127 are now also supported.
getAllowReservedNames and allowReservedNames (boolean)
The first method returns whether reserved names are
allowed. The second takes a boolean argument and sets
whether reserved names are allowed. The initial value
is 1 (i.e. allow reserved names.)
The XML spec states that "Names" starting with
(X|x)(M|m)(L|l) are reserved for future use. (Amus
ingly enough, the XML version of the XML spec
(REC-xml-19980210.xml) breaks that very rule by defin
ing an ENTITY with the name 'xmlpio'.) A "Name" in
this context means the Name token as found in the BNF
rules in the XML spec.
XML::DOM only checks for errors when you modify the
DOM tree, not when the DOM tree is built by the
XML::DOM::Parser.
setTagCompression (funcref)
There are 3 possible styles for printing empty Element
tags:
Style 0
<empty/> or <empty attr="val"/>
XML::DOM uses this style by default for all Ele
ments.
Style 1
<empty></empty> or <empty attr="val"></empty>
Style 2
<empty /> or <empty attr="val" />
This style is sometimes desired when using XHTML.
(Note the extra space before the slash "/") See
<http://www.w3.org/TR/xhtml1> Appendix C for more
details.
By default XML::DOM compresses all empty Element tags
(style 0.) You can control which style is used for a
particular Element by calling XML::DOM::setTagCompres
sion with a reference to a function that takes 2 argu
ments. The first is the tag name of the Element, the
second is the XML::DOM::Element that is being printed.
The function should return 0, 1 or 2 to indicate which
style should be used to print the empty tag. E.g.

XML::DOM::setTagCompression (my_tag_compression);
sub my_tag_compression
{
my ($tag, $elem) = @_;
# Print empty br, hr and img tags like this: <br
/>
return 2 if $tag =~ /^(br|hr|img)$/;
# Print other empty tags like this: <empty></emp
ty>
return 1;
}

IMPLEMENTATION DETAILS

· Perl Mappings
The value undef was used when the DOM Spec said null.
The DOM Spec says: Applications must encode DOMString
using UTF-16 (defined in Appendix C.3 of [UNICODE] and
Amendment 1 of [ISO-10646]). In this implementation
we use plain old Perl strings encoded in UTF-8 instead
of UTF-16.
· Text and CDATASection nodes
The Expat parser expands EntityReferences and CData
Section sections to raw strings and does not indicate
where it was found. This implementation does there
fore convert both to Text nodes at parse time. CDATA
Section and EntityReference nodes that are added to an
existing Document (by the user) will be preserved.
Also, subsequent Text nodes are always merged at parse
time. Text nodes that are added later can be merged
with the normalize method. Consider using the addText
method when adding Text nodes.
· Printing and toString
When printing (and converting an XML Document to a
string) the strings have to encoded differently
depending on where they occur. E.g. in a CDATASection
all substrings are allowed except for "]]>". In regu
lar text, certain characters are not allowed, e.g. ">"
has to be converted to "&gt;". These routines should
be verified by someone who knows the details.
· Quotes
Certain sections in XML are quoted, like attribute
values in an Element. XML::Parser strips these quotes
and the print methods in this implementation always
uses double quotes, so when parsing and printing a
document, single quotes may be converted to double
quotes. The default value of an attribute definition
(AttDef) in an AttlistDecl, however, will maintain its
quotes.
· AttlistDecl
Attribute declarations for a certain Element are
always merged into a single AttlistDecl object.
· Comments
Comments in the DOCTYPE section are not kept in the
right place. They will become child nodes of the Docu
ment.
· Hidden Nodes
Previous versions of XML::DOM would expand parameter
entity references (like %pent;), so when printing the
DTD, it would print the contents of the external
entity, instead of the parameter entity reference.
With this release (1.27), you can prevent this by set
ting the XML::DOM::Parser options ParseParamEnt => 1
and ExpandParamEnt => 0.
When it is parsing the contents of the external enti
ties, it *DOES* still add the nodes to the Document
Type, but it marks these nodes by setting the 'Hidden'
property. In addition, it adds an EntityReference node
to the DocumentType node.
When printing the DocumentType node (or when using
to_expat() or to_sax()), the 'Hidden' nodes are sup pressed, so you will see the parameter entity refer
ence instead of the contents of the external entities.
See test case t/dom_extent.t for an example.
The reason for adding the 'Hidden' nodes to the Docu
mentType node, is that the nodes may contain <!ENTITY>
definitions that are referenced further in the docu
ment. (Simply not adding the nodes to the DocumentType
could cause such entity references to be expanded
incorrectly.)
Note that you need XML::Parser 2.27 or higher for this
to work correctly.

SEE ALSO

The Japanese version of this document by Takanori Kawai
(Hippo2000) at <http://mem
ber.nifty.ne.jp/hippo2000/perltips/xml/dom.htm>

The DOM Level 1 specification at
<http://www.w3.org/TR/REC-DOM-Level-1>

The XML spec (Extensible Markup Language 1.0) at
<http://www.w3.org/TR/REC-xml>

The XML::Parser and XML::Parser::Expat manual pages.

XML::LibXML also provides a DOM Parser, and is signifi
cantly faster than XML::DOM, and is under active develop
ment. It requires that you download the Gnome libxml
library.

XML::GDOME will provide the DOM Level 2 Core API, and
should be as fast as XML::LibXML, but more robust, since
it uses the memory management functions of libgdome. For
more details see http://tjmather.com/xml-gdome/

CAVEATS

The method getElementsByTagName() does not return a "live" NodeList. Whether this is an actual caveat is debatable,
but a few people on the www-dom mailing list seemed to
think so. I haven't decided yet. It's a pain to implement,
it slows things down and the benefits seem marginal. Let
me know what you think.

(To subscribe to the www-dom mailing list send an email
with the subject "subscribe" to www-dom-request@w3.org. I
only look here occasionally, so don't send bug reports or
suggestions about XML::DOM to this list, send them to
tjmather@tjmather.com instead.)

AUTHOR

Enno Derksen is the original author.

Send bug reports, hints, tips, suggestions to T.J. Mather
at <tjmather@tjmather.com>.

Thanks to Clark Cooper for his help with the initial ver
sion.
Copyright © 2010-2025 Platon Technologies, s.r.o.           Home | Man pages | tLDP | Documents | Utilities | About
Design by styleshout