pdf::core(3)
NAME
PDF::Core - Core Library for PDF library
SYNOPSIS
use PDF::Core; $pdf=PDF::Core->new ; $pdf=PDF->new(filename); $res= $pdf->GetObject($ref); $name = UnQuoteName($pdfname); $string = UnQuoteString($pdfstring); $pdfname = QuoteName($name); $pdfhexstring = QuoteHexString($string); $pdfstring = QuoteString($string); $obj = PDFGetPrimitive (filehandle, ffset); $line = PDFGetLine (filehandle, ffset);
DESCRIPTION
The main purpose of the PDF::Core library is to provide
the data structure and the constructor for the more gen
eral PDF library.
Helper functions
This functions are not part of the class, but perform use
ful services.
UnQuoteName ( string )
This function processes quoted characters in a PDF-name.
PDF-names returned by GetObject are already processed by
this function.
Returns a string.
UnQuoteString ( string )
This function extracts the text from PDF-strings and
PDF-hexstrings. It will process all quoted characters and
remove the enclosing braces.
WARNING: The current version doesn't handle unicode
strings properly.
Returns a string.
QuoteName ( string )
This function quotes problematic characters in a PDF-name.
This function should be used before writing a PDF-name
back to a PDF-file.
Returns a string.
QuoteHexString ( string )
This function translates a string into a PDF-hexstring.
Returns a string.
QuoteString ( string )
This function translates a string into a PDF-string. Prob
lematic character will be quoted.
WARNING: The current version doesn't handle unicode
strings properly.
Returns a string.
PDFGetPrimitive ( filehandle, offset )
This internal function is used while parsing a PDF-file.
If you are not writing extentions for this library and are
parsing some special parts of the PDF-file, stay away and
use GetObject instead.
This function has many quirks and limitations. Check the
source for details.
PDFGetline ( filehandle, offset )
This internal function was used to read a line from a
PDF-file. It has many limitations and you should stay away
from it, if you don't know what you are doing. Use GetOb
ject or PDFGetPrimitive instead.
Constructor
new ( [ filename ] )
This is the constructor of a new PDF object. If the file
name is missing, it returns an empty PDF descriptor ( can
be filled with $pdf->TargetFile). Otherwise, It acts as
the PDF::Parse::TargetFile method.
Methods
The available methods are:
GetObject (reference)
This methods returns the PDF-object for reference. The
string reference must match the regular expression /^+
+ R$/, where the first number is the object number, the
second number the generation number.
The return value is a PDF-primitive, the type depends on
the content of the object:
- undef
- The object could not be found or an error. Not all
referenced objects need to be present in a PDF-file.
This value can be ignored. - Hash Reference
- If (UNIVERSAL::isa ($retval, "HASH") is true, the
object is a PDF-dictionary. The keys of the hash
should be either a PDF name (eg: /MediaBox) or a gen
erated value like Stream_Offset. Everything else is an
error. - The values of the hash can be any PDF-primitive,
including PDF-arrays and other dictionaries. - This is the most common value returned by GetObject.
If the key Stream_Offset exists, the dictionary is
followed by stream data, starting at the file offeset
indicated by this value. - Array Reference
- If (UNIVERSAL::isa ($retval, "ARRAY") is true, the
object is a PDF-array. Each element may be of a dif
ferent type, and may contain further references to
arrays or any other PDF-primitive. - String matching /d^+d+ R$/
- This is a reference to another PDF-Object. This value
can be passed to GetObject. This kind of value may
appear instead of most other types. Some PDF-writing
programs seem to have special fun writing references
when a simple number is expected. If the final number
is need, use code like this to resolve references: - while ($len =~ m/^+ + R$/) {$len = $self->GetOb
ject ($len); } - Example: 22 0 R
- String matching //^/
- This is a Name in a PDF dictionary. This string is
already processed by UnQuotName and may differ from the value in the PDF-file. In some very old andstrange
non-standard PDF-files, this may lead to confusion. - Example: /MediaBox
- String matching /^(.)*$/
- This is a string. It may contain newlines, quoted
characters und other strange stuff. Use
PDF::UnQuoteString to extract the text. - Example: (This is0 string with two lines.)
- String matching /^<.*>$/
- This is a hex encoded string. Use PDF::UnQuoteString
to extract the text. - Example: <48 45 4c4C4 F1c>
- String matching /^d[.+-]+$/
- This is probably a number.
- Example: 611
- String matching none of the above
- this is either a PDF bareword (eg. true, false, ...)
or a value generated by this method like Stream_Off
set. - Example: true
- To improve performance GetObject uses an internal cache
for objects. Repeated requests for the same objects are
not read form the file but satisfied from the cache. With
the Variable $PDF::Core::UseObjectCache, the caching mech anism can be turned off. - WARNING
- Special care must be taken, when returned objects are mod
ified. If the object contains sub-objects, the sub-objects
are not duplicated and all changes affect all other copies
of this object. Use your own copy, if you need to modify
those values.
Variables
Available variables are:
- $PDF::Core::VERSION
- Contains the version of the library installed
- $PDF::Core::UseObjectCache
- If this variable is true, all processed objects will
be added to the object cache. If only header informa
tion of a PDF are read or very big PDF are processed,
turning off the cache reduces the memory usage.
Copyright
- Copyright (c) 1998 - 2000 Antonio Rosella Italy
- antro@tiscalinet.it, Johannes Blach dw235@yahoo.com
- This library is free software; you can redistribute it
and/or modify it under the same terms as Perl itself.
Availability
The latest version of this library is likely to be avail
able from:
- http://www.geocities.com/CapeCanaveral/Hangar/4794/