MARC::Charset(3pm)
NAME
MARC::Charset - convert MARC-8 encoded strings to UTF-8
SYNOPSIS
# import the marc8_to_utf8 function use MARC::Charset 'marc8_to_utf8'; # prepare STDOUT for utf8 binmode(STDOUT, 'utf8'); # print out some marc8 as utf8 print marc8_to_utf8($marc8_string);
DESCRIPTION
- MARC::Charset allows you to turn MARC-8 encoded strings into UTF-8
strings. MARC-8 is a single byte character encoding that predates
unicode, and allows you to put non-Roman scripts in MARC bibliographic records. - http://www.loc.gov/marc/specifications/spechome.html
EXPORTS
- ignore_errors()
- Tells MARC::Charset whether or not to ignore all encoding errors, and
returns the current setting. This is helepfuli if you have records
that contain both MARC8 and UNICODE characters.
my $ignore = MARC::Charset->ignore_errors();MARC::Charset->ignore_errors(1); # ignore errors
MARC::Charset->ignore_errors(0); # DO NOT ignore errors - assume_unicode()
- Tells MARC::Charset whether or not to assume UNICODE when an error is
encountered in ignore_errors mode and returns the current setting.
This is helepfuli if you have records that contain both MARC8 and
UNICODE characters.
my $setting = MARC::Charset->assume_unicode();MARC::Charset->assume_unicode(1); # assume characters are unicode (utf-8) MARC::Charset->assume_unicode(0); # DO NOT assume characters are unicode - assume_encoding()
- Tells MARC::Charset whether or not to assume a specific encoding when
an error is encountered in ignore_errors mode and returns the current
setting. This is helpful if you have records that contain both MARC8
and other characters.
my $setting = MARC::Charset->assume_encoding();MARC::Charset->assume_encoding('cp850'); # assume characters are cp850 MARC::Charset->assume_encoding(''); # DO NOT assume any encoding - marc8_to_utf8()
- Converts a MARC-8 encoded string to UTF-8.
my $utf8 = marc8_to_utf8($marc8); - If you'd like to ignore errors pass in a true value as the 2nd
parameter or call MARC::Charset->ignore_errors() with a true value:
my $utf8 = marc8_to_utf8($marc8, 'ignore-errors'); - or
- utf8_to_marc8()
- Will attempt to translate utf8 into marc8.
my $marc8 = utf8_to_marc8($utf8); - If you'd like to ignore errors, or characters that can't be converted
to marc8 then pass in a true value as the second parameter:
my $marc8 = utf8_to_marc8($utf8, 'ignore-errors'); - or
DEFAULT CHARACTER SETS
- If you need to alter the default character sets you can set the
$MARC::Charset::DEFAULT_G0 and $MARC::Charset::DEFAULT_G1 variables to the appropriate character set code: - use MARC::Charset::Constants qw(:all);
$MARC::Charset::DEFAULT_G0 = BASIC_ARABIC;
$MARC::Charset::DEFAULT_G1 = EXTENDED_ARABIC;
SEE ALSO
o MARC::Charset::Constant
o MARC::Charset::Table
o MARC::Charset::Code
o MARC::Charset::Compiler
o MARC::Record
o MARC::XML
AUTHOR
- Ed Summers (ehs@pobox.com)