Jcode(3pm)
NAME
Jcode - Japanese Charset Handler
SYNOPSIS
use Jcode; # # traditional Jcode::convert(\$str, $ocode, $icode, "z"); # or OOP! print Jcode->new($str)->h2z->tr($from, $to)->utf8;
DESCRIPTION
<Japanese document is now available as Jcode::Nihongo. >
- Jcode.pm supports both object and traditional approach. With object
approach, you can go like; - $iso_2022_jp = Jcode->new($str)->h2z->jis;
- Which is more elegant than:
$iso_2022_jp = $str;
&jcode::convert(\$iso_2022_jp, 'jis', &jcode::getcode(\$str), "z");- For those unfamiliar with objects, Jcode.pm still supports "getcode()" and "convert()."
- If the perl version is 5.8.1, Jcode acts as a wrapper to Encode, the
standard charset handler module for Perl 5.8 or later.
Methods
Methods mentioned here all return Jcode object unless otherwise mentioned.
Constructors
- $j = Jcode->new($str [, $icode])
- Creates Jcode object $j from $str. Input code is automatically
checked unless you explicitly set $icode. For available charset, see getcode below. - For perl 5.8.1 or better, $icode can be any encoding name that Encode
understands.
$j = Jcode->new($european, 'iso-latin1'); - When the object is stringified, it returns the EUC-converted string
so you can <print $j> instead of <print $j->euc>. - Passing Reference
Instead of scalar value, You can use reference asJcode->new(\$str);This saves time a little bit. In exchange of the value of $str
being converted. (In a way, $str is now "tied" to jcode object). - $j->set($str [, $icode])
Sets $j's internal string to $str. Handy when you use Jcode object
repeatedly (saves time and memory to create object).
# converts mailbox to SJIS format
my $jconv = new Jcode;
$/ = 00;
while(<>){print $jconv->set(\$_)->mime_decode->sjis;}$j->append($str [, $icode]);Appends $str to $j's internal string.$j = jcode($str [, $icode]);shortcut for Jcode->new() so you can go like;Encoded StringsIn general, you can retrieve encoded string as $j->encoded.$sjis = jcode($str)->sjis
$euc = $j->euc
$jis = $j->jis
$sjis = $j->sjis
$ucs2 = $j->ucs2
$utf8 = $j->utf8What you code is what you get :)$iso_2022_jp = $j->iso_2022_jpSame as "$j->h2z->jis". Hankaku Kanas are forcibly converted to
Zenkaku.For perl 5.8.1 and better, you can also use any encoding names and
aliases that Encode supports. For example:
$european = $j->iso_latin1; # replace '-' with '_' for names.FYI: Encode::Encoder uses similar trick.$j->fallback($fallback)For perl is 5.8.1 or better, Jcode stores the internal string in
UTF-8. Any character that does not map to ->encoding are replaced with a '?', which is Encode standard.
my $unistr = "\x{262f}"; # YIN YANG
my $j = jcode($unistr); # $j->euc is '?'You can change this behavior by specifying fallback like Encode.
Values are the same as Encode. "Jcode::FB_PERLQQ", "Jcode::FB_XMLCREF", "Jcode::FB_HTMLCREF" are aliased to those of Encode for convenice.
print $j->fallback(Jcode::FB_PERLQQ)->euc; # '\x{262f}'
print $j->fallback(Jcode::FB_XMLCREF)->euc; # '☯'
print $j->fallback(Jcode::FB_HTMLCREF)->euc; # '☯'The global variable $Jcode::FALLBACK stores the default fallback so you can override that by assigning the value.
$Jcode::FALLBACK = Jcode::FB_PERLQQ; # set default fallback scheme[@lines =] $jcode->jfold([$width, $newline_str, $kref])folds lines in jcode string every $width (default: 72) where $width
is the number of "halfwidth" character. Fullwidth Characters are
counted as two.with a newline string spefied by $newline_str (default: "\n").Rudimentary kinsoku suppport is now available for Perl 5.8.1 and better.$length = $jcode->jlength();returns character length properly, rather than byte length.Methods that use MIME::Base64To use methods below, you need MIME::Base64. To install, simply
perl -MCPAN -e 'CPAN::Shell->install("MIME::Base64")'If your perl is 5.6 or better, there is no need since MIME::Base64 is
bundled.$mime_header = $j->mime_encode([$lf, $bpl])Converts $str to MIME-Header documented in RFC1522. When $lf is
specified, it uses $lf to fold line (default: \n). When $bpl is
specified, it uses $bpl for the number of bytes (default: 76; this
number must be smaller than 76).For Perl 5.8.1 or better, you can also encode MIME Header as:
$mime_header = $j->MIME_Header;In which case the resulting $mime_header is MIME-B-encoded UTF-8
whereas "$j->mime_encode()" returnes MIME-B-encoded ISO-2022-JP.
Most modern MUAs support both.$j->mime_decode;Decodes MIME-Header in Jcode object. For perl 5.8.1 or better, you
can also do the same as:
Jcode->new($str, 'MIME-Header')Hankaku vs. Zenkaku$j->h2z([$keep_dakuten])Converts X201 kana (Hankaku) to X208 kana (Zenkaku). When
$keep_dakuten is set, it leaves dakuten as is (That is, "ka +
dakuten" is left as is instead of being converted to "ga")You can retrieve the number of matches via $j->nmatch;$j->z2hConverts X208 kana (Zenkaku) to X201 kana (Hankaku).You can retrieve the number of matches via $j->nmatch;Regexp emulatorsTo use "->m()" and "->s()", you need perl 5.8.1 or better.$j->tr($from, $to, $opt);Applies "tr/$from/$to/" on Jcode object where $from and $to are EUCJP strings. On perl 5.8.1 or better, $from and $to can also be
flagged UTF-8 strings.If $opt is set, "tr/$from/$to/$opt" is applied. $opt must be 'c',
'd' or the combination thereof.You can retrieve the number of matches via $j->nmatch;The following methods are available only for perl 5.8.1 or better.$j->s($patter, $replace, $opt);Applies "s/$pattern/$replace/$opt". $pattern and "replace" must be in EUC-JP or flagged UTF-8. $opt are the same as regexp options. See
perlre for regexp options.Like "$j->tr()", "$j->s()" returns the object itself so you can nest the operation as follows;
$j->tr("a-z", "A-Z")->s("foo", "bar");[@match = ] $j->m($pattern, $opt);Applies "m/$patter/$opt". Note that this method DOES NOT RETURN AN
OBJECT so you can't chain the method like "$j->s()".Instance VariablesIf you need to access instance variables of Jcode object, use access
methods below instead of directly accessing them (That's what OOP is
all about)FYI, Jcode uses a ref to array instead of ref to hash (common way) to
optimize speed (Actually you don't have to know as long as you use
access methods instead; Once again, that's OOP)$j->r_strReference to the EUC-coded String.$j->icodeInput charcode in recent operation.$j->nmatchNumber of matches (Used in $j->tr, etc.)
Subroutines
- ($code, [$nmatch]) = getcode($str)
- Returns char code of $str. Return codes are as follows
ascii Ascii (Contains no Japanese Code)
binary Binary (Not Text File)
euc EUC-JP
sjis SHIFT_JIS
jis JIS (ISO-2022-JP)
ucs2 UCS2 (Raw Unicode)
utf8 UTF8 - When array context is used instead of scaler, it also returns how
many character codes are found. As mentioned above, $str can be
\$str instead. - jcode.pl Users: This function is 100% upper-conpatible with
jcode::getcode() -- well, almost;
* When its return value is an array, the order is the opposite;jcode::getcode() returns $nmatch first.* jcode::getcode() returns 'undef' when the number of EUC charactersis equal to that of SJIS. Jcode::getcode() returns EUC. for
Jcode.pm there is no in-betweens.Jcode::convert($str, [$ocode, $icode, $opt])Converts $str to char code specified by $ocode. When $icode is specified also, it assumes $icode for input string instead of the one
checked by getcode(). As mentioned above, $str can be \$str instead.jcode.pl Users: This function is 100% upper-conpatible with jcode::convert() !
BUGS
For perl is 5.8.1 or later, Jcode acts as a wrapper to Encode. Meaning
Jcode is subject to bugs therein.
ACKNOWLEDGEMENTS
This package owes a lot in motivation, design, and code, to the
jcode.pl for Perl4 by Kazumasa Utashiro <utashiro@iij.ad.jp>.
Hiroki Ohzaki <ohzaki@iod.ricoh.co.jp> has helped me polish regexp from
the very first stage of development.
JEncode by makamaka@donzoko.net has inspired me to integrate Encode to
Jcode. He has also contributed Japanese POD.
And folks at Jcode Mailing list <jcode5@ring.gr.jp>. Without them, I
couldn't have coded this far.
SEE ALSO
Encode
Jcode::Nihongo
<http://www.iana.org/assignments/character-sets>
COPYRIGHT
Copyright 1999-2005 Dan Kogai <dankogai@dan.co.jp>
- This library is free software; you can redistribute it and/or modify it
under the same terms as Perl itself.