mime::head(3)
NAME
MIME::Head - MIME message header (a subclass of
Mail::Header)
SYNOPSIS
Before reading further, you should see MIME::Tools to make
sure that you understand where this module fits into the
grand scheme of things. Go on, do it now. I'll wait.
Ready? Ok...
Construction
### Create a new, empty header, and populate it manually:
$head = MIME::Head->new;
$head->replace('content-type', 'text/plain;
charset=US-ASCII');
$head->replace('content-length', $len);
### Parse a new header from a filehandle:
$head = MIME::Head->read(TDIN);
### Parse a new header from a file, or a readable
pipe:
$testhead = MIME::Head->from_file("/tmp/test.hdr");
$a_b_head = MIME::Head->from_file("cat a.hdr b.hdr
|");
Output
### Output to filehandle:
$head->print(TDOUT);
### Output as string:
print STDOUT $head->as_string;
print STDOUT $head->stringify;
Getting field contents
### Is this a reply?
$is_reply = 1 if ($head->get('Subject') =~ /^Re: /);
### Get receipt information:
print "Last received from: ", $head->get('Received',
0), "0;
@all_received = $head->get('Received');
### Print the subject, or the empty string if none:
print "Subject: ", $head->get('Subject',0), "0;
### Too many hops? Count 'em and see!
if ($head->count('Received') > 5) { ...
### Test whether a given field exists
warn "missing subject!" if (! $head->count('subject'));
Setting field contents
### Declare this to be an HTML header:
$head->replace('Content-type', 'text/html');
Manipulating field contents
### Get rid of internal newlines in fields:
$head->unfold;
### Decode any Q- or B-encoded-text in fields (DEPRECATED):
$head->decode;
Getting high-level MIME information
### Get/set a given MIME attribute:
unless ($charset = $head->mime_attr('contenttype.charset')) {
$head->mime_attr("content-type.charset" => "USASCII");
}
### The content type (e.g., "text/html"):
$mime_type = $head->mime_type;
### The content transfer encoding (e.g., "quotedprintable"):
$mime_encoding = $head->mime_encoding;
### The recommended name when extracted:
$file_name = $head->recommended_filename;
### The boundary text, for multipart messages:
$boundary = $head->multipart_boundary;
DESCRIPTION
A class for parsing in and manipulating RFC-822 message
headers, with some methods geared towards standard (and
not so standard) MIME fields as specified in RFC-1521,
Multipurpose Internet Mail Extensions.
PUBLIC INTERFACE
Creation, input, and output
- new [ARG],[OPTIONS]
- Class method, inherited. Creates a new header object. Arguments are the same as those in the superclass.
- from_file EXPR,OPTIONS
- Class or instance method. For convenience, you can
use this to parse a header object in from EXPR, which
may actually be any expression that can be sent to
open() so as to return a readable filehandle. The
"file" will be opened, read, and then closed:
### Create a new header by parsing in a file:
my $head = MIME::Head->from_file("/tmp/test.hdr"); - Since this method can function as either a class con
structor or an instance initializer, the above is
exactly equivalent to:
### Create a new header by parsing in a file:
my $head =MIME::Head->new->from_file("/tmp/test.hdr"); - On success, the object will be returned; on failure,
the undefined value. - The OPTIONS are the same as in new(), and are passed
into new() if this is invoked as a class method. - Note: This is really just a convenience front-end onto
"read()", provided mostly for backwards-compatibility
with MIME-parser 1.0. - read FILEHANDLE
- Instance (or class) method. This initiallizes a
header object by reading it in from a FILEHANDLE,
until the terminating blank line is encountered. A
syntax error or end-of-stream will also halt process
ing. - Supply this routine with a reference to a filehandle
glob; e.g., "TDIN":
### Create a new header by parsing in STDIN:
$head->read(TDIN); - On success, the self object will be returned; on fail
ure, a false value. - Note: in the MIME world, it is perfectly legal for a
header to be empty, consisting of nothing but the ter
minating blank line. Thus, we can't just use the for
mula that "no tags equals error". - Warning: as of the time of this writing,
Mail::Header::read did not flag either syntax errors
or unexpected end-of-file conditions (an EOF before
the terminating blank line). MIME::ParserBase takes
this into account. - Getting/setting fields
- The following are methods related to retrieving and modi
fying the header fields. Some are inherited from
Mail::Header, but I've kept the documentation around for
convenience. - add TAG,TEXT,[INDEX]
- Instance method, inherited. Add a new occurence of
the field named TAG, given by TEXT:
### Add the trace information:
$head->add('Received','from eryq.pr.mcs.net by gonzo.net withsmtp'); - Normally, the new occurence will be appended to the
existing occurences. However, if the optional INDEX
argument is 0, then the new occurence will be
prepended. If you want to be explicit about append ing, specify an INDEX of -1. - Warning: this method always adds new occurences; it
doesn't overwrite any existing occurences... so if you
just want to change the value of a field (creating it
if necessary), then you probably don't want to use
this method: consider using "replace()" instead. - count TAG
- Instance method, inherited. Returns the number of
occurences of a field; in a boolean context, this
tells you whether a given field exists:
### Was a "Subject:" field given?
$subject_was_given = $head->count('subject'); - The TAG is treated in a case-insensitive manner. This
method returns some false value if the field doesn't
exist, and some true value if it does. - decode [FORCE]
- Instance method, DEPRECATED. Go through all the
header fields, looking for RFC-1522-style "Q"
(quoted-printable, sort of) or "B" (base64) encoding,
and decode them in-place. Fellow Americans, you prob
ably don't know what the hell I'm talking about.
Europeans, Russians, et al, you probably do. ":-)". - This method has been deprecated. See "decode_headers"
in MIME::Parser for the full reasons. If you abso
lutely must use it and don't like the warning, then
provide a FORCE:
"I_NEED_TO_FIX_THIS"Just shut up and do it. Not recommended.
Provided only for those who need to keep oldscripts functioning."I_KNOW_WHAT_I_AM_DOING"Just shut up and do it. Not recommended.
Provided for those who REALLY know what theyare doing. - What this method does. For an example, let's consider a valid email header you might get:
From: =?US-ASCII?Q?Keith_Moore?=- <moore@cs.utk.edu>
To: =?ISO-8859-1?Q?Keld_J=F8rn_Simonsen?= - <keld@dkuug.dk>
CC: =?ISO-8859-1?Q?Andr=E9_?= Pirard <PI - RARD@vm1.ulg.ac.be>
Subject: =?ISO-8859-1?B?SWYgeW91IGNhbiByZWFk - IHRoaXMgeW8=?=
=?ISO-8859-2?B?dSB1bmRlcnN0YW5kIHRoZSBleGFtcGxlLg==?=
=?US-ASCII?Q?.._cool!?= - That basically decodes to (sorry, I can only approxi
mate the Latin characters with 7 bit sequences /o and
'e):
From: Keith Moore <moore@cs.utk.edu>
To: Keld J/orn Simonsen <keld@dkuug.dk>
CC: Andr'e Pirard <PIRARD@vm1.ulg.ac.be>
Subject: If you can read this you understand theexample... cool! - Note: currently, the decodings are done without regard
to the character set: thus, the Q-encoding "=F8" is
simply translated to the octet (hexadecimal "F8"),
period. For piece-by-piece decoding of a given field,
you want the array context of
"MIME::Word::decode_mimewords()". - Warning: the CRLF+SPACE separator that splits up long
encoded words into shorter sequences (see the Subject:
example above) gets lost when the field is unfolded,
and so decoding after unfolding causes a spurious
space to be left in the field. THEREFORE: if you're going to decode, do so BEFORE unfolding! - This method returns the self object.
- Thanks to Kent Boortz for providing the idea, and the baseline RFC-1522-decoding code.
- delete TAG,[INDEX]
- Instance method, inherited. Delete all occurences of
the field named TAG.
### Remove some MIME information:
$head->delete('MIME-Version');
$head->delete('Content-type'); - get TAG,[INDEX]
- Instance method, inherited. Get the contents of field TAG.
- If a numeric INDEX is given, returns the occurence at
that index, or undef if not present:
### Print the first and last 'Received:' entries(explicitly):
print "First, or most recent: ", $head->get('received', 0), "0;
print "Last, or least recent: ", $head->get('received',-1), "0; - If no INDEX is given, but invoked in a scalar context,
then INDEX simply defaults to 0:
### Get the first 'Received:' entry (implicitly):
my $most_recent = $head->get('received'); - If no INDEX is given, and invoked in an array context,
then all occurences of the field are returned:
### Get all 'Received:' entries:
my @all_received = $head->get('received'); - get_all FIELD
- Instance method. Returns the list of all occurences
of the field, or the empty list if the field is not
present:
### How did it get here?
@history = $head->get_all('Received'); - Note: I had originally experimented with having
"get()" return all occurences when invoked in an array
context... but that causes a lot of accidents when you
get careless and do stuff like this:
print "$field: ", $head->get($field), "0; - It also made the intuitive behaviour unclear if the
INDEX argument was given in an array context. So I
opted for an explicit approach to asking for all
occurences. - print [OUTSTREAM]
- Instance method, override. Print the header out to
the given OUTSTREAM, or the currently-selected file
handle if none. The OUTSTREAM may be a filehandle, or
any object that responds to a print() message. - The override actually lets you print to any object
that responds to a print() method. This is vital for outputting MIME entities to scalars. - Also, it defaults to the currently-selected filehandle
if none is given (not STDOUT!), so please supply a
filehandle to prevent confusion. - stringify
- Instance method. Return the header as a string. You can also invoke it as "as_string".
- unfold [FIELD]
- Instance method, inherited. Unfold (remove newlines
in) the text of all occurences of the given FIELD. If
the FIELD is omitted, all fields are unfolded.
Returns the "self" object. - MIME-specific methods
- All of the following methods extract information from the
following fields:
Content-type
Content-transfer-encoding
Content-disposition- Be aware that they do not just return the raw contents of
those fields, and in some cases they will fill in sensible
(I hope) default values. Use "get()" or "mime_attr()" if
you need to grab and process the raw field text. - Note: some of these methods are provided both as a conve
nience and for backwards-compatibility only, while others
(like recommended_filename()) really do have to be in MIME::Head to work properly, since they look for their value in more than one field. However, if you know that a
value is restricted to a single field, you should really
use the Mail::Field interface to get it. - mime_attr ATTR,[VALUE]
- A quick-and-easy interface to set/get the attributes
in structured MIME fields:
$head->mime_attr("content-type" =>"text/html");
$head->mime_attr("content-type.charset" => "USASCII");
$head->mime_attr("content-type.name" => "homepage.html"); - This would cause the final output to look something
like this:
Content-type: text/html; charset=US-ASCII;name="homepage.html" - Note that the special empty sub-field tag indicates
the anonymous first sub-field. - Giving VALUE as undefined will cause the contents of
the named subfield to be deleted:
$head->mime_attr("content-type.charset" => undef); - Supplying no VALUE argument just returns the
attribute's value, or undefined if it isn't there:
$type = $head->mime_attr("content-type"); ###text/html
$name = $head->mime_attr("content-type.name"); ###homepage.html - In all cases, the new/current value is returned.
- mime_encoding
- Instance method. Try real hard to determine the con
tent transfer encoding (e.g., "base64", "binary"),
which is returned in all-lowercase. - If no encoding could be found, the default of "7bit"
is returned. I quote from RFC-1521 section 5:
This is the default value -- that is, "ContentTransfer-Encoding: 7BIT"
is assumed if the Content-Transfer-Encoding headerfield is not present. - mime_type [DEFAULT]
- Instance method. Try "real hard" to determine the
content type (e.g., "text/plain", "image/gif",
"x-weird-type", which is returned in all-lowercase.
"Real hard" means that if no content type could be
found, the default (usually "text/plain") is returned.
From RFC-1521 section 7.1:
The default Content-Type for Internet mail is
"text/plain; charset=us-ascii". - Unless this is a part of a "multipart/digest", in
which case "message/rfc822" is the default. Note that
you can also set the default, but you shouldn't: nor
mally only the MIME parser uses this feature. - multipart_boundary
- Instance method. If this is a header for a multipart
message, return the "encapsulation boundary" used to
separate the parts. The boundary is returned exactly
as given in the "Content-type:" field; that is, the
leading double-hyphen ("--") is not prepended. - Well, almost exactly... this passage from RFC-1521
dictates that we remove any trailing spaces:
If a boundary appears to end with white space, thewhite space
must be presumed to have been added by a gateway,and must be deleted. - Returns undef (not the empty string) if either the
message is not multipart, if there is no specified
boundary, or if the boundary is illegal (e.g., if it
is empty after all trailing whitespace has been
removed). - recommended_filename
- Instance method. Return the recommended external
filename. This is used when extracting the data from
the MIME stream. - Returns undef if no filename could be suggested.
NOTES
- Why have separate objects for the entity, head, and body?
- See the documentation for the MIME-tools distribution
for the rationale behind this decision. - Why assume that MIME headers are email headers?
- I quote from Achim Bohnet, who gave feedback on v.1.9
(I think he's using the word "header" where I would
use "field"; e.g., to refer to "Subject:", "Con
tent-type:", etc.):
There is also IMHO no requirement [for]MIME::Heads to look
like [email] headers; so to speak, the MIME::Head[simply stores]
the attributes of a complex object, e.g.:
new MIME::Head type => "text/plain",charset => ...,
disposition => ..., ... ; - I agree in principle, but (alas and dammit) RFC-1521
says otherwise. RFC-1521 [MIME] headers are a syntac
tic subset of RFC-822 [email] headers. Perhaps a bet
ter name for these modules would be RFC1521:: instead
of MIME::, but we're a little beyond that stage now. - In my mind's eye, I see an abstract class, call it
MIME::Attrs, which does what Achim suggests... so you
could say:
my $attrs = new MIME::Attrs type => "text/plain",charset => ...,
disposition => ...,... ;- We could even make it a superclass of MIME::Head: that
way, MIME::Head would have to implement its interface,
and allow itself to be initiallized from a MIME::Attrs
object. - However, when you read RFC-1521, you begin to see how
much MIME information is organized by its presence in
particular fields. I imagine that we'd begin to mir
ror the structure of RFC-1521 fields and subfields to
such a degree that this might not give us a tremendous
gain over just having MIME::Head. - Why all this "occurence" and "index" jazz? Isn't every
field unique? - Aaaaaaaaaahh....no.
- Looking at a typical mail message header, it is
sooooooo tempting to just store the fields as a hash
of strings, one string per hash entry. Unfortunately,
there's the little matter of the "Received:" field,
which (unlike "From:", "To:", etc.) will often have
multiple occurences; e.g.:
Received: from gsfc.nasa.gov by eryq.pr.mcs.netwith smtp(Linux Smail3.1.28.1 #5) id m0tStZ7-0007X4C;Thu, 21 Dec 95 16:34 CSTReceived: from rhine.gsfc.nasa.gov by gsfc.nasa.gov(5.65/Ultrix3.0-C) id AA13596;
Thu, 21 Dec 95 17:20:38 -0500Received: (from eryq@localhost) by rhine.gsfc.nasa.gov(8.6.12/8.6.12) id RAA28069;
Thu, 21 Dec 1995 17:27:54 -0500Date: Thu, 21 Dec 1995 17:27:54 -0500
From: Eryq <eryq@rhine.gsfc.nasa.gov>
Message-Id: <199512212227.RAA28069@rhine.gsfc.nasa.gov>
To: eryq@eryq.pr.mcs.net
Subject: Stuff and thingsThe "Received:" field is used for tracing message
routes, and although it's not generally used for any
thing other than human debugging, I didn't want to
inconvenience anyone who actually wanted to get at
that information.I also didn't want to make this a special case; after
all, who knows what other fields could have multiple
occurences in the future? So, clearly, multiple
entries had to somehow be stored multiple times... and
the different occurences had to be retrievable.
AUTHOR
Eryq (eryq@zeegee.com), ZeeGee Software Inc
(http://www.zeegee.com).
All rights reserved. This program is free software; you
can redistribute it and/or modify it under the same terms
as Perl itself.
The more-comprehensive filename extraction is courtesy of
Lee E. Brotzman, Advanced Data Solutions.
VERSION
- $Revision: 5.403 $ $Date: 2000/11/04 19:54:46 $