lwp::useragent(3)
NAME
LWP::UserAgent - A WWW UserAgent class
SYNOPSIS
require LWP::UserAgent;
my $ua = LWP::UserAgent->new(env_proxy => 1,
keep_alive => 1,
timeout => 30,
);
$response = $ua->get('http://search.cpan.org/');
# or:
$request = HTTP::Request->new('GET',
'http://search.cpan.org/');
# and then one of these:
$response = $ua->request($request); # or
$response = $ua->request($request, '/tmp/sss'); # or
$response = $ua->request($request, callback, 4096);
sub callback { my($data, $response, $protocol) = @_; ....
}
DESCRIPTION
The "LWP::UserAgent" is a class implementing a World-Wide
Web user agent in Perl. It brings together the
HTTP::Request, HTTP::Response and the LWP::Protocol
classes that form the rest of the core of libwww-perl
library. For simple uses this class can be used directly
to dispatch WWW requests, alternatively it can be sub
classed for application-specific behaviour.
In normal use the application creates a "LWP::UserAgent"
object, and then configures it with values for timeouts,
proxies, name, etc. It then creates an instance of
"HTTP::Request" for the request that needs to be per
formed. This request is then passed to one of the UserA
gent's request() methods, which dispatches it using the
relevant protocol, and returns a "HTTP::Response" object.
There are convenience methods for sending the most common
request types; get(), head() and post().
The basic approach of the library is to use HTTP style
communication for all protocol schemes, i.e. you even
receive an "HTTP::Response" object for gopher or ftp
requests. In order to achieve even more similarity to
HTTP style communications, gopher menus and file directo
ries are converted to HTML documents.
The send_request(), simple_request() and request() methods
can process the content of the response in one of three
ways: in core, into a file, or into repeated calls to a
subroutine. You choose which one by the kind of value
passed as the second argument.
The in core variant simply stores the content in a scalar
'content' attribute of the response object and is suitable
for small HTML replies that might need further parsing.
This variant is used if the second argument is missing (or
is undef).
The filename variant requires a scalar containing a
filename as the second argument to the request method and
is suitable for large WWW objects which need to be written
directly to the file without requiring large amounts of
memory. In this case the response object returned from the
request method will have an empty content attribute. If
the request fails, then the content might not be empty,
and the file will be untouched.
The subroutine variant requires a reference to callback
routine as the second argument to the request method and
it can also take an optional chuck size as the third argu
ment. This variant can be used to construct "pipe-lined"
processing, where processing of received chuncks can begin
before the complete data has arrived. The callback func
tion is called with 3 arguments: the data received this
time, a reference to the response object and a reference
to the protocol object. The response object returned from
the request method will have empty content. If the
request fails, then the the callback routine is not
called, and the response->content might not be empty.
The request can be aborted by calling die() in the call
back routine. The die message will be available as the
"X-Died" special response header field.
The library also allows you to use a subroutine reference
as content in the request object. This subroutine should
return the content (possibly in pieces) when called. It
should return an empty string when there is no more con
tent.
METHODS
The following methods are available:
- $ua = LWP::UserAgent->new( %options );
- This class method constructs a new "LWP::UserAgent"
object and returns a reference to it. - Key/value pair arguments may be provided to set up the
initial state of the user agent. The following
options correspond to attribute methods described
below:
KEY DEFAULT
----------- -------------------agent "libwww-perl/#.##"
from undef
timeout 180
use_eval 1
parse_head 1
max_size undef
cookie_jar undef
conn_cache undef
protocols_allowed undef
protocols_forbidden undef
requests_redirectable ['GET', 'HEAD'] - The followings option are also accepted: If the
"env_proxy" option is passed in an has a TRUE value,
then proxy settings are read from environment vari
ables. If the "keep_alive" option is passed in, then
a "LWP::ConnCache" is set up (see conn_cache() method below). The keep_alive value is a number and is
passed on as the total_capacity for the connection
cache. The "keep_alive" option also has the effect of
loading and enabling the new experimental HTTP/1.1
protocol module. - $ua->send_request($request, $arg [, $size])
- This method dispatches a single WWW request on behalf
of a user, and returns the response received. The
request is sent off unmodified, without passing it
through "prepare_request()". - The $request should be a reference to a
"HTTP::Request" object with values defined for at
least the method() and uri() attributes. - If $arg is a scalar it is taken as a filename where
the content of the response is stored. - If $arg is a reference to a subroutine, then this rou
tine is called as chunks of the content is received.
An optional $size argument is taken as a hint for an
appropriate chunk size. - If $arg is omitted, then the content is stored in the
response object itself. - $ua->prepare_request($request)
- This method modifies given "HTTP::Request" object by
setting up various headers based on the attributes of
the $ua. The headers affected are; "User-Agent",
"From", "Range" and "Cookie". - The return value is the $request object passed in.
- $ua->simple_request($request, [$arg [, $size]])
- This method dispatches a single WWW request on behalf
of a user, and returns the response received. If dif
fers from "send_request()" by automatically calling
the "prepare_request()" method before the request is
sent. - The arguments are the same as for "send_request()".
- $ua->request($request, $arg [, $size])
- Process a request, including redirects and security.
This method may actually send several different simple
requests. - The arguments are the same as for "send_request()" and
"simple_request()". - $ua->get($url, Header => Value,...);
- This is a shortcut for
"$ua->request(HTTP::Request::Common::GET( $url, Header
=> Value,... ))". See HTTP::Request::Common. - $ua->post($url, formref, Header => Value,...);
- This is a shortcut for "$ua->request(
HTTP::Request::Common::POST( $url, formref, Header
=> Value,... ))". Note that the form reference is
optional, and can be either a hashref ("formdata" or
"{ 'key1' =" 'val2', 'key2' => 'val2', ... }>) or an
arrayref ("@formdata" or "['key1' =" 'val2', 'key2'
=> 'val2', ...]>). See HTTP::Request::Common. - $ua->head($url, Header => Value,...);
- This is a shortcut for "$ua->request(
HTTP::Request::Common::HEAD( $url, Header => Value,...
))". See HTTP::Request::Common. - $ua->put($url, Header => Value,...);
- This is a shortcut for "$ua->request(
HTTP::Request::Common::PUT( $url, Header => Value,...
))". See HTTP::Request::Common. - $ua->protocols_allowed( ); # to read
$ua->protocols_allowed( @protocols ); # to set - This reads (or sets) this user-agent's list of proco
tols that "$ua->request" and "$ua->simple_request"
will exclusively allow. - For example: "$ua->protocols_allowed( [ 'http',
'https'] );" means that this user agent will allow
only those protocols, and attempts to use this useragent to access URLs with any other schemes (like
"ftp://...") will result in a 500 error. - To delete the list, call: "$ua->proto
cols_allowed(undef)" - By default, an object has neither a protocols_allowed
list, nor a protocols_forbidden list. - Note that having a protocols_allowed list causes any
protocols_forbidden list to be ignored. - $ua->protocols_forbidden( ); # to read
$ua->protocols_forbidden( @protocols ); # to set - This reads (or sets) this user-agent's list of proco
tols that "$ua->request" and "$ua->simple_request"
will not allow. - For example: "$ua->protocols_forbidden( [ 'file',
'mailto'] );" means that this user-agent will not
allow those protocols, and attempts to use this useragent to access URLs with those schemes will result in
a 500 error. - To delete the list, call: "$ua->protocols_forbid
den(undef)" - $ua->is_protocol_supported($scheme)
- You can use this method to test whether this useragent object supports the specified "scheme". (The
"scheme" might be a string (like 'http' or 'ftp') or
it might be an URI object reference.) - Whether a scheme is supported, is determined by $ua's
protocols_allowed or protocols_forbidden lists (if
any), and by the capabilities of LWP. I.e., this will
return TRUE only if LWP supports this protocol and
it's permitted for this particular object. - $ua->requests_redirectable( ); # to read
$ua->requests_redirectable( @requests ); # to set - This reads or sets the object's list of request names
that "$ua->redirect_ok(...)" will allow redirection
for. By default, this is "['GET', 'HEAD']", as per
RFC 2068. To change to include 'POST', consider:
push @{ $ua->requests_redirectable }, 'POST'; - $ua->redirect_ok($prospective_request)
This method is called by request() before it tries to follow a redirection to the request in
$prospective_request. This should return a true value
if this redirection is permissible.The default implementation will return FALSE unless
the method is in the object's "requests_redirectable"
list, FALSE if the proposed redirection is to a
"file://..." URL, and TRUE otherwise.Subclasses might want to override this.(This method's behavior in previous versions was sim
ply to return TRUE for anything except POST requests). - $ua->credentials($netloc, $realm, $uname, $pass)
Set the user name and password to be used for a realm.
It is often more useful to specialize the
get_basic_credentials() method instead. - $ua->get_basic_credentials($realm, $uri, [$proxy])
This is called by request() to retrieve credentials for a Realm protected by Basic Authentication or
Digest Authentication.Should return username and password in a list. Return
undef to abort the authentication resolution atempts.This implementation simply checks a set of pre-stored
member variables. Subclasses can override this method
to e.g. ask the user for a username/password. An
example of this can be found in "lwp-request" program
distributed with this library. - $ua->agent([$product_id])
Get/set the product token that is used to identify the
user agent on the network. The agent value is sent as
the "User-Agent" header in the requests. The default
is the string returned by the _agent() method (see below).If the $product_id ends with space then the "_agent"
string is appended to it.The user agent string should be one or more simple
product identifiers with an optional version number
separated by the "/" character. Examples are:
$ua->agent('Checkbot/0.4 ' . $ua->_agent);
$ua->agent('Checkbot/0.4 '); # same as above
$ua->agent('Mozilla/5.0');
$ua->agent(""); # don't identify$ua->_agentReturns the default agent identifier. This is a
string of the form "libwww-perl/#.##", where "#.##" is
substitued with the version numer of this library.$ua->from([$email_address])Get/set the Internet e-mail address for the human user
who controls the requesting user agent. The address
should be machine-usable, as defined in RFC 822. The
from value is send as the "From" header in the
requests. Example:
$ua->from('gaas@cpan.org');The default is to not send a "From" header.$ua->timeout([$secs])Get/set the timeout value in seconds. The default
timeout() value is 180 seconds, i.e. 3 minutes.$ua->cookie_jar([$cookie_jar_obj])Get/set the cookie jar object to use. The only
requirement is that the cookie jar object must imple
ment the extract_cookies($request) and
add_cookie_header($response) methods. These methods
will then be invoked by the user agent as requests are
sent and responses are received. Normally this will
be a "HTTP::Cookies" object or some subclass.The default is to have no cookie_jar, i.e. never auto
matically add "Cookie" headers to the requests.Shortcut: If a reference to a plain hash is passed in
as the $cookie_jar_object, then it is replaced with an
instance of "HTTP::Cookies" that is initalized based
on the hash. This form also automatically loads the
"HTTP::Cookies" module. It means that:
$ua->cookie_jar({ file => "$ENV{HOME}/.cookies.txt"});is really just a shortcut for:
require HTTP::Cookies;
$ua->cookie_jar(HTTP::Cookies->new(file =>"$ENV{HOME}/.cookies.txt"));$ua->conn_cache([$cache_obj])Get/set the LWP::ConnCache object to use.$ua->parse_head([$boolean])Get/set a value indicating wether we should initialize
response headers from the <head> section of HTML docu
ments. The default is TRUE. Do not turn this off,
unless you know what you are doing.$ua->max_size([$bytes])Get/set the size limit for response content. The
default is "undef", which means that there is no
limit. If the returned response content is only par
tial, because the size limit was exceeded, then a
"Client-Aborted" header will be added to the response.$ua->clone;Returns a copy of the LWP::UserAgent object$ua->mirror($url, $file)Get and store a document identified by a URL, using
If-Modified-Since, and checking of the Content-Length.
Returns a reference to the response object.$ua->proxy(...)Set/retrieve proxy URL for a scheme:
$ua->proxy(['http', 'ftp'],'http://proxy.sn.no:8001/');
$ua->proxy('gopher', 'http://proxy.sn.no:8001/');The first form specifies that the URL is to be used
for proxying of access methods listed in the list in
the first method argument, i.e. 'http' and 'ftp'.The second form shows a shorthand form for specifying
proxy URL for a single access scheme.$ua->env_proxy()Load proxy settings from *_proxy environment vari
ables. You might specify proxies like this (sh-syn
tax):
gopher_proxy=http://proxy.my.place/
wais_proxy=http://proxy.my.place/
no_proxy="localhost,my.domain"
export gopher_proxy wais_proxy no_proxyCsh or tcsh users should use the "setenv" command to
define these environment variables.On systems with case-insensitive environment variables
there exists a name clash between the CGI environment
variables and the "HTTP_PROXY" environment variable
normally picked up by env_proxy(). Because of this "HTTP_PROXY" is not honored for CGI scripts. The
"CGI_HTTP_PROXY" environment variable can be used
instead.$ua->no_proxy($domain,...)Do not proxy requests to the given domains. Calling
no_proxy without any domains clears the list of
domains. Eg:
$ua->no_proxy('localhost', 'no', ...);
SEE ALSO
See LWP for a complete overview of libwww-perl5. See lwprequest and lwp-mirror for examples of usage.
COPYRIGHT
Copyright 1995-2001 Gisle Aas.
- This library is free software; you can redistribute it
and/or modify it under the same terms as Perl itself.