Perl script using LWP module

Library FOR WWW in Perl (LWP)

In Linux you can istall all the perl modules about the web (LWP, URI, URL, HTTP…) at once:

:~$ sudo apt-get install libwww-perl

LWP is the most used Perl module for accessing data on the web.

LWP::Simple – module to get document by http
its functions don’t support cookies or authorization, setting header lines in the HTTP request; and generally, they don’t support reading header lines in the HTTP response (most notably the full HTTP error message, in case of an error). To get at all those features, you’ll have to use the LWP::UserAgent;

LWP::UserAgent is a class for “virtual browsers,” which you use for performing requests, and HTTP::Response is a class for the responses (or error messages) that you get back from those requests.

There are two objects involved: $browser, which holds an object of the class LWP::UserAgent, and then the $response object, which is of the class HTTP::Response. You really need only one browser object per program; but every time you make a request, you get back ar esponse object, which will have some interesting attributes:

$response->is_success : A HTTP status line, indicating success or failure  (like “404 Not Found”).

$response->content_type A MIME content-type like “text/html”, “image/gif”, “application/xml”, and so on, which you can see with

$response->content : the actual content of the response. If the response is HTML, that’s where the HTML source will be; if it’s a GIF, then $response->content will be the binary GIF data.

Enabling Cookies

A default LWP::UserAgent object acts like a browser with its cookies support turned off.
You can even activate cookies, with the following function:


with “cookie_jar” you can get and save the cookies from Browsers.

The following script gets a url from the shell and print the content of the corresponding web page both to screen and a new file called “code.html” (created by running the script).


#!/usr/bin/perl -w
use LWP::UserAgent;

#browser = instance of the UserAgent class
my $browser = LWP::UserAgent->new;
my $url =$ARGV[0]; # passing the url by command line
my $response = $browser->get($url);

die "Can’t get $url \n", $response->status_line
unless $response->is_success;

# check if the content is html
die "Hey, I was expecting HTML, not ", $response->content_type
unless $response->content_type eq 'text/html';

print "Page content: \n";

#print content to console
print $response->decoded_content;

#print content to a NEW file
open (MYPAGE, '>>code.html');
print MYPAGE $response->decoded_content;
close (MYPAGE);

#REGULAR EXPRESSION: search for a string in the content
if($response->content =~ m/perl/i) {
print " \n \n This page is about Perl!\n \n";
} else {print "\n \n No content about Perl! \n \n"; }