Perl crawl example

This is a little example how to crawl a web page with Perl, in this example i crawl my own page with XPATH to get all titles.

#!/usr/bin/perl
use strict;
use HTML::TreeBuilder::XPath;

#get page
my $tree = HTML::TreeBuilder::XPath->new_from_url('/');
#parse
my $node = $tree->findnodes_as_string('//h1/a[@rel="bookmark"]');
#remove html
$node =~ s/<.+?>//go;
#split into array
my @nodes = split("\n",$node);
#print result
for(my $i = 0;$i<=$#nodes;++$i){
	print "[$i] => $nodes[$i]$/";
}
#delete obj
$tree->delete;

This is our result, this are the matches from the XPATH:

[0] => Perl simple template funktion
[1] => Decode Funktion in Perl
[2] => Perl simple code formater
[3] => create multiple file in shell with touch
[4] => Format current Date in Perl and Bash
[5] => Perl Print Benchmark
[6] => Perl simple Data Dumper for references
[7] => Arduino C Program LED Blink
[8] => Perl Constant Benchmark
[9] => Benchmark Debug perl

To see this in PHP click: PHP crawl example.

3 thoughts on “Perl crawl example”

  1. Right now it appears like BlogEngine is the preferred blogging platform available right now.
    (from what I’ve read) Is that what you’re using on your blog?

  2. Why viewers still use to read news papers when in this technological world all is accessible on net?

Leave a Reply

Your email address will not be published.

2 × 3 =