JavaScript generate an result array from XPATH

This is a function in JavaScript that use an XPATH string to generate an Array from the results.This is very useful if you have to manipulate or use some items on a page and don’t have the id(like bots).

That is the function:

function matches2array (XPATH){
  var links = new Array();
  var elements = 0;
  var xPathRes = document.evaluate (XPATH, document, null, XPathResult.ORDERED_NODE_ITERATOR_TYPE, null);
  var actualSpan = xPathRes.iterateNext ();
  while (actualSpan) {
    links[elements] = actualSpan;
    actualSpan = xPathRes.iterateNext ();
    ++elements;
  }
  return(links);
}

You could use it in the Terminal:

matches2array("//a");

And we get all hrefs on this page in the Terminal:

[a.skip-link.screen-reader-text #content, a scheinast.eu, a /category/c-sharp/, a /category/c/, a /category/fortran/, a /category/bash/, a /category...-css-js/, a /category/java/, a /category/perl/, a /category/php/, a /category/projects/, a /category/regex/, a /category...-tricks/, a /category/windows/, a /perl-str...nchmark/, a /regex-fi...strings/, a /how-to-o...desktop/, a /c-calculator/, a /javascri...ocation/, a /image-uploader/, a /wp-admin/, a wp-login....68d8990f, a /feed/, a /comments/feed/, a wordpress.org, a.a2a_button_facebook /, a.a2a_button_twitter /, a.a2a_button_google_plus /, a.a2a_dd.addtoany_share_save share_sav...2F%5D..., a /calculator-in-c/, a.url.fn.n /author/p...heinast/, a /category/c/, a /tag/c/, a /tag/calculator/, a.post-edit-link post.php?...ion=edit, a#cancel-comment-reply-link /calculat...#respond, a profile.php, a wp-login....68d8990f, a /new-hompage-design/, a /backtrack-how-to/, a.a2a_button_facebook /, a.a2a_button_twitter /, a.a2a_button_google_plus /, a.a2a_dd.addtoany_share_save share_sav...2F%5D..., a.screen-reader-shortcut #wp-toolbar, a.ab-item about.php, a.ab-item about.php, a.ab-item wordpress.org, a.ab-item codex.wordpress.org, a.ab-item /support/, a.ab-item requests-...feedback, a.ab-item admin.php...tics.php, a.ab-item admin.php...ine_menu, a.ab-item admin.php...tics.php, a.ab-item /wp-admin/, a.ab-item /wp-admin/, a.ab-item themes.php, a.ab-item customize...D=themes, a.ab-item customize...-in-c%2F, a.ab-item widgets.php, a.ab-item customize...=widgets, a.ab-item nav-menus.php, a.ab-item themes.ph...ckground, a.ab-item customize...nd_image, a.ab-item themes.ph...m-header, a.ab-item customize...er_image, a.ab-item edit-comments.php, a.ab-item post-new.php, a.ab-item post-new.php, a.ab-item media-new.php, a.ab-item post-new....ype=page, a.ab-item user-new.php, a.ab-item post.php?...ion=edit, a.ab-item profile.php, a.ab-item profile.php, a.ab-item profile.php, a.ab-item wp-login....68d8990f, a.screen-reader-shortcut wp-login....68d8990f, a.a2a_i.a2a_sss /, a.a2a_i.a2a_sss /, a.a2a_i.a2a_sss /, a.a2a_i /, a.a2a_i /, a.a2a_i /, a.a2a_i /, a.a2a_i /, a.a2a_i /, a.a2a_i /, a.a2a_i /, a.a2a_i /, a.a2a_i /, a.a2a_i /, a.a2a_i /, a.a2a_i /, a.a2a_i /, a.a2a_i /, a.a2a_i /, a.a2a_i /, a.a2a_i /, a.a2a_i /, a.a2a_i /, a.a2a_i /, a.a2a_i /, a.a2a_i /, a.a2a_i /, a.a2a_i /, a.a2a_i /, a.a2a_i /, a.a2a_i /, a.a2a_i /, a.a2a_i /, a.a2a_i /, a.a2a_i /, a.a2a_i /, a.a2a_i /, a.a2a_i /, a.a2a_i /, a.a2a_i /, a.a2a_i /, a.a2a_i /, a.a2a_i /, a.a2a_i /, a.a2a_i /, a.a2a_i /, a.a2a_i /, a.a2a_i /, a.a2a_i /, a.a2a_i /, a.a2a_i /, a.a2a_i /, a.a2a_i /, a.a2a_i /, a.a2a_i /, a.a2a_i /, a.a2a_i /, a.a2a_i /, a.a2a_i /, a.a2a_i /, a.a2a_i /, a.a2a_i /, a.a2a_i /, a.a2a_i /, a.a2a_i /, a.a2a_i /, a.a2a_i /, a.a2a_i /, a.a2a_i /, a.a2a_i /, a.a2a_i /, a.a2a_i /, a.a2a_i /, a.a2a_i /, a.a2a_i /, a.a2a_i /, a.a2a_i /, a.a2a_i /, a.a2a_i /, a.a2a_i /, a.a2a_i /, a.a2a_i /, a.a2a_i /, a.a2a_i /, a.a2a_i /, a.a2a_i /, a.a2a_i /, a.a2a_i /, a.a2a_i /, a.a2a_i /, a.a2a_i /, a.a2a_i /, a.a2a_i /, a.a2a_i /, a.a2a_i /, a, a.a2a_i.a2a_sss /, a.a2a_i.a2a_sss /, a.a2a_i.a2a_sss /, a.a2a_i /, a.a2a_i /, a.a2a_i /, a.a2a_i /, a.a2a_i /, a.a2a_i /, a.a2a_i /, a.a2a_i /, a.a2a_i /, a#a2apage_any_email.a2a_i.a2a_emailer /, a#a2apage_email_client.a2a_i.a2a_emailer.a2a_email_client /, a#a2apage_show_more_less.a2a_menu_show_more_less.a2a_more]

Use it to store it in an Array and then write it on the page:

var scripts = matches2array("//a");

for (i=0;i<scripts.length;i++){
document.write(scripts[i] + "\n");
}

and we got this result:

http://scheinast.eu/calculator-in-c/#content http://scheinast.eu/ http://scheinast.eu/category/c-sharp/ http://scheinast.eu/category/c/ http://scheinast.eu/category/fortran/ http://scheinast.eu/category/bash/ http://scheinast.eu/category/html-css-js/ http://scheinast.eu/category/java/ http://scheinast.eu/category/perl/ http://scheinast.eu/category/php/ http://scheinast.eu/category/projects/ http://scheinast.eu/category/regex/ http://scheinast.eu/category/tips-and-tricks/ http://scheinast.eu/category/windows/ http://scheinast.eu/perl-strict-benchmark/ http://scheinast.eu/regex-find-interpolate-strings/ http://scheinast.eu/how-to-open-remote-desktop/ http://scheinast.eu/c-calculator/ http://scheinast.eu/javascript-location/ http://scheinast.eu/image-uploader/ http://scheinast.eu/wp-admin/ http://scheinast.eu/wp-login.php?action=logout&_wpnonce=a768d8990f http://scheinast.eu/feed/ http://scheinast.eu/comments/feed/ https://wordpress.org/ http://scheinast.eu/ http://scheinast.eu/ http://scheinast.eu/ https://www.addtoany.com/share_save#url=http%3A%2F%2Fscheinast.eu%2Fcalculator-in-c%2F&title=Calculator%20in%20C%2B%2B&description=This%20is%20my%20calculator%20in%20C%2B%2B%20with%20in-line%20assembler%3A%20%5Bcrayon-556e97ccb93bc417229824%2F%5D%20Compile%20the%20code%3A%20%5Bcrayon-556e97ccb93e1826855153%2F%5D%20output%3A%20%5Bcrayon-556e97ccb93f8566053665%2F%5D... http://scheinast.eu/calculator-in-c/ http://scheinast.eu/author/paul-scheinast/ http://scheinast.eu/category/c/ http://scheinast.eu/tag/c/ http://scheinast.eu/tag/calculator/ http://scheinast.eu/wp-admin/post.php?post=1542&action=edit http://scheinast.eu/calculator-in-c/#respond http://scheinast.eu/wp-admin/profile.php http://scheinast.eu/wp-login.php?action=logout&redirect_to=http%3A%2F%2Fscheinast.eu%2Fcalculator-in-c%2F&_wpnonce=a768d8990f http://scheinast.eu/new-hompage-design/ http://scheinast.eu/backtrack-how-to/ http://scheinast.eu/ http://scheinast.eu/ http://scheinast.eu/ https://www.addtoany.com/share_save#url=http%3A%2F%2Fscheinast.eu%2Fcalculator-in-c%2F&title=Calculator%20in%20C%2B%2B%20%7C%20Paul%20Scheinast&description=This%20is%20my%20calculator%20in%20C%2B%2B%20with%20in-line%20assembler%3A%20%5Bcrayon-556e97ccb93bc417229824%2F%5D%20Compile%20the%20code%3A%20%5Bcrayon-556e97ccb93e1826855153%2F%5D%20output%3A%20%5Bcrayon-556e97ccb93f8566053665%2F%5D... http://scheinast.eu/calculator-in-c/#wp-toolbar http://scheinast.eu/wp-admin/about.php http://scheinast.eu/wp-admin/about.php https://wordpress.org/ https://codex.wordpress.org/ https://wordpress.org/support/ https://wordpress.org/support/forum/requests-and-feedback http://scheinast.eu/wp-admin/admin.php?page=wp-statistics/wp-statistics.php http://scheinast.eu/wp-admin/admin.php?page=wps_online_menu http://scheinast.eu/wp-admin/admin.php?page=wp-statistics/wp-statistics.php http://scheinast.eu/wp-admin/ http://scheinast.eu/wp-admin/ http://scheinast.eu/wp-admin/themes.php http://scheinast.eu/wp-admin/customize.php?url=http%3A%2F%2Fscheinast.eu%2Fcalculator-in-c%2F&autofocus%5Bsection%5D=themes http://scheinast.eu/wp-admin/customize.php?url=http%3A%2F%2Fscheinast.eu%2Fcalculator-in-c%2F http://scheinast.eu/wp-admin/widgets.php http://scheinast.eu/wp-admin/customize.php?url=http%3A%2F%2Fscheinast.eu%2Fcalculator-in-c%2F&autofocus%5Bpanel%5D=widgets http://scheinast.eu/wp-admin/nav-menus.php http://scheinast.eu/wp-admin/themes.php?page=custom-background http://scheinast.eu/wp-admin/customize.php?url=http%3A%2F%2Fscheinast.eu%2Fcalculator-in-c%2F&autofocus%5Bcontrol%5D=background_image http://scheinast.eu/wp-admin/themes.php?page=custom-header http://scheinast.eu/wp-admin/customize.php?url=http%3A%2F%2Fscheinast.eu%2Fcalculator-in-c%2F&autofocus%5Bcontrol%5D=header_image http://scheinast.eu/wp-admin/edit-comments.php http://scheinast.eu/wp-admin/post-new.php http://scheinast.eu/wp-admin/post-new.php http://scheinast.eu/wp-admin/media-new.php http://scheinast.eu/wp-admin/post-new.php?post_type=page http://scheinast.eu/wp-admin/user-new.php http://scheinast.eu/wp-admin/post.php?post=1542&action=edit http://scheinast.eu/wp-admin/profile.php http://scheinast.eu/wp-admin/profile.php http://scheinast.eu/wp-admin/profile.php http://scheinast.eu/wp-login.php?action=logout&_wpnonce=a768d8990f http://scheinast.eu/wp-login.php?action=logout&_wpnonce=a768d8990f http://scheinast.eu/ http://scheinast.eu/ http://scheinast.eu/ http://scheinast.eu/ http://scheinast.eu/ http://scheinast.eu/ http://scheinast.eu/ http://scheinast.eu/ http://scheinast.eu/ http://scheinast.eu/ http://scheinast.eu/ http://scheinast.eu/ http://scheinast.eu/ http://scheinast.eu/ http://scheinast.eu/ http://scheinast.eu/ http://scheinast.eu/ http://scheinast.eu/ http://scheinast.eu/ http://scheinast.eu/ http://scheinast.eu/ http://scheinast.eu/ http://scheinast.eu/ http://scheinast.eu/ http://scheinast.eu/ http://scheinast.eu/ http://scheinast.eu/ http://scheinast.eu/ http://scheinast.eu/ http://scheinast.eu/ http://scheinast.eu/ http://scheinast.eu/ http://scheinast.eu/ http://scheinast.eu/ http://scheinast.eu/ http://scheinast.eu/ http://scheinast.eu/ http://scheinast.eu/ http://scheinast.eu/ http://scheinast.eu/ http://scheinast.eu/ http://scheinast.eu/ http://scheinast.eu/ http://scheinast.eu/ http://scheinast.eu/ http://scheinast.eu/ http://scheinast.eu/ http://scheinast.eu/ http://scheinast.eu/ http://scheinast.eu/ http://scheinast.eu/ http://scheinast.eu/ http://scheinast.eu/ http://scheinast.eu/ http://scheinast.eu/ http://scheinast.eu/ http://scheinast.eu/ http://scheinast.eu/ http://scheinast.eu/ http://scheinast.eu/ http://scheinast.eu/ http://scheinast.eu/ http://scheinast.eu/ http://scheinast.eu/ http://scheinast.eu/ http://scheinast.eu/ http://scheinast.eu/ http://scheinast.eu/ http://scheinast.eu/ http://scheinast.eu/ http://scheinast.eu/ http://scheinast.eu/ http://scheinast.eu/ http://scheinast.eu/ http://scheinast.eu/ http://scheinast.eu/ http://scheinast.eu/ http://scheinast.eu/ http://scheinast.eu/ http://scheinast.eu/ http://scheinast.eu/ http://scheinast.eu/ http://scheinast.eu/ http://scheinast.eu/ http://scheinast.eu/ http://scheinast.eu/ http://scheinast.eu/ http://scheinast.eu/ http://scheinast.eu/ http://scheinast.eu/ http://scheinast.eu/ http://scheinast.eu/ http://scheinast.eu/ http://scheinast.eu/ http://scheinast.eu/ https://www.addtoany.com/ http://scheinast.eu/ http://scheinast.eu/ http://scheinast.eu/ http://scheinast.eu/ http://scheinast.eu/ http://scheinast.eu/ http://scheinast.eu/ http://scheinast.eu/ http://scheinast.eu/ http://scheinast.eu/ http://scheinast.eu/ http://scheinast.eu/ http://scheinast.eu/ http://scheinast.eu/ http://scheinast.eu/calculator-in-c/

XPATH

The XPATH(XML Path Language) is a query language from the W3C, it is used to select nodes in an XML file.

Full Path

/html/body/div

In this example I start with “/” that say we start on root and then i navigate from element to element.

 

Anywhere

//a

In this example I use the “//” to search in the full document to find all a nodes.(you could use the “//” behind a “/” like “/html/body/div//a” to find all elements in html body div.)

 

Attributes

//div[@class="secondary"]

you could use the “@” to find all elements wit the attribute class that is the string “secondarary”.

 

As example you could select all hrefs where the Attributes contains a string:

//a[contains(@href,"scheinast")]

Or select all hrefs where the text from the URL contains a string:

//a[contains(text(),"Scheinast")]

 

Array

//a[1]

If you select all Elements you get an Array, in the example above I select the firs element from my result Array.You should avoid this, better is to navigate over the class or ID because if the Page changes a bit the XPATH don’t work any more.

 

That are some basics, if you need something special take a look into the Network or on my page.As example Web crawling in PHP or Perl.

Grease monkey open url with JavaScript

I use this JavaScript and XPATH with Greasemonkey to open the urls on Piratebay or other pages, if you open a torrent in a new tab the script search for urls containign “imagecurl.org”, “imgcrl.org” or “pixoverflow” then wait 10 seconds and open the url.This saves time, just open all what’s interesting in a new tab then just look at the images :

var XPATH =('//div[@class="nfo"]/pre/a['+
                         'contains(@href,"imagecurl.org") or '+
                         'contains(@href,"imgcrl.org") or ' +
                         'contains(@href,"pixoverflow") ]');
//alert(XPATH);
var link = "";
if (link == "") {
link = document.evaluate(XPATH, document.body, null, 6, null).snapshotItem(0);
    setTimeout(function(){window.open(link,"_self")},10000);
}

Perl crawl example

This is a little example how to crawl a web page with Perl, in this example i crawl my own page with XPATH to get all titles.

#!/usr/bin/perl
use strict;
use HTML::TreeBuilder::XPath;

#get page
my $tree = HTML::TreeBuilder::XPath->new_from_url('/');
#parse
my $node = $tree->findnodes_as_string('//h1/a[@rel="bookmark"]');
#remove html
$node =~ s/<.+?>//go;
#split into array
my @nodes = split("\n",$node);
#print result
for(my $i = 0;$i<=$#nodes;++$i){
	print "[$i] => $nodes[$i]$/";
}
#delete obj
$tree->delete;

This is our result, this are the matches from the XPATH:

[0] => Perl simple template funktion
[1] => Decode Funktion in Perl
[2] => Perl simple code formater
[3] => create multiple file in shell with touch
[4] => Format current Date in Perl and Bash
[5] => Perl Print Benchmark
[6] => Perl simple Data Dumper for references
[7] => Arduino C Program LED Blink
[8] => Perl Constant Benchmark
[9] => Benchmark Debug perl

To see this in PHP click: PHP crawl example.

PHP crawl example

This is a little example how to crawl a web page with PHP, in this example I crawl my own page with XPATH to get all titles.

<?php
# turn erros of
libxml_use_internal_errors(true);
# creat DomDocument Object
$DOM = new DomDocument;
# load the THML
$DOM->loadHTMLFile("/");
# creat DomXPath Object
$XPATH = new DomXPath($DOM);
# get all titles with XPATH
$mynodes = $XPATH->query("//article//header//h1");
# print result
foreach ($mynodes as $i => $mynodes) {
    echo "[$i] => '", $mynodes->nodeValue, "'n";
}
?>

This is our result, this are the matches from the XPATH:

[0] => 'create multiple file in shell with touch'
[1] => 'Format current Date in Perl and Bash'
[2] => 'Perl Print Benchmark'
[3] => 'Perl simple Data Dumper for references'
[4] => 'Arduino C Program LED Blink'
[5] => 'Perl Constant Benchmark'
[6] => 'Benchmark Debug perl'
[7] => 'Programmier-Einleitung in C/C++'
[8] => 'perl debug output'
[9] => 'crontab via script'

To see this in Perl click: Perl crawl example.

Javascript XPATH and Sleep

If you need to select an element via JavaScript and XPATH you could use this code.In line 4 I make a sleep about 10 Seconds and then open the link in the same window.

var XPATH =('//div[@class="nfo"]/pre/a[contains(@href,"imagecurl.org")]');
//alert(XPATH);
var link = document.evaluate(XPATH, document.body, null, 6, null).snapshotItem(0);
    setTimeout(function(){window.open(link,"_self")},10000);