Benchmark in Perl – replace

if you want to check whats the fastest way to replace a string in Perl you could use the Benchmark-Module.I compared 4 different ways  :

#!/usr/bin/perl
use strict;
use warnings;
use Benchmark qw(:all) ;
use Inline 'C';


my $teststring;

#check result :
$teststring = "teststring1"; $teststring =~ s/1/2/g; 				print $teststring.$/;
$teststring = "teststring1"; $teststring =~ s/1/2/go; 				print $teststring.$/;
$teststring = "teststring1"; $teststring =~ tr/1/2/; 				print $teststring.$/;
$teststring = "teststring1"; $teststring = replace($teststring,"1","2"); 	print $teststring.$/;
$teststring = "teststring1"; $teststring = repl_str($teststring,"1","2"); 	print $teststring.$/;

#test one
 cmpthese(-4, {
'regex_normal' 		=> sub {$teststring = "teststring1"; $teststring =~ s/1/2/g; },
'regex_optimised' 	=> sub {$teststring = "teststring1"; $teststring =~ s/1/2/go; },
'translate' 		=> sub {$teststring = "teststring1"; $teststring =~ tr/1/2/; },
'perl_sub' 		=> sub {$teststring = "teststring1"; $teststring = replace($teststring,"1","2"); },
'C_replace_string' 	=> sub {$teststring = "teststring1"; $teststring = repl_str($teststring,"1","2"); },
});


#test two
 cmpthese(-4, {
'regex_normal' 		=> sub {$teststring = "teststring1stringxstring"; $teststring =~ s/string/test/g; },
'regex_optimised' 	=> sub {$teststring = "teststring1stringxstring"; $teststring =~ s/string/test/go; },
'translate' 		=> sub {$teststring = "teststring1stringxstring"; $teststring =~ tr/string/test/; },
'perl_sub' 		=> sub {$teststring = "teststring1stringxstring"; $teststring = replace($teststring,"string","test"); },
'C_replace_string' 	=> sub {$teststring = "teststring1stringxstring"; $teststring = repl_str($teststring,"string","test"); },
});


sub replace {
my $string  = shift;
my $old     = shift;
my $new     = shift;
my $pos     = index($string, $old);
while ( $pos > -1 ) {
 substr( $string, $pos, length( $old ), $new );
 $pos = index( $string, $old, $pos + length( $new ));
}
return($string);
}

__END__
__C__
char* repl_str(const char *str, const char *old, const char *new){
	char *ret, *r;
	const char *p, *q;
	size_t oldlen = strlen(old);
	size_t count, retlen, newlen = strlen(new);

	if (oldlen != newlen) {
		for (count = 0, p = str; (q = strstr(p, old)) != NULL; p = q + oldlen)
			count++;
		retlen = p - str + strlen(p) + count * (newlen - oldlen);
	} else
		retlen = strlen(str);

	if ((ret = malloc(retlen + 1)) == NULL)
		return NULL;

	for (r = ret, p = str; (q = strstr(p, old)) != NULL; p = q + oldlen) {
		ptrdiff_t l = q - p;
		memcpy(r, p, l);
		r += l;
		memcpy(r, new, newlen);
		r += newlen;
	}
	strcpy(r, p);

	return ret;
}

the result looks like:

teststring2
teststring2
teststring2
teststring2
teststring2
                      Rate perl_sub C_replace_string regex_normal regex_optimised translate
perl_sub          632214/s       --             -68%         -68%            -70%      -91%
C_replace_string 1961177/s     210%               --          -2%             -8%      -71%
regex_normal     1999403/s     216%               2%           --             -7%      -71%
regex_optimised  2142180/s     239%               9%           7%              --      -69%
translate        6843359/s     982%             249%         242%            219%        --
                      Rate perl_sub regex_normal regex_optimised C_replace_string translate
perl_sub          430150/s       --         -50%            -50%             -57%      -92%
regex_normal      861549/s     100%           --             -0%             -14%      -85%
regex_optimised   862213/s     100%           0%              --             -14%      -85%
C_replace_string  998062/s     132%          16%             16%               --      -82%
translate        5603343/s    1203%         550%            550%             461%        --

Always check you sub results (Line 1-5) and make more test cases,

at the first case (Line 6) we see that for single character the best solution is to use regex.If you want to replace a longer string better use a C function.But in both cases its better to use the /o flag for regex to optimize, but then you cant interpolate a string in to the regex.And if you only want to replaxe use tr its the fastest.

Brute Force with Perl

This script is a brute force Perl-Script for sha256 password with a length of 4.You can edit the characters and the length.

#!/usr/bin/perlp
use strict;
use warnings;
use Digest::SHA qw(sha256_hex);
°define 'GET_SHA256' 'sha256_hex($cach)'

# charactter array
my @list = qw(a b c d e f g h i j k l m n o p q r s t u v w x y z);
#search hash
my $searchhash="f0c929a9e723bc62724e30c7e396e576019dfcb8cfd0a3f264ee5d72e64e49d1";

my $cach;
my $res;
#1 dimension
for my $L1 (@list){
	$cach = $L1;
	$res = GET_SHA256;$res =~ s/n//o;check($res,$cach);
}
#2 dimension
for my $L1 (@list){
for my $L2 (@list){
	$cach = $L1.$L2;
	$res = GET_SHA256;$res =~ s/n//o;check($res,$cach);
}
}
#3 dimension
for my $L1 (@list){
for my $L2 (@list){
for my $L3 (@list){
	$cach = $L1.$L2.$L3;
	$res = GET_SHA256;$res =~ s/n//o;check($res,$cach);
}
}
}
#4 dimension
for my $L1 (@list){
for my $L2 (@list){
for my $L3 (@list){
for my $L4 (@list){
	$cach = $L1.$L2.$L3.$L4;
	$res = GET_SHA256;$res =~ s/n//o;check($res,$cach);
}
}
}
}

exit(0);
sub check{
	if(shift eq $searchhash){
		print $/.$/.$searchhash."->".shift.$/;
		exit(1);
	}else{
		#debug print
		#print $searchhash."->".shift.$/;
	}
}

Result looks like this :

f0c929a9e723bc62724e30c7e396e576019dfcb8cfd0a3f264ee5d72e64e49d1->perl

real	0m2.111s
user	0m2.096s
sys	0m0.011s

 

Don’t try to use  this :

`echo -n "$cach" | sha256sum | cut -d " " -f1`

its much slower …..

 

Perl Preprocessor

I want to modify Perl for 3 things.

  1. Preprocessor
  2. if,for,while without brackets
  3. Useful things like current ram or average load

For this three things I wrote this script :

#!/usr/bin/perl
use Getopt::Long;
use Inline 'C';

#option vars
my $o_print,$o_help,$o_file;
#vars
my $data,$search0,$search1,$search2,$replace;

#get options
 GetOptions(
 "p" => $o_print,
 "h" => $o_help,
 "f" => $o_file
 );

#print help
if($o_help){
print "Perl++ Interpreter V0.01".$/;
print "-p     print Perl++ converted to Perl".$/;
print "-f     print Perl++ file".$/;
print "-h     print help".$/;
exit;
}

#print file
if($o_file){print $ARGV[0].$/;exit;}

#load file
open (FILE, $ARGV[0]) or die $!;
   while(<FILE>){
     $data = $data.$_;
   }
close (FILE);

#define special vars
my $vars = q#
°define @°free_ram°@	@`free | awk "NR == 2" | awk '{print $4}'`@
°define @°load_1°@	@`uptime | awk '{print $10}' | cut -d "," -f 1,2 | tr "," "."`@
°define @°load_5°@	@`uptime | awk '{print $11}' | cut -d "," -f 1,2 | tr "," "."`@
°define @°load_15°@	@`uptime | awk '{print $12}' | cut -d "," -f 1,2 | tr "," "."`@
#;



#run preprocessor for user defined vars
prep();
#add them to data
$data =$vars.$data;
#run preprocessor for default defined vars
prep();

#prepare if|for|while without brackets
while($data =~ /(.{2,})((.+))([ nr]{0,}[^;{]+;)/om){
$search0 = $1;
$search1 = $2;
$search2 = $3;
$data = repl_str($data,"$search0($search1)$search2","$search0($search1){$search2}");
}

#remove leading n
$data =~ s/^n//o;

#print or run
if($o_print){
print $data.$/;
}else{
eval $data;
warn $@ if $@;
}

sub prep {
while($data =~ /^°define (.)(.+)n/om){
$search0 = $1.$2;
(undef,$search1,undef,$search2) = split(/$1/, $search0);
$data = repl_str($data,"°define $search0n","");
$data = repl_str($data,$search1,$search2);
}
}

__END__
__C__
char* repl_str(const char *str, const char *old, const char *new){
	char *ret, *r;
	const char *p, *q;
	size_t oldlen = strlen(old);
	size_t count, retlen, newlen = strlen(new);

	if (oldlen != newlen) {
		for (count = 0, p = str; (q = strstr(p, old)) != NULL; p = q + oldlen)
			count++;
		retlen = p - str + strlen(p) + count * (newlen - oldlen);
	} else
		retlen = strlen(str);

	if ((ret = malloc(retlen + 1)) == NULL)
		return NULL;

	for (r = ret, p = str; (q = strstr(p, old)) != NULL; p = q + oldlen) {
		ptrdiff_t l = q - p;
		memcpy(r, p, l);
		r += l;
		memcpy(r, new, newlen);
		r += newlen;
	}
	strcpy(r, p);

	return ret;
}

if you want to test it you have to save it at “/usr/bin/perlp”.

#!/usr/bin/perlp
use strict;
use warnings;
°define '1' '5'
°define "2" "6"
°define /3/ /7/
°define '4' '8'


print °free_ram°.$/;


if(1) print "true".$/;


for(1..10)print $_.$/;

print "1".$/;
print "2".$/;
print "3".$/;
print "4".$/;

In line 4-7 I define some values for replace.

Line 10 you cant print the current free ram, for this the script replace the token via bash script Free Ram or Load Average .

Line 13 and 16 you see that you can use if without brackets.

Line 18-21 it prints the changed values.

Generate Preview Jpeg/gif from Video

I use this Perl-Function to create a Thumbnail Gif and a Preview JPEG for my videos.If you want to switch the image with mouse over in HTML see this script.

I use the standard variable $$  (Prozess-ID)  to create a own folder for each process then convert, optimize and then delete.

sub makegif { 
my $in = shift; 
system("rm -rf ./Folder/gif_$$"); 
system("mkdir ./Folder/gif_$$"); 
`ffmpegthumbnailer -i './Folder/$in.mp4' -q 10 -s 512 -t 5% -o './Folder/$in.jpg'`; 
`ffmpegthumbnailer -i './Folder/$in.mp4' -q 10 -s 512 -t 5% -o  ./Folder/gif_$$/test0.jpg`; 
`ffmpegthumbnailer -i './Folder/$in.mp4' -q 10 -s 512 -t 10% -o ./Folder/gif_$$/test1.jpg`; 
`ffmpegthumbnailer -i './Folder/$in.mp4' -q 10 -s 512 -t 20% -o ./Folder/gif_$$/test2.jpg`; 
`ffmpegthumbnailer -i './Folder/$in.mp4' -q 10 -s 512 -t 30% -o ./Folder/gif_$$/test3.jpg`; 
`ffmpegthumbnailer -i './Folder/$in.mp4' -q 10 -s 512 -t 40% -o ./Folder/gif_$$/test4.jpg`; 
`ffmpegthumbnailer -i './Folder/$in.mp4' -q 10 -s 512 -t 50% -o ./Folder/gif_$$/test5.jpg`; 
`ffmpegthumbnailer -i './Folder/$in.mp4' -q 10 -s 512 -t 60% -o ./Folder/gif_$$/test6.jpg`; 
`ffmpegthumbnailer -i './Folder/$in.mp4' -q 10 -s 512 -t 70% -o ./Folder/gif_$$/test7.jpg`; 
`ffmpegthumbnailer -i './Folder/$in.mp4' -q 10 -s 512 -t 80% -o ./Folder/gif_$$/test8.jpg`; 
`ffmpegthumbnailer -i './Folder/$in.mp4' -q 10 -s 512 -t 90% -o ./Folder/gif_$$/test9.jpg`; 
`ffmpegthumbnailer -i './Folder/$in.mp4' -q 10 -s 512 -t 95% -o ./Folder/gif_$$/test10.jpg`; 
system("convert -layers Optimize -delay 40 -loop 0 './Folder/gif_$$/*.jpg' './Folder/".$in.".gif'"); 
system('mogrify -resize 166x125 "./Folder/'.$in.'.gif"'); 
system('mogrify -resize 166x125 "./Folder/'.$in.'.jpg"'); 

system("rm -rf ./Folder/gif_$$"); 
}

 

Perl Sitemap Generator

I think everyone has the problem to make a sitemap if you have more than 1M videos …this script counts the packets (40k) in line 19 and then make the packets to generate a sitemap.

#!/usr/bin/perl
use strict;
use warnings;
use DBI;

# MYSQL CONFIG VARIABLES
my $host = "127.0.0.1";
my $database = "db";
my $tablename = "table";
my $user = "root";
my $pw = "pwd1234";


my $dbh = DBI->connect('DBI:mysql:'.$database , $user, $pw
	           ) || die "Could not connect to database: $DBI::errstr";


my $sitemap_counter = 0;
my $th = $dbh->prepare('SELECT round(count(*)/40000+0.5) FROM `video` WHERE `bw` <=80');
$th->execute();
while (my @row = $th->fetchrow_array()) {
   $sitemap_counter = $row[0];
}



for my $SMP (0..$sitemap_counter){
	print "build midd".$SMP.".xml ...".$/;
	open (DATEI, "> /var/www/MAP/midd".$SMP.".xml") or die $!;
	print DATEI '<?xml version="1.0" encoding="UTF-8"?>'.$/;
	print DATEI '<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">'.$/;
	$th = $dbh->prepare('SELECT CONCAT( "<url><loc>http://www.example.com/view.php?ID=", `id` , "</loc><priority>0.8</priority></url>" ) FROM `video` WHERE `bw` <=80 ORDER BY `id` LIMIT '.(40000*$SMP).' , 40000;');
	$th->execute();
	while (my @row = $th->fetchrow_array()) {
	   print DATEI $row[0].$/;
	}
	print DATEI '</urlset>';
	close (DATEI);
	print "[DONE]".$/;
}


$dbh->disconnect;
exit;

Perl FTP

If you want to Upload files via Perl and FTP. you could use this script:

#!/usr/bin/perl
use File::Basename;
use Net::FTP;
$ordner =time();
my $directory =  "backupordnerpfad";
my @parts = split(/\/, $directory);
my $length = $parts;
my $ordnerDir = $parts[$length-1];
$ftp = Net::FTP->new("www.myftpserver.at", Debug => 1)
    or die "Cannot connect to hostname: $@";
$ftp->login("username", "passwort")
    or die "Cannot login ", $ftp->message;
$ftp->cwd("/www")
    or die "Cannot change working directory ", $ftp->message;
$ftp->mkdir($ordner);
$ftp->cwd($ordner);
# set binary mode which is needed for image upload
$ftp->binary();
opendir(DIR,$directory);
my @files = readdir(DIR);
foreach my $file (@files)
    {
    if (not -d $file)
        {
        $ftp->put("$directory/$file");
        }
    }
$ftp->quit();
sleep(1000);

main.pl

the main script gets triggered from the WATCHDOG.pl script and is able to use the keywords on the page.

Line 6 start the phantom js

Line 11 Keywords

Line 15 Main Page crawl

Line 22 Keyword crawl

#!/usr/bin/perl
use strict;
use warnings;
$|=0;

system('./phantomjs-1.9.7-linux-x86_64/bin/phantomjs --webdriver='.$ARGV[0].' >> /dev/null 2>&1 &');sleep(4);
print 'working '.$$.(time()-$^T).$/;



my @words = ("keyword1","keyword2");

my $run ="";

                for(1..6){
                        $run ="./download.pl 'http://www.example.com/new/$_' '$ARGV[0]'";
                        print $run.$/;
                        system($run." && sleep 1");sleep(10);
                }
for (0..$#words){
my $word = $words[$_];
                for(1..1){
                        $run ="./download.pl 'http://www.example.com/search.php?what=".$word."&page=$_' '$ARGV[0]'";
                        print $run.$/;
                        system($run." && sleep 1");sleep(10);
                }
}

system("ps -e -o pid,args -dd | egrep '--webdriver=$ARGV[0]' | grep -v egrep | awk '{print $1}' | xargs kill -s 9");
print "normal exit !".$/;

 

WATCHDOG

This is my Watchdog script to regulate the crawling scripts …..

Line 5-6 is to reset the scripts

Line 8 is to start the selenium server

Line 11 is my proxy port pointer … i user several ports ant to set the port I let the pointer iterate ….

Line 13-12 i count the phantomjs and main.pl

Line 15-16 is to remove the line break ….

Line 18-21 check the maximum values the start the script and let the pointer iterate ….

then sleep and redo the work ….

#!/usr/bin/perl
use strict;
use warnings;

`killall phantomjs`;
`killall main.pl`;

system('java -jar ./selenium-server-standalone-2.40.0.jar >> /dev/null 2>&1 &');
#`killall -s 9 phantomjs && sleep 1`;
#system('./phantomjs-1.9.7-linux-x86_64/bin/phantomjs --webdriver=8888 >> /dev/$
my $pp=10;
do{
my $count = `ps aux | grep phantomjs | grep -v grep | wc -l`//0;
my $skript = `ps aux | grep main.pl | grep -v grep | wc -l`//0;
$count  =~ s/n//og;
$skript =~ s/n//og;
print "Main : $count - Gesamt :$skript".$/;
if($count < 4 && $skript < 4){
system("./main.pl '88$pp' > /dev/null 2>&1 &");
++$pp;$pp =10 if($pp>=20);
}
sleep(20);
}while(1);