Perl Regex named capture variables

Simple example to extract, protocol, server and domain from a given URL:

my $test = "http:www.test.com";

$test =~ /^(?<protocol>.+)\:(?<server>.+)\.(?<domain>.+)$/;

print "protocol : ".$+{protocol}."\n";
print "Server   : ".$+{server}."\n";
print "Domain   : ".$+{domain}."\n";

Our Result:

sh-4.3$ perl main.pl 
protocol : http      
Server   : www.test 
Domain   : com

Regular expression

A regular expression is a search pattern, its very often used in the programming language Perl, but it is used in other programming languages too or Text editors.

 

Meta characters

char meaning
^ defines the end of the matching String
$ defines the end of the matching String
. matches any character but not newline
* matches from 0 to infinity times
+ matches from 1 to infinity times
? matches 0 or 1 times
{} matches exact the given number or range
| logical or operator
() makes a group to store the result
[] makes a character matching group

 

Matches

\t tabulator
\n new line
\r return (CR)
\w matches from a-z,A-Z,0-9 and “_”.
\W matches nothing from a-z,A-Z,0-9 and “_”.
\s matches space, tab and newline
\S matches nothing from space, tab and newline
\d matches from 0-9
\D matches nothing from 0-9

 

Examples

^a

This matches every string that starts with an “a”.

a$

This pattern matches every string that end with an “a”.

^schools?$

This regex would match “schools” and “school”.

^.{3,4}$

Would match every string with the length between 3 and 4.

^school$|^schools$

This also would match “schools” and “school”.

"(\w)+"

On this “this is an ‘test’.” the pattern would store “test”.

[aco]+

This regex matches every combination from “a”,”c” and “o” like the word “coca”.

 

For testing Regex you could test the JavaScript Regex tester.

JavaScript Regex tester

This is a little JavaScript Regex tester, just write your Regex as example "\d" or "\w" and a test string like "123abc". It is more like an example as an real use function, but you can see how it works and handle the matches. Try it:

Matches:
Code:
<html>
<script>
function myFunction() {
if(document.getElementById("regex").value == ""){
document.getElementById("result").innerHTML = "No Regex";return(0);
}
var re = RegExp(document.getElementById("regex").value, 'g');
document.getElementById("result").innerHTML = 
re.execAll(document.getElementById("text").value);
}
RegExp.prototype.execAll = function(string) {
    var match = null;
    var matches = new Array();
    while (match = this.exec(string)) {
        var matchArray = [];
        for (i in match) {
            if (parseInt(i) == i) {
                matchArray.push(match[i]);
            }
        }
        matches.push(matchArray);
    }
    return matches;
}
</script>
<body>
<textarea id="regex" rows="1" cols="50" onchange="myFunction()" onkeyup="myFunction()" placeholder="Your Regex here..."></textarea>
<br>
<textarea id="text" rows="4" cols="50"onchange="myFunction()" onkeyup="myFunction()" placeholder="Your Text here..."></textarea>
<br>
Matches:
<div id="result"></div>
</body>
</html>

Perl Regex on different lines

This little example shows how to use a regex on different lines:
print reg("1n2n3n","2","-","n",2..3).$/;
print reg("1,2,3,4,5,6",".+","-",",",2..3,5..6).$/;

#sub
sub reg {
    my ($strr,$reg1,$reg2,$split,@count) = @_;
    my @elem = split($split,$strr);
    for(@count){
        $elem[$_-1] =~ s/$reg1/$reg2/g;
    }
    return join $split,@elem;
}
As Result we get this:
1n-n3
1,-,-,4,-,-

Check password with Perl

This little Perl-Script checks if we use a password longer than 9 characters, 2 lower-case digits, 2 upper-case digits, 2 numbers and 2 special chars.
use strict;

print checkPW("password").$/;
print checkPW("passwordt").$/;
print checkPW("PaSswordt").$/;
print checkPW("2PaSsword5").$/;
print checkPW("2PaSs/word5@").$/;

sub checkPW {
return("length must longer than 9!") unless(length $_[0] >= 9);
return("no upper-case!") unless($_[0] =~ /^.*[A-Z].*[A-Z].*$/o);
return("no lower-case!") unless($_[0] =~ /^.*[a-z].*[a-z].*$/o);
return("no numbers!") unless($_[0] =~ /^.*[0-9].*[0-9].*$/o);
return("no special chars!") unless($_[0] =~ /.*[^A-Za-z0-9n].*[^A-Za-z0-9n].*/o);
return("OK");
}
And our result looks like:
length must longer than 9! 
no upper-case!
no numbers! 
no special chars!
OK
Now we could check our passwords.

Benchmark in Perl – replace

if you want to check whats the fastest way to replace a string in Perl you could use the Benchmark-Module.I compared 4 different ways  :
#!/usr/bin/perl
use strict;
use warnings;
use Benchmark qw(:all) ;
use Inline 'C';


my $teststring;

#check result :
$teststring = "teststring1"; $teststring =~ s/1/2/g; 				print $teststring.$/;
$teststring = "teststring1"; $teststring =~ s/1/2/go; 				print $teststring.$/;
$teststring = "teststring1"; $teststring =~ tr/1/2/; 				print $teststring.$/;
$teststring = "teststring1"; $teststring = replace($teststring,"1","2"); 	print $teststring.$/;
$teststring = "teststring1"; $teststring = repl_str($teststring,"1","2"); 	print $teststring.$/;

#test one
 cmpthese(-4, {
'regex_normal' 		=> sub {$teststring = "teststring1"; $teststring =~ s/1/2/g; },
'regex_optimised' 	=> sub {$teststring = "teststring1"; $teststring =~ s/1/2/go; },
'translate' 		=> sub {$teststring = "teststring1"; $teststring =~ tr/1/2/; },
'perl_sub' 		=> sub {$teststring = "teststring1"; $teststring = replace($teststring,"1","2"); },
'C_replace_string' 	=> sub {$teststring = "teststring1"; $teststring = repl_str($teststring,"1","2"); },
});


#test two
 cmpthese(-4, {
'regex_normal' 		=> sub {$teststring = "teststring1stringxstring"; $teststring =~ s/string/test/g; },
'regex_optimised' 	=> sub {$teststring = "teststring1stringxstring"; $teststring =~ s/string/test/go; },
'translate' 		=> sub {$teststring = "teststring1stringxstring"; $teststring =~ tr/string/test/; },
'perl_sub' 		=> sub {$teststring = "teststring1stringxstring"; $teststring = replace($teststring,"string","test"); },
'C_replace_string' 	=> sub {$teststring = "teststring1stringxstring"; $teststring = repl_str($teststring,"string","test"); },
});


sub replace {
my $string  = shift;
my $old     = shift;
my $new     = shift;
my $pos     = index($string, $old);
while ( $pos > -1 ) {
 substr( $string, $pos, length( $old ), $new );
 $pos = index( $string, $old, $pos + length( $new ));
}
return($string);
}

__END__
__C__
char* repl_str(const char *str, const char *old, const char *new){
	char *ret, *r;
	const char *p, *q;
	size_t oldlen = strlen(old);
	size_t count, retlen, newlen = strlen(new);

	if (oldlen != newlen) {
		for (count = 0, p = str; (q = strstr(p, old)) != NULL; p = q + oldlen)
			count++;
		retlen = p - str + strlen(p) + count * (newlen - oldlen);
	} else
		retlen = strlen(str);

	if ((ret = malloc(retlen + 1)) == NULL)
		return NULL;

	for (r = ret, p = str; (q = strstr(p, old)) != NULL; p = q + oldlen) {
		ptrdiff_t l = q - p;
		memcpy(r, p, l);
		r += l;
		memcpy(r, new, newlen);
		r += newlen;
	}
	strcpy(r, p);

	return ret;
}
the result looks like:
teststring2
teststring2
teststring2
teststring2
teststring2
                      Rate perl_sub C_replace_string regex_normal regex_optimised translate
perl_sub          632214/s       --             -68%         -68%            -70%      -91%
C_replace_string 1961177/s     210%               --          -2%             -8%      -71%
regex_normal     1999403/s     216%               2%           --             -7%      -71%
regex_optimised  2142180/s     239%               9%           7%              --      -69%
translate        6843359/s     982%             249%         242%            219%        --
                      Rate perl_sub regex_normal regex_optimised C_replace_string translate
perl_sub          430150/s       --         -50%            -50%             -57%      -92%
regex_normal      861549/s     100%           --             -0%             -14%      -85%
regex_optimised   862213/s     100%           0%              --             -14%      -85%
C_replace_string  998062/s     132%          16%             16%               --      -82%
translate        5603343/s    1203%         550%            550%             461%        --
Always check you sub results (Line 1-5) and make more test cases, at the first case (Line 6) we see that for single character the best solution is to use regex.If you want to replace a longer string better use a C function.But in both cases its better to use the /o flag for regex to optimize, but then you cant interpolate a string in to the regex.And if you only want to replaxe use tr its the fastest.