sábado, 10 de octubre de 2009

Benchmarking perl

I had a little code that worked perfectly for what it had to do (process every line in a file, and get a list of unique lines (after being processed)).

The 'process' I had to run on every line was a simple substitution, so I thought of Algorithm::Loops. I used it and worked really well, but then I thought about performance, and tried some other approaches:


Well I could clearly see that while version is the winner, and it really makes a big difference. I suppose this is because Filter takes <$fh> as an array (in list context), and that is slower than the iterator version (just guessing).

Here's the code:
#!/usr/bin/perl
use Benchmark qw(:all);
use Algorithm::Loops qw(Filter);
system('rm /tmp/footable.txt');
system('seq 1 2000 >>/tmp/footable.txt') for (1..4);
system('sed -i s/$/_i02_c00\ foo/ /tmp/footable.txt');
cmpthese (50, {
'filterChomp' =>
'open my $fh, "/tmp/footable.txt";
my %hash = map {
$_ => 1
} Filter {s/_i\d+_c\d+ .*$//} Filter{chomp} <$fh>;
close $fh; ' ,
'regexBarraN' =>
'open my $fh, "/tmp/footable.txt";
my %hash = map {
$_ => 1
} Filter {s/_i\d+_c\d+ .*\n//} <$fh>;
close $fh;' ,
'match' =>
' open my $fh, "/tmp/footable.txt";
my %hash = map {
$_ => 1
} Filter {m/^([^ ]*)_i\d+_c\d+/} <$fh>;
close $fh; ' ,
'while' =>
' open my $fh, "/tmp/footable.txt";
my %hash;
while (<$fh>) {
m/^([^ ]*)_i\d+_c\d+ .*/;
$hash{$1}=1;
}
close $fh;' ,
});
view raw gistfile1.pl hosted with ❤ by GitHub


And the results:

Rate filterChomp match regexBarraN while
filterChomp 9.49/s -- -19% -24% -64%
match 11.8/s 24% -- -6% -55%
regexBarraN 12.5/s 32% 6% -- -52%
while 26.0/s 174% 121% 108% --


Well, appart from keeping the while version for my app, now I have a template for bencharking perl codes. Following the same technique than in one of my other perl posts, I've written a template file to be easy to paste in a .pl file from vim.

No hay comentarios: