I have a hash of AoAs:
$hash{$key} = [
[0.0,1.0,2.0],
10.0,
[1.5,9.5,5.5],
];
that I need to crunch as follows:
$err += (($hash{$key}[0][$_]-$hash{key}[2][$_])*$hash{$key}[1])**2 foreach (0 .. 2);
calculating the squared weighted difference between the two arrays. Since my hash is large, I was hoping PDL would help speed up the calculation, but it doesn't for some reason. I'm still new to PDL so I'm probably messing something up. the script below with PDL is ~10 times slower. Description: The following two scripts are my attempt to represent, simply, what is going of in my program. I read in some reference values into the hash, and then I compare observations (pulled into the hash on the fly) to those values a bunch of times with some weight. In the scripts, I set the reference array, weight, and observation array to some arbitrary fixed values, but that won't be the case at run time.
here are two simple scripts without and with PDL:
use strict;
use warnings;
use Time::HiRes qw(time);
my $t1 = time;
my %hash;
my $error = 0;
foreach (0 .. 10000){
$hash{$_} = [
[0.000, 1.000, 2.0000],
10.0,
[1.5,9.5,5.5],
];
foreach my $i (0 .. 2){
$error += (($hash{$_}[0][$i]-$hash{$_}[2][$i])*$hash{$_}[1])**2;
}
}
my $t2 = time;
printf ( "total time: %10.4f error: %10.4f\n", $t2-$t1,$error);
use strict;
use warnings;
use PDL;
use Time::HiRes qw(time);
my $t1 = time;
my %hash;
my $error = 0;
foreach (0 .. 10000){
$hash{$_}[0] = pdl[0.000, 1.000, 2.0000];
$hash{$_}[1] = pdl[10.0];
$hash{$_}[2] = pdl[1.5,9.5,5.5];
my $e = ($hash{$_}[0]-$hash{$_}[2])*$hash{$_}[1];
$error += inner($e,$e);
}
my $t2 = time;
printf ( "total time: %10.4f error: %10.4f\n", $t2-$t1, $error);
PDL is optimized to handle array computations. You are using a hash for your data but since the keys are numbers, it can be reformulated in terms of PDL array objects for a big win in performance. The following all PDL version of the example code runs about 36X faster than the original without PDL code (and 300X faster than the original with PDL code).
use strict;
use warnings;
use PDL;
use Time::HiRes qw(time);
my $t1 = time;
my %hash;
my $error = 0;
my $pdl0 = zeros(3,10001); # create a [3,10001] pdl
$pdl0 .= pdl[0.000, 1.000, 2.0000];
my $pdl1 = zeros(1,10001); # create a [1,10001] pdl
$pdl1 .= pdl[10.0];
my $pdl2 = zeros(3,10001); # create a [3,10001] pdl
$pdl2 .= pdl[1.5,9.5,5.5];
my $e = ($pdl0 - $pdl2)*$pdl1;
$error = sum($e*$e);
my $t2 = time;
printf ( "total time: %10.4f error: %10.4f\n", $t2-$t1, $error);
See the PDL Book for an in-depth intro to using PDL for computation. The PDL homepage is also a good starting point for all things PDL.
First, PDL is not going to help much unless the arrays are large. So instead of using a hash indexed by 0 to 10000, each with (basically) seven scalar elements, can you instead create seven PDL vectors of 10001 elements each and operate on those using vector operations?
Second, the expression $hash{$_} is being evaluated every time you name it, so you should factor it out. In your standard Perl code, for instance, you should do this:
my $vec = $hash{$_};
foreach my $i (0 .. 2){
$error += (($vec->[0][$i]-$vec->[2][$i])*$vec->[1])**2;
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With