Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

operating on AoAs stored in hash. PDL vs no PDL

Tags:

perl

pdl

I have a hash of AoAs:

$hash{$key} = [ 
               [0.0,1.0,2.0],
               10.0,
               [1.5,9.5,5.5],
              ];

that I need to crunch as follows:

$err += (($hash{$key}[0][$_]-$hash{key}[2][$_])*$hash{$key}[1])**2 foreach (0 .. 2);

calculating the squared weighted difference between the two arrays. Since my hash is large, I was hoping PDL would help speed up the calculation, but it doesn't for some reason. I'm still new to PDL so I'm probably messing something up. the script below with PDL is ~10 times slower. Description: The following two scripts are my attempt to represent, simply, what is going of in my program. I read in some reference values into the hash, and then I compare observations (pulled into the hash on the fly) to those values a bunch of times with some weight. In the scripts, I set the reference array, weight, and observation array to some arbitrary fixed values, but that won't be the case at run time.

here are two simple scripts without and with PDL:

without PDL

use strict;
use warnings;
use Time::HiRes qw(time);

my $t1 = time;
my %hash;
my $error = 0;

foreach (0 .. 10000){
  $hash{$_} = [
               [0.000, 1.000, 2.0000],
               10.0,
               [1.5,9.5,5.5],
              ];
  foreach my $i (0 .. 2){
    $error += (($hash{$_}[0][$i]-$hash{$_}[2][$i])*$hash{$_}[1])**2;
  }
}

my $t2 = time;

printf ( "total time: %10.4f error: %10.4f\n", $t2-$t1,$error);

with PDL

use strict;
use warnings;
use PDL;
use Time::HiRes qw(time);

my $t1 = time;
my %hash;
my $error = 0;

foreach (0 .. 10000){
  $hash{$_}[0] = pdl[0.000, 1.000, 2.0000];
  $hash{$_}[1] = pdl[10.0];
  $hash{$_}[2] = pdl[1.5,9.5,5.5];
  my $e = ($hash{$_}[0]-$hash{$_}[2])*$hash{$_}[1];
  $error += inner($e,$e);
}

my $t2 = time;

printf ( "total time: %10.4f error: %10.4f\n", $t2-$t1, $error);
like image 844
Demian Avatar asked Dec 08 '25 02:12

Demian


2 Answers

PDL is optimized to handle array computations. You are using a hash for your data but since the keys are numbers, it can be reformulated in terms of PDL array objects for a big win in performance. The following all PDL version of the example code runs about 36X faster than the original without PDL code (and 300X faster than the original with PDL code).

all PDL

use strict;
use warnings;
use PDL;
use Time::HiRes qw(time);

my $t1 = time;
my %hash;
my $error = 0;

my $pdl0 = zeros(3,10001);  # create a [3,10001] pdl
$pdl0 .= pdl[0.000, 1.000, 2.0000];

my $pdl1 = zeros(1,10001);  # create a [1,10001] pdl
$pdl1 .= pdl[10.0];

my $pdl2 = zeros(3,10001);  # create a [3,10001] pdl
$pdl2 .= pdl[1.5,9.5,5.5];

my $e = ($pdl0 - $pdl2)*$pdl1;
$error = sum($e*$e);

my $t2 = time;

printf ( "total time: %10.4f error: %10.4f\n", $t2-$t1, $error);

See the PDL Book for an in-depth intro to using PDL for computation. The PDL homepage is also a good starting point for all things PDL.

like image 125
chm Avatar answered Dec 10 '25 15:12

chm


First, PDL is not going to help much unless the arrays are large. So instead of using a hash indexed by 0 to 10000, each with (basically) seven scalar elements, can you instead create seven PDL vectors of 10001 elements each and operate on those using vector operations?

Second, the expression $hash{$_} is being evaluated every time you name it, so you should factor it out. In your standard Perl code, for instance, you should do this:

my $vec = $hash{$_};
foreach my $i (0 .. 2){
    $error += (($vec->[0][$i]-$vec->[2][$i])*$vec->[1])**2;
}
like image 33
Nemo Avatar answered Dec 10 '25 15:12

Nemo



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!