operating on AoAs stored in hash. PDL vs no PDL

Question

I have a hash of AoAs:

$hash{$key} = [ 
               [0.0,1.0,2.0],
               10.0,
               [1.5,9.5,5.5],
              ];

that I need to crunch as follows:

$err += (($hash{$key}[0][$_]-$hash{key}[2][$_])*$hash{$key}[1])**2 foreach (0 .. 2);

calculating the squared weighted difference between the two arrays. Since my hash is large, I was hoping PDL would help speed up the calculation, but it doesn't for some reason. I'm still new to PDL so I'm probably messing something up. the script below with PDL is ~10 times slower. Description: The following two scripts are my attempt to represent, simply, what is going of in my program. I read in some reference values into the hash, and then I compare observations (pulled into the hash on the fly) to those values a bunch of times with some weight. In the scripts, I set the reference array, weight, and observation array to some arbitrary fixed values, but that won't be the case at run time.

here are two simple scripts without and with PDL:

without PDL

use strict;
use warnings;
use Time::HiRes qw(time);

my $t1 = time;
my %hash;
my $error = 0;

foreach (0 .. 10000){
  $hash{$_} = [
               [0.000, 1.000, 2.0000],
               10.0,
               [1.5,9.5,5.5],
              ];
  foreach my $i (0 .. 2){
    $error += (($hash{$_}[0][$i]-$hash{$_}[2][$i])*$hash{$_}[1])**2;
  }
}

my $t2 = time;

printf ( "total time: %10.4f error: %10.4f
", $t2-$t1,$error);

with PDL

use strict;
use warnings;
use PDL;
use Time::HiRes qw(time);

my $t1 = time;
my %hash;
my $error = 0;

foreach (0 .. 10000){
  $hash{$_}[0] = pdl[0.000, 1.000, 2.0000];
  $hash{$_}[1] = pdl[10.0];
  $hash{$_}[2] = pdl[1.5,9.5,5.5];
  my $e = ($hash{$_}[0]-$hash{$_}[2])*$hash{$_}[1];
  $error += inner($e,$e);
}

my $t2 = time;

printf ( "total time: %10.4f error: %10.4f
", $t2-$t1, $error);

chm · Accepted Answer

PDL is optimized to handle array computations. You are using a hash for your data but since the keys are numbers, it can be reformulated in terms of PDL array objects for a big win in performance. The following all PDL version of the example code runs about 36X faster than the original without PDL code (and 300X faster than the original with PDL code).

all PDL

use strict;
use warnings;
use PDL;
use Time::HiRes qw(time);

my $t1 = time;
my %hash;
my $error = 0;

my $pdl0 = zeros(3,10001);  # create a [3,10001] pdl
$pdl0 .= pdl[0.000, 1.000, 2.0000];

my $pdl1 = zeros(1,10001);  # create a [1,10001] pdl
$pdl1 .= pdl[10.0];

my $pdl2 = zeros(3,10001);  # create a [3,10001] pdl
$pdl2 .= pdl[1.5,9.5,5.5];

my $e = ($pdl0 - $pdl2)*$pdl1;
$error = sum($e*$e);

my $t2 = time;

printf ( "total time: %10.4f error: %10.4f
", $t2-$t1, $error);

See the PDL Book for an in-depth intro to using PDL for computation. The PDL homepage is also a good starting point for all things PDL.

Nemo · Answer

First, PDL is not going to help much unless the arrays are large. So instead of using a hash indexed by 0 to 10000, each with (basically) seven scalar elements, can you instead create seven PDL vectors of 10001 elements each and operate on those using vector operations?

Second, the expression $hash{$_} is being evaluated every time you name it, so you should factor it out. In your standard Perl code, for instance, you should do this:

my $vec = $hash{$_};
foreach my $i (0 .. 2){
    $error += (($vec->[0][$i]-$vec->[2][$i])*$vec->[1])**2;
}

operating on AoAs stored in hash. PDL vs no PDL

Tags:

perl

pdl

without PDL

with PDL

Demian

2 Answers

all PDL

chm

Nemo

Recent Activity

Donate For Us

operating on AoAs stored in hash. PDL vs no PDL

Tags:

perl

pdl

without PDL

with PDL

Demian

2 Answers

all PDL

chm

Nemo

Related questions

Recent Activity

Donate For Us