Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fortran performance for complex vs real variable

Tags:

fortran

So, I was wondering if it is preferable to work on the real and imaginary part of the array separately instead of a complex variable for performance gain. For example,

program test
   implicit none
   integer,parameter :: n = 1e8
   real(kind=8),parameter :: pi = 4.0d0*atan(1.0d0)
   complex(kind=8),parameter :: i_ = (0.0d0,1.0d0)
   double complex :: s
   real(kind=8) :: th(n),sz, t1,t2, s1,s2
   integer :: i 
   sz = 2.0d0*pi/n
   do i=1,n 
      th(i) = sz*i
   enddo
   call cpu_time(t1)
   s= sum(exp(th*i_))
   call cpu_time(t2)
   print *, t2-t1 

   call cpu_time(t1)
   s1 = sum(cos(th))
   s2 =  sum(sin(th))
   call cpu_time(t2)
   print *, t2-t1 
end program test

And the time it takes

   3.7041089999999999     
   2.6299830000000002     

So, the splited calculation does takes less time. This was a very simple calculation. But I have some long calculation and using complex variables improves the readability and does takes less lines of code. But will it sacrifice the performance of my code ? Or is it always advisable to work on the real and imaginary part separately?

like image 760
Eular Avatar asked Mar 23 '26 19:03

Eular


1 Answers

Better to understand what kind of trick compiler can do for you. Generally it's not worth the effort to do so nowadays. Create a little script to study the CPU time of your code.

#!/bin/bash
src=a.f90
for fcc in gfortran ifort; do
    $fcc --version
    for flag in "-O0" "-O1" "-O2" "-O3"; do
        fexe=$fcc$flag
        echo $fcc $src -o "$fcc$flag" $flag
        $fcc $src -o $fexe $flag
        echo "run $fexe ..."
        ./$fexe
    done
done

You will notice the some of the CPU time may show very close to 0, as the compiler is clever enough to discard the computation that you never used. Make the change to avoid the compile optimize out your computation.

print *, t2-t1, s
print *, t2-t1, s1, s2

The result of using ifort is here, beside the speed, notice the ACCURACY, speed comes at a price:

ifort (IFORT) 14.0.2

ifort a.f90 -o ifort-O0 -O0
run ifort-O0 ...
   3.57999900000000      (-2.319317404797516E-009,7.034712528404704E-009)
   4.07666600000000      -2.319317404797516E-009  7.034712528404704E-009
ifort a.f90 -o ifort-O1 -O1
run ifort-O1 ...
   3.30333300000000      (-2.319317404797516E-009,7.034712528404704E-009)
   3.54666700000000      -2.319317404797516E-009  7.034712528404704E-009
ifort a.f90 -o ifort-O2 -O2
run ifort-O2 ...
   3.08000000000000      (-2.319317404797516E-009,7.034712528404704E-009)
   1.13666600000000      -6.304215927066537E-009  1.737099880017717E-009
ifort a.f90 -o ifort-O3 -O3
run ifort-O3 ...
   3.08333400000000      (-2.319317404797516E-009,7.034712528404704E-009)
   1.13666600000000      -6.304215927066537E-009  1.737099880017717E-009
sum 31.999 3.496 0:35.82 99.0% 0

you may wonder what happens between -O1 and -O2 flag, if check the compiled object file, the actual internal function it linked has changed from:

         U cexp
         U cos
         U sin

to :

         U __svml_cos2
         U __svml_sin2
         U cexp

svml stand for short vector math library. Some trade off between speed and accuracy can be found in Intel IPP Library Fixed-Accuracy Arithmetic Functions

like image 164
KL-Yang Avatar answered Mar 26 '26 14:03

KL-Yang