Fortran performance for complex vs real variable

Question

So, I was wondering if it is preferable to work on the real and imaginary part of the array separately instead of a complex variable for performance gain. For example,

program test
   implicit none
   integer,parameter :: n = 1e8
   real(kind=8),parameter :: pi = 4.0d0*atan(1.0d0)
   complex(kind=8),parameter :: i_ = (0.0d0,1.0d0)
   double complex :: s
   real(kind=8) :: th(n),sz, t1,t2, s1,s2
   integer :: i 
   sz = 2.0d0*pi/n
   do i=1,n 
      th(i) = sz*i
   enddo
   call cpu_time(t1)
   s= sum(exp(th*i_))
   call cpu_time(t2)
   print *, t2-t1 

   call cpu_time(t1)
   s1 = sum(cos(th))
   s2 =  sum(sin(th))
   call cpu_time(t2)
   print *, t2-t1 
end program test

And the time it takes

   3.7041089999999999     
   2.6299830000000002

So, the splited calculation does takes less time. This was a very simple calculation. But I have some long calculation and using complex variables improves the readability and does takes less lines of code. But will it sacrifice the performance of my code ? Or is it always advisable to work on the real and imaginary part separately?

KL-Yang · Accepted Answer

Better to understand what kind of trick compiler can do for you. Generally it's not worth the effort to do so nowadays. Create a little script to study the CPU time of your code.

#!/bin/bash
src=a.f90
for fcc in gfortran ifort; do
    $fcc --version
    for flag in "-O0" "-O1" "-O2" "-O3"; do
        fexe=$fcc$flag
        echo $fcc $src -o "$fcc$flag" $flag
        $fcc $src -o $fexe $flag
        echo "run $fexe ..."
        ./$fexe
    done
done

You will notice the some of the CPU time may show very close to 0, as the compiler is clever enough to discard the computation that you never used. Make the change to avoid the compile optimize out your computation.

print *, t2-t1, s
print *, t2-t1, s1, s2

The result of using ifort is here, beside the speed, notice the ACCURACY, speed comes at a price:

ifort (IFORT) 14.0.2

ifort a.f90 -o ifort-O0 -O0
run ifort-O0 ...
   3.57999900000000      (-2.319317404797516E-009,7.034712528404704E-009)
   4.07666600000000      -2.319317404797516E-009  7.034712528404704E-009
ifort a.f90 -o ifort-O1 -O1
run ifort-O1 ...
   3.30333300000000      (-2.319317404797516E-009,7.034712528404704E-009)
   3.54666700000000      -2.319317404797516E-009  7.034712528404704E-009
ifort a.f90 -o ifort-O2 -O2
run ifort-O2 ...
   3.08000000000000      (-2.319317404797516E-009,7.034712528404704E-009)
   1.13666600000000      -6.304215927066537E-009  1.737099880017717E-009
ifort a.f90 -o ifort-O3 -O3
run ifort-O3 ...
   3.08333400000000      (-2.319317404797516E-009,7.034712528404704E-009)
   1.13666600000000      -6.304215927066537E-009  1.737099880017717E-009
sum 31.999 3.496 0:35.82 99.0% 0

you may wonder what happens between -O1 and -O2 flag, if check the compiled object file, the actual internal function it linked has changed from:

         U cexp
         U cos
         U sin

to :

         U __svml_cos2
         U __svml_sin2
         U cexp

svml stand for short vector math library. Some trade off between speed and accuracy can be found in Intel IPP Library Fixed-Accuracy Arithmetic Functions

Fortran performance for complex vs real variable

Tags:

fortran

Eular

1 Answers

KL-Yang

Recent Activity

Donate For Us

Fortran performance for complex vs real variable

Tags:

fortran

Eular

1 Answers

KL-Yang

Related questions

Recent Activity

Donate For Us