So, I was wondering if it is preferable to work on the real and imaginary part of the array separately instead of a complex variable for performance gain. For example,
program test
implicit none
integer,parameter :: n = 1e8
real(kind=8),parameter :: pi = 4.0d0*atan(1.0d0)
complex(kind=8),parameter :: i_ = (0.0d0,1.0d0)
double complex :: s
real(kind=8) :: th(n),sz, t1,t2, s1,s2
integer :: i
sz = 2.0d0*pi/n
do i=1,n
th(i) = sz*i
enddo
call cpu_time(t1)
s= sum(exp(th*i_))
call cpu_time(t2)
print *, t2-t1
call cpu_time(t1)
s1 = sum(cos(th))
s2 = sum(sin(th))
call cpu_time(t2)
print *, t2-t1
end program test
And the time it takes
3.7041089999999999
2.6299830000000002
So, the splited calculation does takes less time. This was a very simple calculation. But I have some long calculation and using complex variables improves the readability and does takes less lines of code. But will it sacrifice the performance of my code ? Or is it always advisable to work on the real and imaginary part separately?
Better to understand what kind of trick compiler can do for you. Generally it's not worth the effort to do so nowadays. Create a little script to study the CPU time of your code.
#!/bin/bash
src=a.f90
for fcc in gfortran ifort; do
$fcc --version
for flag in "-O0" "-O1" "-O2" "-O3"; do
fexe=$fcc$flag
echo $fcc $src -o "$fcc$flag" $flag
$fcc $src -o $fexe $flag
echo "run $fexe ..."
./$fexe
done
done
You will notice the some of the CPU time may show very close to 0, as the compiler is clever enough to discard the computation that you never used. Make the change to avoid the compile optimize out your computation.
print *, t2-t1, s
print *, t2-t1, s1, s2
The result of using ifort is here, beside the speed, notice the ACCURACY, speed comes at a price:
ifort (IFORT) 14.0.2
ifort a.f90 -o ifort-O0 -O0
run ifort-O0 ...
3.57999900000000 (-2.319317404797516E-009,7.034712528404704E-009)
4.07666600000000 -2.319317404797516E-009 7.034712528404704E-009
ifort a.f90 -o ifort-O1 -O1
run ifort-O1 ...
3.30333300000000 (-2.319317404797516E-009,7.034712528404704E-009)
3.54666700000000 -2.319317404797516E-009 7.034712528404704E-009
ifort a.f90 -o ifort-O2 -O2
run ifort-O2 ...
3.08000000000000 (-2.319317404797516E-009,7.034712528404704E-009)
1.13666600000000 -6.304215927066537E-009 1.737099880017717E-009
ifort a.f90 -o ifort-O3 -O3
run ifort-O3 ...
3.08333400000000 (-2.319317404797516E-009,7.034712528404704E-009)
1.13666600000000 -6.304215927066537E-009 1.737099880017717E-009
sum 31.999 3.496 0:35.82 99.0% 0
you may wonder what happens between -O1 and -O2 flag, if check the compiled object file, the actual internal function it linked has changed from:
U cexp
U cos
U sin
to :
U __svml_cos2
U __svml_sin2
U cexp
svml stand for short vector math library. Some trade off between speed and accuracy can be found in Intel IPP Library Fixed-Accuracy Arithmetic Functions
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With