I observed a very curious effect in this trivial program
module Moo
contains
subroutine main()
integer :: res
real :: start, finish
integer :: i
call cpu_time(start)
do i = 1, 1000000000
call Squared(5, res)
enddo
call cpu_time(finish)
print '("Time = ",f6.3," seconds.")',finish-start
end subroutine
subroutine Squared(v, res)
integer, intent(in) :: v
integer, intent(out) :: res
res = v*v
end subroutine
! subroutine main2()
! integer :: res
! real :: start, finish
! integer :: i
!
! call cpu_time(start)
!
! do i = 1, 1000000000
! res = v*v
! enddo
! call cpu_time(finish)
!
! print '("Time = ",f6.3," seconds.")',finish-start
! end subroutine
end module
program foo
use Moo
call main()
! call main2()
end program
Compiler is gfortran 4.6.2 on mac. If I compile with -O0 and run the program, the timing is 4.36 seconds. If I uncomment the subroutine main2(), but not its call, the timing becomes 4.15 seconds on average. If I also uncomment the call main2() the first timing becomes 3.80 and the second 1.86 (understandable, I have no function call).
I compared the assembler produced in the second and third cases (routine uncommented; call commented and uncommented) and they are exactly the same, save for the actual invocation of the main2 routine.
How can the code get this performance increase from a call to a routine that is going to happen in the future, and basically no difference in the resulting code?
First thing I noticed was that your program is way too short for proper benchmarking. How many runs do you use to average? What is the standard deviation? I added a nested do loop to your code to make it longer:
do i = 1, 1000000000
do j=1,10
call Squared(5, res)
enddo
enddo
I looked at only case 1 and case 2 (main2 commented and uncommented) because case 3 is different and irrelevant for this comparison. I would expect a slight increase in runtime in case 2, because of needing to load a larger executable into memory, even though that part is not used in the program.
So I did timing (3 runs each) for cases 1 and 2, for three compilers:
pgf90 10.6-0 64-bit target on x86-64 Linux -tp istanbul-64
Intel(R) Fortran Intel(R) 64 Compiler XE for applications running on Intel(R) 64, Version 12.0.2.137 Build 20110112
GNU Fortran (GCC) 4.1.2 20080704 (Red Hat 4.1.2-51)
on AMD Opteron(tm) Processor 6134
The output of my script is:
exp 1 with pgf90:
Time = 30.619 seconds.
Time = 30.620 seconds.
Time = 30.686 seconds.
exp 2 with pgf90:
Time = 30.606 seconds.
Time = 30.693 seconds.
Time = 30.635 seconds.
exp 1 with ifort:
Time = 77.412 seconds.
Time = 77.381 seconds.
Time = 77.395 seconds.
exp 2 with ifort:
Time = 77.834 seconds.
Time = 77.853 seconds.
Time = 77.825 seconds.
exp 1 with gfortran:
Time = 68.713 seconds.
Time = 68.659 seconds.
Time = 68.650 seconds.
exp 2 with gfortran:
Time = 71.923 seconds.
Time = 74.857 seconds.
Time = 72.126 seconds.
Notice the time difference between case 1 and case 2 is largest for gfortran, and smallest for pgf90.
EDIT: After Stefano Borini pointed out that I overlooked the fact that only the looping is being benchmarked using call to cpu_time, executable load-time is out of the equation. Answer by AShelley suggests a possible reason for this. For longer runtimes, the difference between the 2 cases becomes minimal. Still - I observe a significant difference in case of gfortran (see above)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With