Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What causes the runtime difference in this trivial fortran code?

I observed a very curious effect in this trivial program

module Moo 
contains
   subroutine main()
      integer :: res 
      real :: start, finish
      integer :: i

      call cpu_time(start)

      do i = 1, 1000000000
         call Squared(5, res) 
      enddo
      call cpu_time(finish)

      print '("Time = ",f6.3," seconds.")',finish-start
   end subroutine

   subroutine Squared(v, res)
      integer, intent(in) :: v
      integer, intent(out) :: res 

      res = v*v 
   end subroutine 

!   subroutine main2()
!      integer :: res
!      real :: start, finish
!      integer :: i
!
!      call cpu_time(start)
!      
!      do i = 1, 1000000000
!         res = v*v
!      enddo
!      call cpu_time(finish)
!
!      print '("Time = ",f6.3," seconds.")',finish-start
!   end subroutine

end module
program foo 
   use Moo 
   call main()
!   call main2()
end program

Compiler is gfortran 4.6.2 on mac. If I compile with -O0 and run the program, the timing is 4.36 seconds. If I uncomment the subroutine main2(), but not its call, the timing becomes 4.15 seconds on average. If I also uncomment the call main2() the first timing becomes 3.80 and the second 1.86 (understandable, I have no function call).

I compared the assembler produced in the second and third cases (routine uncommented; call commented and uncommented) and they are exactly the same, save for the actual invocation of the main2 routine.

How can the code get this performance increase from a call to a routine that is going to happen in the future, and basically no difference in the resulting code?

like image 752
Stefano Borini Avatar asked Jan 18 '26 09:01

Stefano Borini


1 Answers

First thing I noticed was that your program is way too short for proper benchmarking. How many runs do you use to average? What is the standard deviation? I added a nested do loop to your code to make it longer:

do i = 1, 1000000000
  do j=1,10
    call Squared(5, res) 
  enddo
enddo

I looked at only case 1 and case 2 (main2 commented and uncommented) because case 3 is different and irrelevant for this comparison. I would expect a slight increase in runtime in case 2, because of needing to load a larger executable into memory, even though that part is not used in the program.

So I did timing (3 runs each) for cases 1 and 2, for three compilers:

pgf90 10.6-0 64-bit target on x86-64 Linux -tp istanbul-64

Intel(R) Fortran Intel(R) 64 Compiler XE for applications running on Intel(R) 64, Version 12.0.2.137 Build 20110112

GNU Fortran (GCC) 4.1.2 20080704 (Red Hat 4.1.2-51)

on AMD Opteron(tm) Processor 6134

The output of my script is:

exp 1 with pgf90:
Time = 30.619 seconds.
Time = 30.620 seconds.
Time = 30.686 seconds.
exp 2 with pgf90:
Time = 30.606 seconds.
Time = 30.693 seconds.
Time = 30.635 seconds.
exp 1 with ifort:
Time = 77.412 seconds.
Time = 77.381 seconds.
Time = 77.395 seconds.
exp 2 with ifort:
Time = 77.834 seconds.
Time = 77.853 seconds.
Time = 77.825 seconds.
exp 1 with gfortran:
Time = 68.713 seconds.
Time = 68.659 seconds.
Time = 68.650 seconds.
exp 2 with gfortran:
Time = 71.923 seconds.
Time = 74.857 seconds.
Time = 72.126 seconds.

Notice the time difference between case 1 and case 2 is largest for gfortran, and smallest for pgf90.

EDIT: After Stefano Borini pointed out that I overlooked the fact that only the looping is being benchmarked using call to cpu_time, executable load-time is out of the equation. Answer by AShelley suggests a possible reason for this. For longer runtimes, the difference between the 2 cases becomes minimal. Still - I observe a significant difference in case of gfortran (see above)

like image 132
milancurcic Avatar answered Jan 21 '26 08:01

milancurcic