hello stackoverflow users, this is my first question asked, so if there are any errors in my way of expressing it, please point it out, thank you
I wrote this simple calculation function in both Java and C++
Java:
long start = System.nanoTime();
long total = 0;
for (int i = 0; i < 2147483647; i++) {
    total += i;
}
System.out.println(total);
System.out.println(System.nanoTime() - start);
C++:
auto start = chrono::high_resolution_clock::now();
register long long total = 0;
for (register int i = 0; i < 2147483647; i++)
{
    total += i;
}
cout << total << endl;
auto finish = chrono::high_resolution_clock::now();
cout << chrono::duration_cast<chrono::nanoseconds>(finish - start).count() << endl;
software: - JDK8u11 - Microsoft Visual C++ Compiler (2013)
results:
Java: 2305843005992468481 1096361110
C++: 2305843005992468481 6544374300
The calculation results are the same, which is good however, the nano time printed shows the Java program takes 1 second while in C++ it takes 6 seconds to execute
I've been doing Java for quite some time, but I am new to C++, is there any problem in my code? or is it a fact that C++ is slower than Java with simple calculations?
also, i used the "register" keyword in my C++ code, hoping it will bring performance improvements, but the execution time doesn't differ at all, could someone explain this?
EDIT: My mistake here is the C++ compiler settings are not optimized, and output is set to x32, after applying /O2 WIN64 and removing DEBUG, the program only took 0.7 seconds to execute
The JDK by default applies optimization to output, however this is not the case for VC++, which favors compilation speed by default, different C++ compilers also vary in result, some will calculate the loop's result in compile time, leading to extremely short execution times (around 5 microseconds)
NOTE: Given the right conditions, the C++ program will perform better than Java in this simple test, however I noticed many runtime safety checks are skipped, violating it's debug intention as a "safe language", I believe C++ will even more outperform Java in a large array test, as it does not have bound checking
On Linux/Debian/Sid/x86-64, using OpenJDK 7 with
// file test.java
class Test {
    public static void main(String[] args) {
    long start = System.nanoTime();
    long total = 0;
    for (int i = 0; i < 2147483647; i++) {
        total += i;
    }
    System.out.println(total);
    System.out.println(System.nanoTime() - start);
    }
}   
and GCC 4.9 with
   // file test.cc
#include <iostream>
#include <chrono>
int main (int argc, char**argv) {
 using namespace std;
 auto start = chrono::high_resolution_clock::now();
 long long total = 0;
 for (int i = 0; i < 2147483647; i++)
   {
     total += i;
   }
 cout << total << endl;
 auto finish = chrono::high_resolution_clock::now();
 cout << chrono::duration_cast<chrono::nanoseconds>(finish - start).count()
      << endl;
}    
Then compiling and running test.java with
javac test.java
java Test
I'm getting the output
2305843005992468481
774937152
when compiling test.cc  with optimizations
g++ -O2 -std=c++11 test.cc -o test-gcc
and running ./test-gcc it goes much faster
2305843005992468481
40291
Of course without optimizations  g++  -std=c++11 test.cc -o test-gcc   the run is slower
2305843005992468481
5208949116
By looking at the assembler code using g++ -O2 -fverbose-asm -S -std=c++11 test.cc  I see that the compiler computed the result at compile time:
    .globl  main
    .type   main, @function
  main:
  .LFB1530:
    .cfi_startproc
    pushq   %rbx    #
    .cfi_def_cfa_offset 16
    .cfi_offset 3, -16
    call    _ZNSt6chrono3_V212system_clock3nowEv    #
    movabsq $2305843005992468481, %rsi  #,
    movl    $_ZSt4cout, %edi    #,
    movq    %rax, %rbx  #, start
    call    _ZNSo9_M_insertIxEERSoT_    #
    movq    %rax, %rdi  # D.35007,
    call    _ZSt4endlIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_  #
    call    _ZNSt6chrono3_V212system_clock3nowEv    #
    subq    %rbx, %rax  # start, D.35008
    movl    $_ZSt4cout, %edi    #,
    movq    %rax, %rsi  # D.35008, D.35008
    call    _ZNSo9_M_insertIlEERSoT_    #
    movq    %rax, %rdi  # D.35007,
    call    _ZSt4endlIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_  #
    xorl    %eax, %eax  #
    popq    %rbx    #
    .cfi_def_cfa_offset 8
    ret
    .cfi_endproc
  .LFE1530:
            .size   main, .-main
So you just need to enable optimizations in your compiler (or switch to a better compiler, like GCC 4.9)
BTW on Java low level optimizations happen in the JIT of the JVM.  I don't know JAVA well but I don't think I need to switch them on. I do know that on GCC you need to enable optimizations which of course are ahead of time (e.g. with -O2)
PS: I never used any Microsoft compiler in this 21st century, so I cannot help you on how to enable optimizations in it.
At last, I dont believe that such microbenchmarks are significant. Benchmark then optimize your real applications.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With