I am trying to understand native code generated from a Java loop. The native code should be optimized by the C2 compiler, but on my simple example it seems some optimizations are missing.
This is the Java method I wrote base on the minimal example of https://wiki.openjdk.java.net/display/HotSpot/LoopPredication:
104 public static byte[] myLoop(int init, int limit, int stride, int scale, int offset, byte value, byte[] array) {
105 for (int i = init; i < limit; i += stride) {
106 array [ scale * i + offset] = value;
107 }
108 return array;
109 }
These are the arguments given to the Java 8 Hotspot VM to force C2 compilation:
-server
-XX:-TieredCompilation
-XX:CompileThreshold=5
-XX:+UnlockDiagnosticVMOptions
-XX:+PrintAssembly
-XX:-UseCompressedOops
-XX:+LogCompilation
-XX:+TraceClassLoading
-XX:+UseLoopPredicate
-XX:+RangeCheckElimination
This is the amd64 native code generated by C2 ('myLoop' is called at least 10000 times):
# {method} {0x00007fcb5088ef38} 'myLoop' '(IIIIIB[B)[B' in 'MyClass'
# parm0: rsi = int
# parm1: rdx = int
# parm2: rcx = int
# parm3: r8 = int
# parm4: r9 = int
# parm5: rdi = byte
# parm6: [sp+0x40] = '[B' (sp of caller)
0x00007fcd44ee9fe0: mov %eax,0xfffffffffffec000(%rsp)
0x00007fcd44ee9fe7: push %rbp
0x00007fcd44ee9fe8: sub $0x30,%rsp ;*synchronization entry
; - MyClass::myLoop@-1 (line 105)
0x00007fcd44ee9fec: cmp %edx,%esi
0x00007fcd44ee9fee: jnl 0x7fcd44eea04a ;*if_icmplt
; - MyClass::myLoop@27 (line 105)
0x00007fcd44ee9ff0: mov 0x40(%rsp),%rax
0x00007fcd44ee9ff5: mov 0x10(%rax),%r10d ;*bastore
; - MyClass::myLoop@17 (line 106)
; implicit exception: dispatches to 0x00007fcd44eea051
0x00007fcd44ee9ff9: nopl 0x0(%rax) ;*aload
; - MyClass::myLoop@6 (line 106)
0x00007fcd44eea000: mov %esi,%ebx
0x00007fcd44eea002: imull %r8d,%ebx
0x00007fcd44eea006: add %r9d,%ebx ;*iadd
; - MyClass::myLoop@14 (line 106)
0x00007fcd44eea009: cmp %r10d,%ebx
0x00007fcd44eea00c: jnb 0x7fcd44eea02e ;*bastore
; - MyClass::myLoop@17 (line 106)
0x00007fcd44eea00e: add %ecx,%esi ;*iadd
; - MyClass::myLoop@21 (line 105)
0x00007fcd44eea010: movsxd %ebx,%r11
0x00007fcd44eea013: mov %dil,0x18(%rax,%r11) ; OopMap{rax=Oop off=56}
;*if_icmplt
; - MyClass::myLoop@27 (line 105)
0x00007fcd44eea018: test %eax,0xa025fe2(%rip) ; {poll}
0x00007fcd44eea01e: cmp %edx,%esi
0x00007fcd44eea020: jl 0x7fcd44eea000 ;*synchronization entry
; - MyClass::myLoop@-1 (line 105)
0x00007fcd44eea022: add $0x30,%rsp
0x00007fcd44eea026: pop %rbp
0x00007fcd44eea027: test %eax,0xa025fd3(%rip) ; {poll_return}
0x00007fcd44eea02d: retq
0x00007fcd44eea02e: movabs $0x7fcca3c810a8,%rsi ; {oop(a 'java/lang/ArrayIndexOutOfBoundsException')}
0x00007fcd44eea038: movq $0x0,0x18(%rsi) ;*bastore
; - MyClass::myLoop@17 (line 106)
0x00007fcd44eea040: add $0x30,%rsp
0x00007fcd44eea044: pop %rbp
0x00007fcd44eea045: jmpq 0x7fcd44e529a0 ; {runtime_call}
0x00007fcd44eea04a: mov 0x40(%rsp),%rax
0x00007fcd44eea04f: jmp 0x7fcd44eea022
0x00007fcd44eea051: mov %edx,%ebp
0x00007fcd44eea053: mov %ecx,0x40(%rsp)
0x00007fcd44eea057: mov %r8d,0x44(%rsp)
0x00007fcd44eea05c: mov %r9d,(%rsp)
0x00007fcd44eea060: mov %edi,0x4(%rsp)
0x00007fcd44eea064: mov %rax,0x8(%rsp)
0x00007fcd44eea069: mov %esi,0x10(%rsp)
0x00007fcd44eea06d: mov $0xffffff86,%esi
0x00007fcd44eea072: nop
0x00007fcd44eea073: callq 0x7fcd44dea1a0 ; OopMap{[8]=Oop off=152}
;*aload
; - MyClass::myLoop@6 (line 106)
; {runtime_call}
0x00007fcd44eea078: callq 0x7fcd4dc47c50 ;*aload
; - MyClass::myLoop@6 (line 106)
; {runtime_call}
0x00007fcd44eea07d: hlt
0x00007fcd44eea07e: hlt
0x00007fcd44eea07f: hlt
According to https://wiki.openjdk.java.net/display/HotSpot/LoopPredication, one optimization, called "array range elimination", eliminates array range checks within the loop but adds a loop predicate before the loop. It seems this optimization has not been done on 'myLoop' by C2. The loop's backward jump is at 0x7fcd44eea020 and jumps back to 0x7fcd44eea000. Within the loop there is still a range check at 0x7fcd44eea009-0x7fcd44eea00c.
The explanation is right there on the same page:
From the above example, the requirements to perform loop predication for array range check elimination are that
init,limit,offsetandarraya are loop invariants, andstrideandscaleare compile time constants.
In your example scale and stride are not compile time constants, so the optimization fails.
However, if you call this method with constant arguments, HotSpot will be able to eliminate range checks due to inling and constant propagation optimizations.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With