Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

When is Java loop predication optimization triggered by the C2 JIT compiler?

I am trying to understand native code generated from a Java loop. The native code should be optimized by the C2 compiler, but on my simple example it seems some optimizations are missing.

This is the Java method I wrote base on the minimal example of https://wiki.openjdk.java.net/display/HotSpot/LoopPredication:

104    public static byte[] myLoop(int init, int limit, int stride, int scale, int offset, byte value, byte[] array) {
105     for (int i = init; i < limit; i += stride) {
106         array [ scale * i + offset] = value;
107     }
108     return array;
109    }

These are the arguments given to the Java 8 Hotspot VM to force C2 compilation:

-server
-XX:-TieredCompilation
-XX:CompileThreshold=5
-XX:+UnlockDiagnosticVMOptions 
-XX:+PrintAssembly
-XX:-UseCompressedOops
-XX:+LogCompilation
-XX:+TraceClassLoading
-XX:+UseLoopPredicate
-XX:+RangeCheckElimination

This is the amd64 native code generated by C2 ('myLoop' is called at least 10000 times):

  # {method} {0x00007fcb5088ef38} 'myLoop' '(IIIIIB[B)[B' in 'MyClass'                                                                                                                                                                                                                                                                                      
  # parm0:    rsi       = int
  # parm1:    rdx       = int
  # parm2:    rcx       = int
  # parm3:    r8        = int
  # parm4:    r9        = int
  # parm5:    rdi       = byte
  # parm6:    [sp+0x40]   = '[B'  (sp of caller)
  0x00007fcd44ee9fe0: mov     %eax,0xfffffffffffec000(%rsp)
  0x00007fcd44ee9fe7: push    %rbp
  0x00007fcd44ee9fe8: sub     $0x30,%rsp        ;*synchronization entry
                                                ; - MyClass::myLoop@-1 (line 105)

  0x00007fcd44ee9fec: cmp     %edx,%esi
  0x00007fcd44ee9fee: jnl     0x7fcd44eea04a    ;*if_icmplt
                                                ; - MyClass::myLoop@27 (line 105)

  0x00007fcd44ee9ff0: mov     0x40(%rsp),%rax
  0x00007fcd44ee9ff5: mov     0x10(%rax),%r10d  ;*bastore
                                                ; - MyClass::myLoop@17 (line 106)
                                                ; implicit exception: dispatches to 0x00007fcd44eea051
  0x00007fcd44ee9ff9: nopl    0x0(%rax)         ;*aload
                                                ; - MyClass::myLoop@6 (line 106)

  0x00007fcd44eea000: mov     %esi,%ebx
  0x00007fcd44eea002: imull   %r8d,%ebx
  0x00007fcd44eea006: add     %r9d,%ebx         ;*iadd
                                                ; - MyClass::myLoop@14 (line 106)

  0x00007fcd44eea009: cmp     %r10d,%ebx
  0x00007fcd44eea00c: jnb     0x7fcd44eea02e    ;*bastore
                                                ; - MyClass::myLoop@17 (line 106)

  0x00007fcd44eea00e: add     %ecx,%esi         ;*iadd
                                                ; - MyClass::myLoop@21 (line 105)

  0x00007fcd44eea010: movsxd  %ebx,%r11
  0x00007fcd44eea013: mov     %dil,0x18(%rax,%r11)  ; OopMap{rax=Oop off=56}
                                                ;*if_icmplt
                                                ; - MyClass::myLoop@27 (line 105)

  0x00007fcd44eea018: test    %eax,0xa025fe2(%rip)  ;   {poll}
  0x00007fcd44eea01e: cmp     %edx,%esi
  0x00007fcd44eea020: jl      0x7fcd44eea000    ;*synchronization entry
                                                ; - MyClass::myLoop@-1 (line 105)

  0x00007fcd44eea022: add     $0x30,%rsp
  0x00007fcd44eea026: pop     %rbp
  0x00007fcd44eea027: test    %eax,0xa025fd3(%rip)  ;   {poll_return}
  0x00007fcd44eea02d: retq
  0x00007fcd44eea02e: movabs  $0x7fcca3c810a8,%rsi  ;   {oop(a 'java/lang/ArrayIndexOutOfBoundsException')}
  0x00007fcd44eea038: movq    $0x0,0x18(%rsi)   ;*bastore
                                                ; - MyClass::myLoop@17 (line 106)

  0x00007fcd44eea040: add     $0x30,%rsp
  0x00007fcd44eea044: pop     %rbp
  0x00007fcd44eea045: jmpq    0x7fcd44e529a0    ;   {runtime_call}
  0x00007fcd44eea04a: mov     0x40(%rsp),%rax
  0x00007fcd44eea04f: jmp     0x7fcd44eea022
  0x00007fcd44eea051: mov     %edx,%ebp
  0x00007fcd44eea053: mov     %ecx,0x40(%rsp)
  0x00007fcd44eea057: mov     %r8d,0x44(%rsp)
  0x00007fcd44eea05c: mov     %r9d,(%rsp)
  0x00007fcd44eea060: mov     %edi,0x4(%rsp)
  0x00007fcd44eea064: mov     %rax,0x8(%rsp)
  0x00007fcd44eea069: mov     %esi,0x10(%rsp)
  0x00007fcd44eea06d: mov     $0xffffff86,%esi
  0x00007fcd44eea072: nop
  0x00007fcd44eea073: callq   0x7fcd44dea1a0    ; OopMap{[8]=Oop off=152}
                                                ;*aload
                                                ; - MyClass::myLoop@6 (line 106)
                                                ;   {runtime_call}
  0x00007fcd44eea078: callq   0x7fcd4dc47c50    ;*aload
                                                ; - MyClass::myLoop@6 (line 106)
                                                ;   {runtime_call}
  0x00007fcd44eea07d: hlt
  0x00007fcd44eea07e: hlt
  0x00007fcd44eea07f: hlt

According to https://wiki.openjdk.java.net/display/HotSpot/LoopPredication, one optimization, called "array range elimination", eliminates array range checks within the loop but adds a loop predicate before the loop. It seems this optimization has not been done on 'myLoop' by C2. The loop's backward jump is at 0x7fcd44eea020 and jumps back to 0x7fcd44eea000. Within the loop there is still a range check at 0x7fcd44eea009-0x7fcd44eea00c.

  1. Why is there still a check in the loop?
  2. Why has the loop predication optimization not been run?
  3. How can I force all optimizations?
like image 779
Ano Nymous Avatar asked Jan 02 '26 02:01

Ano Nymous


1 Answers

The explanation is right there on the same page:

From the above example, the requirements to perform loop predication for array range check elimination are that init, limit, offset and array a are loop invariants, and stride and scale are compile time constants.

In your example scale and stride are not compile time constants, so the optimization fails.

However, if you call this method with constant arguments, HotSpot will be able to eliminate range checks due to inling and constant propagation optimizations.

like image 98
apangin Avatar answered Jan 05 '26 04:01

apangin



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!