Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How are these Java byte offsets calculated?

Tags:

java

jvm

bytecode

I have the following Java code:

public int sign(int a) {
  if(a<0) return -1;
  else if (a>0) return 1;
  else return 0;
}

which when compiled generated the following bytecode:

public int sign(int);
  Code:
     0: iload_1
     1: ifge          6
     4: iconst_m1
     5: ireturn
     6: iload_1
     7: ifle          12
    10: iconst_1
    11: ireturn
    12: iconst_0
    13: ireturn

I want to know how the byte offset count (the first column) is calculated, in particular, why is the byte count for the ifge and ifle instructions 3 bytes when all the other instructions are single byte instructions?

like image 675
jcm Avatar asked Dec 06 '25 04:12

jcm


1 Answers

As already pointed out in the comment: The ifge and ifle instructions have an additional offset.

The Java Virtual Machine Instruction Set specification for ifge and ifle contains the relevant hint here:

Format

if<cond>
branchbyte1
branchbyte2

This indicates that there are two additional bytes associated with this instruction, namely the "branch bytes". These bytes are composed to a single short value to determine the offset - namely, how far the instruction pointer should "jump" when the condition is satisfied.


Edit:

The comments made me curious: The offset is defined to be a signed 16 bit value, limiting the jumps to the range of +/- 32k. This does not cover the whole range of a possible method, which may contain up to 65535 bytes according to the code_length in the class file.

So I created a test class, to see what happens. This class looks like this:

class FarJump
{
    public static void main(String args[])
    {
        call(0, 1);
    }

    public static void call(int x, int y)
    {
        if (x < y)
        {
            y++;
            y++;

            ... (10921 times) ...

            y++;
            y++;
        }
        System.out.println(y);
    }

}

Each of the y++ lines will be translated into a iinc instruction, consisting of 3 bytes. So the resulting byte code is

public static void call(int, int);
    Code:
       0: iload_0
       1: iload_1
       2: if_icmpge     32768
       5: iinc          1, 1
       8: iinc          1, 1

       ...(10921 times) ...

    32762: iinc          1, 1
    32765: iinc          1, 1
    32768: getstatic     #3             // Field java/lang/System.out:Ljava/io/PrintStream;
    32771: iload_1
    32772: invokevirtual #4             // Method java/io/PrintStream.println:(I)V
    32775: return

One can see that it still uses an if_icmpge instruction, with an offset of 32768 (Edit: It is an absolute offset. The relative offset is 32766. Also see this question)

By adding a single more y++ in the original code, the compiled code suddenly changes to

public static void call(int, int);
    Code:
       0: iload_0
       1: iload_1
       2: if_icmplt     10
       5: goto_w        32781
      10: iinc          1, 1
      13: iinc          1, 1
      ....
    32770: iinc          1, 1
    32773: iinc          1, 1
    32776: goto_w        32781
    32781: getstatic     #3             // Field java/lang/System.out:Ljava/io/PrintStream;
    32784: iload_1
    32785: invokevirtual #4             // Method java/io/PrintStream.println:(I)V
    32788: return

So it reverses the condition from if_icmpge to if_icmplt, and handles the far jump with a goto_w instruction, that contains four branch bytes and can thus cover (more than) a full method range.

like image 94
Marco13 Avatar answered Dec 10 '25 11:12

Marco13



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!