Scala 2.11 compiles a match expression over a relatively dense Int range into a lookupswitch:
lookupswitch { // 21
-12: 200
-11: 200
-10: 184
-9: 190
-8: 190
-7: 190
-6: 190
-5: 190
-4: 200
-1: 200
2: 195
3: 195
4: 195
5: 195
6: 184
7: 184
12: 184
13: 184
18: 184
21: 184
25: 184
default: 180
}
Whereas Java 7 compiles the equivalent switch statement into a tableswitch:
tableswitch { // -12 to 25
-12: 168
-11: 168
-10: 177
-9: 174
-8: 174
-7: 174
-6: 174
-5: 174
-4: 168
-3: 185
-2: 185
-1: 168
0: 185
1: 185
2: 171
3: 171
4: 171
5: 171
6: 177
7: 177
8: 185
9: 185
10: 185
11: 185
12: 181
13: 181
14: 185
15: 185
16: 185
17: 185
18: 181
19: 185
20: 185
21: 181
22: 185
23: 185
24: 185
25: 181
default: 185
}
Is there some way to force Scala into generating a tableswitch as well?
You should not care about the bytecode, since modern JVMs are smart enough to compile both lookupswitch and tableswitch in a similarly efficient way.
Intuitively tableswitch should be faster, and this is also suggested by
JVM specification:
Thus, a tableswitch instruction is probably more efficient than a lookupswitch where space considerations permit a choice.
However, the spec was written 20+ years ago, when JVM had no JIT compiler. Is there a performance difference in a modern HotSpot JVM?
package bench;
import org.openjdk.jmh.annotations.*;
@State(Scope.Benchmark)
public class SwitchBench {
@Param({"1", "2", "3", "4", "5", "6", "7", "8"})
int n;
@Benchmark
public long lookupSwitch() {
return Switch.lookupSwitch(n);
}
@Benchmark
public long tableSwitch() {
return Switch.tableSwitch(n);
}
}
To have precise control over the bytecode, I build Switch class with Jasmin.
.class public bench/Switch
.super java/lang/Object
.method public static lookupSwitch(I)I
.limit stack 1
iload_0
lookupswitch
1 : One
2 : Two
3 : Three
4 : Four
5 : Five
6 : Six
7 : Seven
default : Other
One:
bipush 11
ireturn
Two:
bipush 22
ireturn
Three:
bipush 33
ireturn
Four:
bipush 44
ireturn
Five:
bipush 55
ireturn
Six:
bipush 66
ireturn
Seven:
bipush 77
ireturn
Other:
bipush -1
ireturn
.end method
.method public static tableSwitch(I)I
.limit stack 1
iload_0
tableswitch 1
One
Two
Three
Four
Five
Six
Seven
default : Other
One:
bipush 11
ireturn
Two:
bipush 22
ireturn
Three:
bipush 33
ireturn
Four:
bipush 44
ireturn
Five:
bipush 55
ireturn
Six:
bipush 66
ireturn
Seven:
bipush 77
ireturn
Other:
bipush -1
ireturn
.end method
The results show no performance difference between lookupswitch/tableswitch benchmarks, but there is a small variation depending on switch argument:

Let's verify the theory by looking at generated assembly code.
The following JVM option will help: -XX:CompileCommand=print,bench.Switch::*
# {method} {0x0000000017498a48} 'lookupSwitch' '(I)I' in 'bench/Switch'
# parm0: rdx = int
# [sp+0x20] (sp of caller)
0x000000000329b240: sub $0x18,%rsp
0x000000000329b247: mov %rbp,0x10(%rsp) ;*synchronization entry
; - bench.Switch::lookupSwitch@-1
0x000000000329b24c: cmp $0x4,%edx
0x000000000329b24f: je 0x000000000329b2a5
0x000000000329b251: cmp $0x4,%edx
0x000000000329b254: jg 0x000000000329b281
0x000000000329b256: cmp $0x2,%edx
0x000000000329b259: je 0x000000000329b27a
0x000000000329b25b: cmp $0x2,%edx
0x000000000329b25e: jg 0x000000000329b273
0x000000000329b260: cmp $0x1,%edx
0x000000000329b263: jne 0x000000000329b26c ;*lookupswitch
; - bench.Switch::lookupSwitch@1
...
What we see here is a binary search starting with a mid value 4 (this explains why case 4 has the best performance in the graph above).
But the interesting thing is that tableSwitch is compiled exactly the same way!
# {method} {0x0000000017528b18} 'tableSwitch' '(I)I' in 'bench/Switch'
# parm0: rdx = int
# [sp+0x20] (sp of caller)
0x000000000332c280: sub $0x18,%rsp
0x000000000332c287: mov %rbp,0x10(%rsp) ;*synchronization entry
; - bench.Switch::tableSwitch@-1
0x000000000332c28c: cmp $0x4,%edx
0x000000000332c28f: je 0x000000000332c2e5
0x000000000332c291: cmp $0x4,%edx
0x000000000332c294: jg 0x000000000332c2c1
0x000000000332c296: cmp $0x2,%edx
0x000000000332c299: je 0x000000000332c2ba
0x000000000332c29b: cmp $0x2,%edx
0x000000000332c29e: jg 0x000000000332c2b3
0x000000000332c2a0: cmp $0x1,%edx
0x000000000332c2a3: jne 0x000000000332c2ac ;*tableswitch
; - bench.Switch::tableSwitch@1
...
But wait... Why binary search, not a jump table?
HotSpot JVM has an heuristics to generate a jump table only for switches with 10+ cases. This can be altered by the option -XX:MinJumpTableSize=.
OK, let's extend our test case with some more labels and check the generated code once again.
# {method} {0x0000000017288a68} 'lookupSwitch' '(I)I' in 'bench/Switch'
# parm0: rdx = int
# [sp+0x20] (sp of caller)
0x000000000307ecc0: sub $0x18,%rsp ; {no_reloc}
0x000000000307ecc7: mov %rbp,0x10(%rsp) ;*synchronization entry
; - bench.Switch::lookupSwitch@-1
0x000000000307eccc: mov %edx,%r10d
0x000000000307eccf: dec %r10d
0x000000000307ecd2: cmp $0xa,%r10d
0x000000000307ecd6: jb 0x000000000307ece9
0x000000000307ecd8: mov $0xffffffff,%eax
0x000000000307ecdd: add $0x10,%rsp
0x000000000307ece1: pop %rbp
0x000000000307ece2: test %eax,-0x1faece8(%rip) # 0x00000000010d0000
; {poll_return}
0x000000000307ece8: retq
0x000000000307ece9: movslq %edx,%r10
0x000000000307ecec: movabs $0x307ec60,%r11 ; {section_word}
0x000000000307ecf6: jmpq *-0x8(%r11,%r10,8) ;*lookupswitch
; - bench.Switch::lookupSwitch@1
^^^^^^^^^^^^^^^^^^^^^^^^^
Yes! Here is our computed jump instruction. Note, this is generated for lookupswitch. But there will be exactly the same one for tableswitch.
Amazingly, HotSpot JVM can generate jump tables even for switches with gaps and with outliers. -XX:MaxJumpTableSparseness controls how large the gaps can be. E.g. if there are labels from 1 to 10, then from 13 to 20 and the last label with the value 99 - JIT will generate a guard test for the value 99, and for the rest labels it will create a table.
HotSpot source code will finally convince there should be no performance difference between lookupswitch and tableswitch after a method is JIT-compiled with C2. That's basically because the parsing of both instructions ends up with a call to the same jump_switch_ranges function that works for an arbitrary set of labels.
As we saw, HotSpot JVM may compile tableswitch using a binary search and lookupswitch using a jump table, or vice versa. This depends on the number and the density of labels, but not on the bytecode itself.
So, answering your original question - you don't need to!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With