Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ByteBuffer.putLong ~2x faster with non-native ByteOrder

Here's a result I can't wrap by head around, despite extensive reading of the JDK source and the examination of intrinsic routines.

I'm testing clearing out a ByteBuffer, allocated with allocateDirect using ByteBuffer.putLong(int index, long value). Based on the JDK code, this results in a single 8 byte write if the buffer is in "native byte order", or a byte swap, followed by the same if it isn't.

So I'd expect native byte order (little endian for me) to be at least as fast as non-native. as it turns out, however, non-native are ~2x faster.

Here's my benchmark in Caliper 0.5x:

...    

public class ByteBufferBench extends SimpleBenchmark {

    private static final int SIZE = 2048;

    enum Endian {
        DEFAULT,
        SMALL,
        BIG
    }

    @Param Endian endian;

    private ByteBuffer bufferMember; 

    @Override
    protected void setUp() throws Exception {
        super.setUp();
        bufferMember = ByteBuffer.allocateDirect(SIZE);
        bufferMember.order(endian == Endian.DEFAULT ? bufferMember.order() :
            (endian == Endian.SMALL ? ByteOrder.LITTLE_ENDIAN : ByteOrder.BIG_ENDIAN));
    }

    public int timeClearLong(int reps) {
        ByteBuffer buffer = bufferMember;
        while (reps-- > 0) {
            for (int i=0; i < SIZE / LONG_BYTES; i+= LONG_BYTES) {
                buffer.putLong(i, reps);
            }
        }
        return 0;
    }

    public static void main(String[] args) {
        Runner.main(ByteBufferBench.class,args);
    }

}

The results are:

benchmark       type  endian     ns linear runtime
ClearLong     DIRECT DEFAULT   64.8 =
ClearLong     DIRECT   SMALL  118.6 ==
ClearLong     DIRECT     BIG   64.8 =

That's consistent. If I swap putLong for putFloat, it's about 4x faster for native order. If you look at how putLong works, it's doing absolutely more work in the non-native case:

private ByteBuffer putLong(long a, long x) {
    if (unaligned) {
        long y = (x);
        unsafe.putLong(a, (nativeByteOrder ? y : Bits.swap(y)));
    } else {
        Bits.putLong(a, x, bigEndian);
    }
    return this;
}

Note that unaligned is true in either case. The only difference between native and non-native byte order is Bits.swap which favors the native case (little-endian).

like image 710
BeeOnRope Avatar asked Oct 24 '25 12:10

BeeOnRope


1 Answers

To summarize the discussion from the mechanical sympathy mailing list:

1.The anomaly described by the OP was not reproduce-able on my setup (JDK7u40/Ubuntu13.04/i7) resulting in consistent performance for both heap and direct buffers on all cases, with direct buffer offering a massive performance advantage:

BYTE_ARRAY DEFAULT 211.1 ==============================
BYTE_ARRAY   SMALL 199.8 ============================
BYTE_ARRAY     BIG 210.5 =============================
DIRECT DEFAULT  33.8 ====
DIRECT   SMALL  33.5 ====
DIRECT     BIG  33.7 ==== 

The Bits.swap(y) method gets intrinsic-fied into a single instruction and so can't/shouldn't really account for much of a difference/overhead.

2.The above result (i.e. contradictory to the OP experience) was independently confirmed by a naive hand rolled benchmark and a JMH benchmark written by another participant.

This leads me to believe you are either experiencing some local issue or some sort of a benchmarking framework issue. It would be valuable if others could run the experiment and see if they can reproduce your result.

like image 133
Nitsan Wakart Avatar answered Oct 27 '25 00:10

Nitsan Wakart