Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java 8 - Most effective way to merge List<byte[]> to byte[]

I have a library that returns some binary data as list of binary arrays. Those byte[] need to be merged into an InputStream.

This is my current implementation:

public static InputStream foo(List<byte[]> binary) {
    byte[] streamArray = null;
    binary.forEach(bin -> {
        org.apache.commons.lang.ArrayUtils.addAll(streamArray, bin);
    });
    return new ByteArrayInputStream(streamArray);
}

but this is quite cpu intense. Is there a better way?

Thanks for all the answers. I did a performance test. Those are my results:

  • Function: 'NicolasFilotto' => 68,04 ms average on 100 calls
  • Function: 'NicolasFilottoEstSize' => 65,24 ms average on 100 calls
  • Function: 'NicolasFilottoSequenceInputStream' => 63,09 ms average on 100 calls
  • Function: 'Saka1029_1' => 63,06 ms average on 100 calls
  • Function: 'Saka1029_2' => 0,79 ms average on 100 calls
  • Function: 'Coco' => 541,60 ms average on 10 calls

I'm not sure if 'Saka1029_2' is measured correctly...

this is the execute function:

private static double execute(Callable<InputStream> funct, int times) throws Exception {
    List<Long> executions = new ArrayList<>(times);

    for (int idx = 0; idx < times; idx++) {
        BufferedReader br = null;
        long startTime = System.currentTimeMillis();
        InputStream is = funct.call();
        br = new BufferedReader(new InputStreamReader(is));
        String line = null;
        while ((line = br.readLine()) != null) {}
        executions.add(System.currentTimeMillis() - startTime);
    }

    return calculateAverage(executions);
}

note that I read every input stream

those are the used implementations:

public static class NicolasFilotto implements Callable<InputStream> {

    private final List<byte[]> binary;

    public NicolasFilotto(List<byte[]> binary) {
        this.binary = binary;
    }

    @Override
    public InputStream call() throws Exception {
        ByteArrayOutputStream baos = new ByteArrayOutputStream();
        for (byte[] bytes : binary) {
            baos.write(bytes, 0, bytes.length);
        }
        return new ByteArrayInputStream(baos.toByteArray());
    }

}

public static class NicolasFilottoSequenceInputStream implements Callable<InputStream> {

    private final List<byte[]> binary;

    public NicolasFilottoSequenceInputStream(List<byte[]> binary) {
        this.binary = binary;
    }

    @Override
    public InputStream call() throws Exception {
        return new SequenceInputStream(
                Collections.enumeration(
                        binary.stream().map(ByteArrayInputStream::new).collect(Collectors.toList())));
    }

}

public static class NicolasFilottoEstSize implements Callable<InputStream> {

    private final List<byte[]> binary;
    private final int lineSize;

    public NicolasFilottoEstSize(List<byte[]> binary, int lineSize) {
        this.binary = binary;
        this.lineSize = lineSize;
    }

    @Override
    public InputStream call() throws Exception {
        ByteArrayOutputStream baos = new ByteArrayOutputStream(binary.size() * lineSize);
        for (byte[] bytes : binary) {
            baos.write(bytes, 0, bytes.length);
        }
        return new ByteArrayInputStream(baos.toByteArray());
    }

}

public static class Saka1029_1 implements Callable<InputStream> {

    private final List<byte[]> binary;

    public Saka1029_1(List<byte[]> binary) {
        this.binary = binary;
    }

    @Override
    public InputStream call() throws Exception {
        byte[] all = new byte[binary.stream().mapToInt(a -> a.length).sum()];
        int pos = 0;
        for (byte[] bin : binary) {
            int length = bin.length;
            System.arraycopy(bin, 0, all, pos, length);
            pos += length;
        }
        return new ByteArrayInputStream(all);
    }

}

public static class Saka1029_2 implements Callable<InputStream> {

    private final List<byte[]> binary;

    public Saka1029_2(List<byte[]> binary) {
        this.binary = binary;
    }

    @Override
    public InputStream call() throws Exception {
        int size = binary.size();
        return new InputStream() {
            int i = 0, j = 0;

            @Override
            public int read() throws IOException {
                if (i >= size) return -1;
                if (j >= binary.get(i).length) {
                    ++i;
                    j = 0;
                }
                if (i >= size) return -1;
                return binary.get(i)[j++];
            }
        };
    }

}

public static class Coco implements Callable<InputStream> {

    private final List<byte[]> binary;

    public Coco(List<byte[]> binary) {
        this.binary = binary;
    }

    @Override
    public InputStream call() throws Exception {
        byte[] streamArray = new byte[0];
        for (byte[] bin : binary) {
            streamArray = org.apache.commons.lang.ArrayUtils.addAll(streamArray, bin);
        }
        return new ByteArrayInputStream(streamArray);
    }

}
like image 651
Coco Avatar asked Oct 23 '25 13:10

Coco


2 Answers

You could use a ByteArrayOutputStream to store the content of each byte arrays of your list but to make it efficient, we would need to create the instance of ByteArrayOutputStream with an initial size that matches the best as possible with the target size, so if you know the size or at least the average size of the array of bytes, you should use it, the code would be:

public static InputStream foo(List<byte[]> binary) {
    ByteArrayOutputStream baos = new ByteArrayOutputStream(ARRAY_SIZE * binary.size());
    for (byte[] bytes : binary) {
        baos.write(bytes, 0, bytes.length);
    }
    return new ByteArrayInputStream(baos.toByteArray());
}

Another approach would be to use SequenceInputStream in order to logically concatenate all the ByteArrayInputStream instances representing one element of your list, as next:

public static InputStream foo(List<byte[]> binary) {
    return new SequenceInputStream(
        Collections.enumeration(
            binary.stream().map(ByteArrayInputStream::new).collect(Collectors.toList())
        )
    );
}

The interesting aspect of this approach is the fact that you have no need to copy anything, you only create instances of ByteArrayInputStream that will use the byte array as it is.

To avoid collecting the result as a List which has a cost especially if your initial List is big, you can directly call iterator() as proposed by @Holger, then we will simply need to convert an iterator into an enumeration which can be done with IteratorUtils.asEnumeration(iterator) from Apache Commons Collection, the final code would then be:

public static InputStream foo(List<byte[]> binary) {
    return new SequenceInputStream(
        IteratorUtils.asEnumeration(
            binary.stream().map(ByteArrayInputStream::new).iterator()
        )
    );
}
like image 137
Nicolas Filotto Avatar answered Oct 26 '25 04:10

Nicolas Filotto


Try this.

public static InputStream foo(List<byte[]> binary) {
    byte[] all = new byte[binary.stream().mapToInt(a -> a.length).sum()];
    int pos = 0;
    for (byte[] bin : binary) {
        int length = bin.length;
        System.arraycopy(bin, 0, all, pos, length);
        pos += length;
    }
    return new ByteArrayInputStream(all);
}

Or

public static InputStream foo(List<byte[]> binary) {
    int size = binary.size();
    return new InputStream() {
        int i = 0, j = 0;
        @Override
        public int read() throws IOException {
            if (i >= size) return -1;
            if (j >= binary.get(i).length) {
                ++i;
                j = 0;
            }
            if (i >= size) return -1;
            return binary.get(i)[j++];
        }
    };
}

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!