Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fastest way to GZIP and UDP a large amount of Strings in Java

I'm implementing a logging system that needs to encode the log messages with GZIP and send them off by UDP.

What I've got so far is:

Initialization:

DatagramSocket sock = new DatagramSocket(); 
baos = new ByteArrayOutputStream();
printStream = new PrintStream(new GZIPOutputStream(baos));

This printStream is then passed out of the logger - messages will arrive through it

Then every time a message arrives:

byte[] d = baos.toByteArray();
DatagramPacket dp = new DatagramPacket(d,d.length,host,port);
sock.send(dp);

What stumps me currently is that I can't find a way to remove the data from the ByteArrayOutputStream (toByteArray() only takes a copy) and I'm afraid that recreating all three stream objects every time will be inefficient.

Is there some way to remove sent data from the stream? Or should I look in another direction entirely?

like image 811
Kristaps Baumanis Avatar asked Jan 25 '26 20:01

Kristaps Baumanis


2 Answers

You must create a new stream for each message; otherwise, every call to toByteArray() will send all previous messages again.

A better approach is probably to wrap the OutputStream of a TCP socket with a GZIPOutputStream:

printStream = new PrintStream(new GZIPOutputStream(sock.getOutputStream()));

Also don't forget to flush the PrintStream after every message or nothing will happen.

If speed is really that important, you should consider to use a DatagramChannel instead of the old (slow) steam API. This should get you started:

ByteBuffer buffer = ByteBuffer.allocate( 1000 );
ByteBufferOutputStream bufferOutput = new ByteBufferOutputStream( buffer );
GZIPOutputStream output = new GZIPOutputStream( bufferOutput );
OutputStreamWriter writer = new OutputStreamWriter( output, "UTF-8" );
writer.write( "log message\n" );
writer.close();

sock.getChannel().open(); // do this once
sock.getChannel().write( buffer ); // Send compressed data

Note: You can reuse the buffer by rewinding it but all the streams must be created once per message.

like image 111
Aaron Digulla Avatar answered Jan 27 '26 09:01

Aaron Digulla


It is worth checking that using GZIP will help if speed is important. (It will add some latency)

public static void main(String... args) throws IOException {
    test("Hello World");
    test("Nov 20, 2012 4:55:11 PM Main main\n" +
            "INFO: Hello World log message");
}

private static void test(String s) throws IOException {
    byte[] bytes = s.getBytes("UTF-8");
    ByteArrayOutputStream baos = new ByteArrayOutputStream();
    GZIPOutputStream outputStream = new GZIPOutputStream(baos);
    outputStream.write(bytes);
    outputStream.close();
    byte[] bytes2 = baos.toByteArray();
    System.out.println("'" + s + "' raw.length=" + bytes.length + " gzip.length=" + bytes2.length);
}

prints

'Hello World' raw.length=11 gzip.length=31
'Nov 20, 2012 4:55:11 PM Main main
INFO: Hello World log message' raw.length=63 gzip.length=80
like image 38
Peter Lawrey Avatar answered Jan 27 '26 09:01

Peter Lawrey



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!