Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I make an async call to Hive in Java?

I would like to execute a Hive query on the server in an asynchronous manner. The Hive query will likely take a long time to complete, so I would prefer not to block on the call. I am currently using Thirft to make a blocking call (blocks on client.execute()), but I have not seen an example of how to make a non-blocking call. Here is the blocking code:

        TSocket transport = new TSocket("hive.example.com", 10000);
        transport.setTimeout(999999999);
        TBinaryProtocol protocol = new TBinaryProtocol(transport);
        Client client = new ThriftHive.Client(protocol);
        transport.open();
        client.execute(hql);  // Omitted HQL

        List<String> rows;
        while ((rows = client.fetchN(1000)) != null) {
            for (String row : rows) {
                // Do stuff with row
            }
        }

        transport.close();

The code above is missing try/catch blocks to keep it short.

Does anyone have any ideas how to do an async call? Can Hive/Thrift support it? Is there a better way?

Thanks!

like image 760
Peter Sankauskas Avatar asked Sep 06 '25 03:09

Peter Sankauskas


2 Answers

AFAIK, at the time of writing Thrift does not generate asynchronous clients. The reason as explained in this link here (search text for "asynchronous") is that Thrift was designed for the data centre where latency is assumed to be low.

Unfortunately as you know the latency experienced between call and result is not always caused by the network, but by the logic being performed! We have this problem calling into the Cassandra database from a Java application server where we want to limit total threads.

Summary: for now all you can do is make sure you have sufficient resources to handle the required numbers of blocked concurrent threads and wait for a more efficient implementation.

like image 90
The D Williams Avatar answered Sep 07 '25 16:09

The D Williams


It is now possible to make an asynchronous call in a Java thrift client after this patch was put in: https://issues.apache.org/jira/browse/THRIFT-768

Generate the async java client using the new thrift and initialize your client as follows:

TNonblockingTransport transport = new TNonblockingSocket("127.0.0.1", 9160);
TAsyncClientManager clientManager = new TAsyncClientManager();
TProtocolFactory protocolFactory = new TBinaryProtocol.Factory();
Hive.AsyncClient client = new Hive.AsyncClient(protocolFactory, clientManager, transport);

Now you can execute methods on this client as you would on a synchronous interface. The only change is that all methods take an additional parameter of a callback.

like image 21
Ishan Avatar answered Sep 07 '25 17:09

Ishan