Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ZeroMQ blocked in a context.term() call. Why? How to prevent?

Tags:

java

zeromq

I have a java program that using ZeroMQ.

But I found the program blocked in context.term(); if receiving a message( recvMsg() ) time out!

ZMQ.Context context = ZMQ.context(1);  
ZMQ.Socket socket = context.socket(ZMQ.REQ);  
socket.connect(mAddress);         

ZMsg ZM = new ZMsg();
ZM.add(qString);
ZM.send(socket, true);

socket.setReceiveTimeOut(mTimeout);     
ZMsg receivedZM = ZMsg.recvMsg(socket);

if(receivedZM != null) {
    System.out.println(receivedZM.getFirst().toString());   
}      
socket.close();  
context.term(); 

What is the reason cause it to blocked?

And how to solve this problem?

like image 932
jones321 Avatar asked Dec 04 '25 07:12

jones321


2 Answers

ZeroMQ is a system using many tricks behind the Context()-factory

I always advocate to automatically set .setsockopt( ZMQ_LINGER, 0 ) right upon a Socket-instantiation, right due to these types of behaviour, that otherwise remain outside of your local-code domain of control. A hanging Context-instance IO-thread(s) ( be it after a programmed .term() having been issued in spite of the not yet successful .close() of all socket-instances, instantiated under this Context-instance the .term() is to dismantle and release back all system resources from, or an unhandled exception case, when things just went straight wreck havoc ) is one of such never-more-s.

Feel free to follow schoolbook and online hacks/snippet examples, but a serious distributed system designer ought take all reasonable steps and measures so as to prevent her/his system code to fall into any deadlock-state ( the less into an un-salvageable one ).


What is the reason?

As documentation states - it is a designed-in feature of ZeroMQ:

attempting to terminate the socket's context with zmq_ctx_term() shall block until all pending messages have been sent to a peer.

Any case, where a .send()-dispatched ( just dispatched -- by no means meaning that it has already been sent-to-wire ) message is still inside the local-queue for any of the recognised ( and potentially disconnected or busy or ... ) peer-nodes, the just-default configured .term() cannot proceed and will block.


What is the solution:

Newer API versions started to say, a default LINGER value to stop being -1 == INFINITY, but as you never know, which version will your code interface with, an explicit ( manual ) call to a .setsockopt( ZMQ_LINGER, 0 ) method is a self-disciplining step and increases your team awareness on how to build reliable distributed-systems' code.

Using the try: / except: / finally: syntax-handlers is needless to be raised here. You simply always have to design with failures & collisions in mind, haven't you?

like image 177
user3666197 Avatar answered Dec 05 '25 20:12

user3666197


According to the API, http://api.zeromq.org/4-2:zmq-term, it will block when there's still messages to transmit. This suggests that you other machine or process, the one that will open the REP socket; isn't running.

like image 41
bazza Avatar answered Dec 05 '25 21:12

bazza