I'm currently redesigning a system and have run into a bit of a pickle.
I have a queue of messages (A RabbitMQ queue) and a number of services consuming from that queue. Each message has an id that denotes it is part of a group, and a timestamp to help denote order.
Each service consuming from the queue is responsible for processing at least one group, but can be responsible for many. Services can run across multiple servers.
The system must allow for a variable numbers of consuming services, and for ids to be automatically reassigned on service initialization/destruction.
Messages must be processed in realtime, no buffering is allowed. Messages must be processed in order within their ID group.
Rabbit does not allow traversal of the queue before fetching, or fetching from a single queue based as a pattern.
Order is important in the messages. While we could simply get the top messages and reject them if they do not match our id, this introduces an issue
In order to enforce in-order processing, we need to set the prefetch count value to 1. However this introduces the possibility of a livelock, where the services are constantly getting the head of the queue, but no service gets a message which it can process, or so rarely as to heavily impact performance.
Conversely if we set the prefetch count to be high enough to reasonably mitigate this, it allows for out of order processing.
In either case setting the prefetch count to a low value seems to heavily impact performance.
Automatically generating a rabbit queue per consumer and using topic exchanges to automatically route the messages seems like a possible solution.
However this creates an issue where messages will be lost if we need to decrease the number of consuming services before the queue for that service can be processed. It will also result in lost messages if a service dies and no replacement can be restarted.
My current idea is to have a coordinator service which reads from the queue and distributes messages to workers based on the id.
While this would introduce a single point of failure, it would be possible to create several coordinator processes, which would communicate which worker is responsible for a given id via zookeeper.
This would necessitate one of the following patterns;
Any help is much appreciated
This may not be the right Stack Exchange for this question but I'll take a stab at it by correcting you on a couple of things.
In order to enforce in-order processing, we need to set the prefetch count value to 1. However this introduces the possibility of a livelock, where the services are constantly getting the head of the queue, but no service gets a message which it can process, or so rarely as to heavily impact performance.
I don't understand this issue... only have services subscribe to a queue if they can handle all messages. Also only bind routing keys the queue can handle.
In either case setting the prefetch count to a low value seems to heavily impact performance.
Basically the prefetch count should equal the number of workers or you get very little benefit. The problem is that you cannot do two multiple things (ie multiple works running in parallel) with out loosing order.
Thus order is lost with multiple parallel workers not the prefetch count.
It will also result in lost messages if a service dies and no replacement can be restarted.
No this should not happen. If your service does not ack the message, the message will not be lost. Just don't delete the queues and make them durable. However if your dynamically adding queues and bindings which I think is what you were trying to say then you could loose message if the binding (routing key) is not declared in time. You can deal with this issue with a dead letter exchange.
Your general problem is I think you would like to be able to add workers to queues and still preserve order. This is a very difficult problem to solve at scale. You should look into how Twitter does this.
A couple of ideas are to figure out what you can 'hash' on and what you can make queues on. For example you could make a queue for all the users in a certain geographic region thus preserving order for the people in those region. It really depends on your business problem.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With