Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using Replicated Cache vs LB sticky session

I need to keep some data in cache on server. The servers are in cluster and call can go to any of them. In such a scenario is it better to use a replicated/distributed cache like EhCache Or to use session stickiness of LB.

If the data size(in cache) is big, won't it have a performance impact of serialization and de-serialization across all servers?

Also in case of distributed cache, whats the optimal number of servers till which such cache is effective. Since data is replicated to all nodes, and say number of nodes is 20, its like master to master replication across all nodes. By that I mean, each node will get notifications from other 19 and will update modifications to other 19.Does such type os setup scale?

like image 392
John Jay Avatar asked Nov 05 '25 19:11

John Jay


1 Answers

As always in distributed systems, the answer depands on different things:

  1. A load balancer with sticky sessions is for sure the simpler way for the developer, since it doesn’t make any difference if the application runs on 1, 2 or 100 servers. If this is all you care about, stick with it and you can stop reading right here.

  2. I’m not sure how session aware load balancers are implemented and what their general limit in terms of requests per second would be, but they have at least one big disadvantage over the distributed cache. - What to do if the machine handling the sessions is down? - If you distributed your cache, any machine can serve the request and it doesn’t matter if one of them fails. The serialisation/deserialisation part is not a big problem, rather the network could be the bottleneck if you don't run it in at least a 1 Gbit network environment, but it should be ok.

    • For distributed cache you could go either with Hazelcast, Infinispan or similar solutions, which would simplify the access from your own application. (Update: these implementations use DHT to distribute the cache)
    • Fully replicated cache you could use EhCached, which you mentioned, or Infinispan. Here the advantage over the distributed cache is the much faster access since you have all the data replicated on every machine and only need to access it localy. The disadvantage is slower writes (so rather use it for read very often, write very seldom scenarios) and the fact that your cache is limited by the amount which one machine is able to store. If you are running your applications on servers with 64GB of RAM this is ok. If you want to distribute them over small amazon instances, this is probably a bad idea. I think before you will hit any problems with updating too many nodes, you will run out of memory, and that one is at least very easy to calculate: AVG_CACHE_NEEDED_PER_CLIENT * NUMBER_OF_CLIENTS < MEMORY_FOR_CACHE_AVAILABLE (on one server). If you need more cache than you have available on any node in your EhCached cluster, full replication won't be possible any more.
    • Or you could use a Redis cluster or similar independent from your application and the servers your application is running on. This would allow you to scale the cache at a different speed than the rest of your application, however the access to the data wouldn’t be that trivial.

Of course the actual decision depends on your very specific use-case and the demands you are putting on your application.

Personally I was very happy when I found out today that Azure WebPages have a load balancer with sticky session support, and I don’t need to reconfigure my application to use Redis as a session object store, and can just keep everything as it is.

But for a huge workload with hundreds of servers a simple load balancer probably will be rather overwhelmed, and distributed cache, or centralized replicated cache (Redis) will be the way to go.

like image 124
peter Avatar answered Nov 07 '25 11:11

peter



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!