What are the differences between the following implementations of SolrServer:
ConcurrentUpdateSolrServerHttpSolrServerCommonsHttpSolrServer (Note: Is this now deprecated?)As mentioned in the documentation:
It is only recommended to use ConcurrentUpdateSolrServer with /update requests. The class HttpSolrServer is better suited for the query interface.
The documentation for ConcurrentUpdateSolrServer suggests using it for updates and HttpSolrServer for queries. Why is this?
At the moment I am using HttpSolrServer for everything, will using ConcurrentUpdateSolrServer for updates result in significant performance improvements?
We are currently in 2017, and Solr community renamed SolrServer into SolrClient and currently we have 4 implementations:
CloudSolrClient ConcurrentUpdateSolrClientHttpSolrClientLBHttpSolrClientDocumentation suggests to use ConcurrentUpdateSolrClient, because it buffers all update requests into final BlockingQueue<Update> queue;, so operation time on updates will be less than using HttpSolrClient, which behaves like this - as soon as it gets update request it immediately fires it. Of course, we are trusting the documentation, but it will be so easy to get this answer, that's why I did some perf testing.
However, first I will describe the different operations of the clients. If you're using add operation of the SolrClient, there is no difference if you gonna create HttpSolrClient or ConcurrentUpdateSolrClient, cause both methods will do the same. ConcurrentUpdateSolrClient only shines if you're explicitily doing UpdateRequest
Test results for indexing wikipedia titles (code): My machine is: Intel i5-4670S 3.1 Ghz 16 Gb RAM
ConcurrentUpdateSolrClient (5 threads, 1000 queue size) - 200 seconds
ConcurrentUpdateSolrClient (5 threads, 10000 queue size) - 150 seconds
ConcurrentUpdateSolrClient (10 threads, 1000 queue size) - 100 seconds
ConcurrentUpdateSolrClient (10 threads, 10000 queue size) - 30 seconds
HttpSolrClient (no bulk) - 7000 seconds
HttpSolrClient (bulk 1000 docs) - 150 seconds
HttpSolrClient (bulk 10000 docs) - 80 seconds
Summary:
If you're using clients in similar fashion, e.g: client.add(doc); than, ConcurrentUpdateSolrClient performing at least 10-20 times faster, because of the usage of ThreadPool and Queue (aka Bulk operation)
If you're using HttpSolrClient, you still could mimic this behaviour, by manually creating several clients, running additional threads and using some intermediate storage, like List. It will improve the performance for sure, but requires additional code.
Numbers most likely have very little sense, but I hope it gives some raw comparison.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With