I have Vespa.ai cluster with multiple container/content nodes. After Vespa is loaded with data, my app sends queries and gets the data from Vespa. I want to be sure that I utilize well all the nodes and I get the data as fast as possible. My app builds HTTP request and sends it to one of the nodes.
Which node/nodes should I direct my request to?
How can I be sure that all instances participate in answering queries?
What should I do to utilize all the cluster nodes?
Does Vespa know to load balance these requests to other instances for better performance?
Vespa is a 2-tier system:

The containers will load balance over the content nodes (if you have multiple groups), but since you are sending the requests to the containers, you need to load balance over those.
This can be done by code you write in your client, by VIP, by another tier of nodes you host yourself such as e.g Nginx, or by a hosted load balancer such as AWS ELB.
You can debug the distributed query execution by adding &presentation.timing=true&trace.timestamps&tracelevel=5
to the search request, then you'll get a trace in the response where you can see how the query was dispatched and how long each node uses to match the query. See also Scaling Vespa https://docs.vespa.ai/en/performance/sizing-search.html
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With