Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Vespa.ai exploit multiple instances to answer queries

Tags:

vespa

I have Vespa.ai cluster with multiple container/content nodes. After Vespa is loaded with data, my app sends queries and gets the data from Vespa. I want to be sure that I utilize well all the nodes and I get the data as fast as possible. My app builds HTTP request and sends it to one of the nodes.
Which node/nodes should I direct my request to? How can I be sure that all instances participate in answering queries?
What should I do to utilize all the cluster nodes?
Does Vespa know to load balance these requests to other instances for better performance?

like image 549
Oded Avatar asked Oct 28 '25 07:10

Oded


2 Answers

Vespa is a 2-tier system:

enter image description here

The containers will load balance over the content nodes (if you have multiple groups), but since you are sending the requests to the containers, you need to load balance over those.

This can be done by code you write in your client, by VIP, by another tier of nodes you host yourself such as e.g Nginx, or by a hosted load balancer such as AWS ELB.

like image 167
Jon Avatar answered Oct 31 '25 11:10

Jon


You can debug the distributed query execution by adding &presentation.timing=true&trace.timestamps&tracelevel=5 to the search request, then you'll get a trace in the response where you can see how the query was dispatched and how long each node uses to match the query. See also Scaling Vespa https://docs.vespa.ai/en/performance/sizing-search.html

like image 21
Jo Kristian Bergum Avatar answered Oct 31 '25 11:10

Jo Kristian Bergum