Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to configure shards in Vespa?

Tags:

vespa

We want to setup cluster of 4 nodes to host data. And the cluster hosts one index only, so to have similar data type in all 4 node.

Our goal is to have data sharded on the nodes. Let say two shards and two replica. (total 4 nodes to host these 4 data partition)

Document mode is "index" and global is "true".

   <redundancy>2</redundancy>

   <nodes>
      <node hostalias="node1" distribution-key="0"/>
      <node hostalias="node2" distribution-key="1"/>
      <node hostalias="node3" distribution-key="2"/>
      <node hostalias="node4" distribution-key="3"/>
    </nodes>        

    <engine>
      <proton>
        <searchable-copies>2</searchable-copies>
        <flush-on-shutdown>true</flush-on-shutdown>
      </proton>
    </engine>

Above config of in services.xml is not allowed. it asks redundancy to be at least same number as nodes and we need to configure,

<redundancy>4</redundancy>

and

<searchable-copies>4</searchable-copies>

for it to accept a valid config.

And that is configuring all 4 nodes to have all the data and each would contain copy of data. According to http://docs.vespa.ai/documentation/content/data-placement.html - we need global=true. And noticed:

Note: The global documents feature is under development. It is currently only available for setups where all documents are already inherently on all nodes, i.e. N groups each containing a single node.

How to distribute data in shards? Can we make node1 and node2 to have distributed data and node3 and node4 can have their copy with redundancy 2?

like image 787
enator Avatar asked Oct 11 '25 22:10

enator


1 Answers

Thanks for asking - I see the documentation of global=true is a bit confusing.

In your case, you want to shard, i.e. distribute 2 replicas of each document over 4 nodes (correct me if I am wrong).

global is normally used for parent documents like in http://docs.vespa.ai/documentation/search-definitions.html#document-references - in your case you have only document type (I assume), hence no parents, so do not use global

The global feature will distribute 4 replicas over 4 nodes (if this is what you want, please set redundancy=4). but no need to use global here, too.

like image 160
Kristian Aune Avatar answered Oct 16 '25 05:10

Kristian Aune