Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parallel algorithm for connected components

I have to find the best algorithm from already known for parallel computing of connected components of graph.

Here is a brief outline of my data and computer architecture:

  • I have access to a computational cluster with several thousands of processors (memory is not shared, but I expect that there should be enough memory in a single node to assess my needs for the whole data).
  • my graph has rather small ratio of edges number to vertices number (about 5)
  • I expect the most of connected components to be very small (2-3 vertices)
  • there will be, however, very big components with millions of vertices (constituting even up to 10% of total vertices number).

I have read about parallel algorithms for computing connected components of graphs. As I have noticed, some of them base on the classical BFS approach for the serialized case. To be honest, I got a bit lost in the number of these algorithms. Could anyone give me some advice, which algorithm would be the best for my purposes?

like image 538
piotrmizerka Avatar asked Oct 22 '25 01:10

piotrmizerka


1 Answers

Ligra is either the state of the art or close to it for single-machine multicore implementations. It should be able to handle your graph no problem.

Connected Components at Scale via Local Contractions, by my colleagues Jakub Łącki, Vahab Mirrokni, and Michał Włodarczyk, is the state of the art (at least, that I know about) for MapReduce algorithms. We've used it on graphs a thousand times bigger than yours.

like image 107
David Eisenstat Avatar answered Oct 24 '25 14:10

David Eisenstat



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!