Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Connected components in a graph with 100 million nodes

Tags:

python

graph

I am trying to get the list of connected components in a graph with 100 million nodes. For smaller graphs, I usually use the connected_components function of the Networkx module in Python which does exactly that. However, loading a graph with 100 million nodes (and their edges) into memory with this module would require ca. 110GB of memory, which I don't have. An alternative would be to use a graph database which has a connected components function but I haven't found any in Python. It would seem that Dex (API: Java, .NET, C++) has this functionality but I'm not 100% sure. Ideally I'm looking for a solution in Python. Many thanks.

like image 971
David M. Avatar asked Oct 29 '25 08:10

David M.


1 Answers

SciPy has a connected components algorithm. It expects as input the adjacency matrix of your graph in one of its sparse matrix formats and handles both the directed and undirected cases.

Building a sparse adjacency matrix from a sequence of (i, j) pairs adj_list where i and j are (zero-based) indices of nodes can be done with

i_indices, j_indices = zip(*adj_list)
adj_matrix = scipy.sparse.coo_matrix((np.ones(number_of_nodes),
                                     (i_indices, j_indices)))

You'll have to do some extra work for the undirected case.

This approach should be efficient if your graph is sparse enough.

like image 98
Fred Foo Avatar answered Oct 31 '25 13:10

Fred Foo



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!