Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Out of memory error when projecting a bipartite network in igraph

Tags:

r

igraph

I have a directed, bipartite graph g with 215473 vertices and 2326714 edges. When creating a bipartite.projection of g, I keep running out of memory (it uses ~35 gig of RAM before crashing).

I tried to calculate how much memory I need, by following a previous thread on nongnu.org.

From the information provided in this thread, to store a graph in memory costs (in bytes):

(4*|E|+2*|V|) * 8 + 4*|V|

To calculate the projection requires the following memory (in bytes):

16*|V| + (2*|V|+2*|E|) * 8

Thus, for my graph g, it would cost:

((4*2326714+2*215473) * 8 + 4*215473) + (16*215473 + (2*215473+2*2326714) * 8)

= 78764308 + 44122560
= 122886868 (bytes)
= 122.886868 (mb)

Clearly, this isn't correct, and I must be doing something wrong.

Can anyone please help figure out how to create a bipartite projection of my graph?

like image 773
timothyjgraham Avatar asked Oct 18 '25 20:10

timothyjgraham


1 Answers

Working with sparse matrices could possibly solve your problem.

# Load tiny toy data as edgelist
df <- data.frame( person =
    c('Sam','Sam','Sam','Greg','Tom','Tom','Tom','Mary','Mary'), group =
    c('a','b','c','a','b','c','d','b','d'), stringsAsFactors = F)

# Transform data to a sparse matrix
library(Matrix)
A <- Matrix::sparseMatrix(nrow=length(unique(df$person)),
        ncol=length(unique(df$group)),
        i = as.numeric(factor(df$person)),
        j = as.numeric(factor(df$group)),
        x = rep(1, length(as.numeric(df$person))) )
row.names(A) <- levels(factor(df$person))
colnames(A) <- levels(factor(df$group))

To do the projection you have acutally multiple possiblities, here are two:

# Use base r
Arow <- tcrossprod(A)
# Alternatively, if you want to project on the other mode:
Acol <- tcrossprod(t(A))

# Use the igraph package, which works with sparse matrices
library(igraph)
g <- graph.incidence(A)

# The command bipartite.projection does both possible projections at once
proj <- bipartite.projection(g)

#proj[[1]]
#proj[[2]]

You can also read-in the data and do the transformation within the spMatrix command by using data.table, which will speed up those operations as well.

UPDATE:

Here is an example with a larger graph and some memory benchmarks:

# Load packages
library(data.table)
library(igraph)

# Scientific collaboration dataset
# Descriptives as reported on https://toreopsahl.com/datasets/#newman2001 
# mode 1 elements: 16726    
# mode 2 elements: 22016    
# two mode ties: 58595
# one mode ties: 47594  
d <- fread("http://opsahl.co.uk/tnet/datasets/Newman-Cond_mat_95-99-two_mode.txt", 
           stringsAsFactors=TRUE, colClasses = "factor", header=FALSE)

# Transform data to a sparse matrix
A <- Matrix::sparseMatrix(nrow=length(unique(d[, V1])),
              ncol=length(unique(d[, V2])),
              i = as.numeric(d[, V1]),
              j = as.numeric(d[, V2]),
              x = rep(1, length(as.numeric(d[, V1]))) )
row.names(A) <- levels(d[, V1])
colnames(A) <- levels(d[, V2])

#To do the projection you have acutally multiple possiblities, here are two:
  
# Use base r
Arow <- tcrossprod(A)
# Alternatively, if you want to project on the other mode:
Acol <- tcrossprod(t(A))

Here is a overview how much memory got used, i.e. the sparse matrix approach worked on my laptop for this network, but the approach using regular matrices did give a memory allocation error (even after removing the Bcol object from memory with rm(Brow) and then calling the garbage collector gc())

object.size(A) # Spare matrix: 3108520 bytes
object.size(Arow) # 2713768 bytes
object.size(Acol) # 5542104 bytes

# For comparison
object.size(B <- as.matrix(A)) # Regular matrix: 2945783320 bytes
object.size(Brow <- tcrossprod(B)) # 2239946368 bytes
object.size(Bcol <- tcrossprod(t(B))) # Memory allocation error on my laptop
like image 104
majom Avatar answered Oct 20 '25 11:10

majom