Counting the number of non-zero's in data frame or matrix

Question

I have a large data frame 190,000 rows X 13 columns of integers, and I want to get a count of all of the non-zero integers in the whole thing.

I know I can write a nested for loop to loop over each row of each column, but is there a single function, or a one-line code, that can perform the same task?

flodel · Accepted Answer

The consensus is that sum(df != 0) is much shorter and efficient than the currently accepted solution. I will add that if you have integers like you say, then you should compare with 0L (integer) and not 0 (numeric) to avoid unnecessary conversions. Also, converting your data.frame to a matrix will be even faster. Here are some benchmarks:

df <- as.data.frame(as.matrix(sample(as.integer(0:9), 190000*13, TRUE), 190000))

library(microbenchmark)
microbenchmark(
  sum(df != 0),
  sum(df != 0L),
  sum(as.matrix(df) != 0L)
)
# Unit: milliseconds
#                      expr      min       lq   median       uq       max neval
#              sum(df != 0) 57.44615 61.40066 62.83314 76.93262 116.42085   100
#             sum(df != 0L) 46.01104 48.76516 53.00026 55.91232  74.20851   100
#  sum(as.matrix(df) != 0L) 20.25708 25.22730 27.43667 30.36676  48.23750   100

For comparison, @Codoremifa's solution takes around 2.5 seconds, i.e. is close to 100x slower on this particular data.

Counting the number of non-zero's in data frame or matrix

Tags:

dataframe

r

count

matrix

Tom Anonymous

1 Answers

flodel

Recent Activity

Donate For Us

Counting the number of non-zero's in data frame or matrix

Tags:

dataframe

r

count

matrix

Tom Anonymous

1 Answers

flodel

Related questions

Recent Activity

Donate For Us