Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to keep NA values with dcast() function?

Tags:

r

df <- data.frame(x = c(1,1,1,2,2,3,3,3,4,5,5),
                 y = c("A","B","C","A","B","A","B","D","B","C","D"),
                 z = c(3,2,1,4,2,3,2,1,2,3,4))

df_new <- dcast(df, x ~ y, value.var = "z")

If sample data as given above then dcast() function keeps NA values. But it doesn't work with my dataset. So, the function converts na to zero. Why?

How to keep na values?

ml-latest-small.zip

r <- read.csv("ratings.csv")
m <- read.csv("movies.csv")
rm <- merge(ratings, movies, by="movieId")
umr <- dcast(rm, userId ~ title, value.var = "rating", fun.aggregate= sum)

Thanks in advance.

like image 640
youraz Avatar asked Sep 05 '25 20:09

youraz


1 Answers

In the first example, fun.aggregate is not called, but in second case the change is that fun.aggregate being called. According to ?dcast

library(reshape2)

fill - value with which to fill in structural missings, defaults to value from applying fun.aggregate to 0 length vector

dcast(df, x ~ y, value.var = "z", fun.aggregate = NULL)
# x  A  B  C  D
#1 1  3  2  1 NA
#2 2  4  2 NA NA
#3 3  3  2 NA  1
#4 4 NA  2 NA NA
#5 5 NA NA  3  4

dcast(df, x ~ y, value.var = "z", fun.aggregate = sum)
#  x A B C D
#1 1 3 2 1 0
#2 2 4 2 0 0
#3 3 3 2 0 1
#4 4 0 2 0 0
#5 5 0 0 3 4

Note that here is there is only one element per combination, so the sum will return the same value except that if there is a particular combination not preseent, it return 0. It is based on the behavior of sum

length(integer(0))
#[1] 0
sum(integer(0))
#[1] 0

sum(NULL)
#[1] 0

Or when all the elements are NA and if we use na.rm, there won't be any element to sum, then also it goees into integer(0) mode

sum(c(NA, NA), na.rm = TRUE)
#[1] 0

If we use sum_ from hablar, this behavior is changed to return NA

library(hablar)
sum_(c(NA, NA))
#[1] NA

An option is to create a condition in the fun.aggregate to return NA

dcast(df, x ~ y, value.var = "z", 
   fun.aggregate = function(x) if(length(x) == 0) NA_real_ else sum(x, na.rm = TRUE))
#  x  A  B  C  D
#1 1  3  2  1 NA
#2 2  4  2 NA NA
#3 3  3  2 NA  1
#4 4 NA  2 NA NA
#5 5 NA NA  3  4

For more info about how the sum (primitive function) is created, check the source code here

like image 128
akrun Avatar answered Sep 08 '25 12:09

akrun