Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

sd function returns NA when using group_by() and summarise() in dplyr (no NA values in df)

I've got a df with a binary numeric response variable (0 or 1) and several response variables. I am trying to create a table that groups by type (a 3 level variable) and step (7 levels). I want the mean response and standard deviation for each type at each step. The output table should have 21 rows with 4 variables: type, step, mean and sd.

My code looks like this:

data <- data %>% group_by(step, type) %>% summarise(Response = mean(Response), dev = sd(Response))  

The output table correctly generates the mean values, but returns NA for all sd values. I tried using 'na.rm=TRUE' to remove NA values but there aren't any in the original df for response. Any ideas?

like image 798
MatthewQMLing Avatar asked Jan 25 '26 13:01

MatthewQMLing


1 Answers

The following should work as you expect:

data <- data %>% group_by(step, type) %>% summarise(Response_mean = mean(Response), dev = sd(Response))  

The reason, as mentioned, that you are getting NA, is because you are inputting a single value to sd().

However, the reason that happens is related to the order in which things happen in your code. The following part in your code:

summarise(Response = mean(Response)

is creating a variable named 'Response' in your new table, holding a single value - the mean of the vector 'Response' in your original data. The following part:

dev = sd(Response)

tries to calculate the standard deviation of that single value.

To illustrate, you can try this as well:

data <- data %>% group_by(step, type) %>% summarise(Response = mean(Response), Response_plus_10 = Response + 10)  

Hope this clarifies the issue.

like image 83
Amit Gal Avatar answered Jan 27 '26 03:01

Amit Gal



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!