R: How to calculate mean for each row with missing values using dplyr

Question

I want to calculate means over several columns for each row in my dataframe containing missing values, and place results in a new column called 'means.' Here's my dataframe:

df <- data.frame(A=c(3,4,5),B=c(0,6,8),C=c(9,NA,1))
  A B  C
1 3 0  9
2 4 6 NA
3 5 8  1

The code below successfully accomplishes the task if columns have no missing values, such as columns A and B.

 library(dplyr)
 df %>%
 rowwise() %>%
 mutate(means=mean(A:B, na.rm=T))

     A     B     C   means
  <dbl> <dbl> <dbl> <dbl>
1     3     0     9   1.5
2     4     6    NA   5.0
3     5     8     1   6.5

However, if a column has missing values, such as C, then I get an error:

> df %>% rowwise() %>% mutate(means=mean(A:C, na.rm=T))
Error: NA/NaN argument

Ideally, I'd like to implement it with dplyr.

eipi10 · Accepted Answer

df %>% 
  mutate(means=rowMeans(., na.rm=TRUE))

The . is a "pronoun" that references the data frame df that was piped into mutate.

  A B  C    means
1 3 0  9 4.000000
2 4 6 NA 5.000000
3 5 8  1 4.666667

You can also select only specific columns to include, using all the usual methods (column names, indices, grep, etc.).

df %>% 
  mutate(means=rowMeans(.[ , c("A","C")], na.rm=TRUE))

  A B  C means
1 3 0  9     6
2 4 6 NA     4
3 5 8  1     3

lmo · Answer

It is simple to accomplish in base R as well:

cbind(df, "means"=rowMeans(df, na.rm=TRUE))
  A B  C    means
1 3 0  9 4.000000
2 4 6 NA 5.000000
3 5 8  1 4.666667

The rowMeans performs the calculation.and allows for the na.rm argument to skip missing values, while cbind allows you to bind the mean and whatever name you want to the the data.frame, df.

R: How to calculate mean for each row with missing values using dplyr

Tags:

r

dplyr

mean

Irakli

2 Answers

eipi10

lmo

Recent Activity

Donate For Us

R: How to calculate mean for each row with missing values using dplyr

Tags:

r

dplyr

mean

Irakli

2 Answers

eipi10

lmo

Related questions

Recent Activity

Donate For Us