Problem
Let's consider two data frames :
One containing only 1's and 0's and second one with data :
set.seed(20)
df<-data.frame(sample(0:1,5,T),sample(0:1,5,T),sample(0:1,5,T))
#zero_one data frame
sample.0.1..5..T. sample.0.1..5..T..1 sample.0.1..5..T..2
1 0 1 0
2 1 0 0
3 1 1 1
4 0 0 0
5 1 0 1
df1<-data.frame(append(rnorm(4),10),append(runif(4),-5),append(rexp(4),20))
#with data
append.rnorm.4...10. append.runif.4....5. append.rexp.4...20.
1 0.08609139 0.2374272 0.3341095
2 -0.63778176 0.2297862 0.7537732
3 0.22642990 0.9447793 1.3011998
4 -0.05418293 0.8448115 1.2097271
5 10.00000000 -5.0000000 20.0000000
Now what I want to do is to change values in second data frame for which first data frame takes values 0 by mean calculated for values for which first data frame takes value one.
Example
In first column I want to replace 0.08609139 and -0.05418293 (values for which first column in first data frame takes values 0) by mean(-0.63778176, 0.22642990,10.00000000) (values for which first column in first data frame takes values 1).
I want to do it using mutate_all() function from dplyr package.
My work so far
df1<-df1 %>% mutate_all(
function(x) ifelse(df[x]==0, mean(x[df==1],na.rm=T,x)))
I know that the condition df[x] is meaningless, but I have no idea what should i put there. Could you please help me with that ?
You could follow @deschen's suggestion and multiply the two data frames together.
Here is another approach to consider using mapply. For each column, identify the positions (indices) in df where value is zero.
Then, substitute the corresponding df1 column of those positions with the mean of other values in the column. y[-idx] should be all values in the df1 column that exclude those positions.
Note that my set.seed is different - when I used yours of 20 I got different values, and a column with all zeroes. Please let me know if you are able to reproduce.
set.seed(12)
df<-data.frame(sample(0:1,5,T),sample(0:1,5,T),sample(0:1,5,T))
df1<-data.frame(append(rnorm(4),10),append(runif(4),-5),append(rexp(4),20))
my_fun <- function(x, y) {
idx <- which(x == 0)
y[idx] <- mean(y[-idx])
return(y)
}
mapply(my_fun, df, df1)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With