I have following dataset:
A -1
A 10
B 8
D -1
A 0
A 4
B 2
C 6
And I want to add column to it like this:
A -1 4,6
A 10 4,6
B 8 5,0
D -1 5,0
A 0 4,6
A 4 4,6
B 2 5,0
C 6 6,0
What have happened here?
Well I have calculated average of each categorical letter variable but ignoring negative numbers and given this as new columns value.
If there is only negative values for category I have given value of total average (ignoring negative values).
In SQL this would have been done with filtered group funcion followed with join. In excel this would be conditional vlookup. How I do this in R?
Edit:
# Create dataset
category <- c("A","A","B","D","A","A","B","C")
value <- c(-1,10,8,-1,0,4,2,6)
dataset <- data.frame(category, value)
# Calculated means
fdata <- dataset[dataset[,'value']>-1,]
aggregate(fdata[,2], list(fdata$category), mean)
We can use ave from base R grouped by each category, we check if all the value in a particular group is less than 0, if it is then we select the mean of the entire dataset and if it is not then we take only the mean of the group.
dataset$mean_column <- with(dataset, ave(value, category, FUN = function(x)
ifelse(all(x < 0), mean(value[value >= 0]), mean(x[x >= 0]))))
dataset
# category value mean_column
#1 A -1 4.666667
#2 A 10 4.666667
#3 B 8 5.000000
#4 D -1 5.000000
#5 A 0 4.666667
#6 A 4 4.666667
#7 B 2 5.000000
#8 C 6 6.000000
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With