Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Calculating categorical average with filtered data

Tags:

r

I have following dataset:

A -1
A 10
B  8
D -1
A  0
A  4
B  2
C  6

And I want to add column to it like this:

A -1 4,6
A 10 4,6
B  8 5,0
D -1 5,0
A  0 4,6
A  4 4,6
B  2 5,0
C  6 6,0

What have happened here?

Well I have calculated average of each categorical letter variable but ignoring negative numbers and given this as new columns value.

If there is only negative values for category I have given value of total average (ignoring negative values).

In SQL this would have been done with filtered group funcion followed with join. In excel this would be conditional vlookup. How I do this in R?

Edit:

# Create dataset
category <- c("A","A","B","D","A","A","B","C")
value <- c(-1,10,8,-1,0,4,2,6)
dataset <- data.frame(category, value)

# Calculated means

fdata <- dataset[dataset[,'value']>-1,]
aggregate(fdata[,2], list(fdata$category), mean)
like image 326
Veli Hemming Avatar asked Dec 10 '25 12:12

Veli Hemming


1 Answers

We can use ave from base R grouped by each category, we check if all the value in a particular group is less than 0, if it is then we select the mean of the entire dataset and if it is not then we take only the mean of the group.

dataset$mean_column <- with(dataset, ave(value, category, FUN = function(x) 
         ifelse(all(x < 0), mean(value[value >= 0]), mean(x[x >= 0]))))

dataset

#  category value  mean_column
#1        A    -1    4.666667
#2        A    10    4.666667
#3        B     8    5.000000
#4        D    -1    5.000000
#5        A     0    4.666667
#6        A     4    4.666667
#7        B     2    5.000000
#8        C     6    6.000000
like image 139
Ronak Shah Avatar answered Dec 12 '25 02:12

Ronak Shah



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!