Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

subset() a factor by its number of observation

I have a problem with subset()function. How can I subset a factor of my dataframe by its number of observation?

   NAME      CLASS         COLOR   VALUE      
   antonio       B          YELLOW       5
   antonio       B          BLUE       8
   antonio       B          BLUE       7 
   antonio       B          BLUE      12 
   luca          C          YELLOW    99
   luca          B          YELLOW    87
   luca          B          YELLOW    98
   giovanni      A          BLUE      48

I would like to obtain data where the three factors "NAME","CLASS" and "COLOR" compare at least three times in order to make a mean of VALUE. in this case I'll obtain:

   NAME      CLASS         COLOR   VALUE      
   antonio       B          BLUE       mean

because antonio is the only with three observations for each factor

thank you so much

Nik

like image 479
Spigonico Avatar asked Dec 20 '25 05:12

Spigonico


1 Answers

You can use the table function as follows:

subset(df, table(FACTOR)[FACTOR] >= 3)
#    FACTOR VALUE
# 1 ANTONIO     5
# 2 ANTONIO     8
# 3 ANTONIO     7

To help you understand, see what these return:

table(df$FACTOR)
table(df$FACTOR)[df$FACTOR]
table(df$FACTOR)[df$FACTOR] >= 3

You could also use the ave function to compute the number of observations:

subset(df, ave(VALUE, FACTOR, FUN = length) >= 3)

This last method may be a little more flexible if you have multiple factors like you asked in your comment and updated question. You can do:

subset(df, ave(VALUE, NAME, CLASS, COLOR, FUN = length) >= 3)
like image 68
flodel Avatar answered Dec 22 '25 18:12

flodel



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!