For Loop with Subset in R

Question

I have the following data in a csv file:

Date        Model      Color    Value   Samples
6/19/2017   Gold       Blue     0.5     500
6/19/2017   Gold       Red      0.0     449
6/19/2017   Silver     Blue     0.75    1320
6/19/2017   Silver     Blue     1.5     103
6/19/2017   Gold       Red      0.7     891
6/19/2017   Gold       Blue     0.41    18103
6/19/2017   Copper     Blue     0.83    564
6/19/2017   Silver     Pink     1.17    173
6/19/2017   Platinum   Brown    0.43    793
6/19/2017   Platinum   Red      0.71    1763
6/19/2017   Gold       Orange   1.92    503

I use the fread function to create a data.table:

library(dplyr)
library(data.table)

df <- fread("test_data.csv", 
                 header = TRUE,
                 fill = TRUE,
                 sep = ",")

I then subset the data by Model, as follows:

df_subset <- subset(df, df$Model=='Gold' & df$Value > 0)

Then, I create some percentiles based on the Color variable, as follows:

df_subset[, .(Samples = sum(Samples),
    '50th'    = quantile(AvgValue, probs = c(0.50)),
    '99th'    = quantile(AvgValue, probs = c(0.99)),
    '99.9th'  = quantile(AvgValue, probs = c(0.999)), 
    '99.99th' = quantile(AvgValue, probs = c(0.9999))),
by = Color]

Which gives the following output:

    Color Samples  50th   99th  99.9th  99.99th
1:   Blue   18603 0.455 0.4991 0.49991 0.499991
2:    Red    1340 0.975 1.2445 1.24945 1.249945
3: Orange     503 1.920 1.9200 1.92000 1.920000

I'm trying to iterate through the list of Model values and output the associated percentile values for each Model value.

I've tried the following (which fails):

models <- unique(df$Model)

for (model in models){

  df$model[, .(Samples = sum(Samples),
                '50th'    = quantile(Value, probs = c(0.50)),
                '99th'    = quantile(Value, probs = c(0.99)),
                '99.9th'  = quantile(Value, probs = c(0.999)), 
                '99.99th' = quantile(Value, probs = c(0.9999))),
            by = Color]
}

The error message is:

Error in .(Samples = sum(Samples), `50th` = quantile(Value, probs = c(0.5)),  :  could not find function "."

dww · Accepted Answer

fread creates a data.table object rather than a data frame, so I would recommend sticking with data.table syntax and not mixing it up with dplyr. No need for a for loop either, we can use a list of two variables in the by argument to loop over both models and colors in one line of code:

qs = df[Value > 0, .(Samples = sum(Samples),
              '50th'    = quantile(Value, probs = c(0.50)),
              '99th'    = quantile(Value, probs = c(0.99)),
              '99.9th'  = quantile(Value, probs = c(0.999)), 
              '99.99th' = quantile(Value, probs = c(0.9999))),
          by = .(Model, Color)]
setkey(qs, 'Model')

#       Model  Color Samples  50th   99th  99.9th  99.99th
# 1:   Copper   Blue     564 0.830 0.8300 0.83000 0.830000
# 2:     Gold   Blue   18603 0.455 0.4991 0.49991 0.499991
# 3:     Gold    Red     891 0.700 0.7000 0.70000 0.700000
# 4:     Gold Orange     503 1.920 1.9200 1.92000 1.920000
# 5: Platinum  Brown     793 0.430 0.4300 0.43000 0.430000
# 6: Platinum    Red    1763 0.710 0.7100 0.71000 0.710000
# 7:   Silver   Blue    1423 1.125 1.4925 1.49925 1.499925
# 8:   Silver   Pink     173 1.170 1.1700 1.17000 1.170000

Patrik_P · Answer

Using your definitions you can try the following:

library(data.table)
df<-fread("~/theData.csv")
df$Value<-as.numeric(df$Value)
result<-data.frame()
for (i in seq_along(unique(df$Model))){
  temp <- subset(df, df$Model==unique(df$Model)[i] & df$Value > 0)
  temp<-temp[, .(Samples = sum(Samples),
  '50th'    = quantile(Value, probs = c(0.50)),
  '99th'    = quantile(Value, probs = c(0.99)),
  '99.9th'  = quantile(Value, probs = c(0.999)), 
  '99.99th' = quantile(Value, probs = c(0.9999))),
   by = Color]
  temp$model<-unique(df$Model)[i]
  result<-rbind(result, temp)
}
rm(temp)

For Loop with Subset in R

Tags:

for-loop

dataframe

r

data.table

dplyr

equanimity

2 Answers

dww

Patrik_P

Recent Activity

Donate For Us

For Loop with Subset in R

Tags:

for-loop

dataframe

r

data.table

dplyr

equanimity

2 Answers

dww

Patrik_P

Related questions

Recent Activity

Donate For Us