I have the following data in a csv file:
Date Model Color Value Samples
6/19/2017 Gold Blue 0.5 500
6/19/2017 Gold Red 0.0 449
6/19/2017 Silver Blue 0.75 1320
6/19/2017 Silver Blue 1.5 103
6/19/2017 Gold Red 0.7 891
6/19/2017 Gold Blue 0.41 18103
6/19/2017 Copper Blue 0.83 564
6/19/2017 Silver Pink 1.17 173
6/19/2017 Platinum Brown 0.43 793
6/19/2017 Platinum Red 0.71 1763
6/19/2017 Gold Orange 1.92 503
I use the fread function to create a data.table:
library(dplyr)
library(data.table)
df <- fread("test_data.csv",
header = TRUE,
fill = TRUE,
sep = ",")
I then subset the data by Model, as follows:
df_subset <- subset(df, df$Model=='Gold' & df$Value > 0)
Then, I create some percentiles based on the Color variable, as follows:
df_subset[, .(Samples = sum(Samples),
'50th' = quantile(AvgValue, probs = c(0.50)),
'99th' = quantile(AvgValue, probs = c(0.99)),
'99.9th' = quantile(AvgValue, probs = c(0.999)),
'99.99th' = quantile(AvgValue, probs = c(0.9999))),
by = Color]
Which gives the following output:
Color Samples 50th 99th 99.9th 99.99th
1: Blue 18603 0.455 0.4991 0.49991 0.499991
2: Red 1340 0.975 1.2445 1.24945 1.249945
3: Orange 503 1.920 1.9200 1.92000 1.920000
I'm trying to iterate through the list of Model values and output the associated percentile values for each Model value.
I've tried the following (which fails):
models <- unique(df$Model)
for (model in models){
df$model[, .(Samples = sum(Samples),
'50th' = quantile(Value, probs = c(0.50)),
'99th' = quantile(Value, probs = c(0.99)),
'99.9th' = quantile(Value, probs = c(0.999)),
'99.99th' = quantile(Value, probs = c(0.9999))),
by = Color]
}
The error message is:
Error in .(Samples = sum(Samples), `50th` = quantile(Value, probs = c(0.5)), : could not find function "."
fread creates a data.table object rather than a data frame, so I would recommend sticking with data.table syntax and not mixing it up with dplyr. No need for a for loop either, we can use a list of two variables in the by argument to loop over both models and colors in one line of code:
qs = df[Value > 0, .(Samples = sum(Samples),
'50th' = quantile(Value, probs = c(0.50)),
'99th' = quantile(Value, probs = c(0.99)),
'99.9th' = quantile(Value, probs = c(0.999)),
'99.99th' = quantile(Value, probs = c(0.9999))),
by = .(Model, Color)]
setkey(qs, 'Model')
# Model Color Samples 50th 99th 99.9th 99.99th
# 1: Copper Blue 564 0.830 0.8300 0.83000 0.830000
# 2: Gold Blue 18603 0.455 0.4991 0.49991 0.499991
# 3: Gold Red 891 0.700 0.7000 0.70000 0.700000
# 4: Gold Orange 503 1.920 1.9200 1.92000 1.920000
# 5: Platinum Brown 793 0.430 0.4300 0.43000 0.430000
# 6: Platinum Red 1763 0.710 0.7100 0.71000 0.710000
# 7: Silver Blue 1423 1.125 1.4925 1.49925 1.499925
# 8: Silver Pink 173 1.170 1.1700 1.17000 1.170000
Using your definitions you can try the following:
library(data.table)
df<-fread("~/theData.csv")
df$Value<-as.numeric(df$Value)
result<-data.frame()
for (i in seq_along(unique(df$Model))){
temp <- subset(df, df$Model==unique(df$Model)[i] & df$Value > 0)
temp<-temp[, .(Samples = sum(Samples),
'50th' = quantile(Value, probs = c(0.50)),
'99th' = quantile(Value, probs = c(0.99)),
'99.9th' = quantile(Value, probs = c(0.999)),
'99.99th' = quantile(Value, probs = c(0.9999))),
by = Color]
temp$model<-unique(df$Model)[i]
result<-rbind(result, temp)
}
rm(temp)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With