Suppose that i have following data
ID Category Price Month
1 X 2 1
1 X 2 2
1 X 2 3
1 X 2 4
2 X 3 1
2 X 3 2
2 X 3 3
2 X 3 4
3 X 1 1
3 X 1 2
3 X 1 3
3 X 1 4
4 X 10 1
4 X 10 2
4 X 10 3
4 X 10 4
5 Y 5 1
5 Y 5 2
5 Y 5 3
5 Y 5 4
6 Y 2 1
6 Y 2 2
6 Y 2 3
6 Y 2 4
7 Y 1 1
7 Y 1 2
7 Y 1 3
7 Y 1 4
8 Y 10 1
8 Y 10 2
8 Y 10 3
8 Y 10 4
There are different prices for products in a particular category, some are low priced and some high priced. I want to have a new variable "Price Level" which shows whether the product is low priced product, medium priced product or high priced product.
Levels are defined as follows. It takes the prices of all the products in a particular category and divide into 4 percentiles.
So the table will look like this
ID Category Price Month Price Level
1 X 4 1 Medium
1 X 4 2 Medium
1 X 4 3 Medium
1 X 4 4 Medium
2 X 3 1 Medium
2 X 3 2 Medium
2 X 3 3 Medium
2 X 3 4 Medium
3 X 1 1 Low
3 X 1 2 Low
3 X 1 3 Low
3 X 1 4 Low
4 X 10 1 High
4 X 10 2 High
4 X 10 3 High
4 X 10 4 High
5 Y 5 1 Medium
5 Y 5 2 Medium
5 Y 5 3 Medium
5 Y 5 4 Medium
6 Y 2 1 Low
6 Y 2 2 Low
6 Y 2 3 Low
6 Y 2 4 Low
7 Y 1 1 Low
7 Y 1 2 Low
7 Y 1 3 Low
7 Y 1 4 Low
8 Y 10 1 Low
8 Y 10 2 Low
8 Y 10 3 Low
8 Y 10 4 Low
You can lapply across the data.frame split by Category, and call cut and quantile on each group. data.frame and do.call(rbind, reassemble the data back into a single data.frame:
do.call(rbind, lapply(split(df, df$Category), function(x){
data.frame(x, Price_Level = cut(x$Price,
quantile(x$Price, probs = c(0, .25, .75, 1)),
labels = c('Low', 'Medium', 'High'),
include.lowest = TRUE))
}))
# ID Category Price Month Price_Level
# 1 1 X 2 1 Medium
# 2 1 X 2 2 Medium
# 3 1 X 2 3 Medium
# 4 1 X 2 4 Medium
# 5 2 X 3 1 Medium
# 6 2 X 3 2 Medium
# 7 2 X 3 3 Medium
# 8 2 X 3 4 Medium
# 9 3 X 1 1 Low
# 10 3 X 1 2 Low
# 11 3 X 1 3 Low
# 12 3 X 1 4 Low
# 13 4 X 10 1 High
# 14 4 X 10 2 High
# 15 4 X 10 3 High
# 16 4 X 10 4 High
# 17 5 Y 5 1 Medium
# 18 5 Y 5 2 Medium
# 19 5 Y 5 3 Medium
# 20 5 Y 5 4 Medium
# 21 6 Y 2 1 Medium
# 22 6 Y 2 2 Medium
# 23 6 Y 2 3 Medium
# 24 6 Y 2 4 Medium
# 25 7 Y 1 1 Low
# 26 7 Y 1 2 Low
# 27 7 Y 1 3 Low
# 28 7 Y 1 4 Low
# 29 8 Y 10 1 High
# 30 8 Y 10 2 High
# 31 8 Y 10 3 High
# 32 8 Y 10 4 High
If you just want to return a single column, but don't want to worry about grouping messing up your order, you can use the equivalent
factor(ave(df$Price, df$Category, FUN = function(x){
cut(x,
quantile(x, probs = c(0, .25, .75, 1)),
include.lowest = TRUE)
}), levels = c(1, 2, 3), labels = c('Low', 'Medium', 'High'))
A slightly less ugly version with dplyr:
library(dplyr)
df %>% group_by(Category) %>% mutate(Price_Level = cut(Price,
quantile(Price, c(0, .25, .75, 1)),
labels = c('Low', 'Medium', 'High'),
include.lowest = TRUE))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With