How to divide a table into different levels based on values in R? [duplicate]

Question

Suppose that i have following data

ID  Category    Price   Month
1   X            2       1
1   X            2       2
1   X            2       3
1   X            2       4
2   X            3       1
2   X            3       2
2   X            3       3
2   X            3       4
3   X            1       1
3   X            1       2
3   X            1       3
3   X            1       4
4   X            10      1
4   X            10      2
4   X            10      3
4   X            10      4
5   Y            5       1
5   Y            5       2
5   Y            5       3
5   Y            5       4
6   Y            2       1
6   Y            2       2
6   Y            2       3
6   Y            2       4
7   Y            1       1
7   Y            1       2
7   Y            1       3
7   Y            1       4
8   Y            10      1
8   Y            10      2
8   Y            10      3
8   Y            10      4

There are different prices for products in a particular category, some are low priced and some high priced. I want to have a new variable "Price Level" which shows whether the product is low priced product, medium priced product or high priced product.

Levels are defined as follows. It takes the prices of all the products in a particular category and divide into 4 percentiles.

1st Percentile = Low
2nd and 3rd percentile = Medium
4th percentile = High

So the table will look like this

ID  Category    Price   Month   Price Level
1   X            4       1        Medium
1   X            4       2        Medium
1   X            4       3        Medium
1   X            4       4        Medium
2   X            3       1        Medium
2   X            3       2        Medium
2   X            3       3        Medium
2   X            3       4        Medium
3   X            1       1        Low
3   X            1       2        Low
3   X            1       3        Low
3   X            1       4        Low
4   X            10      1        High
4   X            10      2        High
4   X            10      3        High
4   X            10      4        High
5   Y            5       1        Medium
5   Y            5       2        Medium
5   Y            5       3        Medium
5   Y            5       4        Medium
6   Y            2       1        Low
6   Y            2       2        Low
6   Y            2       3        Low
6   Y            2       4        Low
7   Y            1       1        Low
7   Y            1       2        Low
7   Y            1       3        Low
7   Y            1       4        Low
8   Y            10      1        Low
8   Y            10      2        Low
8   Y            10      3        Low
8   Y            10      4        Low

alistaire · Accepted Answer

You can lapply across the data.frame split by Category, and call cut and quantile on each group. data.frame and do.call(rbind, reassemble the data back into a single data.frame:

do.call(rbind, lapply(split(df, df$Category), function(x){
    data.frame(x, Price_Level =  cut(x$Price, 
                                     quantile(x$Price, probs = c(0, .25, .75, 1)), 
                                     labels = c('Low', 'Medium', 'High'), 
                                     include.lowest = TRUE))
}))

#    ID Category Price Month Price_Level
# 1   1        X     2     1      Medium
# 2   1        X     2     2      Medium
# 3   1        X     2     3      Medium
# 4   1        X     2     4      Medium
# 5   2        X     3     1      Medium
# 6   2        X     3     2      Medium
# 7   2        X     3     3      Medium
# 8   2        X     3     4      Medium
# 9   3        X     1     1         Low
# 10  3        X     1     2         Low
# 11  3        X     1     3         Low
# 12  3        X     1     4         Low
# 13  4        X    10     1        High
# 14  4        X    10     2        High
# 15  4        X    10     3        High
# 16  4        X    10     4        High
# 17  5        Y     5     1      Medium
# 18  5        Y     5     2      Medium
# 19  5        Y     5     3      Medium
# 20  5        Y     5     4      Medium
# 21  6        Y     2     1      Medium
# 22  6        Y     2     2      Medium
# 23  6        Y     2     3      Medium
# 24  6        Y     2     4      Medium
# 25  7        Y     1     1         Low
# 26  7        Y     1     2         Low
# 27  7        Y     1     3         Low
# 28  7        Y     1     4         Low
# 29  8        Y    10     1        High
# 30  8        Y    10     2        High
# 31  8        Y    10     3        High
# 32  8        Y    10     4        High

If you just want to return a single column, but don't want to worry about grouping messing up your order, you can use the equivalent

factor(ave(df$Price, df$Category, FUN = function(x){
    cut(x, 
        quantile(x, probs = c(0, .25, .75, 1)), 
        include.lowest = TRUE)
}), levels = c(1, 2, 3), labels = c('Low', 'Medium', 'High'))

A slightly less ugly version with dplyr:

library(dplyr)
df %>% group_by(Category) %>% mutate(Price_Level = cut(Price, 
                                                       quantile(Price, c(0, .25, .75, 1)), 
                                                       labels = c('Low', 'Medium', 'High'), 
                                                       include.lowest = TRUE))

How to divide a table into different levels based on values in R? [duplicate]

Tags:

r

Jay khan

1 Answers

alistaire

Recent Activity

Donate For Us

How to divide a table into different levels based on values in R? [duplicate]

Tags:

r

Jay khan

1 Answers

alistaire

Related questions

Recent Activity

Donate For Us