Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to calculate share per category within a column?

df = data.frame(week = as.factor(rep(c(1, 2), times = 5)),
                name = as.factor(rep(LETTERS[1:5], times = 2)),
                count = rpois(n = 10, lambda = 20))

    > df
     week   name count
1       1      A    16
2       2      B    14
3       1      C    23
4       2      D    15
5       1      E    12
6       2      A    15
7       1      B    23
8       2      C    22
9       1      D    22
10      2      E    26

I'd like to calculate each name's count share per week. At first I was going to use the following method:

transform(df, week1_share = ifelse(week == "1", round((df$count / sum(df$count)  * 100),2), NA))
transform(df, week2_share = ifelse(week == "2", round((df$count / sum(df$count)  * 100),2), NA))

but then making each column to merge, to eventually put it as label on the bar plot, seemed too inefficient. There must be some type of quick solution for this that I dont know of yet.

Basically what I would like to do is as follows but add the share% that may have been calculated as above to match within each box.

ggplot(df, aes(reorder(week, -count),count, color = "white", group = name, fill = name))+
        geom_bar(position = "stack", stat = "identity") +
        scale_y_continuous(labels=comma)+
        ggthemes::scale_color_tableau()

enter image description here

I don't know why the reorder function often fails upon me. If you have any tips to sort the order in desc, please share.

like image 315
tmhs Avatar asked Oct 15 '25 02:10

tmhs


1 Answers

The data provided by you has been used:

# Loading the required data
df = data.frame(week = as.factor(rep(c(1, 2), times = 5)),
                name = as.factor(rep(LETTERS[1:5], times = 2)),
                count = rpois(n = 10, lambda = 20))

Using plyr package function, percentage and the relative positions for labelling have been calculated.

#Loading the required packages    
library(plyr)
library(ggplot2)

# Calculating the percentages
df = ddply(df, .(week), transform, percent = round(count/sum(count) * 100))

# Calculating the position for plotting
df = ddply(df, .(week), transform, pos = cumsum(percent) - (0.5 * percent))

Using the information calculated above, plotting has been done.

# Basic graph
p10 <- ggplot() + geom_bar(aes(y = percent, x = week, fill = name), 
                       data = df, stat="identity")

# Adding data labels
p10 <- p10 + geom_text(data=df, aes(x = week, y = pos, 
                                label = paste0(percent,"%")), size=4)
p10

Is this what you have been looking for ?

enter image description here

like image 184
Prradep Avatar answered Oct 17 '25 16:10

Prradep