Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Side-by-side bar chart with columns proportional by group (relative frequency bar chart)

The dataset

gender <- c('Male', 'Male', 'Male', 'Female', 'Female', 'Female', 'Male', 'Male', 'Male', 'Female', 'Female', 'Female', 'Female', 'Female', 'Male', 'Female', 'Female', 'Male', 'Female', 'Female')
answer <- c('Yes', 'No', 'Yes', 'Yes', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'Yes', 'No', 'No', 'Yes', 'Yes', 'Yes', 'Yes', 'No', 'Yes')
df <- data.frame(gender, answer)

is biased towards females:

df %>% ggplot(aes(gender, fill = gender)) + geom_bar()

enter image description here

My task is to build a graph that makes it easy to figure out which of the two genders is more likely to say 'Yes'.

But, given the bias, I cannot just do

df %>% ggplot(aes(x = answer, fill = gender)) + geom_bar(position = 'dodge')

enter image description here

or even

df %>% ggplot(aes(x = answer, y = ..count../sum(..count..), fill = gender)) +
geom_bar(position = 'dodge')

enter image description here

To alleviate the bias I need to divide each of the counts by the total number of males or females respectively so that the 'Female' bars add up to 1 as well as the 'Male' ones. Like so:

df.total <- df %>% count(gender)
male.total <- (df.total %>% filter(gender == 'Male'))$n
female.total <- (df.total %>% filter(gender == 'Female'))$n

df %>% count(answer, gender) %>% 
mutate(freq = n/if_else(gender == 'Male', male.total, female.total)) %>% 
ggplot(aes(x = answer, y = freq, fill = gender)) + 
geom_bar(stat="identity", position = 'dodge')

enter image description here

Which draws a completely different picture.

Questions:

  1. Is there a way to simplify the former piece of code using only dplyr and ggplot2?
  2. Are there any other libraries that can do the trick better?
  3. Does the above type of chart have a conventional name?

Thanks.

like image 960
ayorgo Avatar asked Jan 23 '26 10:01

ayorgo


1 Answers

Question 1:

df %>%  
  count(gender, answer) %>% 
  group_by(gender) %>% 
  mutate(freq = n/sum(n)) %>% 
  ggplot(aes(x = answer, y = freq, fill = gender)) + 
  geom_bar(stat="identity", position = 'dodge')

Question 2:

You can probably do it in fewer lines with other packages.

Question 3:

Relative frequency bar graph.

like image 132
Jake Kaupp Avatar answered Jan 25 '26 23:01

Jake Kaupp



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!