The dataset
gender <- c('Male', 'Male', 'Male', 'Female', 'Female', 'Female', 'Male', 'Male', 'Male', 'Female', 'Female', 'Female', 'Female', 'Female', 'Male', 'Female', 'Female', 'Male', 'Female', 'Female')
answer <- c('Yes', 'No', 'Yes', 'Yes', 'No', 'No', 'No', 'No', 'No', 'No', 'No', 'Yes', 'No', 'No', 'Yes', 'Yes', 'Yes', 'Yes', 'No', 'Yes')
df <- data.frame(gender, answer)
is biased towards females:
df %>% ggplot(aes(gender, fill = gender)) + geom_bar()

My task is to build a graph that makes it easy to figure out which of the two genders is more likely to say 'Yes'.
But, given the bias, I cannot just do
df %>% ggplot(aes(x = answer, fill = gender)) + geom_bar(position = 'dodge')

or even
df %>% ggplot(aes(x = answer, y = ..count../sum(..count..), fill = gender)) +
geom_bar(position = 'dodge')

To alleviate the bias I need to divide each of the counts by the total number of males or females respectively so that the 'Female' bars add up to 1 as well as the 'Male' ones. Like so:
df.total <- df %>% count(gender)
male.total <- (df.total %>% filter(gender == 'Male'))$n
female.total <- (df.total %>% filter(gender == 'Female'))$n
df %>% count(answer, gender) %>%
mutate(freq = n/if_else(gender == 'Male', male.total, female.total)) %>%
ggplot(aes(x = answer, y = freq, fill = gender)) +
geom_bar(stat="identity", position = 'dodge')

Which draws a completely different picture.
Questions:
dplyr and ggplot2?Thanks.
Question 1:
df %>%
count(gender, answer) %>%
group_by(gender) %>%
mutate(freq = n/sum(n)) %>%
ggplot(aes(x = answer, y = freq, fill = gender)) +
geom_bar(stat="identity", position = 'dodge')
Question 2:
You can probably do it in fewer lines with other packages.
Question 3:
Relative frequency bar graph.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With