Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using density in stat_bin with factor variables

Tags:

r

ggplot2

It seems density plot in stat_bin doesn't work as expected for factor variables. The density is 1 for each category on y-axis.

For example, using diamonds data:

diamonds_small <- diamonds[sample(nrow(diamonds), 1000), ]
ggplot(diamonds_small, aes(x = cut)) +  stat_bin(aes(y=..density.., fill=cut))

enter image description here

I understand I could use

stat_bin(aes(y=..count../sum(..count..), fill=cut))

to make it work. However, according to the docs of stat_bin, it should works with categorical variables.

like image 898
sun Avatar asked Jan 30 '26 17:01

sun


1 Answers

You can get what you (might) want by setting the group aesthetic manually.

ggplot(diamonds_small, aes(x = cut)) +  stat_bin(aes(y=..density..,group=1))

However, you can't easily fill differently within a group. You can summarize the data yourself:

library(plyr)
ddply(diamonds_small,.(cut),
         function(x) data.frame(dens=nrow(x)/nrow(diamonds_small)))
ggplot(dd_dens,aes(x=cut,y=dens))+geom_bar(aes(fill=cut),stat="identity")

A slightly more compact version of the summarization step:

as.data.frame.table(prop.table(table(diamonds_small$cut)))
like image 153
Ben Bolker Avatar answered Feb 02 '26 09:02

Ben Bolker