Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to create a dot plot with a lot of values in ggplot2

I created a bar chart to show the population distribution of Vietnam. This is my vietnam2015 data:

 Year Age.group Est.pop
1  2015       0-4    7753
2  2015       5-9    7233
3  2015     10-14    6623
4  2015     15-19    6982
5  2015     20-24    8817
6  2015     25-29    8674
7  2015     30-34    7947
8  2015     35-39    7166
9  2015     40-44    6653
10 2015     45-49    6011
11 2015     50-54    5469
12 2015     55-59    4623
13 2015     60-64    3310
14 2015     65-69    1896
15 2015     70-74    1375
16 2015     75-79    1162
17 2015       80+    1878 

This is my bar chart and I was wondering if I could also make a dot plot instead of a bar chart.

Library(tidyverse)

vietnam2015 %>%
  filter(Age.group != "5-9") %>% # Somehow this weird value creeped into the data frame, is therefor filtered out.
  ggplot(aes(x = Age.group, y = Est.pop)) +
  geom_col(colour = "black",
           fill = "#FFEB3B")

enter image description here

Now I know a dot plot is usually for data with not that many data points. But can I create a dot plot where one dot represents 1000 people or a million? I like to communicate better that the bars consist of people. Like flowingdata's example and middle image:

Histogram explained

like image 548
Tdebeus Avatar asked Oct 15 '25 18:10

Tdebeus


1 Answers

We can use geom_dotplot. As you mentioned, dot plot is usually for small count number, but we can aggregate the data. In the following code, I used mutate(Est.pop = round(Est.pop, digits = -3)/1000) to round the Est.pop to thousand and then divided by 1000. After that, I repeat each Age.group for how many times I just calculated in the Est.pop column. Finally, I used the geom_dotplot to plot the data. Each dot represents 1000 people. y-axis is hidden because I think this visualization mainly focuses on the dot number.

# Load package
library(tidyverse)

# Process the data
dt2 <- dt %>%
  mutate(Est.pop = round(Est.pop, digits = -3)/1000) %>%
  split(f = .$Age.group) %>%
  map_df(function(x) x[rep(row.names(x), x$Est.pop[1]), ])

# Plot the data
ggplot(dt2, aes(x = Age.group)) +
  geom_dotplot() +
  scale_y_continuous(NULL, breaks = NULL)

enter image description here

Data

dt <- read.table(text = " Year Age.group Est.pop
1  2015       0-4    7753
                 2  2015       5-9    7233
                 3  2015     10-14    6623
                 4  2015     15-19    6982
                 5  2015     20-24    8817
                 6  2015     25-29    8674
                 7  2015     30-34    7947
                 8  2015     35-39    7166
                 9  2015     40-44    6653
                 10 2015     45-49    6011
                 11 2015     50-54    5469
                 12 2015     55-59    4623
                 13 2015     60-64    3310
                 14 2015     65-69    1896
                 15 2015     70-74    1375
                 16 2015     75-79    1162
                 17 2015       80+    1878 ",
                 header = TRUE, stringsAsFactors = FALSE)
like image 139
www Avatar answered Oct 17 '25 09:10

www