Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ggplot and dplyr and column name as string

I would like to process data frame through dplyr and ggplot using column names in form of string. Here is my code

library(ggplot2)
library(dplyr)
my_df <- data.frame(var_1 = sample(c('a', 'b', 'c'), 1000, replace = TRUE),
                    var_2 = sample(c('d', 'e', 'f'), 1000, replace = TRUE))

name_list = c('var_1', 'var_2')

for(el in name_list){
  pdf(paste(el, '.pdf', sep =''))
    test <- my_df %>% group_by(el) %>% summarize(count = n())
    ggplot(data = test, aes(x = el, y = count)) + geom_bar(stat='identity')
  dev.off()
}

The above code obviously does not work. So I tried different things like UQ and as.name. UQ creates column with extra quotes and ggplot does not understand it with aes_string. Any suggestions?

I can use for (el in names(my_df)) with filtering, but would prefer to work with strings.

UPDATE Here are detailed messages/errors that I got:

for(el in name_list){
  pdf(paste(el, '.pdf', sep =''))
    test <- my_df %>% group_by(!!el) %>% summarize(count = n())
    ggplot(data = test, aes_string(x = el, y = 'count')) + geom_bar(stat='identity')
  dev.off()
}

The above code generate empty files.

for(el in name_list){
  pdf(paste(el, '.pdf', sep =''))
    test <- my_df %>% group_by(UQ(el)) %>% summarize(count = n())
    ggplot(data = test, aes_string(x = el, y = 'count')) + geom_bar(stat='identity')
  dev.off()
}

The above code also generates empty files

for(el in name_list){
  pdf(paste(el, '.pdf', sep =''))
    test <- my_df %>% group_by(as.name(el)) %>% summarize(count = n())
    ggplot(data = test, aes_string(x = el, y = 'count')) + geom_bar(stat='identity')
  dev.off()
}

produces

Error in mutate_impl(.data, dots) : 
  Column `as.name(el)` is of unsupported type symbol
like image 228
user1700890 Avatar asked Sep 06 '25 20:09

user1700890


2 Answers

You need to UQ (or !!) the name/symbol. For example

for(el in name_list){
  pdf(paste(el, '.pdf', sep =''))
  test <- my_df %>% group_by(UQ(as.name(el))) %>% summarize(count = n())
  print(ggplot(data = test, aes_string(x = el, y = 'count')) + geom_bar(stat='identity'))
  dev.off()
}
like image 119
MrFlick Avatar answered Sep 08 '25 10:09

MrFlick


I made two changes to your code:

  1. To "group by" variable in dplyr use group_by_ instead of group_by;
  2. To call variable in ggplot2 use aes_string or get(variable);

I also added minor changes (e.g. ggsave to save plots).

library(ggplot2)
library(dplyr)
my_df <- data.frame(var_1 = sample(c('a', 'b', 'c'), 1000, replace = TRUE),
                    var_2 = sample(c('d', 'e', 'f'), 1000, replace = TRUE))

name_list = c('var_1', 'var_2')

for(el in name_list){
    p <- my_df %>% 
         group_by_(el) %>% 
         summarize(count = n()) %>%
         ggplot(aes(x = get(el), y = count)) +
             geom_bar(stat = "identity")
    ggsave(paste0(el, ".pdf"), p)
}
like image 28
pogibas Avatar answered Sep 08 '25 12:09

pogibas