I'm using the R package stargazer to create high-quality regression tables, and I would like to use it to create a summary statistics table. I have a factor variable in my data, and I would like the summary table to show me the percent in each category of the factor -- in effect, separate the factor into a set of mutually exclusive logical (dummy) variables, and then display those in the table. Here's an example:
> library(car)
> library(stargazer)
> data(Blackmore)
> stargazer(Blackmore[, c("age", "exercise", "group")], type = "text")
==========================================
Statistic  N   Mean  St. Dev.  Min   Max  
------------------------------------------
age       945 11.442  2.766   8.000 17.920
exercise  945 2.531   3.495   0.000 29.960
------------------------------------------
But I'm trying to get an additional row that shows me the percent in each group (% control and/or % patient, in these data). I'm sure this is just an option somewhere in stargazer, but I can't find it. Does anyone know what it is?
Edit: car::Blackmoor has updated spelling to car::Blackmore.
If you have categorical variables, you can generally still incorporate them into a summary statistics table by turning them into binary “dummy” variables.
Probably the most straightforward and simplest way to do a summary statistics table in R is with the sumtable function in the vtable package, which also has many options for customization. There are also other options like stargazer in stargazer, dfsummary () in summarytools, summary_table () in qwraps2 or table1 () in table1.
The built-in Stata command summarize (which can be referred to in short as su or summ) easily creates summary statistics tables.
Since Stargazer can't do this directly, you can create your own summary table as a data frame and output that using pander, xtable, or any other package. For example, here's how you can use dplyr and tidyr to create a summary table:
library(dplyr)
library(tidyr)
fancy.summary <- Blackmoor %>%
  select(-subject) %>%  # Remove the subject column
  group_by(group) %>%  # Group by patient and control
  summarise_each(funs(mean, sd, min, max, length)) %>%  # Calculate summary statistics for each group
  mutate(prop = age_length / sum(age_length)) %>%  # Calculate proportion
  gather(variable, value, -group, -prop) %>%  # Convert to long
  separate(variable, c("variable", "statistic")) %>%  # Split variable column
  mutate(statistic = ifelse(statistic == "length", "n", statistic)) %>%
  spread(statistic, value) %>%  # Make the statistics be actual columns
  select(group, variable, n, mean, sd, min, max, prop)  # Reorder columns
Which results in this if you use pander:
library(pander)
pandoc.table(fancy.summary)
------------------------------------------------------
 group   variable   n   mean   sd    min   max   prop 
------- ---------- --- ------ ----- ----- ----- ------
control    age     359 11.26  2.698   8   17.92 0.3799
control  exercise  359 1.641  1.813   0   11.54 0.3799
patient    age     586 11.55  2.802   8   17.92 0.6201
patient  exercise  586 3.076  4.113   0   29.96 0.6201
------------------------------------------------------
Another workaround is to use model.matrix to create dummy variables in a separate step, and then use stargazer to create a table from that.  To show this with the example:
> library(car)
> library(stargazer)
> data(Blackmore)
> 
> options(na.action = "na.pass")  # so that we keep missing values in the data
> X <- model.matrix(~ age + exercise + group - 1, data = Blackmore)
> X.df <- data.frame(X)  # stargazer only does summary tables of data.frame objects
> names(X) <- colnames(X)
> stargazer(X.df, type = "text")
=============================================
Statistic     N   Mean  St. Dev.  Min   Max  
---------------------------------------------
age          945 11.442  2.766   8.000 17.920
exercise     945 2.531   3.495   0.000 29.960
groupcontrol 945 0.380   0.486     0     1   
grouppatient 945 0.620   0.486     0     1   
---------------------------------------------
Edit: car::Blackmoor has updated spelling to car::Blackmore.
The package tables can be useful for this task. 
library(car)
library(tables)
data(Blackmore)
# percent only:
(x <- tabular((Factor(group, "") ) ~ (Pct=Percent()) * Format(digits=4), 
    data=Blackmore))
##              
##         Pct  
## control 37.99
## patient 62.01
# percent and counts:
(x <- tabular((Factor(group, "") ) ~ ((n=1) + (Pct=Percent())) * Format(digits=4), 
    data=Blackmore))
##                      
##         n      Pct   
## control 359.00  37.99
## patient 586.00  62.01
Then it's straightforward to output this to LaTeX:
> latex(x)
\begin{tabular}{lcc}
\hline
  & n & \multicolumn{1}{c}{Pct} \\ 
\hline
control  & $359.00$ & $\phantom{0}37.99$ \\
patient  & $586.00$ & $\phantom{0}62.01$ \\
\hline 
\end{tabular}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With