Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Reproduce nested Excel Pivot table in r with tabular

Tags:

r

tabular

I am trying to reproduce the following excel pivot table in r: Excel pivot table

Using tabular:

library(vcd)
library(tables)
tabular(Sex*(Treatment+1)+1~(count=ID + Percent("col")), data=Arthritis)

Produces:

              count        
 Sex    Treatment ID    Percent
 Female Placebo   32     38.10 
        Treated   27     32.14 
        All       59     70.24 
 Male   Placebo   11     13.10 
        Treated   14     16.67 
        All       25     29.76 
        All       84    100.00 

Is there a way to get the treatment percents to add up to 100 for each gender like the excel pivot table?

like image 989
user2547973 Avatar asked Sep 01 '25 16:09

user2547973


1 Answers

Everything apart from the final All row can be accomplished with the following.

library(dplyr)
library(tidyr)

   df <- Arthritis %>%
  group_by(Sex, Treatment) %>%
  summarise(cnt = n()) %>%
  ungroup() %>%
  spread(Treatment, cnt) %>%
  mutate(All = Placebo + Treated) %>%
  gather(Treatment, ID , -Sex) %>%
  group_by(Sex) %>%
  mutate(percent = ID / (sum(ID) / 2)) %>%
  arrange(Sex, desc(Treatment)) #forces "Treated" to top of Treatment column for each group

    > df
Source: local data frame [6 x 4]
Groups: Sex [2]

     Sex Treatment    ID   percent
  <fctr>     <chr> <int>     <dbl>
1 Female   Treated    27 0.4576271
2 Female   Placebo    32 0.5423729
3 Female       All    59 1.0000000
4   Male   Treated    14 0.5600000
5   Male   Placebo    11 0.4400000
6   Male       All    25 1.0000000

If you want a total line, you can use the following, but it isn't very pretty.

grand_total <- data.frame(Sex = "Total" , "Treatment" = "All", 
                          ID = nrow(Arthritis), percent = 1, 
                          stringsAsFactors = FALSE)

df_final <- bind_rows(df, grand_total)

Now, if you want to blank out the Sex column except for the first, occurrence, you can do it this way. Since we sorted in decsending order on the Treatment column, we know that Treated it as the top of each group. So we simply replace the Sex column with a blank when the Treatment column is not equal to Treated. We also don't blank out the All we created.

df_final$Sex[df_final$Treatment != "Treated" & 
df_final$Sex %in% c("Female", "Male")] <- ""

Source: local data frame [7 x 4]
Groups: Sex [3]

     Sex Treatment    ID   percent
   <chr>     <chr> <int>     <dbl>
1 Female   Treated    27 0.4576271
2          Placebo    32 0.5423729
3              All    59 1.0000000
4   Male   Treated    14 0.5600000
5          Placebo    11 0.4400000
6              All    25 1.0000000
7  Total       All    84 1.0000000
like image 119
Nick Criswell Avatar answered Sep 04 '25 06:09

Nick Criswell