I have a long dataset with one row per individual grouped with schools. Each row has an ordered factor {1, 2, 3, 4}, "cats". I want to get the percentage of 1's, 2's, 3's and 4's within each school. The dataset looks like this:
  school_number           cats
1          10505             3
2          10505             3
3          10502             1
4          10502             1
5          10502             2
6          10502             1
7          10502             1
8          10502             2
10         10503             3
11         10505             2
I tried something like this:
df_pcts <- df %>%
   group_by(school_number) %>%
   mutate(total=sum(table(cats))) %>%
   summarize(cat_pct = table(cats)/total)
but the total variable produced by the mutate() step puts the grand total number of rows in every row. I can't even get to the final summarize step. I'm confused.
P.S. In some other posts I saw lines like this:
n = n()
when I do that I get a message saying,
Error in n() : This function should not be called directly
Where did this come from?
TIA
Perhaps this helps a little, though I'm not 100% sure of what output you need.
This counts the number of rows of each combination of school_number/cats that exist in your df using tally.  Then calculates the percentage of 'cats' in each school_number by then only grouping by school_number.
df %>%
  group_by(school_number,cats) %>%
  tally  %>%
  group_by(school_number) %>%
  mutate(pct=(100*n)/sum(n))
It gives this:
  #    school_number cats n       pct
  #  1         10502    1 4  66.66667
  #  2         10502    2 2  33.33333
  #  3         10503    3 1 100.00000
  #  4         10505    2 1  33.33333
  #  5         10505    3 2  66.66667
EDIT:
to add in rows with 0% that are missing from your sample data, you could do the following. Bind together the output above with a df that contains 0% for all school_number/cats combinations. Only keep the first instance of this bind (first instances always containing values >0% should they exist). I then arranged it by school_number and cats for ease of reading:
y<-df %>%
  group_by(school_number,cats) %>%
  tally  %>%
  group_by(school_number) %>%
  mutate(pct=(100*n)/sum(n)) %>%
  select(-n) 
x<-data.frame(school_number=rep(unique(df$school_number),each=4), cats=1:4,pct=0)  
rbind(y,x) %>%
  group_by(school_number,cats)%>%
  filter(row_number() == 1) %>%
  arrange(school_number,cats)
which gives:
#   school_number cats       pct
#1          10502    1  66.66667
#2          10502    2  33.33333
#3          10502    3   0.00000
#4          10502    4   0.00000
#5          10503    1   0.00000
#6          10503    2   0.00000
#7          10503    3 100.00000
#8          10503    4   0.00000
#9          10505    1   0.00000
#10         10505    2  33.33333
#11         10505    3  66.66667
#12         10505    4   0.00000
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With