Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Dividing columns by particular values using dplyr

Tags:

r

dplyr

I have a dataframe like this:

 Setting   q02_id c_school c_home c_work c_transport c_leisure Country
   Rural 11900006        0      5      3           1         1 Vietnam
   Rural 11900031       10      5      0           0         0 China
   Rural 11900033        0      3      0           0         3 Vietnam
   Rural 11900053        0      7      2           0         0 Vietnam
   Rural 11900114        3      6      0           0         0 Malaysia
   Rural 11900446        0      6      0           0         0 Vietnam

and I would like to divide columns 2, 3, 4, 5, 6 by the total for that particular country.

Doing it in base R is a bit clumsy:

df[df$Country=="Vietnam",][c(3, 4, 5, 6)] = df[df$Country=="Vietnam",][c(3, 4, 5, 6)] / sum(df[df$Country=="Vietnam",][c(3, 4, 5, 6)])

(I think that works).

I'm trying to convert as much of my code as possible to use tidyverse functions. Is there a way of doing the same thing more efficiently using, dplyr, for instance?

Thanks.

like image 324
user438383 Avatar asked Dec 14 '25 10:12

user438383


1 Answers

I trust this is what you are after:

Divide each column by the sum of that column - grouped by Country

library(tidyverse)
df1 %>%
  group_by(Country) %>%
  mutate_at(vars(c_school: c_leisure), funs(./ sum(.)))
#output
  Setting   q02_id c_school c_home  c_work c_transport c_leisure Country 
  <fct>      <int>    <dbl>  <dbl>   <dbl>       <dbl>     <dbl> <fct>   
1 Rural   11900006   NaN     0.238   0.600        1.00     0.250 Vietnam 
2 Rural   11900031     1.00  1.00  NaN          NaN      NaN     China   
3 Rural   11900033   NaN     0.143   0            0        0.750 Vietnam 
4 Rural   11900053   NaN     0.333   0.400        0        0     Vietnam 
5 Rural   11900114     1.00  1.00  NaN          NaN      NaN     Malaysia
6 Rural   11900446   NaN     0.286   0            0        0     Vietnam 

or alternatively divide each column by the total sum for each country as in your example (only difference is I used columns 3:7 as I trust you intended.

df1 %>%
  mutate(sum = rowSums(.[,3:7])) %>%
  group_by(Country) %>%
  mutate_at(vars(c_school: c_leisure), funs(./ sum(sum))) %>%
  select(-sum)
#output
  Setting   q02_id c_school c_home c_work c_transport c_leisure Country 
  <fct>      <int>    <dbl>  <dbl>  <dbl>       <dbl>     <dbl> <fct>   
1 Rural   11900006    0     0.161  0.0968      0.0323    0.0323 Vietnam 
2 Rural   11900031    0.667 0.333  0           0         0      China   
3 Rural   11900033    0     0.0968 0           0         0.0968 Vietnam 
4 Rural   11900053    0     0.226  0.0645      0         0      Vietnam 
5 Rural   11900114    0.333 0.667  0           0         0      Malaysia
6 Rural   11900446    0     0.194  0           0         0      Vietnam 

data:

df1 = read.table(text ="Setting   q02_id c_school c_home c_work c_transport c_leisure Country
  Rural 11900006        0      5      3           1         1 Vietnam
  Rural 11900031       10      5      0           0         0 China
  Rural 11900033        0      3      0           0         3 Vietnam
  Rural 11900053        0      7      2           0         0 Vietnam
  Rural 11900114        3      6      0           0         0 Malaysia
  Rural 11900446        0      6      0           0         0 Vietnam", header = T)
like image 69
missuse Avatar answered Dec 16 '25 00:12

missuse



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!