Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Summarise within groups dplyr

I have a dataset like below:

BRAND  MEDIUM      W1   W2   W3   W4   W5

   B1   tv          1    0    1    0    2
   B2   tv          0    0    0    0    0
   B1   radio       0    1    2    5    3
   B1   tv          0    0    0    0    0
   B2   radio       0    4    1    1    1
   B1   newspapers  7    4    2    1    0

What I am trying to do is to group by BRAND and sum the value along each column to form a unique row made up of the summed values. Furthermore I'd also like to show which MEDIUM were in each subgroup.

The final output should be the following:

          W1    W2   W3   W4   W5  tv radio newspaper 

   B1      8    9    5    6    5   1   1       1
   B2      0    4    1    1    1   1   1       0

I'm having hard time to find a way to solve this, especially the second part to put which medium was in the group. Any suggestion or hints? Thanks

like image 432
3nomis Avatar asked Nov 18 '25 04:11

3nomis


2 Answers

Mybe in late, but you can also try to manage it without any external packages, in two steps:

First aggregate the W... by BRAND :

# aggregate by brand, and not using the MEDIUM column:
one  <-   aggregate(. ~ BRAND, data = dats[,-2], sum) 

Then the mediums:

# frequencies
# EDIT HERE
# two <- as.data.frame.matrix(table(dats$BRAND,dats$MEDIUM))  
# replace with ones
# two[(two)>1] <- 1

# thanks to Ronak Shah
two <- data.frame(BRAND = unique(dats$BRAND), +(table(dats$BRAND, dats$MEDIUM) > 0))

Last you can merge the two datasets:

merge(one, two)
  BRAND W1 W2 W3 W4 W5 newspapers radio tv
1    B1  8  5  5  6  5          1     1  1
2    B2  0  4  1  1  1          0     1  1
like image 129
s__ Avatar answered Nov 19 '25 20:11

s__


Here is an idea that creates two different data frames and joins them together to get your expected output. The first is an aggregation on BRAND, and the second one dummifies the MEDIUM i.e.

library(tidyverse)

df1 <- df %>% 
    group_by(BRAND) %>% 
    summarise_at(vars(starts_with('W')), funs(sum))

# A tibble: 2 x 6
#  BRAND    W1    W2    W3    W4    W5
#  <fct> <int> <int> <int> <int> <int>
#1 B1        8     5     5     6     5
#2 B2        0     4     1     1     1

df2 <- df %>% 
    select(BRAND, MEDIUM) %>% 
    group_by(BRAND) %>% 
    mutate(new = 1) %>% 
    distinct() %>% 
    spread(MEDIUM, new, fill = 0)

# A tibble: 2 x 4
# Groups:   BRAND [2]
#  BRAND newspapers radio    tv
#  <fct>      <dbl> <dbl> <dbl>
#1 B1             1     1     1
#2 B2             0     1     1

Join them together to get,

left_join(df1, df2)
#Joining, by = "BRAND"
# A tibble: 2 x 9
#  BRAND    W1    W2    W3    W4    W5 newspapers radio    tv
#  <fct> <int> <int> <int> <int> <int>      <dbl> <dbl> <dbl>
#1 B1        8     5     5     6     5          1     1     1
#2 B2        0     4     1     1     1          0     1     1
like image 41
Sotos Avatar answered Nov 19 '25 21:11

Sotos



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!