Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Remove row below conditionally in dataframe and add values together in R

Tags:

r

I have a large dataset with 3 columns: Name, Country, and Sales.

I'd like to sum the Sales column by Names that are both identical and occur consecutively. Then I'd like to remove all rows but the first occurrence of a series, replacing the value of Sales with the series sum.

For example:

Name,Country,Sales
A,V,100
A,W,100
B,X,100
B,Y,100
A,Z,100

Would be reduced to:

Name,Country,Sales
A,V,200
B,X,200
A,Z,100

Anyone got any idea how to do this?

like image 384
no nein Avatar asked Dec 07 '25 19:12

no nein


1 Answers

Your data

df <- structure(list(Name = c("A", "A", "B"), Country = c("X", "Y", 
"Z"), Sales = c(100L, 100L, 100L)), .Names = c("Name", "Country", 
"Sales"), row.names = c(NA, -3L), class = c("data.table", "data.frame"
))

dplyr solution

library(dplyr)
library(data.table)
ans <- df %>%
         group_by(rleid(Name)) %>%
         summarise(Name = unique(Name), Sales=sum(Sales)) %>%
         select(-1)

Output

   Name Sales
  <chr> <int>
1     A   200
2     B   100       

Alternative example

newdf <- rbind(df, data.frame(Name=c("A","A","B","B"),
                              Country=c("A","B","C","D"),
                              Sales=c(100,100,100,100)))
ans <- newdf %>%
         group_by(rleid(Name)) %>%
         summarise(Name = unique(Name), Sales=sum(Sales)) %>%
         select(-1)

Output

    Name Sales
  <fctr> <dbl>
1      A   200
2      B   100
3      A   200
4      B   200
like image 186
CPak Avatar answered Dec 10 '25 07:12

CPak



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!