pixel230 pixel231 pixel232 pixel233 pixel234 pixel235...
61 191 254 254 254 254 ...
0 0 0 0 0 0 ...
0 160 207 6 0 0 ...
23 210 253 253 253 248 ...
0 0 0 8 211 254 ...
I want to sum over cols skipping some in between and add a new column. There are actually pixel0 through pixel783 and 600000rows. Need something similar to:
mutate(df,eureka=sum([pixel230:pixel239]+[pixel244:pixel252])
How do I index the df to get this to work? Thanks.
You could use select() to get the columns you want, then use rowSums() to add them up. For example
library(dplyr)
#reproducible example
set.seed(15)
dd <- data.frame(matrix(rpois(7*4,5), ncol=7))
dd %>% select(num_range("X", c(1:2, 5:7))) %>% mutate(eureka=rowSums(.))
This gives
X1 X2 X5 X6 X7 eureka
1 5 4 5 7 5 26
2 3 11 9 7 6 36
3 9 7 3 4 2 25
4 6 3 7 2 3 21
To preserve all the variables, you can make a helper function
sum_num_range <- function(x, val, rng, name) {
select(x,num_range(val, rng)) %>%
transmute_(.dots=setNames(list(~rowSums(.)), name)) %>%
cbind(x,.)
}
dd %>% sum_num_range("X", c(1:2, 5:7), "eureka")
which returns
X1 X2 X3 X4 X5 X6 X7 eureka
1 5 4 6 5 5 7 5 26
2 3 11 7 6 9 7 6 36
3 9 7 2 7 3 4 2 25
4 6 3 6 7 7 2 3 21
or maybe more generally
sum_cols <- function(x, colspec, name) {
select_(x, .dots=list(lazyeval::lazy(colspec))) %>%
transmute_(.dots=setNames(list(~rowSums(.)), name)) %>%
cbind(x,.)
}
dd %>% sum_cols(num_range("X", c(1:2, 5:7)), "eureka")
which should work with the other select() helper functions such as starts_with(), ends_with(), everything(), etc. See the ?select help page for more info.
An option using data.table. Specify the columns (.SDcols) that we need to get the sum ('nm1'), use Reduce to sum the corresponding elements of those columns, assign (:=) the output to new column ('eureka') (should be very fast for big datasets as it add columns by reference)
library(data.table)
nm1 <-paste0('pixel', c(230:231, 234:235))
setDT(df)[, eureka:=Reduce(`+`,.SD), .SDcols=nm1][]
# pixel230 pixel231 pixel232 pixel233 pixel234 pixel235 eureka
#1: 61 191 254 254 254 254 760
#2: 0 0 0 0 0 0 0
#3: 0 160 207 6 0 0 160
#4: 23 210 253 253 253 248 734
#5: 0 0 0 8 211 254 465
Suppose if there are missing values (NA) in some of the columns, can replace it with '0' and do the +
df[3,1] <- NA
setDT(df)[, eureka := Reduce(`+`, lapply(.SD, function(x)
replace(x, which(is.na(x)), 0))), .SDcols=nm1][]
a similar approach using base R is
df['eureka'] <- Reduce(`+`,lapply(df[nm1], function(x)
replace(x, which(is.na(x)), 0)))
df <- structure(list(pixel230 = c(61L, 0L, 0L, 23L, 0L),
pixel231 = c(191L, 0L, 160L, 210L, 0L), pixel232 = c(254L, 0L, 207L,
253L, 0L), pixel233 = c(254L, 0L, 6L, 253L, 8L), pixel234 = c(254L,
0L, 0L, 253L, 211L), pixel235 = c(254L, 0L, 0L, 248L, 254L
)), .Names = c("pixel230", "pixel231", "pixel232", "pixel233",
"pixel234", "pixel235"), class = "data.frame", row.names = c(NA, -5L))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With