Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to subtract R dataframe columns based on information in other dataframes?

Tags:

dataframe

r

I have a dataframe that I'd like to add new columns to but where the calculation is dependant on values in another dataframe which holds instructions.

I have created a reproducible example below (although in reality there are quite a few more columns),

input dataframes:

base <- data.frame("A"=c("orange","apple","banana"),
                   "B"=c(5,3,6),
                   "C"=c(7,12,4),
                   "D"=c(5,2,7),
                   "E"=c(1,18,4))
key <- data.frame("cols"=c("A","B","C","D","E"),
                  "include"=c("no","no","yes","no","yes"),
                  "subtract"=c("na","A","B","C","D"),
                  "names"=c("na","G","H","I","J"))

desired output dataframe:

output <- data.frame("A"=c("orange","apple","banana"),
                     "B"=c(5,3,6),
                     "C"=c(7,12,4),
                     "D"=c(5,2,7),
                     "E"=c(1,18,4),
                     "H"=c(2,9,-2),
                     "J"=c(-4,16,-3))

The keys dataframe has a row for each column in the base dataframe and an "include" column that has to be set to "yes" for any calculation to be done. If it is set to yes, then I want to add a new column with a defined name that subtracts a given column.

For example, column "C" in the base dataframe is set to included so I want to create a new column called "H" that has values from column "C" minus values from column "B".

I thought I could do this with a loop but my attempts have not been successful and my searches have not found anything that helped (I'm a bit new). Any help would be much appreciated.

sessioninfo(): R version 3.4.2 (2017-09-28) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 18363)

Matrix products: default

locale: [1] LC_COLLATE=English_United Kingdom.1252 LC_CTYPE=English_United Kingdom.1252
[3] LC_MONETARY=English_United Kingdom.1252 LC_NUMERIC=C
[5] LC_TIME=English_United Kingdom.1252

attached base packages: [1] stats graphics grDevices utils datasets methods base

loaded via a namespace (and not attached): [1] compiler_3.4.2 tools_3.4.2

like image 691
MVincent Avatar asked Nov 25 '25 10:11

MVincent


1 Answers

Here is a base R option

k <- subset(key, include == "yes")
output <- cbind(base,setNames(base[k[["cols"]]]-base[k[["subtract"]]],k$names))

and we will get

> output
       A B  C D  E  H  J
1 orange 5  7 5  1  2 -4
2  apple 3 12 2 18  9 16
3 banana 6  4 7  4 -2 -3
like image 114
ThomasIsCoding Avatar answered Nov 26 '25 23:11

ThomasIsCoding



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!