Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Compute difference between columns and save results in a new one using dplyr

Tags:

r

dplyr

I am working with some data in R. My dataframe DF looks like this (I add the dput() version in the final side):

    ID S.2014.01.01 S.2014.01.02 S.2014.01.03 S.2014.01.04
1  001            1           10            5           74
2  002            2           15            6           75
3  003            3           23            7           76
4  004            4           31            8           77
5  005            5           39            9           78
6  006            6           47           10           79
7  007            7           55           11           80
8  008            8           63           12           81
9  009            9           71           13           82
10 010           10           79           14           83

DF contains an ID variable and many columns that explain values by days (In this example I include only 4 variables, real dataframe has more than 100 variables in this style). My goal is to compute the difference between each pair of variables. For example, I would like to compute the difference between variables S.2014.01.02 and S.2014.01.01 and then save the values in a new variable named D.2014.01.02. It is the same process for the next variables. The next case would be S.2014.01.03 and S.2014.01.02 and then create a new column named D.2014.01.03.

I have tried different solutions due to the number of columns in my real dataframe. One solution would be to compute one by one but is not optimal. Also, I have tried using mutate_each() function from dplyr package but I don't know how to set to take pairs of columns and then create new ones. Moreover, I have tried with lag() function from the same package but it doesn't work. The reason why I have used this function is because I could need to compute not only differences by pairs of columns, but also I could need a difference between for example each two or three columns instead of one (pairs). I would like to get a dataframe like this:

    ID S.2014.01.01 S.2014.01.02 S.2014.01.03 S.2014.01.04 D.2014.01.02 D.2014.01.03 D.2014.01.04
1  001            1           10            5           74            9           -5           69
2  002            2           15            6           75           13           -9           69
3  003            3           23            7           76           20          -16           69
4  004            4           31            8           77           27          -23           69
5  005            5           39            9           78           34          -30           69
6  006            6           47           10           79           41          -37           69
7  007            7           55           11           80           48          -44           69
8  008            8           63           12           81           55          -51           69
9  009            9           71           13           82           62          -58           69
10 010           10           79           14           83           69          -65           69 

In this dataframe we can see the new variables that start with D and they are the result of the difference of pair of columns. I f you could give some advice about this situation with two variables would be fantastic, but if you could help me with a version for the difference each 2 or 3 columns would be marvelous. The dput() version of DF is the next:

DF<-structure(list(ID = c("001", "002", "003", "004", "005", "006", 
"007", "008", "009", "010"), S.2014.01.01 = c(1, 2, 3, 4, 5, 
6, 7, 8, 9, 10), S.2014.01.02 = c(10, 15, 23, 31, 39, 47, 55, 
63, 71, 79), S.2014.01.03 = c(5, 6, 7, 8, 9, 10, 11, 12, 13, 
14), S.2014.01.04 = c(74, 75, 76, 77, 78, 79, 80, 81, 82, 83)), .Names = c("ID", 
"S.2014.01.01", "S.2014.01.02", "S.2014.01.03", "S.2014.01.04"
), row.names = c(NA, -10L), class = "data.frame")

Thanks for your help!

like image 220
Duck Avatar asked Sep 02 '25 14:09

Duck


1 Answers

There is no need to transpose or use any vectorisation functions.

DF <- cbind(DF, DF[,3:5] - DF[,2:4])
names(DF)[6: 8] = gsub("S", "D", names(DF)[6: 8])
like image 82
Alex Avatar answered Sep 05 '25 03:09

Alex