Combining multiple columns into one

Question

I have a dataset as follows:

`      GN      Datum                      Land AG..GKV. MTCR..GKV. NSGI..GKV. NSGII..GKV. WA..GKV. ML..GKV.   Wert
8  11693 2012-01-05                     Kenia                                              5A001f          159700
9  11710 2012-01-05                    Indien 1C350.43                                                      18752
10 11749 2012-01-05 Taiwan, Provinz von China  2B350g2                                                      24000
11 11750 2012-01-05                    Indien                                    2B201a1  2B001b2          370296
12 11751 2012-01-05                   Serbien                                     2B201a                     7200
13 11752 2012-01-05                    Indien                                                           4c   6037
14 11753 2012-01-05                  Rumänien                                              2B001d            7830
15 11755 2012-01-06                     China                                    2B201a1                   215987
16 11756 2012-01-06                     China                                    2B201a1                   234736
17 11757 2012-01-06                 Brasilien                                      2B231                    69900
18 11758 2012-01-06                 Brasilien                                      2B231                   139800
19 11788 2012-01-07                 Kongo                                                                  139800

GN i a unique ID. As you can see, for each row, the variables AG..GKV. until ML..GKV hold one value at most, expect in the case where GN == 11750.

I would like to combine these variables into one, but still know what the original variable name was, like so:

`      GN      Datum                      Land variable value    Wert
8  11693 2012-01-05                     Kenia  WA..GKV. 5A001f   159700    
9  11710 2012-01-05                    Indien  AG..GKV. 1C350.43 18752
...

In the case where there are two values per row as in GN == 11750, it should look like this:

`      GN      Datum       Land   variable                value           Wert
11  11750 2012-01-05     Indien  NSGII..GKV.
WA..GKV.    2B201a1
2B201a 370296

i.e., the "double" variable and value should be separated by a new line delimiter.

UPDATE: There is also a special case: When there is no value at all for one row. I want to preserve these (see GN == 19 above and below). variable and value should just be empty in this case.

`      GN      Datum       Land   variable                value           Wert
19  11788 2012-01-07      Kongo                                           370296

I have no clue how to achieve this and have already tried the reshape package, but probably am too much of a R noob.

Here's a dput (not the same cases as above, but with all three kinds of outcomes):

    structure(list(GN = c(11789L, 11790L, 11791L, 11793L, 11794L, 
11795L, 11796L), Datum = structure(c(15355, 15355, 15355, 15355, 
15355, 15356, 15356), class = "Date"), Land = c("China", "Israel", 
"Pakistan", "Iran, Islamische Republik", "Hong Kong", "Südafrika", 
"Lettland"), AG..GKV. = c("2B350d4", "", "", "", "", "", ""), 
    MTCR..GKV. = c("", "", "", "", "", "", ""), NSGI..GKV. = c("", 
    "", "", "", "", "", ""), NSGII..GKV. = c("", "", "", "", 
    "", "", "2B201a1"), WA..GKV. = c("", "7A002a+b", "", "", 
    "3A001a2c", "6A003b4", "2B001b2"), ML..GKV. = c("", "", "", 
    "", "", "", ""), Wert = c(63720, 25672, 1608000, 10738, 202500, 
    13500, 374873)), .Names = c("GN", "Datum", "Land", "AG..GKV.", 
"MTCR..GKV.", "NSGI..GKV.", "NSGII..GKV.", "WA..GKV.", "ML..GKV.", 
"Wert"), row.names = 49:55, class = "data.frame")

talat · Accepted Answer

Here's an option with tidyr and dplyr:

library(dplyr)
library(tidyr)
DF %>% gather(Variable, Value, AG..GKV.:ML..GKV.) %>% 
    filter(Value != "") %>% group_by(GN, Datum, Land) %>% 
    summarise_each(funs(paste(unique(.), collapse = "
")))

#Source: local data frame [11 x 6]
#Groups: GN, Datum
#
#      GN      Datum                      Land   Wert              Variable            Value
#1  11693 2012-01-05                     Kenia 159700              WA..GKV.           5A001f
#2  11710 2012-01-05                    Indien  18752              AG..GKV.         1C350.43
#3  11749 2012-01-05 Taiwan, Provinz von China  24000              AG..GKV.          2B350g2
#4  11750 2012-01-05                    Indien 370296 NSGII..GKV.
WA..GKV. 2B201a1
2B001b2
#5  11751 2012-01-05                   Serbien   7200           NSGII..GKV.           2B201a
#6  11752 2012-01-05                    Indien   6037              ML..GKV.               4c
#7  11753 2012-01-05                  Rumänien   7830              WA..GKV.           2B001d
#8  11755 2012-01-06                     China 215987           NSGII..GKV.          2B201a1
#9  11756 2012-01-06                     China 234736           NSGII..GKV.          2B201a1
#10 11757 2012-01-06                 Brasilien  69900           NSGII..GKV.            2B231
#11 11758 2012-01-06                 Brasilien 139800           NSGII..GKV.            2B231

Note that this will only create several entries separated by " " if there they are unique, which I assumed is what you want to do.

Edit after OP comment:

For your special case, try the following code:

DF %>% gather(Variable, Value, AG..GKV.:ML..GKV.) %>% 
    group_by(GN, Datum, Land) %>% filter(if(all(Value == "")) row_number() == 1 else Value != "") %>% 
    summarise_each(funs(paste(unique(.), collapse = "
")))

Explanation:

gather reshapes the data from wide to long format (using the columns AG..GKV. until ML..GKV.)
Then we group the data by GN, Datum and Land
The filter condition is: if all Value entries are empty strings, check if the row number is 1 (that means, return only the first row of such groups) else (if at least one or more Value entries are not empty strings) return only those rows which are not empty
Then, on each of the columns - except the grouping columns - summarise all unique values per group into a single string, separated by

Cath · Answer

You can try (df being your data.frame) :

variable name :

df$variable<-apply(df[,4:9],1,function(x){paste(colnames(df[,4:9])[which(x!="")],collapse="
")})

value :

df$value<-apply(df[,4:9],1,function(x){paste(x[which(x!="")],collapse="
")})

Finally, delete the original columns and reorganize :

df<-df[,c(1:3,11:12,10)]

Output :

> df
      GN      Datum                      Land              variable            value   Wert
8  11693 2012-01-05                     Kenia              WA..GKV.           5A001f 159700
9  11710 2012-01-05                    Indien              AG..GKV.         1C350.43  18752
10 11749 2012-01-05 Taiwan, Provinz von China              AG..GKV.          2B350g2  24000
11 11750 2012-01-05                    Indien NSGII..GKV.
WA..GKV. 2B201a1
2B001b2 370296
12 11751 2012-01-05                   Serbien           NSGII..GKV.           2B201a   7200
13 11752 2012-01-05                    Indien              ML..GKV.               4c   6037
14 11753 2012-01-05                  Rumänien              WA..GKV.           2B001d   7830
15 11755 2012-01-06                     China           NSGII..GKV.          2B201a1 215987
16 11756 2012-01-06                     China           NSGII..GKV.          2B201a1 234736
17 11757 2012-01-06                 Brasilien           NSGII..GKV.            2B231  69900
18 11758 2012-01-06                 Brasilien           NSGII..GKV.            2B231 139800

Combining multiple columns into one

Tags:

r

reshape

grssnbchr

2 Answers

Edit after OP comment:

talat

Cath

Recent Activity

Donate For Us

Combining multiple columns into one

Tags:

r

reshape

grssnbchr

2 Answers

Edit after OP comment:

talat

Cath

Related questions

Recent Activity

Donate For Us