Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Reshape dataframe that has years in column names

Tags:

r

I am trying to reshape a wide dataframe in R into a long dataframe. Reading over some of the functions in reshape2 and tidyr they all seem to just handle if you have 1 variable you are splitting whereas I have ~10. Each column has the type variables names and the year and I would like it split so that the years become a factor in each row and then have significantly less columns and an easier data set to work with.

Currently the table looks something like this.

 State  Rank  Name     V1_2016  V1_2017  V1_2018  V2_2016  V2_2017  V2_2018 
 TX     1     Company  1        2        3        4        5        6  

I have tried to melt the data with reshape2 but it came out looking like garbage and being 127k rows when it should only be about 10k.

I am trying to get the data to look something like this.

  State Rank    Name Year V1 V2
1    TX    1 Company 2016  1  4
2    TX    1 Company 2017  2  5
3    TX    1 Company 2018  3  6
like image 743
John Carty Avatar asked Nov 21 '25 10:11

John Carty


2 Answers

An option with melt from data.table that can take multiple measure based on the patterns in the column names

library(data.table)
nm1 <- unique(sub(".*_", "", names(df)[-(1:3)]))
melt(setDT(df), measure = patterns("V1", "V2"),
       value.name = c("V1", "V2"), variable.name = "Year")[,
       Year := nm1[Year]][]
#    State Rank    Name Year V1 V2
#1:    TX    1 Company 2016  1  4
#2:    TX    1 Company 2017  2  5
#3:    TX    1 Company 2018  3  6

data

df <- structure(list(State = "TX", Rank = 1L, Name = "Company", V1_2016 = 1L, 
    V1_2017 = 2L, V1_2018 = 3L, V2_2016 = 4L, V2_2017 = 5L, V2_2018 = 6L),
    class = "data.frame", row.names = c(NA, 
-1L))
like image 65
akrun Avatar answered Nov 24 '25 05:11

akrun


One dplyr and tidyr possibility could be:

df %>%
 gather(var, val, -c(1:3)) %>%
 separate(var, c("var", "Year")) %>%
 spread(var, val)

  State Rank    Name Year V1 V2
1    TX    1 Company 2016  1  4
2    TX    1 Company 2017  2  5
3    TX    1 Company 2018  3  6

It, first, transforms the data from wide to long format, excluding the first three columns. Second, it separates the original variable names into two new variables: one containing the variable prefix, second containing the year. Finally, it spreads the data.

like image 29
tmfmnk Avatar answered Nov 24 '25 06:11

tmfmnk