Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Melting an R data.table with a factor column

I have the following R data.table (though this should scale with a data.frame too). The goal is to reshape this data.table to plot as a scatterplot in ggplot2. I therefore need to reshape this data.table to have one "factor" column to color the points:

> library(data.table)
> dt
   ID                   x_A               y_A        x_B       y_B                                                                                                                                                                                                  
   1:   05AC            0.81               3          0.92      2.05                                                                                                                                                                                                   
   2:   01BA            0.41               5          0.63      1.8                                                                                                                                                                                                   
   3:   Z1AC            0.41               5          0.58      1.8                                                                                                                                                                                                   
   4:   B2BA            0.21             6.5          1.00      1.8   
   ....

I believe the correct output needs to be of the form:

ID     type   x      y
05AC   A      0.81   3       
05AC   B      0.92   2.05
01BA   A      0.41   5 
01BA   B      0.63   1.8
Z1AC   A      0.41   5 
Z1AC   B      0.58   1.8
B2BA   A      0.21   6.5 
B2BA   B      1.00   1.8

Is there a standard way to "unfold" data.tables in this fashion? I'm happy for how to use dplyr in this case, but I suspect there should be a data.table method.

melt() would work, if I could figure out how to create the column type, e.g.

melt(dt, id.vars=c("ID")) 

will only melt based on the one column ID

I'm especially confused how one "scrapes" the A and B type from columns 2-3 and columns 4-5 respectively...

like image 758
ShanZhengYang Avatar asked Oct 26 '25 18:10

ShanZhengYang


2 Answers

Staying within data.table, after your suggested approach of using melt, you can tstrsplit to split the variable based on the "_" character.

## use tstrsplit to split a column on a regular expression
dt[, c("xy", "type") := tstrsplit(variable, "_")]
dt 
#       ID variable value xy type
#  1: 05AC      x_A  0.81  x    A
#  2: 01BA      x_A  0.41  x    A
#  3: Z1AC      x_A  0.41  x    A
#  4: B2BA      x_A  0.21  x    A
#  5: 05AC      y_A  3.00  y    A
#  6: 01BA      y_A  5.00  y    A
#  7: Z1AC      y_A  5.00  y    A
#  8: B2BA      y_A  6.50  y    A
#  9: 05AC      x_B  0.92  x    B
# 10: 01BA      x_B  0.63  x    B
# 11: Z1AC      x_B  0.58  x    B
# 12: B2BA      x_B  1.00  x    B
# 13: 05AC      y_B  2.05  y    B
# 14: 01BA      y_B  1.80  y    B
# 15: Z1AC      y_B  1.80  y    B
# 16: B2BA      y_B  1.80  y    B

This gives you the long-form of your required solution. You can then use dcast to widen it

dcast(dt, formula = ID + type ~ xy)

#      ID type    x    y
# 1: 01BA    A 0.41 5.00
# 2: 01BA    B 0.63 1.80
# 3: 05AC    A 0.81 3.00
# 4: 05AC    B 0.92 2.05
# 5: B2BA    A 0.21 6.50
# 6: B2BA    B 1.00 1.80
# 7: Z1AC    A 0.41 5.00
# 8: Z1AC    B 0.58 1.80

The logic of this answer is the same as the suggested dplyr approach of gather %>% separate %>% spread, but using data.table.

like image 160
SymbolixAU Avatar answered Oct 29 '25 08:10

SymbolixAU


A combination of dplyr and tidyr can produce your desired result. This is untested, due to the lack of a reproducible example.

library(tidyr)
library(dplyr)

dt %>% 
  gather(variable, value, -ID) %>% 
  separate(variable, c("group", "type"), sep = "\\_") %>% 
  spread(group, value, na.rm = TRUE)

What this does:

  1. gathers all columns except the ID column into a key-value rows, variable and value.
  2. separates the variable column into group and type, using _ as a separator.
  3. spread the contents of the group rows into columns and populate them with the value column, removing any NA combinations.
like image 33
Jake Kaupp Avatar answered Oct 29 '25 09:10

Jake Kaupp



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!