Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Split multiple columns into rows [duplicate]

I'm working with a very raw set of data and need to shape it up in order to work with it. I am trying to split selected columns based on seperator '|'

d <- data.frame(id = c(022,565,893,415),
     name = c('c|e','m|q','w','w|s|e'), 
     score = c('e','k|e','e|k|e', 'e|o'))

Is it possible to split the dataframe at one so it looks like this in the end.

df <- data.frame(id = c(22,22,565,565,565,565,893,893,893,415,415,415,415,415,415),
            name = c('c','e','m','m','q','q','w','w','w','w','w','s','s','e','e'),
            score = c('e','e','k','e','k','e','e','k','e','e','o','e','o','e','o'))

So far I've tried various different string split funtions but haven't had much luck :(

Can anybody help?

like image 689
Davis Avatar asked Oct 16 '25 19:10

Davis


2 Answers

Here's a simple base R approach in two steps:

1) split the columns:

x <- lapply(d[-1], strsplit, "|", fixed = TRUE)

2) expand and combine:

d2 <- setNames(do.call(rbind, Map(expand.grid, d$id, x$name, x$score)), names(d)) 

The result is then:

#    id name score
#1   22    c     e
#2   22    e     e
#3  565    m     k
#4  565    q     k
#5  565    m     e
#6  565    q     e
#7  893    w     e
#8  893    w     k
#9  893    w     e
#10 415    w     e
#11 415    s     e
#12 415    e     e
#13 415    w     o
#14 415    s     o
#15 415    e     o
like image 121
talat Avatar answered Oct 18 '25 16:10

talat


There is also 2 line tidyr \ dplyr solution.

The tidyr package has a function called separate_rows that will do what you need. You need two separate the rows in two operations with the nested elements not being equal.

library(tidyr)
library(dplyr)

df <- separate_rows(d, name, sep = "\\|") %>%
separate_rows(score, sep = "\\|")
like image 31
Jake Kaupp Avatar answered Oct 18 '25 14:10

Jake Kaupp



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!