Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Normalizing(Reshaping) data frame based on split and columns

I have a data frame containing

df 
Date        name             score
12/09/2012  Mahesh\nRahul    120
13/09/2012  abc\nxyz\nrep         110
...........................

i have tried this in order to get atomic

name1=str_split(df[,2],"\n")

but dont know how to associate again ,what is the best way to make data frame normalize so that i can get

 df 
Date        name     score
12/09/2012  Mahesh   120
12/09/2012  Rahul    120
13/09/2012  abc      110
13/09/2012  xyz      110
13/09/2012  rep      110
...........................

any help to make normalized a long data frame in R.

Edit

please note that it is just a reproducible example ,i have multiple names in my name column and number of names varies from one row to other row .thanks.

dput(df) structure(list(Date = structure(1:2, .Label = c("12/09/2012", "13/09/2012 "), class = "factor"), name = structure(c(2L, 1L), .Label = c("abc\nxyz", "Mahesh\nRahul"), class = "factor"), score = structure(c(2L, 1L), .Label = c("110", "120"), class = "factor")), .Names = c("Date", "name", "score"), row.names = c(NA, -2L), class = "data.frame")
like image 235
Aashu Avatar asked Jan 25 '26 01:01

Aashu


2 Answers

Here's an R base solution

Update

> Names <- strsplit(df$name, "\n")
> n <- sapply(Names, length)
> data.frame(cbind(apply(df[,-2], 2, function(x) rep(x, n)), 
                   name=unlist(Names)), row.names = NULL)[,c(1,3,2)]
        Date   name score
1 12/09/2012 Mahesh   120
2 12/09/2012  Rahul   120
3 13/09/2012    abc   110
4 13/09/2012    xyz   110
5 13/09/2012    rep   110

where df is:

> dput(df)
structure(list(Date = c("12/09/2012", "13/09/2012"), name = c("Mahesh\nRahul", 
"abc\nxyz\nrep"), score = c(120, 110)), .Names = c("Date", "name", 
"score"), row.names = c(NA, -2L), class = "data.frame")
like image 107
Jilber Urbina Avatar answered Jan 27 '26 18:01

Jilber Urbina


This is relatively easy using data.table (and fast obviously).

require( data.table )
dt <- data.table( df )
dt[ , list( name = unlist( strsplit( name , "\n" ) ) ) , by = list( Date , score ) ]
#         Date score   name
#1: 12/09/2012   120 Mahesh
#2: 12/09/2012   120  Rahul
#3: 13/09/2012   110    abc
#4: 13/09/2012   110    xyz

As a note I took df to be the following data (note character classes over factor classes that appear in your actual data...

df <- read.delim( text = "Date    name    score
12/09/2012  'Mahesh\nRahul'   120
13/09/2012  'abc\nxyz'       110" ,
sep = "" , h = TRUE , quote = "\'" , stringsAsFactors = FALSE )
like image 21
Simon O'Hanlon Avatar answered Jan 27 '26 17:01

Simon O'Hanlon



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!