Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

splitting a dataframe

Tags:

r

Merry Christmas

I would like to split a long dataframe. The dataframe looks like this

    x<-c('0:00:00', '0:30:00', '1:00:00', '1:30:00', '2:00:00', '2:30:00', '3:00:00',
    '0:00:00', '0:30:00', '1:00:00', '1:30:00', '2:00:00', '2:30:00', '3:00:00', 
    '3:30:00', '4:00:00','0:00:00', '0:30:00', '1:00:00', '1:30:00', '2:00:00',
     '2:30:00', '3:00:00', '0:00:00', '0:30:00', '1:00:00', '1:30:00', '2:00:00',
     '2:30:00', '3:00:00' , '3:30:00', '4:00:00')

    y=seq(1:32)

    data1=data.frame(x,y)

i want to split in such a way that the output looks like

    0:00:00  1  8 17 24  
    0:30:00  2  9 18 25  
    1:00:00  3 10 19 26  
    1:30:00  4 11 20 27  
    2:00:00  5 12 21 28  
    2:30:00  6 13 22 29  
    3:00:00  7 14 23 30  
    3:30:00 NA 15 NA 31  
    4:00:00 NA 16 NA 32  

any ideas or functions that i look into for doing this? I tried using split function, but could not get it done. Thanks a lot for your help and time.

The below solution by Matthew works best. However if i increase the cycle time for x

    x<-c('0:00:00', '0:30:00', '1:00:00', '1:30:00', '2:00:00', '2:30:00', '3:00:00', '3:30:00',
    '4:00:00', '4:30:00', '5:00:00', '5:30:00', '6:00:00', '6:30:00', '7:00:00', 
    '7:30:00','8:00:00', '8:30:00', '9:00:00', '9:30:00', '10:00:00', '10:30:00',
     '11:00:00','11:30:00','0:00:00', '0:30:00', '1:00:00', '1:30:00', '2:00:00', '2:30:00', 
    '3:00:00', '3:30:00', '4:00:00', '4:30:00', '5:00:00', '5:30:00', '6:00:00', '6:30:00', 
    '7:00:00', '7:30:00','8:00:00', '8:30:00', '9:00:00', '9:30:00', '10:00:00', '10:30:00', 
    '11:00:00','11:30:00', '12:00:00', '12:30:00', '13:00:00', '13:30:00')

and use the same code, i get the following error:

    Error in match.names(clabs, names(xi)) : names do not match previous names

Cheers, Swagath

like image 544
Nav Avatar asked Mar 12 '26 22:03

Nav


1 Answers

Here's your data for the edited question:

x <- c('0:00:00', '0:30:00', '1:00:00', '1:30:00', '2:00:00', '2:30:00', 
      '3:00:00', '3:30:00', '4:00:00', '4:30:00', '5:00:00', '5:30:00',
      '6:00:00', '6:30:00', '7:00:00', '7:30:00','8:00:00', '8:30:00',
      '9:00:00', '9:30:00', '10:00:00', '10:30:00', '11:00:00','11:30:00',
      '0:00:00', '0:30:00', '1:00:00', '1:30:00', '2:00:00', '2:30:00',
      '3:00:00', '3:30:00', '4:00:00', '4:30:00', '5:00:00', '5:30:00',
      '6:00:00', '6:30:00', '7:00:00', '7:30:00','8:00:00', '8:30:00', 
      '9:00:00', '9:30:00', '10:00:00', '10:30:00', '11:00:00','11:30:00', 
      '12:00:00', '12:30:00', '13:00:00', '13:30:00')

y=seq(1:52)

data1=data.frame(x,y)

We need to create a categorical variable indicating days, and all we have to work with here is the times. If the time regresses, assume that it is a new day. To do this, we will convert the time values to integers, in order, by using a factor.

Here is a vector lev of levels, c('0:00:00', '0:30:00', '1:00:00', ...), and a factor fac which contains the same strings as data$x, but uses this vector as levels:

lev <- paste(t(outer(0:23, c('00', '30'), paste, sep=':')), '00', sep=':')
fac <- factor(as.character(data1$x), levels=lev, ordered=TRUE)

Now we see when we regress in time by applying diff:

d <- c(0, diff(
  as.numeric(factor(as.character(data1$x), levels=lev, ordered=TRUE)))
       )

Now (inspired by both of the other two answers to this question), cumsum(d<0) is the categorical variable that we need, which can be applied the data frame, and used to reshape:

data1$grp <- cumsum(d<0)
res <- reshape(data1, direction="wide", idvar="x", timevar="grp")

> res
          x y.0 y.1
1   0:00:00   1  25
2   0:30:00   2  26
3   1:00:00   3  27
4   1:30:00   4  28
5   2:00:00   5  29
6   2:30:00   6  30
7   3:00:00   7  31
8   3:30:00   8  32
9   4:00:00   9  33
10  4:30:00  10  34
11  5:00:00  11  35
12  5:30:00  12  36
13  6:00:00  13  37
14  6:30:00  14  38
15  7:00:00  15  39
16  7:30:00  16  40
17  8:00:00  17  41
18  8:30:00  18  42
19  9:00:00  19  43
20  9:30:00  20  44
21 10:00:00  21  45
22 10:30:00  22  46
23 11:00:00  23  47
24 11:30:00  24  48
49 12:00:00  NA  49
50 12:30:00  NA  50
51 13:00:00  NA  51
52 13:30:00  NA  52

How this differs from the other answers: it does not assume that a day will always contain the time "0:00:00", and it does not require that data1$x be a character variable -- and even if it is, it gets the times in correct order. Comparing character will say that 2:00:00 occurs after 13:00:00.

like image 64
Matthew Lundberg Avatar answered Mar 15 '26 11:03

Matthew Lundberg