Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

For loop takes forever to run

I have two tables. One has info from 2012 till 2014 with the period of 3 hours. It looks like this:

                    B   C
1   01.06.2012 00:00    10  0   
2   01.06.2012 03:00    10  0   
3   01.06.2012 06:00    10  6   
4   01.06.2012 09:00    7,5 0   
5   01.06.2012 12:00    6   2,5 
6   01.06.2012 15:00    6   0   
7   01.06.2012 18:00    4   0   
8   01.06.2012 21:00    4   0   
9   02.06.2012 00:00    0   0   
10  02.06.2012 03:00    0   0 

The other table is the same time but sampled by 1 minute:

1   01.06.2012 00:00       
2   01.06.2012 00:01       
3   01.06.2012 00:01       
4   01.06.2012 00:03       
5   01.06.2012 00:03       
6   01.06.2012 00:05       
7   01.06.2012 00:05       
8   01.06.2012 00:07       
9   01.06.2012 00:08       
10  01.06.2012 00:09       
11  01.06.2012 00:10

Now, I need the values of 2nd and 3rd rows of the second table to correlate to the first, so that if a timestamp from the second table is between timestamp(i) and timestamp(i+1) of the first table it will take the B(i) and C(i) and copy them. I have this code and I know it works, but it takes more than 12 hours to run it and I have many of such files that I need to work with in the same fashion.

clouds <- read.csv('~/2012-2014 clouds info.csv', sep=";", header = FALSE)
cloudFull <- read.csv('~/2012-2014 clouds.csv', sep=";", header = FALSE)

for (i in 1:nrow(cloudFull)){
  dateOne <- strptime(cloudFull[i,1], '%d.%m.%Y %H:%M')

  for (j in 1:nrow(clouds)){
    bottomDate = strptime(clouds[j,1], '%d.%m.%Y %H:%M')
    upperDate = strptime(clouds[j+1,1], '%d.%m.%Y %H:%M')
    if  ((dateOne >= bottomDate) && (dateOne < upperDate)) {
      cloudFull[i,2] <- clouds[j,2]
      cloudFull[i,3] <- clouds[j,3]
      break

    } 

  }
}

write.csv(cloudFull, file = 'cc.csv')

Now how do I make it run faster? The object.size(cloudFull) gives me 39580744 bytes, it has 470000 rows but other files will have even more data. I'm just beginning with R (have worked in it for 2 days only so far) and I'd be grateful for any advice in a very simple language :D

like image 243
Dana Sharipova Avatar asked Feb 01 '26 06:02

Dana Sharipova


1 Answers

Its hard to know what your real data looks like, but along the lines of

full <- strptime(cloudFull[,1], '%d.%m.%Y %H:%M')
ref <- strptime(clouds[,1], '%d.%m.%Y %H:%M')
## ref <- sort(ref)
cloudsFull[, 2:3] <- clouds[findInterval(full, ref), 2:3]

Use of findInterval() changes the problem into one that scales linearly rather than quadratic.

like image 136
Martin Morgan Avatar answered Feb 02 '26 21:02

Martin Morgan