Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to find all loops in a sequence of numbers using R

Tags:

r

Have a sequence of numbers.

seqNum <- sample(1:15, 30, replace = T)
[1] 10  7  6  5  4  1 15 11  7 15  1  2  3 14  8  3  5 10  8  3  14 8 14  3 14 12 15 12 10 14

Define "Loop": From the first number in the sequence, all numbers between two repeat number is defined as a loop. For example, the first loop in the given sequence above is "7 6 5 4 1 15 11". Remove it from the sequence then repeat the previous method to find next loop.
Sequence after remove the first loop became:

10 15 1 2  3 14  8  3  5 10  8  3  14 8 14  3 14 12 15 12 10 14

Second loop is "3 14 8".
Sequence after remove the second loop became:

10 15 1 2 5 10  8  3  14 8 14  3 14 12 15 12 10 14

Third loop is "10 15 1 2 5". Sequence after remove the third loop became:

8  3  14 8 14  3 14 12 15 12 10 14

Fourth loop is "8 3 14". Sequence after remove the fourth loop became:

14 3 14 12 15 12 10 14

Fifth loop is "14 12 15 12 10".(Always using 2 closest repeated number) Sequence after remove the fifth loop became:

14 3

DONE.

Note:

  • "loop" must contains at least 3 numbers

  • two loop consider to be the same loop if

    1. contains same number
    2. relative order is same, for example loop "1 2 3 4" is same as loop "3 4 1 2" or "4 1 2 3" or "2 3 4 1", but different from loop "1 3 2 4" or "2 1 3 4", etc.

Want:
Find all loops in the given sequence and give the count of each unique loop.

Desired result:

  count            loop
1     1 7-6-5-4-1-15-11
2     2          3-14-8
3     1     10-15-1-2-5
4     1  14-12-15-12-10

I understand this is a long question but I will appreciate any hints. Thank you!!

PS: This suppose to search loops on a very long sequence of numbers, say total around 10^8, please use sample(1:1024, 100000, replace = T) to test.

like image 945
xluo Avatar asked Sep 19 '25 02:09

xluo


2 Answers

the following code will find and print all 'loops' working on the example as expected:

seqNum <- c(10, 7, 6, 5, 4, 1, 15, 11,  7, 15,  1,  2,  3, 14,  8,  3,  5, 10,  8,  3,  14, 8, 14,  3, 14, 12, 15, 12, 10, 14)
loops <- matrix(ncol = 2,nrow = 0, dimnames = list(numeric(0),c("count","loop")))

remove_loop <- function(seqNum) {
  ht <- new.env()
  for(j in 1:length(seqNum)) {
    i <- seqNum[j]
    key <- as.character(i)
    if(exists(key,envir=ht)) {
      lastIdx <- ht[[key]]
      loop <- seqNum[lastIdx:(j-1)]
      if(length(unique(loop)) > 2) {
        return(list(loop=loop,newSeqNum=seqNum[-(lastIdx:j)]))
      }
    }
    ht[[key]] <- j
  }
}

newSeqNum <- seqNum
repeat{
  l <- remove_loop(newSeqNum)
  newSeqNum <- l$newSeqNum
  if(length(l$loop)){
    print(l$loop)
  } else {
    break
  }
}

The output is

[1]  7  6  5  4  1 15 11
[1]  3 14  8
[1] 10 15  1  2  5
[1]  8  3 14
[1] 14 12 15 12 10

It works by repeatedly removing the next loop, i.e. calling the function remove_loop on the remaining sequence. The function remove_loop works as follows: Using an environment as a hash table (variable ht) it keeps track of the last index each number was encountered at. If a number has been seen at least a second time (i.e. has an entry in the hash table) then a 'loop' may have been found. If the number has been seen more than two (say k) times it is true that the first (k-1) occurences did not meet the criteria for a 'loop'. Thus only the current and the last occurence are relevant. The current occurence is at position j and the previous occurence can be found in the hash table. If this 'loop' contains at least three distinct number a 'loop' has been found.

Edit: The following code counts all the occurences of different loops. Loops are represented in a 'canonical' form by wrapping them around such that they start with the smallest element:

repr_loop <- function(l) {
  idx <- which.min(l)
  if(idx != 1) l <- c(l[idx:length(l)],l[1:(idx-1)])
  paste0(l,collapse="-")
}

loops <- data.frame(count=numeric(),loop=character())

newSeqNum <- seqNum
repeat{
  l <- remove_loop(newSeqNum)
  newSeqNum <- l$newSeqNum
  if(length(l$loop)){
    s <- repr_loop(l$loop)
    idx <- match(s,loops[,"loop"])
    if(!is.na(idx)) {
      loops[idx,"count"] <- loops[idx,"count"] + 1
    } else {
      loops <- rbind(loops,data.frame(count=1,loop=s))
    }
  } else {
    break
  }
}
loops

which yields the following output

> loops
  count            loop
1     1 1-15-11-7-6-5-4
2     2          3-14-8
3     1     1-2-5-10-15
4     1  10-14-12-15-12
like image 172
Jonathan von Schroeder Avatar answered Sep 20 '25 16:09

Jonathan von Schroeder


You can try

Your data

d <- c(10,7,6,5,4,1,15,11,7,15,1,2,3,14,8,3,5,10,8,3,14,8,14,3,14,12,15,12,10,14)

I included everything in a function to easily use the code. The main idea is, to search for duplicates, check the length between the first duplicate and the first occurence of the number. If it is euqual or longer than three quit the loop and extract the segment, then update the vector and do all things as long as there are no further duplicates (while). I have to say that there could be problems when two loop segments are occuring at the same time in the vector.

foo <- function(x){
 d1 <- x
 res <- NULL  # vector for the results
  while(any(duplicated(d1))){
   gr <- which(duplicated(d1))
      for(i in gr){
        # here the magic happens
        pos <- which(d1 == d1[i])
        gr_pos <- which(diff(pos) >= 3)
        pos <- pos[c(gr_pos,gr_pos+1)]
        if(pos[2]-pos[1] >= 3) break
      }
   # extract the "loop" sequences
   extract <- d1[seq(pos[1],pos[2])][-length(seq(pos[1],pos[2]))]
   res <-  append(res,paste(sort(extract), collapse = "-")) # save the loop
   d1 <- d1[-seq(pos[1],pos[2])] # update input vector
   if(length(d1) < 3) break # emergency stop
  }
 data.frame(table(res)) # output
}

foo(d)
              res Freq
1     1-2-5-10-15    1
2 1-4-5-6-7-11-15    1
3  10-12-12-14-15    1
4          3-8-14    2
like image 25
Roman Avatar answered Sep 20 '25 14:09

Roman