Background:
I have a sequence of contiguous, time-stamped data. The data-sequence has gaps in it where the data is not contiguous. I want create a method to split the sequence up into a sequence of sequences so that each subsequence contains contiguous data (split the input-sequence at the gaps).
Constraints:
Method signature
let groupContiguousDataPoints (timeBetweenContiguousDataPoints : TimeSpan) (dataPointsWithHoles : seq<DateTime * float>) : (seq<seq< DateTime * float >>)= ... 
On the face of it the problem looked trivial to me, but even employing Seq.pairwise, IEnumerator<_>, sequence comprehensions and yield statements, the solution eludes me. I am sure that this is because I still lack experience with combining F#-idioms, or possibly because there are some language-constructs that I have not yet been exposed to.
// Test data
let numbers = {1.0..1000.0}
let baseTime = DateTime.Now
let contiguousTimeStamps = seq { for n in numbers ->baseTime.AddMinutes(n)}
let dataWithOccationalHoles = Seq.zip contiguousTimeStamps numbers |> Seq.filter (fun (dateTime, num) -> num % 77.0 <> 0.0) // Has a gap in the data every 77 items
let timeBetweenContiguousValues = (new TimeSpan(0,1,0))
dataWithOccationalHoles |> groupContiguousDataPoints timeBetweenContiguousValues |> Seq.iteri (fun i sequence -> printfn "Group %d has %d data-points: Head: %f" i (Seq.length sequence) (snd(Seq.hd sequence)))
I think this does what you want
dataWithOccationalHoles 
|> Seq.pairwise
|> Seq.map(fun ((time1,elem1),(time2,elem2)) -> if time2-time1 = timeBetweenContiguousValues then 0, ((time1,elem1),(time2,elem2)) else 1, ((time1,elem1),(time2,elem2)) )
|> Seq.scan(fun (indexres,(t1,e1),(t2,e2)) (index,((time1,elem1),(time2,elem2))) ->  (index+indexres,(time1,elem1),(time2,elem2))  ) (0,(baseTime,-1.0),(baseTime,-1.0))
|> Seq.map( fun (index,(time1,elem1),(time2,elem2)) -> index,(time2,elem2) )
|> Seq.filter( fun (_,(_,elem)) -> elem <> -1.0)
|> PSeq.groupBy(fst)
|> Seq.map(snd>>Seq.map(snd))
Thanks for asking this cool question
I translated Alexey's Haskell to F#, but it's not pretty in F#, and still one element too eager.
I expect there is a better way, but I'll have to try again later.
let N = 20
let data =  // produce some arbitrary data with holes
    seq {
        for x in 1..N do
            if x % 4 <> 0 && x % 7 <> 0 then
                printfn "producing %d" x
                yield x
    }
let rec GroupBy comp (input:LazyList<'a>) : LazyList<LazyList<'a>> = 
    LazyList.delayed (fun () ->
    match input with
    | LazyList.Nil -> LazyList.cons (LazyList.empty()) (LazyList.empty())
    | LazyList.Cons(x,LazyList.Nil) -> 
        LazyList.cons (LazyList.cons x (LazyList.empty())) (LazyList.empty())
    | LazyList.Cons(x,(LazyList.Cons(y,_) as xs)) ->
        let groups = GroupBy comp xs
        if comp x y then
            LazyList.consf 
                (LazyList.consf x (fun () -> 
                    let (LazyList.Cons(firstGroup,_)) = groups
                    firstGroup)) 
                (fun () -> 
                    let (LazyList.Cons(_,otherGroups)) = groups
                    otherGroups)
        else
            LazyList.cons (LazyList.cons x (LazyList.empty())) groups)
let result = data |> LazyList.of_seq |> GroupBy (fun x y -> y = x + 1)
printfn "Consuming..."
for group in result do
    printfn "about to do a group"
    for x in group do
        printfn "  %d" x
You seem to want a function that has signature
(`a -> bool) -> seq<'a> -> seq<seq<'a>>
I.e. a function and a sequence, then break up the input sequence into a sequence of sequences based on the result of the function.
Caching the values into a collection that implements IEnumerable would likely be simplest (albeit not exactly purist, but avoiding iterating the input multiple times. It will lose much of the laziness of the input):
let groupBy (fun: 'a -> bool) (input: seq) =
  seq {
    let cache = ref (new System.Collections.Generic.List())
    for e in input do
      (!cache).Add(e)
      if not (fun e) then
        yield !cache
        cache := new System.Collections.Generic.List()
    if cache.Length > 0 then
     yield !cache
  }
An alternative implementation could pass cache collection (as seq<'a>) to the function so it can see multiple elements to chose the break points.
A Haskell solution, because I don't know F# syntax well, but it should be easy enough to translate:
type TimeStamp = Integer -- ticks
type TimeSpan  = Integer -- difference between TimeStamps
groupContiguousDataPoints :: TimeSpan -> [(TimeStamp, a)] -> [[(TimeStamp, a)]]
There is a function groupBy :: (a -> a -> Bool) -> [a] -> [[a]] in the Prelude:
The group function takes a list and returns a list of lists such that the concatenation of the result is equal to the argument. Moreover, each sublist in the result contains only equal elements. For example,
group "Mississippi" = ["M","i","ss","i","ss","i","pp","i"]It is a special case of groupBy, which allows the programmer to supply their own equality test.
It isn't quite what we want, because it compares each element in the list with the first element of the current group, and we need to compare consecutive elements. If we had such a function groupBy1, we could write groupContiguousDataPoints easily:
groupContiguousDataPoints maxTimeDiff list = groupBy1 (\(t1, _) (t2, _) -> t2 - t1 <= maxTimeDiff) list
So let's write it!
groupBy1 :: (a -> a -> Bool) -> [a] -> [[a]]
groupBy1 _    []            = [[]]
groupBy1 _    [x]           = [[x]]
groupBy1 comp (x : xs@(y : _))
  | comp x y  = (x : firstGroup) : otherGroups
  | otherwise = [x] : groups
  where groups@(firstGroup : otherGroups) = groupBy1 comp xs
UPDATE: it looks like F# doesn't let you pattern match on seq, so it isn't too easy to translate after all. However, this thread on HubFS shows a way to pattern match sequences by converting them to LazyList when needed.
UPDATE2: Haskell lists are lazy and generated as needed, so they correspond to F#'s LazyList (not to seq, because the generated data is cached (and garbage collected, of course, if you no longer hold a reference to it)).
Okay, trying again. Achieving the optimal amount of laziness turns out to be a bit difficult in F#... On the bright side, this is somewhat more functional than my last attempt, in that it doesn't use any ref cells.
let groupBy cmp (sq:seq<_>) =
  let en = sq.GetEnumerator()
  let next() = if en.MoveNext() then Some en.Current else None
  (* this function returns a pair containing the first sequence and a lazy option indicating the first element in the next sequence (if any) *)
  let rec seqStartingWith start =
    match next() with
    | Some y when cmp start y ->
        let rest_next = lazy seqStartingWith y // delay evaluation until forced - stores the rest of this sequence and the start of the next one as a pair
        seq { yield start; yield! fst (Lazy.force rest_next) }, 
          lazy Lazy.force (snd (Lazy.force rest_next))
    | next -> seq { yield start }, lazy next
  let rec iter start =
    seq {
      match (Lazy.force start) with
      | None -> ()
      | Some start -> 
          let (first,next) = seqStartingWith start
          yield first
          yield! iter next
    }
  Seq.cache (iter (lazy next()))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With