I have a CSV file with two columns, text and count. The goal is to transform the file from this:
some text once,1
some text twice,2
some text thrice,3
To this:
some text once,1
some text twice,1
some text twice,1
some text thrice,1
some text thrice,1
some text thrice,1
repeating each line count times and spreading the count over that many lines.
This seems to me like a good candidate for Seq.unfold, generating the additional lines, as we read the file. I have the following generator function:
let expandRows (text:string, number:int32) =
    if number = 0 
    then None
    else
        let element = text                  // "element" will be in the generated sequence
        let nextState = (element, number-1) // threaded state replacing looping 
        Some (element, nextState)
FSI yields a the following function signature:
val expandRows : text:string * number:int32 -> (string * (string * int32)) option
Executing the following in FSI:
let expandedRows = Seq.unfold expandRows ("some text thrice", 3)
yields the expected:
val it : seq<string> = seq ["some text thrice"; "some text thrice"; "some text thrice"]
The question is: how do I plug this into the context of a larger ETL pipeline? For example:
File.ReadLines(inFile)                  
    |> Seq.map createTupleWithCount
    |> Seq.unfold expandRows // type mismatch here
    |> Seq.iter outFile.WriteLine
The error below is on expandRows in the context of the pipeline.
Type mismatch. 
Expecting a 'seq<string * int32> -> ('a * seq<string * int32>) option'    
but given a     'string * int32 -> (string * (string * int32)) option' 
The type    'seq<string * int 32>' does not match the type 'string * int32'
I was expecting that expandRows was returning seq of string, as in my isolated test. As that is neither the "Expecting" or the "given", I'm confused. Can someone point me in the right direction?
A gist for the code is here: https://gist.github.com/akucheck/e0ff316e516063e6db224ab116501498
Answer: 50° Celsius is equal to 122° Fahrenheit.
Seq.map produces a sequence, but Seq.unfold does not take a sequence, it takes a single value. So you can't directly pipe the output of Seq.map into Seq.unfold. You need to do it element by element instead.
But then, for each element your Seq.unfold will produce a sequence, so the ultimate result will be a sequence of sequences. You can collect all those "subsequences" in a single sequence with Seq.collect:
File.ReadLines(inFile) 
    |> Seq.map createTupleWithCount 
    |> Seq.collect (Seq.unfold expandRows)
    |> Seq.iter outFile.WriteLine
Seq.collect takes a function and an input sequence. For every element of the input sequence, the function is supposed to produce another sequence, and Seq.collect will concatenate all those sequences in one. You may think of Seq.collect as Seq.map and Seq.concat combined in one function. Also, if you're coming from C#, Seq.collect is called SelectMany over there.
In this case, since you simply want to repeat a value a number of times, there's no reason to use Seq.unfold. You can use Seq.replicate instead:
// 'a * int -> seq<'a>
let expandRows (text, number) = Seq.replicate number text
You can use Seq.collect to compose it:
File.ReadLines(inFile)
|> Seq.map createTupleWithCount
|> Seq.collect expandRows
|> Seq.iter outFile.WriteLine
In fact, the only work performed by this version of expandRows is to 'unpack' a tuple and compose its values into curried form.
While F# doesn't come with such a generic function in its core library, you can easily define it (and other similarly useful functions):
module Tuple2 =
    let curry f x y = f (x, y)    
    let uncurry f (x, y) = f x y    
    let swap (x, y) = (y, x)
This would enable you to compose your pipeline from well-known functional building blocks:
File.ReadLines(inFile)
|> Seq.map createTupleWithCount
|> Seq.collect (Tuple2.swap >> Tuple2.uncurry Seq.replicate)
|> Seq.iter outFile.WriteLine
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With