I have a bunch of files several MiB in size which are very simple:
BinaryReader's ReadDouble() methodWhen lexicographically sorted, they contain all values in the sequence they need to be.
I can't keep everything in memory as a float list or float array so I need a float seq that goes through the necessary files when actually being accessed. The portion that goes through the sequence actually does it in imperative style using GetEnumerator() because I don't want any resource leaks and want to close all files correctly.
My first functional approach was:
let readFile file = 
    let rec readReader (maybeReader : BinaryReader option) = 
        match maybeReader with
        | None -> 
            let openFile() = 
                printfn "Opening the file"
                new BinaryReader(new FileStream(file, FileMode.Open, FileAccess.Read, FileShare.Read))
                |> Some
                |> readReader
            seq { yield! openFile() }
        | Some reader when reader.BaseStream.Position >= reader.BaseStream.Length -> 
            printfn "Closing the file"
            reader.Dispose()
            Seq.empty
        | Some reader -> 
            reader.BaseStream.Position |> printfn "Reading from position %d"
            let bytesToRead = Math.Min(1048576L, reader.BaseStream.Length - reader.BaseStream.Position) |> int
            let bytes = reader.ReadBytes bytesToRead
            let doubles = Array.zeroCreate<float> (bytesToRead / 8)
            Buffer.BlockCopy(bytes, 0, doubles, 0, bytesToRead)
            seq { 
                yield! doubles
                yield! readReader maybeReader
            }
    readReader None
And then, when I have a string list containing all the files, I can say something like:
let values = files |> Seq.collect readFile
use ve = values.GetEnumerator()
// Do stuff that only gets partial data from one file
However, this only closes the files when the reader reaches its end (which is clear when looking at the function). So as a second approach I implemented the file enumerating imperatively:
type FileEnumerator(file : string) = 
    let reader = new BinaryReader(new FileStream(file, FileMode.Open, FileAccess.Read, FileShare.Read))
    let mutable _current : float = Double.NaN
    do file |> printfn "Enumerator active for %s"
    interface IDisposable with
        member this.Dispose() = 
            reader.Dispose()
            file |> printfn "Enumerator disposed for %s"
    interface IEnumerator with
        member this.Current = _current :> obj
        member this.Reset() = reader.BaseStream.Position <- 0L
        member this.MoveNext() = 
            let stream = reader.BaseStream
            if stream.Position >= stream.Length then false
            else 
                _current <- reader.ReadDouble()
                true
    interface IEnumerator<float> with
        member this.Current = _current
type FileEnumerable(file : string) = 
    interface IEnumerable with
        member this.GetEnumerator() = new FileEnumerator(file) :> IEnumerator
    interface IEnumerable<float> with
        member this.GetEnumerator() = new FileEnumerator(file) :> IEnumerator<float>
let readFile' file = new FileEnumerable(file) :> float seq
now, when I say
let values = files |> Seq.collect readFile'
use ve = values.GetEnumerator()
// do stuff with the enumerator
disposing the enumerator correctly bubbles through to my imperative enumerator.
While this is a feasible solution for what I want to achieve (I could make it faster by reading it blockwise like the first functional approach but for brevity I didn't do it here) I wonder if there is a truly functional approach for this avoiding the mutable state in the enumerator.
I don't quite get what you mean when you say that using GetEnumerator() will prevent resource leaks and allow to close all files correctly. The below would be my attempt at this (ignoring block copy part for demonstration purposes) and I think it results in the files properly closed.
let eof (br : BinaryReader) = 
  br.BaseStream.Position = br.BaseStream.Length  
let readFileAsFloats filePath = 
    seq{
        use file = new FileStream(filePath, FileMode.Open, FileAccess.Read, FileShare.Read)
        use reader = new BinaryReader(file)
        while (not (eof reader)) do
            yield reader.ReadDouble()
    }
let readFilesAsFloats filePaths = 
    filePaths |> Seq.collect readFileAsFloats
let floats = readFilesAsFloats ["D:\\floatFile1.txt"; "D:\\floatFile2.txt"]
Is that what you had in mind?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With