Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

F# ref-mutable vars vs object fields

I'm writing a parser in F#, and it needs to be as fast as possible (I'm hoping to parse a 100 MB file in less than a minute). As normal, it uses mutable variables to store the next available character and the next available token (i.e. both the lexer and the parser proper use one unit of lookahead).

My current partial implementation uses local variables for these. Since closure variables can't be mutable (anyone know the reason for this?) I've declared them as ref:

let rec read file includepath =
    let c = ref ' '
    let k = ref NONE
    let sb = new StringBuilder()
    use stream = File.OpenText file

    let readc() =
        c := stream.Read() |> char
    // etc

I assume this has some overhead (not much, I know, but I'm trying for maximum speed here), and it's a little inelegant. The most obvious alternative would be to create a parser class object and have the mutable variables be fields in it. Does anyone know which is likely to be faster? Is there any consensus on which is considered better/more idiomatic style? Is there another option I'm missing?

like image 735
rwallace Avatar asked Jan 28 '26 08:01

rwallace


1 Answers

You mentioned that local mutable values cannot be captured by a closure, so you need to use ref instead. The reason for this is that mutable values captured in the closure need to be allocated on the heap (because closure is allocated on the heap).

F# forces you to write this explicitly (using ref). In C# you can "capture mutable variable", but the compiler translates it to a field in a heap-allocated object behind the scene, so it will be on the heap anyway.

Summary is: If you want to use closures, mutable variables need to be allocated on the heap.

Now, regarding your code - your implementation uses ref, which creates a small object for every mutable variable that you're using. An alternative would be to create a single object with multiple mutable fields. Using records, you could write:

type ReadClosure = {
  mutable c : char
  mutable k : SomeType } // whatever type you use here

let rec read file includepath = 
  let state = { c = ' '; k = NONE } 
  // ... 
  let readc() = 
    state.c <- stream.Read() |> char 
    // etc...

This may be a bit more efficient, because you're allocating a single object instead of a few objects, but I don't expect the difference will be noticeable.

There is also one confusing thing about your code - the stream value will be disposed after the function read returns, so the call to stream.Read may be invalid (if you call readc after read completes).

let rec read file includepath =    
  let c = ref ' '    
  use stream = File.OpenText file    
  let readc() =    
    c := stream.Read() |> char    
  readc

let f = read a1 a2
f() // This would fail!

I'm not quite sure how you're actually using readc, but this may be a problem to think about. Also, if you're declaring it only as a helper closure, you could probably rewrite the code without closure (or write it explicitly using tail-recursion, which is translated to imperative loop with mutable variables) to avoid any allocations.

like image 161
Tomas Petricek Avatar answered Jan 31 '26 01:01

Tomas Petricek