I am writing a program to determine if the Levenshtein distance between two strings is exactly 2 in linear time.
I have an algorithm which does this. I use the naive recursive approach which scans the string and when it finds a discrepancy it branches into 3 options trying either to delete, insert or replace. However I make a modification that if we exceed 2 as our total we just give up on that branch. This limits the total number of branches to 9 and makes the algorithm linear.
Here's my initial implementation:
diffIs2 x y =
  2 == go 0 x y
  where
    go total xs@(x1:xs') ys@(y1:ys')
      | x1 == y1
      =
        go total xs' ys'
      | total < 2
      =
        minimum $ zipWith (go $ total + 1)
          [ xs', xs , xs' ]
          [ ys , ys', ys' ]
    go total xs ys =
      total + length xs + length ys
Testing confirms that this runs in linear time.
I also have a second similar program which as far as I can tell should be linear as well.
The idea here is to use short circuiting and lazy evaluation to limit the number of branches evaluated instead. We always allow a branching however, when we branch we take the minimum of the result of each and 3. That way if the branch total exceeds 3 the short circuit should prevent further evaluation.
We have to implement a natural number type for partial evaluation.
data Nat
  = Zero
  | Succ Nat
  deriving
    ( Eq
    , Ord
    )
natLength :: [a] -> Nat
natLength [] =
  Zero
natLength (a : b) =
  Succ (natLength b)
nat3 =
  Succ $ Succ $ Succ Zero
diffIs2 x y =
  Succ (Succ Zero) == go x y
  where
    go xs@(x1:xs') ys@(y1:ys')
      | x1 == y1
      =
        go xs' ys'
      | otherwise
      =
        Succ $
          minimum $
            map (min nat3) $
              zipWith go
                [ xs', xs , xs' ]
                [ ys , ys', ys' ]
    go xs ys =
      natLength $ xs ++ ys
Testing this reveals it doesn't run in linear time.  Something makes it exponential but I can't figure out what.  The short circuiting does work as expected.  We can show this with the following program which halts in finite time because of the short circuiting of min:
f = Succ f
main =
  print $ (Succ Zero) == min (Succ Zero) f
but when put together in the algorithm it doesn't seem to work as I expect.
Why is the second code exponential? What is my misconception?
While the default min is lazy enough for your simple example at the end of your question, it is not as lazy as we would hope:
ghci> let inf = Succ inf
ghci> inf `min` inf `min` Zero == Zero
^CInterrupted.
To fix it we need a lazier min:
min' :: Nat -> Nat -> Nat
min' Zero _ = Zero
min' _ Zero = Zero
min' (Succ x) (Succ y) = Succ (min' x y)
The big difference is that now we can get a partial result before evaluating the arguments completely:
min (Succ undefined) (Succ undefined) === undefined
min' (Succ undefined) (Succ undefined) === Succ (min' undefined undefined)
Now we can use it as follows:
diffIs2 :: Eq a => [a] -> [a] -> Bool
diffIs2 x y = Succ (Succ Zero) == go x y
  where
    go xs@(x1:xs') ys@(y1:ys')
      | x1 == y1 = go xs' ys'
      | otherwise = Succ $ go xs' ys `min'` go xs ys' `min'` go xs' ys'
    go xs ys = natLength $ xs ++ ys
Note that you do not even really need the extra min' nat3, because the top level comparison will only force the first three Succs anyway.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With