I'm trying to solve the longest palindromic substring problem in Haskell in O(n) time without using indexing operations like !! or the like. I'm aware of two approaches that achieve optimal time complexity, Manacher's algorithm and an approach hinted at here using suffix trees.
I have already found this implementation of something very like Manacher's algorithm, but unfortunately it fundamentally uses index arithmetic to keep track of palindrome bounds.
Manacher's algorithm stores information about overlapping palindromes to avoid extra computation, and so I figured it might be possible to write a somewhat less sophisticated pure version of this thing and then use MemoTrie to speed it up. However I'm a bit lost as to how we could do that here.
I was able to find much less information about how to implement the suffix tree version. Naively, I'd expect anything that makes use of trees to be a much better fit for a purely functional language, except that the algorithms used to construct a suffix tree in O(n) time are rather complicated, and may make use of mutation/references that can't be ported to Haskell.
My question is: can we write an O(n) solution to this problem in Haskell that doesn't make use of indices or mutable references?
tl;dr full implementation at the end.
This is quite a fun exercise actually. Like Daniel mentioned in the comments, the Manacher algorithm does not actually need full random array access - at any point it only needs to look at consequitive array elements, so it lends itself very nicely to solving by a data structure that seems to be called a zipper.
A zipper (list) is conceptually a list with a focus point that can move left and right and can access neighbouring elements. For this case I used https://hackage.haskell.org/package/ListZipper-1.2.0.2/docs/Data-List-Zipper.html.
Now you may try to implement the algorithm directly traversing a single Zipper in the main function, but then[1] you'll get stuck at the step corresponding to the branch with the "break" in the imperative version in the wiki article. There the radius for the top function/loop is mutated (!) so that the next iteration starts not with the default 0 radius but with a strictly positive radius (which is ultimately the reason the implementation is O(n) and not O(n²)).
To avoid this difficulty we'll start by defining
type LookAround a =
( Zipper a -- (l) Left pointer
, Zipper a -- (c) Cursor
, Zipper a -- (r) Right pointer
, Int -- (w) Window radius
)
so that a quadruplet (l,c,r,w) is a "palindrome window" of radius w centered at c and with pointers to the left l and right r ends. It is straightforward to expand a LookAround to the widest window that is still a palindrome:
expandPalindrome :: (Eq a) => LookAround a -> LookAround a
expandPalindrome = until stop step where
-- l and r go left and right, respectively, increasing window radius by 1
step (l, c, r, w) = (left l, c, right r, w+1)
-- stop when we hit left or right end or when left≠right
stop (l, _, r, _) = beginp l' || endp r' || cursor l' /= cursor r'
where l' = left l; r' = right r
cursor z returns the zipper's focus; beginp/endp check if the zipper is focused at the ends; left/right move the focus left and right.
For easier cache management I decided to use a State (see earlier answer versions for a direct implementation). Here's an excerpt that provides a good overview of how it works:
newtype Cache = Cache (Zipper Int) deriving (Show)
newtype MirrorCache = MirrorCache (Zipper Int)
palindrome' :: (Eq a) => LookAround a -> State Cache (LookAround a)
palindrome' = iterateUntilM stop stepSearch where
stop (l, _, _, _) = endp l
stepSearch a = do
let expanded = expandPalindromeAround a
pushToCache $ width expanded
reuseCache expanded
reuseCache :: (Eq a)
-- A found palindrome slice for whose right half to reuse cache
=> LookAround a
-- Returns a LookAround from which to continue the palindrome search
-- in the next step, with cache reused as much as possible.
-> State Cache (LookAround a)
shrinkShift :: LookAround a -> Either (LookAround a) (LookAround a)
Note that both Cache and MirroredCache are Zippers, because they too will need to move forward (when populating the cache) and backward (when reusing the mirror cells). I defined them as newtypes to avoid mistakes when passing around.
palindrome' progresses the LookAround to the right, filling the Cache state (start with empty). Following the definition literally, it expands polinomes and reuses the cache within them, stopping when it hits the end.
reuseCache implements essentially the inner loop from the imperative algorithm. Given an already found palindrome p, it does it by
dragging a mirror cache pointer to the left, and
advancing p's center to the right, keeping its right end fixed and adjusting the left end to preserve left-right symmetry.
I added explanations how I implemented this as comments to the definition of the shrinkShift function.
Note that both functions need no arbitrary array access - all the information is already encoded in the requested Zippers.
One last note before the full implementation: I did only the odd-length case (mostly following the wikipedia implementation), so for testing I wrote a helper oddify to put the bars and a palindrome :: (Eq a, Show a) => [a] -> [Int] wrapper around palindrome' above.
import Control.Monad.Loops (iterateUntilM)
import Control.Monad.State.Lazy (State, execState, get, modify)
import Data.Functor (($>))
import Data.List.Zipper (Zipper, beginp, cursor, empty, endp,
fromList, left, push, right, toList)
type LookAround a =
( Zipper a -- (l) Left pointer
, Zipper a -- (c) Cursor
, Zipper a -- (r) Right pointer
, Int -- (w) Window radius
)
newtype Cache = Cache (Zipper Int) deriving (Show)
newtype MirrorCache = MirrorCache (Zipper Int) deriving (Show)
-- I assume this def is self-explanatory
palindrome' :: (Eq a) => LookAround a -> State Cache (LookAround a)
palindrome' = iterateUntilM stop stepSearch where
stop (l, _, _, _) = endp l
stepSearch a = do
let expanded = expandPalindromeAround a
pushToCache $ width expanded
reuseCache expanded
-- Expand the given LookAround to the wides palindrome around it.
expandPalindromeAround :: (Eq a) => LookAround a -> LookAround a
expandPalindromeAround = until stop step where
-- l and r go left and right, respectively, increasing window radius by 1
step (l, c, r, w) = (left l, c, right r, w+1)
-- stop when we hit left or right end or when left≠right
stop (l, _, r, _) = beginp l' || endp r' || cursor l' /= cursor r'
where l' = left l; r' = right r
reuseCache :: (Eq a)
-- A found palindrome slice for whose right half to reuse cache
=> LookAround a
-- Returns a LookAround from which to continue the palindrome search
-- in the next step, with cache reused as much as possible.
-> State Cache (LookAround a)
reuseCache p = do
cache <- get
(sliceAtEnd, _) <-
iterateUntilM stop step
-- THe loop threads through the palindrome the following data:
-- A right-shrink-shifting slice of the original list, anchored to
-- the right end of the palindrome. This slice represents the
-- boundaries at the current step at which the palindrome search
-- would have to continue in case the mirror cache value cannot
-- be reused.
( shrinkShift p
-- A left-moving pointer, mirror to the center of the above slice
-- with respect to the palindrome center.
, mkMirror cache
)
return $ fromEither sliceAtEnd where
shrinkShift :: LookAround a -> Either (LookAround a) (LookAround a)
-- If at a given step the slice is of radius 0, it has hit the end of the
-- given palindrome (p), so it's the last step at which cache can be
-- reused within this palindrome. We denote this case by a Right and
-- put in it the LookAround from which the palindrome search will have
-- to continue - a radius-0 slice at the next element.
shrinkShift (_, c, _, 0) = Right (c',c',c',0) where c' = right c
-- If the slice is of a positive radius, however, there's still room to
-- move and there is potentially reusable cache for the elements in the
-- interval (c, r]. A shrink-shift op in this case look like this:
-- _ _ (l _ _ c _ _ r) _ ...
-- ==> _ _ (l _ c _ r) _ ...
-- (only l,c,r pointers move, not the data behind them)
shrinkShift (l, c, r, w) = Left (right (right l), right c, r, w - 1)
stop :: (Either (LookAround a) (LookAround a), MirrorCache)
-> Bool
-- A Right denotes the case where the entire (right half of the)
-- palindrome did benefit from the cache, so cache reuse should stop.
stop (Right _, _) = True
-- A Left slice denotes a center which could benefit from a mirror
-- cache, but only if the mirror of the slice center (w.r.t. the parent
-- palindrome p in the argument of `reuseCache`) hosts a palindrome of
-- radius *unequal* to the current slice radius. If these are equal, we
-- can only guarantee that a palindrome centered at this slice is at
-- least as wide as this slice - and this is where the wikipedia
-- imperative algorithm mutates the radius and uses a `break`.
stop (Left slice, mirror) = cursor' mirror == width slice
step :: (Either (LookAround a) (LookAround a), MirrorCache)
-> State Cache (Either (LookAround a) (LookAround a), MirrorCache)
step (Left slice, mirror) =
-- In the Left case we reuse the cache value from the mirror.
-- If it exceeds the slice width, however, we take the latter. As
-- explained in the wiki article, if the palindrome at the current
-- slice's center extended beyond the slice (and thus beyond the
-- parent palindrome p), then the parent palindrome would have been
-- wider.
let reuseVal = cursor' mirror `min` width slice
in pushToCache reuseVal
-- The next step's slice shifts and its mirror moves 1 step left.
$> (shrinkShift slice, left' mirror)
-- Only to silence the partial pattern match warning. This case is
-- impossible because `stop` prevents Right's from proceeding.
step _ = undefined
-- When the parent palindrome's center was just pushed to the left in
-- the cache, the first mirror value is 2 positions to the left.
mkMirror (Cache c) = MirrorCache (left $ left c)
left' (MirrorCache m) = MirrorCache (left m)
cursor' (MirrorCache m) = cursor m
-- Whether the loop ended by a Right (exhausting all cache) or by a
-- Left (at an uncertain position), the palindrome search should
-- continue from the contained value.
fromEither :: Either a a -> a
fromEither = either id id
pushToCache :: Int -> State Cache ()
pushToCache v = modify $ \(Cache c) -> Cache $ push v c
width :: LookAround a -> Int
width (_, _, _, radius) = radius
-----------------------------------------------------
---- I test in ghci with:
---- mapM_ print $ palindrome s `zip` oddify s
---- where `s` is the test string
-- Only a wrapper around palindrome' to prepare the input in
-- the suitable form and execute the state monad.
palindrome :: (Eq a, Show a) => [a] -> [Int]
palindrome list =
let z = fromList $ oddify list
Cache cache = execState (palindrome' (z, z, z, 0)) (Cache empty)
in toList cache
-- I decided to use Maybe's for bars (where wikipedia uses '|' chars)
oddify :: [a] -> [Maybe a]
oddify (a:bs@(_:_)) = Just a : Nothing : oddify bs
oddify as = Just <$> as
s :: String
--s = "abracadabra"
s = "yabadabadooo"
If you have some test cases I'll be happy to check my code against them; I didn't find any set readily available.
until and iterateUntilM.If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With