So my problem is as follows. I'm trying to implement a streaming parser for RDB files (the dump files that Redis produces). I want to implement a function similar to mapM_ whereby I can , say print out each object represented in the dump file as it is parsed. However, I can't seem to get it to operate in constant space. I find that what is happening is that I'm building a large IO() thunk inside of the Get monad, returning from the Get monad and then executing the IO. Is there anyway to stream my objects as they are parsed to print and then discard them? I've tried Enumerators and Conduits but I haven't seen any real gain. Here is what I have so far:
loadObjs_ :: (Monad m) => (Maybe Integer -> BL8.ByteString -> RDBObj -> Get (m a)) -> Get (m a)
loadObjs_ f = do
             code <- lookAhead getWord8
             case code of
                0xfd -> do
                 skip 1
                 expire <- loadTime
                 getPairs_ f (Just expire)
               0xfc -> do
                 skip 1
                 expire <- loadTimeMs
                 getPairs_ f (Just expire)
               0xfe -> f Nothing "Switching Database" RDBNull
               0xff -> f Nothing "" RDBNull
               _ -> getPairs_ f Nothing
getPairs_ :: (Monad m) => (Maybe Integer -> BL8.ByteString -> RDBObj -> Get (m a)) -> Maybe Integer -> Get (m a)
getPairs_ f ex = do
                !t <- getWord8
                !key <- loadStringObj False
                !obj <- loadObj t
                !rest <- loadObjs_ f
                !out <- f ex key obj
                return (out >> rest)
(loadObj does the actual parsing of a single object but I believe that whatever I need to fix the streaming to operate in constant or near-constant memory is at a higher level in the iteration than loadObj)
getDBs_ :: (Monad m) => (Maybe Integer -> BL8.ByteString -> RDBObj -> Get (m a)) -> Get (m a)
getDBs_ f = do
           opc <- lookAhead getWord8
           if opc == opcodeSelectdb
              then do
                  skip 1
                  (isEncType,dbnum) <- loadLen
                  objs <- loadObjs_ f
                  rest <- getDBs_ f
                  return (objs >> rest)
              else f Nothing "EOF" RDBNull
processRDB_ :: (Monad m) => (Maybe Integer -> BL8.ByteString -> RDBObj -> Get (m a)) -> Get (m a)
processRDB_ f = do
                header <- getBytes 9
                dbs <- getDBs_ f
                eof <- getWord8
                return (dbs)
printRDBObj :: Maybe Integer -> BL8.ByteString -> RDBObj -> Get (IO ())
printRDBObj (Just exp) key obj = return $ (print ("Expires: " ++ show exp) >>
                                           print ("Key: " ++ (BL8.unpack key)) >> 
                                           print ("Obj: " ++ show obj))
printRDBObj Nothing key RDBNull = return $ (print $ BL8.unpack key)
printRDBObj Nothing key obj = return $ (print ("Key: " ++ (BL8.unpack key)) >> 
                                        print ("Obj: " ++ show obj))
main = do
       testf <- BL8.readFile "./dump.rdb"
       runGet (processRDB_ printRDBObj)  testf
Thanks all in advance.
Best, Erik
EDIT: Here is my attempt to parse the objects into a lazy list and then IO over the lazy list.
processRDB :: Get [RDBObj]
processRDB = do
                header <- getBytes 9
                dbs <- getDBs
                eof <- getWord8
                return (dbs)
main = do
       testf <- BL8.readFile "./dump.rdb"
       mapM_ (print . show) $ runGet processRDB testf
If I understand your code correctly, you are trying to convert the file contents into IO actions incrementally, in the hope of then executing those actions incrementally.
A better approach would be to have your parser return a lazy list of objects which you then print out.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With