Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

haskell write large string

Tags:

haskell

Hello Stackoverflow Community.

I'm relativly new to Haskell and i have noticed writing large strings to a file with writeFile or hPutStr is extremly slow.

For a 1.5 Mb String my Programm (compiled with ghc) takes about 2 seconds while the "same" code in c++ only takes about 0.1 seconds. The string is generated from a list with about 10000 elements and then dumped with writeFile. I have also tried to traverse the the list with mapM_ and hPutStr with the same result.

Is there a faster way to write a large string?

Update

As @applicative pointed out the following code finishes with a 2MB file in no time

main = readFile "input.txt" >>= writeFile "ouput.txt"

So my problem seems to be somewhere else. Here are my two implementations for Writing the list (WordIndex and CoordList are typealiases for a Map and a List)

with hPutStrLn

-- Print to File
indexToFile :: String -> WordIndex -> IO ()
indexToFile filename index =
    let 
        indexList = map (\(k, v) -> entryToString k v)  (Map.toList index)
    in do
        output <- openFile filename WriteMode
        mapM_ (\v -> hPutStrLn output v) indexList
        hClose output


-- Convert Listelement to String
entryToString :: String -> CoordList -> String
entryToString key value = (embedString 25 key) ++ (coordListToString value) ++ "\n"

with writeFile

-- Print to File
indexToFile :: String -> WordIndex -> IO ()
indexToFile filename index = writeFile filename (indexToString "" index)

-- Index to String
indexToString :: String -> WordIndex -> String
indexToString lead index = Map.foldrWithKey (\k v r -> lead ++ (entryToString k v) ++ r) "" index

Maybe you guys can help me a little in finding a speed up here.

Thanks in advance

like image 366
The Dude Avatar asked Mar 11 '26 03:03

The Dude


2 Answers

This is well-known problem. The default Haskell String type is simple [Char] and is slow by definition and is dead slow if it is constructed lazily (usual situation). However, as list, it allows simple and clean processing using list combinators and is useful when performance is not an issue. If it is, one should use ByteString or Text packages. ByteString is better as it is shipped with ghc, but does not provide unicode support. ByteString-based utf8 packages are available on hackage.

like image 121
permeakra Avatar answered Mar 13 '26 17:03

permeakra


Yes. You could, for instance, use the Text type from the module Data.Text or Data.Text.Lazy, which internally represent text in a more efficient way (namely UTF-16) than lists of Chars do.

When writing binary data (which may or may not contain text encoded in some form) you can use ByteStrings or their lazy equivalents.

When modifying Text or ByteStrings, some operations to modify them are faster on the lazy versions. If you only want to read from such a string after creating it the non-lazy versions can generally be recommended.

like image 20
AardvarkSoup Avatar answered Mar 13 '26 18:03

AardvarkSoup



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!