Hello Stackoverflow Community.
I'm relativly new to Haskell and i have noticed writing large strings to a file with
writeFile or hPutStr is extremly slow.
For a 1.5 Mb String my Programm (compiled with ghc) takes about 2 seconds while the
"same" code in c++ only takes about 0.1 seconds.
The string is generated from a list with about 10000 elements and then dumped with writeFile. I have also tried to traverse the the list with mapM_ and hPutStr with the same result.
Is there a faster way to write a large string?
Update
As @applicative pointed out the following code finishes with a 2MB file in no time
main = readFile "input.txt" >>= writeFile "ouput.txt"
So my problem seems to be somewhere else. Here are my two implementations for Writing the list (WordIndex and CoordList are typealiases for a Map and a List)
with hPutStrLn
-- Print to File
indexToFile :: String -> WordIndex -> IO ()
indexToFile filename index =
let
indexList = map (\(k, v) -> entryToString k v) (Map.toList index)
in do
output <- openFile filename WriteMode
mapM_ (\v -> hPutStrLn output v) indexList
hClose output
-- Convert Listelement to String
entryToString :: String -> CoordList -> String
entryToString key value = (embedString 25 key) ++ (coordListToString value) ++ "\n"
with writeFile
-- Print to File
indexToFile :: String -> WordIndex -> IO ()
indexToFile filename index = writeFile filename (indexToString "" index)
-- Index to String
indexToString :: String -> WordIndex -> String
indexToString lead index = Map.foldrWithKey (\k v r -> lead ++ (entryToString k v) ++ r) "" index
Maybe you guys can help me a little in finding a speed up here.
Thanks in advance
This is well-known problem. The default Haskell String type is simple [Char] and is slow by definition and is dead slow if it is constructed lazily (usual situation). However, as list, it allows simple and clean processing using list combinators and is useful when performance is not an issue. If it is, one should use ByteString or Text packages. ByteString is better as it is shipped with ghc, but does not provide unicode support. ByteString-based utf8 packages are available on hackage.
Yes. You could, for instance, use the Text type from the module Data.Text or Data.Text.Lazy, which internally represent text in a more efficient way (namely UTF-16) than lists of Chars do.
When writing binary data (which may or may not contain text encoded in some form) you can use ByteStrings or their lazy equivalents.
When modifying Text or ByteStrings, some operations to modify them are faster on the lazy versions. If you only want to read from such a string after creating it the non-lazy versions can generally be recommended.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With