I have to parse a file, and indeed a have to read it first, here is my program :
import qualified Data.ByteString.Char8 as B
import System.Environment
main = do
args <- getArgs
let path = args !! 0
content <- B.readFile path
let lines = B.lines content
foobar lines
foobar :: [B.ByteString] -> IO()
foobar _ = return ()
but, after the compilation
> ghc --make -O2 tmp.hs
the execution goes through the following error when called with a 7Gigabyte file.
> ./tmp big_big_file.dat
> tmp: {handle: big_big_file.dat}: hGet: illegal ByteString size (-1501792951): illegal operation
thanks for any reply!
The length of ByteStrings are Int. If Int is 32 bits, a 7GB file will exceed the range of Int and the buffer request will be for a wrong size and can easily request a negative size.
The code for readFile converts the file size to Int for the buffer request
readFile :: FilePath -> IO ByteString
readFile f = bracket (openBinaryFile f ReadMode) hClose
(\h -> hFileSize h >>= hGet h . fromIntegral)
and if that overflows, an "illegal ByteString size" error or a segmentation fault are the most likely outcomes.
If at all possible, use lazy ByteStrings to handle files that big. In your case, you pretty much have to make it possible, since with 32 bit Ints, a 7GB ByteString is impossible to create.
If you need the lines to be strict ByteStrings for the processing, and no line is exceedingly long, you can go through lazy ByteStrings to achieve that
import qualified Data.ByteString.Lazy.Char8 as LC
import qualified Data.ByteString.Char8 as C
main = do
...
content <- LC.readFile path
let llns = LC.lines content
slns = map (C.concat . LC.toChunks) llns
foobar slns
but if you can modify your processing to deal with lazy ByteStrings, that will probably be better overall.
Strict ByteStrings only support up to 2 GiB of memory. You need to use lazy ByteStrings for it to work.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With