Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Efficient way to generate large amounts of text in Haskell

The full source code and profiling report are here: https://gist.github.com/anonymous/92334d00859c3db0ba8a

I'm trying to generating a large text file in the form of a text PPM image file, and I am running into some performance problems. I could use another image format, but the text generation performance has gotten me interested as I have other similar situations where I don't have the flexibility of choosing an alternative format.

I started with concatenating Strings together to generate my text file, and quickly found out that it took up almost 90% of the execution time. So, I switched to Data.Text, but found that the performance was not significantly improved. I ended up producing a test file to try to isolate the problem, comparing the two functions:

ppmS (Matrix mx) = unlines ["P3", show w', show w', "255", pixels]
  where
    w'      = mxSize mx * scale
    pixels  = unlines $ map row [0..w'-1]
    row j   = intercalate " " $ map (pix j) [0..w'-1]
    pix j i = color (mx ! (div i scale, div j scale))

ppmT (Matrix mx) = T.unlines ["P3", T.pack (show w'), T.pack (show w'), "255", pixels]
  where
    w'      = mxSize mx * scale
    pixels  = T.unlines $ map row [0..w'-1]
    row j   = T.intercalate " " $ map (pix j) [0..w'-1]
    pix j i = color (mx ! (div i scale, div j scale))

Running it through the profiler using the following commands:

ghc -O2 --make -prof -auto-all -caf-all -fforce-recomp test.hs
./test +RTS -p

I see the following:

total time  =        0.60 secs   (597 ticks @ 1000 us, 1 processor)
total alloc = 1,162,898,488 bytes  (excludes profiling overheads)

                                                     individual     inherited
COST CENTRE              MODULE  no.     entries  %time %alloc   %time %alloc

MAIN                     MAIN     96           0    0.0    0.0   100.0  100.0
 main                    Main    193           0   24.1   14.7    24.1   14.7
 CAF:main3               Main    188           0    0.0    0.0    39.9   37.0
  main                   Main    225           0    0.0    0.0    39.9   37.0
   main.ppmFromText      Main    226           0    0.0    0.0    39.9   37.0
    ppmT                 Main    227           0    8.5    9.3    39.9   37.0
     ppmT.row            Main    252           0    0.0    0.0     0.0    0.0
     ppmT.pixels         Main    250           1    8.7    9.3    31.3   27.7
      ppmT.row           Main    251         500   20.6   18.4    22.6   18.4
       ppmT.pix          Main    253      250000    1.8    0.0     2.0    0.0
        color            Main    254      250000    0.2    0.0     0.2    0.0
 CAF:main6               Main    171           0    0.0    0.0    35.8   48.3
  main                   Main    198           0    0.0    0.0    35.8   48.3
   main.ppmFromString    Main    199           0    0.0    0.0    35.8   48.3
    ppmS                 Main    200           0    9.4   14.4    35.8   48.3
     ppmS.pixels         Main    216           1    8.5   14.5    26.5   33.9
      ppmS.row           Main    217         500   13.9   19.4    17.9   19.4
       ppmS.pix          Main    218      250000    3.5    0.0     4.0    0.0
        color            Main    219      250000    0.5    0.0     0.5    0.0

which tells me that both the Text and String versions are taking significant time and allocating significant memory.

What is a better way to generate this text that is more efficient in time & memory?


1 Answers

Update: As is turns out, simply using ByteStrings instead of Text with:

import qualified Data.ByteString.Char8 as T
import qualified Data.ByteString as TI

achieves just the same or even better performance than the Blaze approach I initially tried. See the updated stats table at the end.

Original answer:

You can achieve better results using the blaze builder monoid.

Here is your algorithm adapted to use Blaze.ByteString.Builder:

import Blaze.ByteString.Builder
import Blaze.ByteString.Builder.Char8
import Data.Monoid
import Data.List (intersperse)

munlines = mconcat . map ( <> (fromChar '\n') )
mintercalate s xs = mconcat $ intersperse s xs

ppmB (Matrix mx) = munlines [ fromString "P3",
                              fromString (show w'),
                              fromString (show w'),
                              fromString "255",
                              pixels ]
  where
    w'      = mxSize mx * scale
    pixels  = munlines $ map row [0..w'-1]
    row j   = mintercalate (fromString " ") $ map (pix j) [0..w'-1]
    pix j i = fromString $ color (mx ! (div i scale, div j scale))

main = do
  let m = makeMatrix
  let ppmFromString = toLazyByteString $ ppmB m
  LBS.writeFile "output.ppm" ppmFromString

Full source available here.

On my machine I get the following RTS stats for the four versions:

             Allocated   Time      %GC
  string     561 MB      0.40 s    56.6 %
  text       601 MB      0.25 s     6.9 %
  blaze       95 MB      0.07 s     3.0 %
  bytestring  91 MB      0.06 s    10.1 %

Another option would be to use the Put monad from the binary package.

like image 112
ErikR Avatar answered Oct 19 '25 12:10

ErikR



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!