In this program:
module Main where
import Data.IntSet
main = do
print $ size
$ {-# SCC "fromAscList" #-} fromAscList
$ {-# SCC "generate_list" #-} [1..1000]
(Compile and run with stack ghc Main.hs -- -prof -fprof-auto && ./Main +RTS -p)
I get this output from the GHC profiler (edited for brevity):
total alloc = 202,704 bytes (excludes profiling overheads)
COST CENTRE MODULE SRC %time %alloc
generate_list Main Main.hs:6:40-48 0.0 39.5
fromAscList Main Main.hs:(5,38)-(6,48) 0.0 36.2
main Main Main.hs:(3,1)-(6,48) 0.0 5.0
individual inherited
COST CENTRE MODULE %time %alloc %time %alloc
main Main 0.0 0.1 0.0 75.8
fromAscList Main 0.0 36.2 0.0 75.7
generate_list Main 0.0 39.5 0.0 39.5
This seems to indicate that [1..a'] is allocating 80Kb (39.5% of 202,704 bytes) over the course of the program, and then fromAscList is allocating the same amount again. (One allocation for the input Int, and another for the stored Int?)
Two questions:
generate_list, or would it be reusing the same piece of memory repeatedly? (I want to know if heap allocations could be a bottleneck.)fromAscList?You should use fromDistinctAscList, otherwise every next Int has to be compared with its predecessor to determine whether to add it or to skip, which must be incurring the overhead you observe. (Though I have not looked at the source code, it seems a plausible guess that this happens, and we could hardly expect memory to be allocated in an optimal way in such case.)
Try this code:
main = do
print $ size
$ {-# SCC "fromDistinctAscList" #-} fromDistinctAscList
$ {-# SCC "generate_list" #-} [1..1000]
The performance that I observe is about 36% better memory-wise. (This figure seems to align with yours.) Here is the relevant section of the profiling report:
COST CENTRE MODULE SRC no. entries %time %alloc %time %alloc
....
main ListAlloc ListAlloc.hs:(5,1)-(8,55) 276 1 0.0 0.2 0.0 62.5
fromDistinctAscList ListAlloc ListAlloc.hs:(7,46)-(8,55) 278 1 0.0 1.1 0.0 62.4
generate_list ListAlloc ListAlloc.hs:8:40-55 279 1 0.0 61.2 0.0 61.2
...
Concerning the reality of the allocations, I'm not sure I understand the question but, looking at garbage collections with +RTS -S, both variants run in almost same memory all along, linearly growing due to the growth of the IntMap. It seems all other computations are done in reasonably small constant memory.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With