I did not find much documentation in the commentery. Are there any good blog posts or similarly on this?
The best source for information on the profiling framework might still be the original paper by Patrick Sansom and Simon Peyton Jones. Additional details can be found in Sansom's PhD thesis as well as the later paper adding a formal specification. Simon Marlow also spoke about a few recent changes in the GHC Status Update at Haskell Implementors' Workshop 2011.
The idea behind cost-centre profiling is to annotate the expression tree with "cost centre" nodes, so for example with -auto-all the program will have annotations like follows:
fib n = {-# SCC foo #-} (case n of
                           0 -> 0
                           1 -> 1
                           n -> fib (n-1) + fib (n-2))
At runtime when entering fib, the program would look at the current "cost centre stack" and add "foo" to the top. This would be reversed once the evaluation exits the scope of the SCC annotation again. A bit of magic ensures that if, say, n happens to be a lazy value and the program needs to execute its code, the cost centre appropriate for that code is restored where necessary.
This infrastructure is then used for both time as well as space profiling:
A timer will check the cost-centre stack periodically. Every time a certain cost-centre stack is found, this counts as a "tick". In the end, the RTS will estimate the amount of time per cost-centre stack from the count of its ticks, giving you a time profile.
Every time an object is allocated, the program saves back a pointer to the cost-centre stack that was current at that point in time. This enables the garbage collector to provide a statistic of how many bytes were resident, broken down by allocation site.
As requested in the comment, a few words on optimization: For obvious reasons the framework can not allow optimizations that move non-constant costs from one cost centre to the other, forcing the optimizer to be quite pessimistic at times. For example, in the above example the current release GHC will not be able to unbox the return value, meaning that each recursive call do an unnecessary heap-allocation.
As a rule of thumb, one should not count on any code transformations happening across a SCC annotation. When in doubt, it is better to annotate a function sufficiently high in the call-stack, so the performance-critical bits do not get annotated at all.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With