If I move a function from where it's used into a separate module, I've noticed the performance of the program drops significantly.
calc = sum . nub . map third . filter isProd . concat . map parts . permutations
    where third (_,_,b)          = fromDigits b
          isProd (a,b,p)         = fromDigits a * fromDigits b == fromDigits p
          -- All possibilities have digits: A x AAAA or AA x AAA
          parts (a:b:c:d:e:rest) = [([a], [b,c,d,e], rest)
                                   ,([a,b], [c,d,e], rest)]
in another module:
fromDigits :: Integral a => [a] -> a                                   
fromDigits = foldl1' (\a b -> 10 * a + b)
This runs in 0.1 seconds when fromDigits is in the same module, but 0.4 seconds when I move it to another module.
I assume this is because GHC can't inline the function if it's in different module, but I feel like it should be able to, since they are in the same package.
I'm not sure what the compiler settings are, but it's built with Leksah/cabal defaults. I'm fairly sure that's with -O2 as a minimum.
For the type-class polymorphic fromDigits, you get a function that is, due to the dictionary lookups for (+), (*) and fromInteger, too large to have its unfolding automatically exposed. That means it can't be specialised at the call sites and the dictionary lookups can't be eliminated to possibly inline addition and multiplication (which might enable further optimisation).
When it is defined in the same module as it is used in, with optimisations, GHC creates a specialised version for the type it's used at, if that is known. Then the dictionary lookups can be eliminated and the (+) and (*) operations can be inlined (if the type they're used at has operations suitable for inlining).
But that depends on the type being known. So if you have the polymorphic calc and fromDigits in one module, but use it only in some other module, you are again in the position that only the generic version is available, but since its unfolding is not exposed, it can't be specialised or otherwise optimised at the call site.
One solution is to make the unfolding of the function exposed in the interface file, so it can be properly optimised where it is used, when the necessary data (in particular the type) is available. You can expose the function's unfolding in the interface file by adding an {-# INLINE #-}, or, as of GHC 7, an {-# INLINABLE #-} pragma to the function. That makes the almost unchanged source code available when compiling the calling code, so the function can be properly optimised with more information available.
The downside to this is code-bloat, you get a copy of the optimised code at every call site (for INLINABLE it's not so extreme, you get at least one copy per calling module, that's usually not too bad).
An alternative solution is to generate specialised versions in the defining module by adding {-# SPECIALISE #-} pragmas (US spelling also accepted) to let GHC create optimised versions for the important types (Int, Integer, Word, ?). That also creates rewrite rules, so that uses at the specialised-for types get rewritten to use the specialised version (when compiling with optimisations).
The downside to this is that some optimisations that would be possible when the code is inlined aren't.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With