There are 3 function given below which find the last but second element in a list. The one using last . init seems much faster than the rest. I can't seem to figure out why.
For testing, I used an input list of [1..100000000] (100 million). The last one runs almost instantly whereas the others take several seconds.
-- slow
myButLast :: [a] -> a
myButLast [x, y] = x
myButLast (x : xs) = myButLast xs
myButLast _ = error "List too short"
-- decent
myButLast' :: [a] -> a
myButLast' = (!! 1) . reverse
-- fast
myButLast'' :: [a] -> a
myButLast'' = last . init
When studying speed and optimization, it is very easy to obtain wildly wrong results. In particular, you cannot really say that one variant is faster than another without mentioning the compiler version and the optimization mode of your benchmarking setup. Even then, modern processors are so sophisticated as to feature neural network based branch predictors, not to mention all kinds of caches, so, even with careful set-up, benchmarking results will be blurry.
That being said...
criterion is a package that provides advanced benchmarking tools. I quickly drafted a
benchmark like this:
module Main where
import Criterion
import Criterion.Main
-- slow
myButLast :: [a] -> a
myButLast [x, y] = x
myButLast (x : xs) = myButLast xs
myButLast _ = error "List too short"
-- decent
myButLast' :: [a] -> a
myButLast' = (!! 1) . reverse
-- fast
myButLast'' :: [a] -> a
myButLast'' = last . init
butLast2 :: [a] -> a
butLast2 (x :     _ : [ ] ) = x
butLast2 (_ : xs@(_ : _ ) ) = butLast2 xs
butLast2 _ = error "List too short"
setupEnv = do
  let xs = [1 .. 10^7] :: [Int]
  return xs
benches xs =
  [ bench "slow?"   $ nf myButLast   xs
  , bench "decent?" $ nf myButLast'  xs
  , bench "fast?"   $ nf myButLast'' xs
  , bench "match2"  $ nf butLast2    xs
  ]
main = defaultMain
    [ env setupEnv $ \ xs -> bgroup "main" $ let bs = benches xs in bs ++ reverse bs ]
As you see, I added the variant that explicitly matches on two elements at once, but otherwise it is the same code verbatim. I also run the benchmarks in reverse, so as to be aware of the bias due to caching. So, let us run and see!
% ghc --version
The Glorious Glasgow Haskell Compilation System, version 8.6.5
% ghc -O2 -package criterion A.hs && ./A
benchmarking main/slow?
time                 54.83 ms   (54.75 ms .. 54.90 ms)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 54.86 ms   (54.82 ms .. 54.93 ms)
std dev              94.77 μs   (54.95 μs .. 146.6 μs)
benchmarking main/decent?
time                 794.3 ms   (32.56 ms .. 1.293 s)
                     0.907 R²   (0.689 R² .. 1.000 R²)
mean                 617.2 ms   (422.7 ms .. 744.8 ms)
std dev              201.3 ms   (105.5 ms .. 283.3 ms)
variance introduced by outliers: 73% (severely inflated)
benchmarking main/fast?
time                 84.60 ms   (84.37 ms .. 84.95 ms)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 84.46 ms   (84.25 ms .. 84.77 ms)
std dev              435.1 μs   (239.0 μs .. 681.4 μs)
benchmarking main/match2
time                 54.87 ms   (54.81 ms .. 54.95 ms)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 54.85 ms   (54.81 ms .. 54.92 ms)
std dev              104.9 μs   (57.03 μs .. 178.7 μs)
benchmarking main/match2
time                 50.60 ms   (47.17 ms .. 53.01 ms)
                     0.993 R²   (0.981 R² .. 0.999 R²)
mean                 60.74 ms   (56.57 ms .. 67.03 ms)
std dev              9.362 ms   (6.074 ms .. 10.95 ms)
variance introduced by outliers: 56% (severely inflated)
benchmarking main/fast?
time                 69.38 ms   (56.64 ms .. 78.73 ms)
                     0.948 R²   (0.835 R² .. 0.994 R²)
mean                 108.2 ms   (92.40 ms .. 129.5 ms)
std dev              30.75 ms   (19.08 ms .. 37.64 ms)
variance introduced by outliers: 76% (severely inflated)
benchmarking main/decent?
time                 770.8 ms   (345.9 ms .. 1.004 s)
                     0.967 R²   (0.894 R² .. 1.000 R²)
mean                 593.4 ms   (422.8 ms .. 691.4 ms)
std dev              167.0 ms   (50.32 ms .. 226.1 ms)
variance introduced by outliers: 72% (severely inflated)
benchmarking main/slow?
time                 54.87 ms   (54.77 ms .. 55.00 ms)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 54.95 ms   (54.88 ms .. 55.10 ms)
std dev              185.3 μs   (54.54 μs .. 251.8 μs)
Looks like our "slow" version is not slow at all! And the intricacies of pattern matching do not
add anything. (A slight speed up we see between two consecutive runs of match2 I ascribe to the
effects of caching.)
There is a way to get more "scientific" data: we can -ddump-simpl and take a look at the
way the compiler sees our code.
"Core" is an internal language of GHC. Every Haskell source file is simplified to Core before
being transformed into the final functional graph for the run time system to execute. If we look
at this intermediate stage, it will tell us that myButLast and butLast2 are equivalent. It
does take looking, since, at the renaming stage, all our nice identifiers are randomly mangled.
% for i in `seq 1 4`; do echo; cat A$i.hs; ghc -O2 -ddump-simpl A$i.hs > A$i.simpl; done
module A1 where
-- slow
myButLast :: [a] -> a
myButLast [x, y] = x
myButLast (x : xs) = myButLast xs
myButLast _ = error "List too short"
module A2 where
-- decent
myButLast' :: [a] -> a
myButLast' = (!! 1) . reverse
module A3 where
-- fast
myButLast'' :: [a] -> a
myButLast'' = last . init
module A4 where
butLast2 :: [a] -> a
butLast2 (x :     _ : [ ] ) = x
butLast2 (_ : xs@(_ : _ ) ) = butLast2 xs
butLast2 _ = error "List too short"
% ./EditDistance.hs *.simpl
(("A1.simpl","A2.simpl"),3866)
(("A1.simpl","A3.simpl"),3794)
(("A2.simpl","A3.simpl"),663)
(("A1.simpl","A4.simpl"),607)
(("A2.simpl","A4.simpl"),4188)
(("A3.simpl","A4.simpl"),4113)
It seems that A1 and A4 are the most similar. Thorough inspection will show that indeed the
code structures in A1 and A4 are identical. That A2 and A3 are alike is also reasonable
since both are defined as a composition of two functions.
If you are going to examine the core output extensively, it makes sense to also supply flags such as -dsuppress-module-prefixes and -dsuppress-uniques. They make it so much easier to read.
So, what can go wrong with benchmarking and optimization?
ghci, being designed for interactive play and quick iteration, compiles Haskell source to a certain flavour of byte code, rather than final executable, and eschews expensive optimizations in favour of quicker reload.This may look sad. But it is really not the thing that should concern a Haskell programmer, most of the time. Real story: I have a friend who just recently started learning Haskell. They had written a program for numerical integration, and it was turtle slow. So we sat down together and wrote a categorial description of the algorithm, with diagrams and stuff. When they re-wrote the code to align with the abstract description, it magically became, like, cheetah fast, and slim on memory too. We calculated π in no time. Moral of the story? Perfect abstract structure, and your code will optimize itself.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With