So I have my haskell program that looks something like this:
main = do
    secondData <- loadSecondBars "output.data"
    putStrLn $ "Generated Second Data " ++ (show $ length secondData)
    let tenMinBars = secondData `seq` generateBars (barSize featureSet) secondData
    putStrLn $ "Generated Ten Minute Bars " ++ (show $ length tenMinBars)
    let combinedData = seq tenMinBars sortBars (tenMinBars ++ secondData)
    putStrLn $ "Generated Combined" ++ (show $ length combinedData)
    let completedOrderManager = evalState (runBar combinedData) startState
    putStrLn "Ran Algo"
Doing this it will take about 8 seconds to load my second data, then about 3 seconds to do the rest of the functions.
However, if I remove the show length data, it will flash
"Generated Second Data"
"Generated Ten Minute Bars"
"Generated Combined"
"Ran Algo"
Then pause for a bit till it runs through all the actual functions.
It was my understanding by having seq in there I prevented the lazy evaluation. Am I using them wrong?
Yes.  There are two important points to consider: seq evaluates to WHNF (weak-head normal form), and secondData is a list.  WHNF means that the data will be evaluated to the outermost constructor, and the outermost constructor for secondData is : (unless it's empty, then the constructor is []).  So
secondData `seq` generateBars (barSize featureSet) secondData
will only do enough work to determine if secondData is the empty list, or if it has at least one element.
length evaluates the spine of a list, which basically means it determines how many elements are in the list by traversing the full structure.  This means that length will do more work than seq for lists with more than 1 element.
You can use deepseq (from deepseq) to fully evaluate a list, but you may not want to.  length and deepseq have to fully traverse the list.  If you don't need to know that information in advance, this is wasted effort as your consumer will have to traverse the list again also.  Depending on the consumer, this may also increase heap residency because deepseq will force all the data structures first, but they won't be GC'd until after the algorithm is complete.
seq only forces values to weak-head normal form; for lists, this means it only evaluates far enough to tell whether a list matches [] or _:_. At a guess, based purely on names (since you haven't given any types), this or something like it is what's happening to you: it's only evaluating far enough to realize that "yup, there's more than no data at all here".
The usual trick is to call seq on the length of the list or use deepseq.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With