Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why do sequential loops run faster than concurrent methods in Go?

Tags:

concurrency

go

Scenario: I want to read a large text file fast (e.g. lorem.txt with 4.5 millon lines in example below) . I have tried three different ways in code below.

  1. Non-concurrent sequential processing
  2. Channels and goroutines
  3. Waitgroup and goroutines

Pseudo-Benchmarks: Here is a typical output I get when I run this below; output below is quick and dirty time deltas and not full-fledged profiling / bench marking / testing.

sequential:     43.541828091 secs 4714074 lines
queued channel: 80.986544385 secs 4714074 lines
wait group:     260.200473751 secs 4712266 lines

Question: Why are sequential loops faster than two other methods below? Am I missing something?

Update I ran the example code on a much bigger text file (see Scenario). I also "reset" the logger for each example just in case not doing so would have some compounding memory issue between the example functions. Also as others have noted, my computer is dual core, which may be one of many issues with my code. Thanks for all feedback / answers.

package main

import (
    "bufio"
    "bytes"
    "fmt"
    "log"
    "os"
    "sync"
    "time"
)

var (
    textFile = "lorem.txt"
    buf      bytes.Buffer
    l        = log.New(&buf, "logger: ", log.Lshortfile)
    wg       sync.WaitGroup
    delta1   float64
    delta2   float64
    delta3   float64
    cnt1     = 0
    cnt2     = 0
    cnt3     = 0
)

func main() {

    // Wait Group Example
    exampleWaitGroup()

    // Queued Channel Example
    exampleQueuedChannel()

    // Sequential Loop Example
    exampleSequentialLoop()

    benchmarks := fmt.Sprintf("sequential:\t%v secs %v lines\nqueued channel:\t%v secs %v lines\nwait group:\t%v secs %v lines\n",
        delta1, cnt1,
        delta2, cnt2,
        delta3, cnt3,
    )

    fmt.Println(benchmarks)

}

func exampleSequentialLoop() {

    buf.Reset()
    l = log.New(&buf, "logger: ", log.Lshortfile)

    start := time.Now()

    file1, err := os.Open(textFile)
    if err != nil {
        log.Fatal(err)
    }

    defer file1.Close()

    scanner := bufio.NewScanner(file1)

    for scanner.Scan() {
        cnt1++
        l.Println(scanner.Text())
    }

    end := time.Now()
    delta1 = end.Sub(start).Seconds()

}

func exampleQueuedChannel() {

    buf.Reset()
    l = log.New(&buf, "logger: ", log.Lshortfile)

    start := time.Now()
    queue := make(chan string)
    done := make(chan bool)

    go processQueue(queue, done)

    file2, err := os.Open(textFile)
    if err != nil {
        log.Fatal(err)
    }

    defer file2.Close()

    scanner := bufio.NewScanner(file2)

    for scanner.Scan() {
        queue <- scanner.Text()
    }

    end := time.Now()
    delta2 = end.Sub(start).Seconds()
}

func exampleWaitGroup() {

    buf.Reset()
    l = log.New(&buf, "logger: ", log.Lshortfile)

    start := time.Now()

    file3, err := os.Open(textFile)
    if err != nil {
        log.Fatal(err)
    }

    defer file3.Close()

    scanner := bufio.NewScanner(file3)

    for scanner.Scan() {
        wg.Add(1)
        go func(line string) {
            defer wg.Done()
            l.Println(line)
            cnt3++
        }(scanner.Text())
    }

    wg.Wait()

    end := time.Now()
    delta3 = end.Sub(start).Seconds()

}

func processQueue(queue chan string, done chan bool) {
    for line := range queue {
        l.Println(line)
        cnt2++
    }
    done <- true
}
like image 837
openwonk Avatar asked Dec 19 '25 12:12

openwonk


1 Answers

In exampleQueuedChannel you're not doing anything in parallel. Yes you've launched another goroutine, but there is no parallel processing. The reason being is that queue is an unbuffered chan. When you write to an unbuffered chan, the writer blocks until someone reads off it. So essentially you block while writing, then the scheduler has to put the goroutine to sleep, and go wake up the reading goroutine. Then that one goes to sleep and the writer wakes back up. So you're thrashing between the two goroutines and the scheduler is getting a heavy workout.
If you want to get better performance here, use a buffered chan. And if you want even more performance, bulk up multiple items per chan message (For a long technical explanation of the performance impact of channels, read this).

In exampleWaitGroup, you're launching a new goroutine for every line. While launching new goroutines isn't expensive, it's not free either, and is also more work for the scheduler. defer is also not free. Also your logger uses a mutex, so if two of your goroutines try to log at the same time, one is going to be put to sleep, and again, more scheduler work.

You can investigate these matters on your own by launching your code under a profiler, and investigating where the bottlenecks lie.

like image 74
phemmer Avatar answered Dec 22 '25 04:12

phemmer



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!