Rewriting a simple program from C# to Go, I found the resulting executable 3 to 4 times slower. Expecialy the Go version use 3 to 4 times more CPU. It's surprising because the code does many I/O and is not supposed to consume significant amount of CPU.
I made a very simple version only doing sequential writes, and made benchmarks. I ran the same benchmarks on Windows 10 and Linux (Debian Jessie). The time can't be compared (not the same systems, disks, ...) but the result is interesting.
I'm using the same Go version on both platforms : 1.6
On Windows os.File.Write use cgo (see runtime.cgocall below), not on Linux. Why ?
Here is the disk.go program :
    package main
    import (
        "crypto/rand"
        "fmt"
        "os"
        "time"
    )
    const (
        // size of the test file
        fullSize = 268435456
        // size of read/write per call
        partSize = 128
        // path of temporary test file
        filePath = "./bigfile.tmp"
    )
    func main() {
        buffer := make([]byte, partSize)
        seqWrite := func() error {
            return sequentialWrite(filePath, fullSize, buffer)
        }
        err := fillBuffer(buffer)
        panicIfError(err)
        duration, err := durationOf(seqWrite)
        panicIfError(err)
        fmt.Printf("Duration : %v\n", duration)
    }
    // It's just a test ;)
    func panicIfError(err error) {
        if err != nil {
            panic(err)
        }
    }
    func durationOf(f func() error) (time.Duration, error) {
        startTime := time.Now()
        err := f()
        return time.Since(startTime), err
    }
    func fillBuffer(buffer []byte) error {
        _, err := rand.Read(buffer)
        return err
    }
    func sequentialWrite(filePath string, fullSize int, buffer []byte) error {
        desc, err := os.OpenFile(filePath, os.O_WRONLY|os.O_CREATE, 0666)
        if err != nil {
            return err
        }
        defer func() {
            desc.Close()
            err := os.Remove(filePath)
            panicIfError(err)
        }()
        var totalWrote int
        for totalWrote < fullSize {
            wrote, err := desc.Write(buffer)
            totalWrote += wrote
            if err != nil {
                return err
            }
        }
        return nil
    }
The benchmark test (disk_test.go) :
    package main
    import (
        "testing"
    )
    // go test -bench SequentialWrite -cpuprofile=cpu.out
    // Windows : go tool pprof -text -nodecount=10 ./disk.test.exe cpu.out
    // Linux : go tool pprof -text -nodecount=10 ./disk.test cpu.out
    func BenchmarkSequentialWrite(t *testing.B) {
        buffer := make([]byte, partSize)
        err := sequentialWrite(filePath, fullSize, buffer)
        panicIfError(err)
    }
The Windows result (with cgo) :
    11.68s of 11.95s total (97.74%)
    Dropped 18 nodes (cum <= 0.06s)
    Showing top 10 nodes out of 26 (cum >= 0.09s)
          flat  flat%   sum%        cum   cum%
        11.08s 92.72% 92.72%     11.20s 93.72%  runtime.cgocall
         0.11s  0.92% 93.64%      0.11s  0.92%  runtime.deferreturn
         0.09s  0.75% 94.39%     11.45s 95.82%  os.(*File).write
         0.08s  0.67% 95.06%      0.16s  1.34%  runtime.deferproc.func1
         0.07s  0.59% 95.65%      0.07s  0.59%  runtime.newdefer
         0.06s   0.5% 96.15%      0.28s  2.34%  runtime.systemstack
         0.06s   0.5% 96.65%     11.25s 94.14%  syscall.Write
         0.05s  0.42% 97.07%      0.07s  0.59%  runtime.deferproc
         0.04s  0.33% 97.41%     11.49s 96.15%  os.(*File).Write
         0.04s  0.33% 97.74%      0.09s  0.75%  syscall.(*LazyProc).Find
The Linux result (without cgo) :
    5.04s of 5.10s total (98.82%)
    Dropped 5 nodes (cum <= 0.03s)
    Showing top 10 nodes out of 19 (cum >= 0.06s)
          flat  flat%   sum%        cum   cum%
         4.62s 90.59% 90.59%      4.87s 95.49%  syscall.Syscall
         0.09s  1.76% 92.35%      0.09s  1.76%  runtime/internal/atomic.Cas
         0.08s  1.57% 93.92%      0.19s  3.73%  runtime.exitsyscall
         0.06s  1.18% 95.10%      4.98s 97.65%  os.(*File).write
         0.04s  0.78% 95.88%      5.10s   100%  _/home/sam/Provisoire/go-disk.sequentialWrite
         0.04s  0.78% 96.67%      5.05s 99.02%  os.(*File).Write
         0.04s  0.78% 97.45%      0.04s  0.78%  runtime.memclr
         0.03s  0.59% 98.04%      0.08s  1.57%  runtime.exitsyscallfast
         0.02s  0.39% 98.43%      0.03s  0.59%  os.epipecheck
         0.02s  0.39% 98.82%      0.06s  1.18%  runtime.casgstatus
Go does not perform file I/O, it delegates the task to the operating system. See the Go operating system dependent syscall packages.
Linux and Windows are different operating systems with different OS ABIs. For example, Linux uses syscalls via syscall.Syscall and Windows uses Windows dlls. On Windows, the dll call is a C call. It doesn't use cgo. It does go through the same dynamic C pointer check used by cgo, runtime.cgocall. There is no runtime.wincall alias.
In summary, different operating systems have different OS call mechanisms.
Command cgo
Passing pointers
Go is a garbage collected language, and the garbage collector needs to know the location of every pointer to Go memory. Because of this, there are restrictions on passing pointers between Go and C.
In this section the term Go pointer means a pointer to memory allocated by Go (such as by using the & operator or calling the predefined new function) and the term C pointer means a pointer to memory allocated by C (such as by a call to C.malloc). Whether a pointer is a Go pointer or a C pointer is a dynamic property determined by how the memory was allocated; it has nothing to do with the type of the pointer.
Go code may pass a Go pointer to C provided the Go memory to which it points does not contain any Go pointers. The C code must preserve this property: it must not store any Go pointers in Go memory, even temporarily. When passing a pointer to a field in a struct, the Go memory in question is the memory occupied by the field, not the entire struct. When passing a pointer to an element in an array or slice, the Go memory in question is the entire array or the entire backing array of the slice.
C code may not keep a copy of a Go pointer after the call returns.
A Go function called by C code may not return a Go pointer. A Go function called by C code may take C pointers as arguments, and it may store non-pointer or C pointer data through those pointers, but it may not store a Go pointer in memory pointed to by a C pointer. A Go function called by C code may take a Go pointer as an argument, but it must preserve the property that the Go memory to which it points does not contain any Go pointers.
Go code may not store a Go pointer in C memory. C code may store Go pointers in C memory, subject to the rule above: it must stop storing the Go pointer when the C function returns.
These rules are checked dynamically at runtime. The checking is controlled by the cgocheck setting of the GODEBUG environment variable. The default setting is GODEBUG=cgocheck=1, which implements reasonably cheap dynamic checks. These checks may be disabled entirely using GODEBUG=cgocheck=0. Complete checking of pointer handling, at some cost in run time, is available via GODEBUG=cgocheck=2.
It is possible to defeat this enforcement by using the unsafe package, and of course there is nothing stopping the C code from doing anything it likes. However, programs that break these rules are likely to fail in unexpected and unpredictable ways.
"These rules are checked dynamically at runtime."
Benchmarks:
To paraphrase, there are lies, damn lies, and benchmarks.
For valid comparisons across operating systems you need to run on identical hardware. For example, the difference between CPUs, memory, and rust or silicon disk I/O. I dual-boot Linux and Windows on the same machine.
Run benchmarks at least three times back-to-back. Operating systems try to be smart. For example, caching I/O. Languages using virtual machines need warm-up time. And so on.
Know what you are measuring. If you are doing sequential I/O, you spend almost all your time in the operating system. Have you turned off malware protection? And so on.
And so on.
Here are some results for disk.go from the same machine using dual-boot Windows and Linux.
Windows:
>go build disk.go
>/TimeMem disk
Duration : 18.3300322s
Elapsed time   : 18.38
Kernel time    : 13.71 (74.6%)
User time      : 4.62 (25.1%)
Linux:
$ go build disk.go
$ time ./disk
Duration : 18.54350723s
real    0m18.547s
user    0m2.336s
sys     0m16.236s
Effectively, they are the same, 18 seconds disk.go duration. Just some variation between operating systems as to what is counted user time and what is counted as kernel or system time. Elapsed or real time is the same.
In your tests, kernel or system time was 93.72%  runtime.cgocall versus 95.49%  syscall.Syscall.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With