Remove leading digit from an uint in without parsing to string and back

Question

I'm having a uint as 3123456, and I want to get 123456 as number (ideally uint), i.e. with the first digit stripped. I know I could parse it as a string, remove the leading character, and convert it back:

func (id *MyID) Scan(v any) (err error) {
    asBytes, ok := v.([]byte)
    ...
    myDBNumber := binary.LittleEndian.Uint64(asBytes) // the field is of type BINARY in the db
    myDBNumberStr := strconv.Itoa(int(myDBNumber))
    myDBNumberWithoutPrefix, err := strconv.Atoi(myDBNumberStr[1:])
    ...
}

Since I'm using this logic in an sql.Scanner to parse IDs, I'm wondering whether there's a way where this parsing back and forward can be avoided, to improve the performance (or at least to compare the performance).

eik · Accepted Answer

You are looking to an alternative to

func stripLeadingAlternate(i uint) uint {
    if i < 10 {
        return 0
    }
    stripped, _ := strconv.ParseUint(strconv.FormatUint(uint64(i), 10)[1:], 10, strconv.IntSize)

    return uint(stripped)
}

may I suggest calculating the largest power of 10 that is smaller than this number and using a division remainder:

func stripLeading(i uint) uint {
    if i < 10 {
        return 0
    }
    v, m := uint64(i), uint64(10)
    for e, c := uint64(100), 2; e <= v && c < 20; c++ {
        m = e
        e *= 10
    }

    return uint(v % m)
}

Test it on the Go Playground:

package main

import (
    "math"
    "strconv"
    "testing"
)

func Test_stripLeading(t *testing.T) {
    t.Parallel()
    tests := []uint{0, 1, 2, 9, 10, 11, 123, 999, 1000, 1001, math.MaxUint}
    for _, tt := range tests {
        t.Run(strconv.FormatUint(uint64(tt), 10), func(t *testing.T) {
            t.Parallel()
            want := stripLeadingAlternate(tt)
            if got := stripLeading(tt); got != want {
                t.Errorf("stripLeading(%d) = %v, want %v", tt, got, want)
            }
        })
    }
}

This should work with unit having 64 and 32 bits.

I'm not sure using a table would really improve performance, but you can easily try using powers[c] with var powers = []uint64{10, 100, 1_000, ...} instead of e *= 10:

var powers = []uint64{
    10,
    100,
    1_000,
    10_000,
    100_000,
    1_000_000,
    10_000_000,
    100_000_000,
    1_000_000_000,
    10_000_000_000,
    100_000_000_000,
    1_000_000_000_000,
    10_000_000_000_000,
    100_000_000_000_000,
    1_000_000_000_000_000,
    10_000_000_000_000_000,
    100_000_000_000_000_000,
    1_000_000_000_000_000_000,
    10_000_000_000_000_000_000,
}

func stripLeading(i uint) uint {
    if i < 10 {
        return 0
    }
    v, c := uint64(i), 1
    for ; c < len(powers) && powers[c] <= v; c++ {
    }

    return uint(v % powers[c-1])
}

On my machine the table based version is so marginally faster (33.37 ns/op vs 35.41 ns/op) that I would prefer the slower version with multiplication.

var benchmarkValues = []uint{0, 1, 2, 9, 10, 11, 123, 999, 1000, 1001, math.MaxUint}

func Benchmark_stripLeading1(b *testing.B) {
    for range b.N {
        for _, v := range benchmarkValues {
            _ = stripLeading(v)
        }
    }
}

You might want to change the sample numbers to match your use case, but there is no way you would recognize the difference when you are doing SQL calls ;)

The version with string parsing has 185.2 ns/op (40 B/op, 5 allocs/op), so that is probably a win.

Edit

Alternate version using binary search:

func stripLeading(i uint) uint {
    if i < 10 {
        return 0
    }
    v := uint64(i)
    c := sort.Search(len(powers), func(c int) bool { return powers[c] > v })

    return uint(v % powers[c-1])
}

This thing is faster assuming near uniform distribution:

package main

import (
    "math/rand"
    "testing"
)

func Benchmark_stripLeading(b *testing.B) {
    benchmarkValues := make([]uint, b.N)
    r := rand.New(rand.NewSource(42))
    for i := range benchmarkValues {
        benchmarkValues[i] = uint(r.Uint64())
    }

    b.ResetTimer()

    for _, v := range benchmarkValues {
        _ = stripLeading(v)
    }
}

It is slower in the initial benchmark (89.17 ns/op), though. It also loses with benchmarkValues[i] = uint(r.Intn(1_000_000)).

Edit2 Just for fun:

package main

import (
    "math/rand/v2"
    "testing"
)

func BenchmarkStripLeading(b *testing.B) {
    benchmarks := []struct {
        name         string
        stripLeading func(uint) uint
    }{
        {"stripLeadingMult", stripLeading1},
        {"stripLeadingTable", stripLeading2},
        {"stripLeadingBinary", stripLeading3},
        {"stripLeadingAlternate", stripLeadingAlternate},
    }
    for _, bm := range benchmarks {
        b.Run(bm.name, func(b *testing.B) {
            benchmarkValues := newBenchmarkValues(b.N)
            b.ResetTimer()

            for _, i := range benchmarkValues {
                _ = bm.stripLeading(i)
            }
        })
    }
}

func newBenchmarkValues(n int) []uint {
    r := rand.New(rand.NewPCG(42, 24))
    benchmarkValues := make([]uint, n)
    for i := range benchmarkValues {
        benchmarkValues[i] = r.UintN(1_000_000_000)
    }

    return benchmarkValues
}

is a table driven benchmark, with stripLeadingMult being the first, multiplication based solution, stripLeadingTable the second, table base and stripLeadingBinary table based with binary search. It uses evenly distributed 9-digit numbers (r.UintN(1_000_000_000)).

On my machine it results in:

BenchmarkStripLeading/stripLeadingMult-10           225536900            5.380 ns/op           0 B/op          0 allocs/op
BenchmarkStripLeading/stripLeadingTable-10          203127973            5.898 ns/op           0 B/op          0 allocs/op
BenchmarkStripLeading/stripLeadingBinary-10         128632011            9.514 ns/op           0 B/op          0 allocs/op
BenchmarkStripLeading/stripLeadingAlternate-10      33422265            35.03 ns/op       15 B/op          1 allocs/op

This could be improved by sorting the table with the most probably choices in front, but as mentioned: I do not believe it's worth the effort, the first (multiplication based) approach should be fast enough for the problem.

While we can learn a lot here, I think effort optimizing is better spend elsewhere.

Remove leading digit from an uint in without parsing to string and back

Tags:

go

NotX

1 Answers

eik

Recent Activity

Donate For Us

Remove leading digit from an uint in without parsing to string and back

Tags:

go

NotX

1 Answers

eik

Related questions

Recent Activity

Donate For Us