I am trying to write a scanner in Go that scans continuation lines and also clean the line up before returning it so that you can return logical lines. So, given the following SplitLine function (Play):
func ScanLogicalLines(data []byte, atEOF bool) (int, []byte, error) {
    if atEOF && len(data) == 0 {
        return 0, nil, nil
    }
    i := bytes.IndexByte(data, '\n')
    for i > 0 && data[i-1] == '\\' {
        fmt.Printf("i: %d, data[i] = %q\n", i, data[i])
        i = i + bytes.IndexByte(data[i+1:], '\n')
    }
    var match []byte = nil
    advance := 0
    switch {
    case i >= 0:
        advance, match = i + 1, data[0:i]
    case atEOF: 
        advance, match = len(data), data
    }
    token := bytes.Replace(match, []byte("\\\n"), []byte(""), -1)
    return advance, token, nil
}
func main() {
    simple := `
Just a test.
See what is returned. \
when you have empty lines.
Followed by a newline.
`
    scanner := bufio.NewScanner(strings.NewReader(simple))
    scanner.Split(ScanLogicalLines)
    for scanner.Scan() {
        fmt.Printf("line: %q\n", scanner.Text())
    }
}
I expected the code to return something like:
line: "Just a test."
line: ""
line: "See what is returned, when you have empty lines."
line: ""
line: "Followed by a newline."
However, it stops after returning the first line. The second call return 1, "", nil.
Anybody have any ideas, or is it a bug?
I would regard this as a bug because an advance value > 0 is not intended to make a further read call, even when the returned token is nil (bufio.SplitFunc):
If the data does not yet hold a complete token, for instance if it has no newline while scanning lines, SplitFunc can return (0, nil) to signal the Scanner to read more data into the slice and try again with a longer slice starting at the same point in the input.
The input buffer of the bufio.Scanner defaults to 4096 byte. That means that it reads up to this
amount at once if it can and then executes the split function. In your case the scanner can read your input all at once as it is well below 4096 byte. This means that the next read it will do results in EOF which is the main problem here.
scanner.Scan reads all your datanil as a token by removing the newline from the matchscanner.Scan assumes: user needs more datascanner.Scan attempts to read moreEOF happensscanner.Scan tries to tokenize one last time"Just a test."
scanner.Scan tries to tokenize one last timenil as a token by removing the newline from the matchscanner.Scan sees nil token and set error (EOF)Any token that is non-nil will prevent this. As long as you return non-nil tokens the 
scanner will not check for EOF and continues executing your tokenizer.
The reason why your code returns nil tokens is that bytes.Replace returns
nil when there's nothing to be done. append([]byte(nil), nil...) == nil.
You could prevent this by returning a slice with a capacity and no elements as
this would be non-nil: make([]byte, 0, 1) != nil.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With