Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is it possible to read characters from `io::stdin()` without caching input line-by-line?

Tags:

string

stdin

rust

This question refers to the stable Rust version 1.2.0

I just want to iterate over the characters in the standard input of my CLI application. It's perfectly possible to do read stdin's read_line method into a temporary String instance and then iterate over it's chars() iterator.

But I don't like this approach, as it allocates a totally unnecessary String object. Stdin trait's documentations implements Read trait, which has chars() iterator, but it is marked as unstable (and thus can't be used with a stable compiler version).

Is there an alternative, possible less obvious way to read stdin char-by-char without any additional Rust-side buffering?

like image 389
kirushik Avatar asked Aug 31 '25 18:08

kirushik


2 Answers

You can do this by having a single byte array, and continuing to read till the Result becomes an Err. There is a problem with this however, as this can become if you're not reading in ASCII characters. If you are going to come with up against this problem, it would be better to just allocate a String, and use the chars iterator, as it handles this problem.

Sample code:

use std::io::{stdin, Read};

fn main() {
    loop {
        let mut character = [0];
        while let Ok(_) = stdin().read(&mut character) {
            println!("CHAR {:?}", character[0] as char);
        }
    }
}

Sample output:

Hello World
CHAR Some('H')
CHAR Some('e')
CHAR Some('l')
CHAR Some('l')
CHAR Some('o')
CHAR Some(' ')
CHAR Some('W')
CHAR Some('o')
CHAR Some('r')
CHAR Some('l')
CHAR Some('d')
CHAR Some('\n')
你好世界
CHAR Some('\u{e4}')
CHAR Some('\u{bd}')
CHAR Some('\u{a0}')
CHAR Some('\u{e5}')
CHAR Some('\u{a5}')
CHAR Some('\u{bd}')
CHAR Some('\u{e4}')
CHAR Some('\u{b8}')
CHAR Some('\u{96}')
CHAR Some('\u{e7}')
CHAR Some('\u{95}')
CHAR Some('\u{8c}')
CHAR Some('\n')
like image 53
XAMPPRocky Avatar answered Sep 02 '25 10:09

XAMPPRocky


XAMPPRocky's answer is correct for the case that you probably care about, ASCII characters. I want to address the question as you phrased it:

I just want to iterate over the characters in the standard input of my CLI application.

In Rust, a char is a 32-bit (4-byte) type that represents a Unicode codepoint. However, the IO abstraction operates on the level of bytes. You need to bring some kind of encoding that maps codepoints to sequences of bytes, and the current winner in that war is UTF-8.

UTF-8 will use a maximum of 4 bytes to represent a single codepoint, but in a different bit pattern than native. To properly read character-by-character, you will always need to have some kind of buffer.

Then there's the problem of having a partial character at the end of your buffer that needs to be moved back to the beginning of the buffer, which is comparatively expensive. The best solution there is to amortize the cost over many characters, thus why reading in larger chunks can be faster.

like image 29
Shepmaster Avatar answered Sep 02 '25 11:09

Shepmaster