I was working with K&R, and it extensively uses getchar()
for input in basics.
But the problem is I am unable to fully understand its behavior.
Below is a piece of code:
#include <stdio.h>
int main() {
char c,i;
char line[10000];
i = 0;
while((c=getchar()) != EOF && c!= '\n') {
line[i++] = c;
}
printf("%s",line);
}
The code works as expected.
My problem with this is: why it terminates when I press enter? How does it know that newline is the termination condition while I am still writing input and the program is at c=getchar()
?
I know it is not the default getchar()
behavior like scanf()
because when I remove the newline condition, the program doesn't terminate at newline. Maybe my question exceeds the getchar()
and is a general question.
Suppose my input is Hello
and I press enter.
First, the c
variable becomes 'H'
, it gets stored in line, then 'e'
, then 'l'
, then 'l'
, then 'o'
, after that it encounters the newline and loop terminates. It's well understood.
I want to know why it started reading the characters after I press enter. I was hoping for a newline and write some more characters.
There are two parts to understanding that code, and there is also an error that chqrlie has made a good argument towards fixing.
Part 0: why you should use int
for reading with getchar
As many have commented, using char c
is dangerous if you are going to read with getchar
, as getchar()
returns signed integers, and most notably EOF
-- which is generally #define
d as -1
to signal end-of-file. Standard char
may or may not have a sign - this would make your program unable to recognize -1
/ EOF
. So let us change the first line to
int c,i;
Part 1: why is \n
special
According to man, getchar()
is equivalent to getc(stdin)
, which is equivalent to fgetc() except that it may be implemented as a macro which evaluates its stream (stdin
, in this case) more than once.
Importantly, every time it is called, it consumes a character from its input. Every call to getchar
returns the next character from the input, as long as there are characters to return. If none remain, it returns EOF
instead.
Now, stdin
, the standard input, is generally line-buffered, which means that programs will not have access to the actual characters until lines are terminated with a \n
. You can test this with this program:
#include <stdio.h>
int main() {
int c,i;
char line[10000];
i = 0;
while((c=getchar()) != EOF && c!= 'a') { // <-- replaced `\n` with `a`
line[i++] = c;
}
printf("%s",line);
}
If you run it, it will still not do anything until \n
is pressed; but when pressed, the input will finish on the 1st a
(not-included). Note that output afterwards will be undefined, since there is no guarantee that there will be a \0
to terminate the string afterwards. To avoid this pitfall, see the rewritten program at the very end.
Part 2: why does the loop condition work as it does
You can rewrite the loop condition as follows. This makes it easier to see what is going on:
// loop condition looks up next char, tests it against EOF and `\n`
while((c=getchar()) != EOF && c!= '\n') { line[i++] = c; }
// loop condition broken up for readability; fully equivalent to above code
while (true) {
c = getchar();
if (c == EOF || c == '\n') {
break; // exit loop
} else {
line [i++] = c;
}
}
Epilogue: improved code
#include <stdio.h>
#define BUFSIZE 10000
int main() {
char line[BUFSIZE]; // avoid magic number
int c, i = 0; // initialize at point of declaration
while (i<BUFSIZE-1 // avoid buffer overflow
&& (c=getchar()) != EOF // do not read past EOF
&& c!= '\n') { // do not read past end-of-line
line[i++] = c;
}
line[i++] = 0; // ensure that the string is null-terminated
printf("%s",line);
return 0; // explicitly return "no error"
}
The program is incorrect and can invoke undefined behavior.
For starters the variable c
shall be declared like
int c;
Otherwise the condition
(c=getchar()) != EOF
can be always true even if the user will try to interrupt the input. The problem is that the macro EOF is a negative integer value of the type int
. On the other hand, the type char
can behave as the type unsigned char
. So the variable c
promoted to the type int
will always contain a non-negative value.
Secondly the type char
in any case can not hold a value equal to 10000
that is the size of the character array. So the variable i
should be declared at least as having the type short int
.
The while loop shall check whether the current value of the index variable i
is already greater than or equal to the size of the character array. Otherwise this statement
line[i++] = c;
can write beyond the character array.
And at last the result character array line
does not contain a string because the terminating zero character '\0'
was not appended to the entered sequence of сharacters. As a result this call
printf("%s",line);
invokes undefined behavior.
The program can look the following way
#include <stdio.h>
int main( void )
{
enum { N = 10000 };
char line[N];
size_t i = 0;
for ( int c; i + 1 < N && ( c = getchar() ) != EOF && c != '\n'; i++ )
{
line[i] = c;
}
line[i] = '\0';
puts( line );
}
That is the loop continues to fill the character array until there is enough space in the character array line
i + 1 < N
the user does not interrupt the input
( c = getchar() ) != EOF
and it does not press the Enter key to finish entering the string
c != '\n'
After the loop the terminating zero is appended
line[i] = '\0';
Now the array line
contains a string that is outputted in the statement
puts( line );
So for example if the user will type this sequence of characters
Hello world!
and then will pressed the Enter key (that sends the new line character '\n'
in the input buffer) then the loop will stop its iteration. The new line character '\n'
will not be written in the string. After the loop the terminating zero character '\0'
will be appended to the characters stored in the array line
.
So the array will contain the following string
{ 'H', 'e', 'l', 'l', 'o', ' ', 'W', 'o', 'r', 'l', 'd', '!', '\0' }
that is outputted.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With