Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does a loop containing getchar() exit when '\n' is entered?

I was working with K&R, and it extensively uses getchar() for input in basics. But the problem is I am unable to fully understand its behavior.

Below is a piece of code:

#include <stdio.h>

int main() {
    char c,i;
    char line[10000];
    i = 0;

    while((c=getchar()) != EOF && c!= '\n') {
        line[i++] = c;
    }

    printf("%s",line);
}

The code works as expected.

My problem with this is: why it terminates when I press enter? How does it know that newline is the termination condition while I am still writing input and the program is at c=getchar()?

I know it is not the default getchar() behavior like scanf() because when I remove the newline condition, the program doesn't terminate at newline. Maybe my question exceeds the getchar() and is a general question.

Suppose my input is Hello and I press enter.

First, the c variable becomes 'H', it gets stored in line, then 'e', then 'l', then 'l', then 'o', after that it encounters the newline and loop terminates. It's well understood.

I want to know why it started reading the characters after I press enter. I was hoping for a newline and write some more characters.

like image 408
mr.loop Avatar asked Oct 19 '25 11:10

mr.loop


2 Answers

There are two parts to understanding that code, and there is also an error that chqrlie has made a good argument towards fixing.

Part 0: why you should use int for reading with getchar

As many have commented, using char c is dangerous if you are going to read with getchar, as getchar() returns signed integers, and most notably EOF -- which is generally #defined as -1 to signal end-of-file. Standard char may or may not have a sign - this would make your program unable to recognize -1 / EOF. So let us change the first line to

int c,i; 

Part 1: why is \n special

According to man, getchar() is equivalent to getc(stdin), which is equivalent to fgetc() except that it may be implemented as a macro which evaluates its stream (stdin, in this case) more than once.

Importantly, every time it is called, it consumes a character from its input. Every call to getchar returns the next character from the input, as long as there are characters to return. If none remain, it returns EOF instead.

Now, stdin, the standard input, is generally line-buffered, which means that programs will not have access to the actual characters until lines are terminated with a \n. You can test this with this program:

#include <stdio.h>

int main() {
    int c,i;
    char line[10000];
    i = 0;

    while((c=getchar()) != EOF && c!= 'a') { // <-- replaced `\n` with `a`
        line[i++] = c;
    }

    printf("%s",line);
}

If you run it, it will still not do anything until \n is pressed; but when pressed, the input will finish on the 1st a (not-included). Note that output afterwards will be undefined, since there is no guarantee that there will be a \0 to terminate the string afterwards. To avoid this pitfall, see the rewritten program at the very end.

Part 2: why does the loop condition work as it does

You can rewrite the loop condition as follows. This makes it easier to see what is going on:

// loop condition looks up next char, tests it against EOF and `\n`
while((c=getchar()) != EOF && c!= '\n') { line[i++] = c; }

// loop condition broken up for readability; fully equivalent to above code
while (true) {
   c = getchar();
   if (c == EOF || c == '\n') {
      break; // exit loop
   } else {
      line [i++] = c;
   }
}

Epilogue: improved code

#include <stdio.h>
#define BUFSIZE 10000

int main() {
    char line[BUFSIZE]; // avoid magic number
    int c, i = 0;       // initialize at point of declaration
    
    while (i<BUFSIZE-1              // avoid buffer overflow
         && (c=getchar()) != EOF    // do not read past EOF
         && c!= '\n') {             // do not read past end-of-line
        line[i++] = c;
    }

    line[i++] = 0;      // ensure that the string is null-terminated
    printf("%s",line);
    return 0;           // explicitly return "no error"
}
like image 99
tucuxi Avatar answered Oct 22 '25 02:10

tucuxi


The program is incorrect and can invoke undefined behavior.

For starters the variable c shall be declared like

int c;

Otherwise the condition

(c=getchar()) != EOF

can be always true even if the user will try to interrupt the input. The problem is that the macro EOF is a negative integer value of the type int. On the other hand, the type char can behave as the type unsigned char. So the variable c promoted to the type int will always contain a non-negative value.

Secondly the type char in any case can not hold a value equal to 10000 that is the size of the character array. So the variable i should be declared at least as having the type short int.

The while loop shall check whether the current value of the index variable i is already greater than or equal to the size of the character array. Otherwise this statement

    line[i++] = c;

can write beyond the character array.

And at last the result character array line does not contain a string because the terminating zero character '\0' was not appended to the entered sequence of сharacters. As a result this call

printf("%s",line);

invokes undefined behavior.

The program can look the following way

#include <stdio.h>

int main( void ) 
{
    enum { N = 10000 };
    char line[N];

    size_t i = 0;
 
    for ( int c; i + 1 < N && ( c = getchar() ) != EOF && c != '\n'; i++ ) 
    {
        line[i] = c;
    }

    line[i] = '\0';

    puts( line );
}

That is the loop continues to fill the character array until there is enough space in the character array line

i + 1 < N 

the user does not interrupt the input

( c = getchar() ) != EOF

and it does not press the Enter key to finish entering the string

c != '\n'

After the loop the terminating zero is appended

    line[i] = '\0';

Now the array line contains a string that is outputted in the statement

    puts( line );

So for example if the user will type this sequence of characters

Hello world!

and then will pressed the Enter key (that sends the new line character '\n' in the input buffer) then the loop will stop its iteration. The new line character '\n' will not be written in the string. After the loop the terminating zero character '\0' will be appended to the characters stored in the array line.

So the array will contain the following string

{ 'H', 'e', 'l', 'l', 'o', ' ', 'W', 'o', 'r', 'l', 'd', '!', '\0' }

that is outputted.

like image 21
Vlad from Moscow Avatar answered Oct 22 '25 02:10

Vlad from Moscow



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!