Converting words from camelCase to snake_case in C

Question

What I am trying to code is, if I input camelcase, it should just print out camelcase, but if there contains any uppercase, for example, if I input camelCase, it should print out camel_case.

The below is the one I am working on but the problem is, if I input, camelCase, it prints out camel_ase.

Can someone please tell me the reason and how to fix it?

#include <stdio.h>
#include <ctype.h>

int main() {
    char ch;
    char input[100];
    int i = 0;

    while ((ch = getchar()) != EOF) {
        input[i] = ch;
        if (isupper(input[i])) {
            input[i] = '_';
            //input[i+1] = tolower(ch);
        } else {
            input[i] = ch;
        }
        printf("%c", input[i]);

        i++;
    }
}

Admin · Accepted Answer

First look at your code and think about what happens when someone enters a word longer than 100 characters -> undefined behavior. If you use a buffer for input, you always have to add checks so you don't overflow this buffer.

But then, as you directly print the characters, why do you need a buffer at all? It's completely unnecessary with the approach you show. Try this:

#include <stdio.h>
#include <ctype.h>

int main()
{
    int ch;
    int firstChar = 1; // needed to also accept PascalCase
    while((ch = getchar())!= EOF)
    {
        if(isupper(ch))
        {
            if (!firstChar) putchar('_');
            putchar(tolower(ch));

        } else
        {
            putchar(ch);
        }
        firstChar = 0;
    }
}

Side note: I changed the type of ch to int. This is because getchar() returns an int, putchar(), isupper() and islower() take an int and they all use a value of an unsigned char, or EOF. As char is allowed to be signed, on a platform with signed char, you would get undefined behavior calling these functions with a negative char. I know, this is a bit complicated. Another way around this issue is to always cast your char to unsigned char when calling a function that takes the value of an unsigned char as an int.

As you use a buffer, and it's useless right now, you might be interested there is a possible solution making good use of a buffer: Read and write a whole line at a time. This is slightly more efficient than calling a function for every single character. Here's an example doing that:

#include <stdio.h>

static size_t toSnakeCase(char *out, size_t outSize, const char *in)
{
    const char *inp = in;
    size_t n = 0;
    while (n < outSize - 1 && *inp)
    {
        if (*inp >= 'A' && *inp <= 'Z')
        {
            if (n > outSize - 3)
            {
                out[n++] = 0;
                return n;
            }
            out[n++] = '_';
            out[n++] = *inp + ('a' - 'A');
        }
        else
        {
            out[n++] = *inp;
        }
        ++inp;
    }
    out[n++] = 0;
    return n;
}

int main(void)
{
    char inbuf[512];
    char outbuf[1024]; // twice the lenght of the input is upper bound

    while (fgets(inbuf, 512, stdin))
    {
        toSnakeCase(outbuf, 1024, inbuf);
        fputs(outbuf, stdout);
    }
    return 0;
}

This version also avoids isupper() and tolower(), but sacrifices portability. It only works if the character encoding has letters in sequence and has the uppercase letters before the lowercase letters. For ASCII, these assumptions hold. Be aware that what is considered an (uppercase) letter could also depend on the locale. The program above only works for letters A-Z as in the english language.

Converting words from camelCase to snake_case in C

Tags:

c

computer-science

seung

1 Answers

Recent Activity

Donate For Us

Converting words from camelCase to snake_case in C

Tags:

c

computer-science

seung

1 Answers

Related questions

Recent Activity

Donate For Us