Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Count occurrences of each letter in a file?

How to find the occurrence of letters A-Z regardless(ignore case) in a optimized way even if the file size is as large as 4GB or more ? What could be the different implementations possible in C++/C ?

One implementation is :

Pseudocode

A[26]={0}
loop through each character ch in file
If isalpha(ch)
     A[tolower(ch)-'A']+ = 1
End If
end loop
like image 708
sp497 Avatar asked Dec 03 '25 17:12

sp497


1 Answers

Not much optimization left, I think.

  • Instead of computing tolower()-'A' for each element, just count occurrences of each character (in a char[256] accumulator), and do the case-aware computation afterwards (Might be more efficient or not, just try).

  • Be sure to use buffered input (fopen, perhaps assign larger buffer with setvbuf).

Eg:

acum[256]={0}
loop through each character 'c' in file
     acum[c]++
end loop
group counts corresponding to same lowercase/uppercase letters

Also, bear in mind that this assumes ASCII or derived (one octet = one character) encoding.

like image 163
leonbloy Avatar answered Dec 06 '25 07:12

leonbloy