An application on my computer needs to read in a text file. I have several, and one doesn't work; the program fails to read it and tells me that there is a bad character in it somewhere. My first guess is that there's a non-ascii character in there somewhere, but I have no idea how to find it. Perl or any generic regex would be nice. Any ideas?
You can use [^\x20-\x7E] to match a non-ASCII character.
e.g. grep -P '[^\x20-\x7E]' suspicious_file
perl -wne 'printf "byte %02X in line $.\n", ord $& while s/[^\t\n\x20-\x7E]//;'
will find every character that is not an ASCII glyphic character, tab, space, or newline.
If it reports 0Ds (carriage-returns) in files that are O.K., then change \t\n to \t\n\r.
If it only reports 0Ds in files that are bad, then you can probably fix those files by running dos2unix on them.
If you use tabulators in your source code as well, try this pattern:
[^\x08-\x7E]
Works also in Notepad++
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With