search document for non-ascii

Question

An application on my computer needs to read in a text file. I have several, and one doesn't work; the program fails to read it and tells me that there is a bad character in it somewhere. My first guess is that there's a non-ascii character in there somewhere, but I have no idea how to find it. Perl or any generic regex would be nice. Any ideas?

mathematical.coffee · Accepted Answer

You can use [^\x20-\x7E] to match a non-ASCII character.

e.g. grep -P '[^\x20-\x7E]' suspicious_file

ruakh · Answer

perl -wne 'printf "byte %02X in line $.
", ord $& while s/[^	
\x20-\x7E]//;'

will find every character that is not an ASCII glyphic character, tab, space, or newline.

If it reports 0Ds (carriage-returns) in files that are O.K., then change to .

If it only reports 0Ds in files that are bad, then you can probably fix those files by running dos2unix on them.

elwood · Answer

If you use tabulators in your source code as well, try this pattern:

[^\x08-\x7E]

Works also in Notepad++

search document for non-ascii

Tags:

regex

ascii

character

perl

Nate Glenn

3 Answers

mathematical.coffee

ruakh

elwood

Recent Activity

Donate For Us

search document for non-ascii

Tags:

regex

ascii

character

perl

Nate Glenn

3 Answers

mathematical.coffee

ruakh

elwood

Related questions

Recent Activity

Donate For Us