Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find all accented words (diacriticals) using grep?

Tags:

grep

I have a large list of words in a text file (one word per line) Some words have accented characters (diacriticals). How can I use grep to display only the lines that contain accented characters?

like image 572
R OMS Avatar asked Oct 28 '25 08:10

R OMS


1 Answers

The best solution I have found, for a larger class of characters ("What words are not pure ASCII?") is using PCRE with -P option:

grep -P "[\x7f-\xff]" filename

This will find UTF-8 and ISO-8859-1(5) (Latin1, win1252, cp850) accented characters alike.

like image 72
LSerni Avatar answered Oct 31 '25 11:10

LSerni



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!