Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Having emacs search for special ligatures

I just noticed a whole bunch of typos in a compiled LaTeX document typed in emacs, stemming from me not noticing that when I pasted in some text from elsewhere, I accrued a lot of ligatures like fi instead of fi. I've done a search and replace to fix this particular instance, but it would be nice to be confident there weren't more of these. Is there anything more wholesale I could do in emacs to find all such fixes?

like image 436
Cam McLeman Avatar asked Nov 01 '25 10:11

Cam McLeman


2 Answers

If the entire document is expected to be in ASCII, then you could use a regexp search for anything outside that range:

C-M-s [^ C-j SPC -~]

That is search for anything that is neither a newline (character code 10) nor anything between space (32) or tilde (126). Any ligatures would be outside this range.

like image 145
legoscia Avatar answered Nov 04 '25 10:11

legoscia


I'm not real sure what you are asking, but you can easily Isearch for (or query-replace or replace-string) any Unicode chars that are ligatures, that is, that have LIGATURE as part of their Unicode character name. However, you must search for each of them separately (well, not really, but it is easiest to do that).

To search for a given ligature char, you use C-x 8 RET during Isearch, then type some part of the character name and complete that.

For this it really helps to use Icicles, or at least some other completion enhancement that lets you complete a substring or other regexp.

With Icicles you have also progressive completion, which means that you can provide multiple substrings (more generally, regexps) to match.

For example, to search for the ligature whose Unicode character name is LATIN SMALL LIGATURE FF you can do the following:

C-s C-x 8 RET

That prompts you for the name of a Unicode char. Type ligature S-SPC to match all whose names contain ligature (matching is case insensitive). Then type latin S-SPC to narrow to just the latin ligatures. Then type small S-SPC to narrow these to only the lowercase ligatures. Then type ffi to get just the one you want.

C-s C-x 8 RET ligature S-SPC latin S-SPC small S-SPC ffi RET

The order in which you provide the multiple patterns is irrelevant. And of course you do not need to use multiple patterns. You could just as easily do it with a single regexp:

C-x C-x 8 RET latin.*small.*ligature.*ffi RET

If you use C-s C-x 8 RET ligature S-TAB (or S-SPC instead of S-TAB), you see all of the ligature characters (there are 517 of them). If you use C-s C-x 8 RET small.*ligature S-TAB then you see all lowercase ligatures (there are 22 of them, including Arabic, Armenian, Cyrillic, Hebrew, and Latin).

Oh, and with Icicles you see not only the character names in buffer *Completions* -- you see also the characters themselves (WYSIWYG) next to their names.

(For query-replace etc. the procedure is the same as for Isearch.)

like image 30
Drew Avatar answered Nov 04 '25 09:11

Drew



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!