Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

regexp for nvda to put spaces between all capital letters?

Tags:

python

regex

So, I use NVDA, a free screen reader for the blind that many people use, and a speech synthesizer. I am building a library of modified versions of addons which it takes, and dictionaries that can contain regular expressions acceptable by python, as well as standard word replacement operation. My thing is, I do not know how to design a regular expression that will place a space between capital letters such as in ANM, which the synth says as one word rather than spelling it like it should. I do not know enough python to manually code an addon for this thing, I only use regexp for this kind of thing. I do know regular expressions basics, the general implementation, which you can find by googling "regular expressions in about 55 minutes". I want it to do something like this.

Input: ANM
Output: A N M

Also with the way this speech synth works, I may have to replace A with eh, which would make this.

Input: ANM
Output: Eh N M

Could any of you provide me a regular expression to do this if it is possible? And no, I don't think I can compile them in loops because I didn't write the python.

like image 671
Colton Hill Avatar asked Dec 22 '25 13:12

Colton Hill


2 Answers

This should do the trick for the capital letters, it uses ?= to look ahead for the next capital letter without 'eating it up':

>>> import re
>>> re.sub("([A-Z])(?=[A-Z])", r"\1 ", "ABC thIs iS XYZ a Test")
'A B C thIs iS X Y Z a Test'

If you have a lot of replacements to make, it might be easiest to put them into a single variable:

replacements = [("A", "eh"), ("B", "bee"), ("X", "ex")]
result = re.sub("([A-Z])(?=[A-Z])", r"\1 ", "ABC thIs iS XYZX. A Xylophone")
for source, dest in replacements:
    result = re.sub("("+source+r")(?=\W)" , dest, result)
print(result)

Output:

eh bee C thIs iS ex Y Z ex. eh Xylophone

I build a regex in the 'replacements' code to handle capitalised words and standalone capitals at the end of sentences correctly. If you want to avoid replacing e.g. the standalone 'A' with 'eh' then the more advanced regex replacement function as mentioned in @fjarri's answer is the way to go.

like image 186
Galax Avatar answered Dec 24 '25 02:12

Galax


While @Galax's solution certainly works, it may be easier to perform further processing of abbreviations if you use callbacks on matches (this way you won't replace any standalone capitals):

import re

s = "This is a normal sentence featuring an abbreviation ANM. One, two, three."

def process_abbreviation(match_object):
    spaced = ' '.join(match_object.group(1))
    return spaced.replace('A', 'Eh')

print(re.sub("([A-Z]{2,})", process_abbreviation, s))
like image 20
fjarri Avatar answered Dec 24 '25 04:12

fjarri



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!