I'm looking for a more elegant solution to replace some upfront not known words in a string, except not,and and or:
Only as an example below, but could be anything but will always be evaluable with eval())
input: (DEFINE_A or not(DEFINE_B and not (DEFINE_C))) and DEFINE_A
output: (self.DEFINE_A or not(self.DEFINE_B and not (self.DEFINE_C))) and self.DEFINE_A
I created a solution, but it looks kind a strange. Is there a more clean way?
s = '(DEFINE_A or not(DEFINE_B and not (DEFINE_C))) and DEFINE_A'
words = re.findall(r'[\w]+|[()]*|[ ]*', s)
for index, word in enumerate(words):
w = re.findall('^[a-zA-Z_]+$', word)
if w and w[0] not in ['and','or','not']:
z = 'self.' + w[0]
words[index] = z
new = ''.join(str(x) for x in words)
print(new)
Will print correctly:
(self.DEFINE_A or not(self.DEFINE_B and not (self.DEFINE_C))) and self.DEFINE_A
First of all, you can match only words by using a simple \w+. Then, Using a negative lookahead you can exclude the ones you don't want. Now all that's left to do is use re.sub directly with that pattern:
s = '(DEFINE_A or not(DEFINE_B and not (DEFINE_C))) and DEFINE_A'
new = re.sub(r"(?!and|not|or)\b(\w+)", r"self.\1", s)
print(new)
Which will give:
(self.DEFINE_A or not(self.DEFINE_B and not (self.DEFINE_C))) and self.DEFINE_A
You can test-out and see how this regex works here.
If the names of your "variables" will always be capitalized, this simplifies the pattern a bit and making it much more efficient. Simply use:
new = re.sub(r"([A-Z\d_]+)", r"self.\1", s)
This is not only a simpler pattern (for readability), but also much more efficient. On this example, it only takes 70 steps compared to 196 of the original (can be seen in the top-right corner in the links).
You can see the new pattern in action here.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With