I have a bunch of Java source files. I need to write a python script that goes through the source files and identifies all string literals and their location.
The problem is the strings could be in a couple of different forms such as:
I have come up with a couple of ideas to accomplish this:
Do you have any comments on the ways I suggested on doing this or another method which I have not thought about?
In case your wondering, were doing internationalization on our code base. That's why I am trying to automate this process.
Using re module is the quickest solution.
you can use re.finditer() which returns each matched regex with the content and position
>>> for m in re.finditer(r"\w+ly", text):
... print '%02d-%02d: %s' % (m.start(), m.end(), m.group(0))
Another option is PLY, which is a pure-python lex / yacc. It was written by David Beazley... he has some slides that demonstrate the functionality. This would require a BNF grammar to quantify the syntax you are parsing. I'm not sure if you want to go that far.
If you don't want to use BNF, pyparsing is another choice.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With