I am trying to determine what comment character a file may be using. For example:
site,rank
#alexa.com/rankings
google.com,1
yahoo.com,2
Is there a way to get the most comment "startswith" path in a list, and intersect that with a set of possible comment characters? What I'm doing now is the following, but it seems quite naive:
POSSIBLE_COMMENT_CHARS = ['#', '//', '/*', '*/']
def get_comment_char(file):
with open(file) as f:
for line in f:
for _char in POSSIBLE_COMMENT_CHARS:
if line.startswith(_char):
return _char
With the above file data it would return:
get_comment_char(myalexafile)
>>> #
I would match the start of the lines with a combination of your comment strings, then count the occurrences.
And finally compute the string with the max number of occurrences
text="""
site,rank
#alexa.com/rankings
google.com,1
#yahoo.com,2
//whatever
# another comment
"""
import collections,re
POSSIBLE_COMMENT_CHARS = ['#', '//', '/*', '*/']
c = collections.Counter(re.findall("^({})".format("|".join(re.escape(x) for x in POSSIBLE_COMMENT_CHARS)),
text,flags=re.MULTILINE))
print(max(c,key=lambda k: c.get(k)))
prints #
be careful with "|".join(re.escape(x) for x in POSSIBLE_COMMENT_CHARS in the general case because it implies a linear search. If you have 5000 strings in your list it can be quite slow. Here it's ok.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With