Most common "startswith" in a file to infer comment char

Question

I am trying to determine what comment character a file may be using. For example:

site,rank
#alexa.com/rankings
google.com,1
yahoo.com,2

Is there a way to get the most comment "startswith" path in a list, and intersect that with a set of possible comment characters? What I'm doing now is the following, but it seems quite naive:

POSSIBLE_COMMENT_CHARS = ['#', '//', '/*', '*/']

def get_comment_char(file):
    with open(file) as f:
        for line in f:
            for _char in POSSIBLE_COMMENT_CHARS:
                if line.startswith(_char):
                    return _char

With the above file data it would return:

get_comment_char(myalexafile)
>>> #

Jean-François Fabre · Accepted Answer

I would match the start of the lines with a combination of your comment strings, then count the occurrences.

And finally compute the string with the max number of occurrences

text="""
site,rank
#alexa.com/rankings
google.com,1
#yahoo.com,2
//whatever
# another comment

"""

import collections,re

POSSIBLE_COMMENT_CHARS = ['#', '//', '/*', '*/']

c = collections.Counter(re.findall("^({})".format("|".join(re.escape(x) for x in POSSIBLE_COMMENT_CHARS)),
     text,flags=re.MULTILINE))

print(max(c,key=lambda k: c.get(k)))

prints #

be careful with "|".join(re.escape(x) for x in POSSIBLE_COMMENT_CHARS in the general case because it implies a linear search. If you have 5000 strings in your list it can be quite slow. Here it's ok.

Most common "startswith" in a file to infer comment char

Tags:

python

collections

python-3.x

David542

1 Answers

Jean-François Fabre

Recent Activity

Donate For Us

Most common "startswith" in a file to infer comment char

Tags:

python

collections

python-3.x

David542

1 Answers

Jean-François Fabre

Related questions

Recent Activity

Donate For Us