Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Most common "startswith" in a file to infer comment char

I am trying to determine what comment character a file may be using. For example:

site,rank
#alexa.com/rankings
google.com,1
yahoo.com,2

Is there a way to get the most comment "startswith" path in a list, and intersect that with a set of possible comment characters? What I'm doing now is the following, but it seems quite naive:

POSSIBLE_COMMENT_CHARS = ['#', '//', '/*', '*/']

def get_comment_char(file):
    with open(file) as f:
        for line in f:
            for _char in POSSIBLE_COMMENT_CHARS:
                if line.startswith(_char):
                    return _char

With the above file data it would return:

get_comment_char(myalexafile)
>>> #
like image 893
David542 Avatar asked Feb 02 '26 10:02

David542


1 Answers

I would match the start of the lines with a combination of your comment strings, then count the occurrences.

And finally compute the string with the max number of occurrences

text="""
site,rank
#alexa.com/rankings
google.com,1
#yahoo.com,2
//whatever
# another comment

"""

import collections,re

POSSIBLE_COMMENT_CHARS = ['#', '//', '/*', '*/']

c = collections.Counter(re.findall("^({})".format("|".join(re.escape(x) for x in POSSIBLE_COMMENT_CHARS)),
     text,flags=re.MULTILINE))

print(max(c,key=lambda k: c.get(k)))

prints #

be careful with "|".join(re.escape(x) for x in POSSIBLE_COMMENT_CHARS in the general case because it implies a linear search. If you have 5000 strings in your list it can be quite slow. Here it's ok.

like image 151
Jean-François Fabre Avatar answered Feb 05 '26 01:02

Jean-François Fabre



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!