Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

counting number of each substring in array python

I have a string array for example [a_text, b_text, ab_text, a_text]. I would like to get the number of objects that contain each prefix such as ['a_', 'b_', 'ab_'] so the number of 'a_' objects would be 2.

so far I've been counting each by filtering the array e.g num_a = len(filter(lambda x: x.startswith('a_'), array)). I'm not sure if this is slower than looping through all the fields and incrementing each counter since I am filtering the array for each prefix I am counting. Are functions such as filter() faster than a for loop? For this scenario I don't need to build the filtered list if I use a for loop so that may make it faster.

Also perhaps instead of the filter I could use list comprehension to make it faster?

like image 346
mysticalstick Avatar asked Jul 01 '26 16:07

mysticalstick


1 Answers

You can use collections.Counter with a regular expression (if all of your strings have prefixes):

from collections import Counter

arr = ['a_text', 'b_text', 'ab_text', 'a_text']
Counter([re.match(r'^.*?_', i).group() for i in arr])

Output:

Counter({'a_': 2, 'b_': 1, 'ab_': 1})

If not all of your strings have prefixes, this will throw an error, since re.match will return None. If this is a possibility, just add an extra step:

arr = ['a_text', 'b_text', 'ab_text', 'a_text', 'test']
matches = [re.match(r'^.*?_', i) for i in arr]
Counter([i.group() for i in matches if i])

Output:

Counter({'a_': 2, 'b_': 1, 'ab_': 1})
like image 150
user3483203 Avatar answered Jul 03 '26 06:07

user3483203