I have a string array for example [a_text, b_text, ab_text, a_text]. I would like to get the number of objects that contain each prefix such as ['a_', 'b_', 'ab_'] so the number of 'a_' objects would be 2.
so far I've been counting each by filtering the array e.g num_a = len(filter(lambda x: x.startswith('a_'), array)). I'm not sure if this is slower than looping through all the fields and incrementing each counter since I am filtering the array for each prefix I am counting. Are functions such as filter() faster than a for loop? For this scenario I don't need to build the filtered list if I use a for loop so that may make it faster.
Also perhaps instead of the filter I could use list comprehension to make it faster?
You can use collections.Counter with a regular expression (if all of your strings have prefixes):
from collections import Counter
arr = ['a_text', 'b_text', 'ab_text', 'a_text']
Counter([re.match(r'^.*?_', i).group() for i in arr])
Output:
Counter({'a_': 2, 'b_': 1, 'ab_': 1})
If not all of your strings have prefixes, this will throw an error, since re.match will return None. If this is a possibility, just add an extra step:
arr = ['a_text', 'b_text', 'ab_text', 'a_text', 'test']
matches = [re.match(r'^.*?_', i) for i in arr]
Counter([i.group() for i in matches if i])
Output:
Counter({'a_': 2, 'b_': 1, 'ab_': 1})
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With