Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I count the occurrence of all the names in the names list which have the letter 'i' as their second letter

Tags:

python

list

I am trying to count the occurrence of the name in a name_list with letter 'i' as their second letter using a nested loop.

 def print_count(names_list):
    for name in names_list:
        count = 0
        for i in range(len(name)):
            if name[i] == 'i' and i == 1:
                count = count + 1

    print(count)

names = ["Cody", "Baldassar", "Delilah", "Vinnie", "Leila", "Zac", "Aiden", "Zaynab"]
print_count(names)

My expected output should be: 2 but i got 0 instead.

like image 508
Ashley Avatar asked Jan 22 '26 21:01

Ashley


1 Answers

Update

The fastest solution so far is @KellyBundy's idea of using a slice:

>>> len([s for s in names if s[1:2] == 'i'])
2

Original answer (twice slower!)

You can express that simply and efficiently:

>>> len([s for s in names if s[1:].startswith('i')])
2

Why?

The argument of len is a list comprehension. It is the original list, filtered by the condition "must have a second letter, and that second letter must be 'i'":

>>> [s for s in names if s[1:2] == 'i']
['Vinnie', 'Aiden']

But is it safe?

Q: "What if a word is empty or has only one letter? For sure s[1] would raise IndexError, right?"

A: It is safe. Yes, s[1] would raise if s is empty or contains a single char, but s[1:2] is just fine:

>>> 'foo'[1:2]
'o'

>>> 'f'[1:2]
''

>>> ''[1:2]
''

Variations and timings

# setup: generate a large list of random names
import numpy as np

n = 1_000_000
names = list(map(''.join, np.random.choice(list('abcdefghijkl'), (n, 10))))

# 1. ***current winner*** @KellyBundy's idea to use a slice
%timeit len([s for s in names if s[1:2] == 'i'])
# 71.8 ms ± 32.2 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

# 2. @Blckknght's suggestion, len of list comprehension version
%timeit len([s for s in names if len(s) > 1 and s[1] == 'i'])
# 98.2 ms ± 82.3 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

# 3. @Blckknght's suggestion, generator version
%timeit sum(len(s) > 1 and s[1] == 'i' for s in names)
# 105 ms ± 24.7 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

# 4. @AndrejKesely's solution
%timeit sum(n[1] == "i" for n in names if len(n) > 1)
# 106 ms ± 77.7 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

# 5. original answer
%timeit len([s for s in names if s[1:].startswith('i')])
# 140 ms ± 821 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

# 6. generator
%timeit sum(1 for s in names if s[1:].startswith('i'))
# 141 ms ± 27.4 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

# 7. sum of booleans, as list comprehension
%timeit sum([s[1:].startswith('i') for s in names])
# 154 ms ± 145 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

# 8. sum of booleans, as generator
%timeit sum(s[1:].startswith('i') for s in names)
#163 ms ± 544 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

Notice how operating on a generator instead of on a list comprehension sometimes takes longer (# 3. vs # 2. and # 8. vs # 7.). That surprised me when I heard about it a few years ago.

Update (including @Blckknght and @AndrejKesely ideas)

Both of these solutions are about 40% faster than my initial code (kudos!).

Update 2 Including @KellyBundy's slice idea

That idea (new # 1.) gets us another 20% cut off of the previous winner (# 2. @Blckknght's suggestion together with using len of a list comprehension). It is the new overall winner. In my tests, I found that using a constant slice (slc = slice(1, 2) and s[slc]) is indistinguishable from the expression in # 1.

like image 133
Pierre D Avatar answered Jan 25 '26 11:01

Pierre D