Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using regex to findall lowercase letters in string append to list. Python

I'm looking for a way to get the lowercase values out of a string that has both uppercase and potentially lowercase letters

here's an example

sequences = ['CABCABCABdefgdefgdefgCABCAB','FEGFEGFEGwowhelloFEGFEGonemoreFEG','NONEARELOWERCASE'] #sequences with uppercase and potentially lowercase letters

this is what i want to output

upper_output = ['CABCABCABCABCAB','FEGFEGFEGFEGFEGFEG','NONEARELOWERCASE'] #the upper case letters joined together
lower_output = [['defgdefgdefg'],['wowhello','onemore'],[]] #the lower case letters in lists within lists
lower_indx = [[9],[9,23],[]] #where the lower case values occur in the original sequence

so i want the lower_output list be a LIST of SUBLISTS. the SUBLISTS would have all the strings of lowercase letters .

i was thinking of using regex . . .

import re

lower_indx = []

for seq in sequences:
    lower_indx.append(re.findall("[a-z]", seq).start())

print lower_indx

for the lowercase lists i was trying:

lower_output = []

for seq in sequences:
    temp = ''
    temp = re.findall("[a-z]", seq)
    lower_output.append(temp)

print lower_output

but the values are not in separate lists (i still need to join them)

[['d', 'e', 'f', 'g', 'd', 'e', 'f', 'g', 'd', 'e', 'f', 'g'], ['w', 'o', 'w', 'h', 'e', 'l', 'l', 'o', 'o', 'n', 'e', 'm', 'o', 'r', 'e'], []]
like image 799
O.rka Avatar asked Jan 21 '26 04:01

O.rka


1 Answers

Sounds like (I may be misunderstanding your question) you just need to capture runs of lowercase letters, rather than each individual lowercase letter. This is easy: just add the + quantifier to your regular expression.

for seq in sequences:
    lower_output.append(re.findall("[a-z]+", seq)) # add substrings

The + quantifier specifies that you want "at least one, and as many as you can find in a row" of the preceding expression (in this case '[a-z]'). So this will capture your full runs of lowercase letters all in one group, which should cause them to appear as you want them to in your output lists.

It gets a little big uglier if you want to preserve your list-of-list structure and get the indices as well, but it's still very simple:

for seq in sequences:
    matches = re.finditer("[a-z]+", seq) # List of Match objects.
    lower_output.append([match.group(0) for match in matches]) # add substrings
    lower_indx.append([match.start(0) for match in matches]) # add indices

print lower_output
>>> [['defgdefgdefg'], ['wowhello', 'onemore'], []]

print lower_indx
>>> [[9], [9, 23], []]
like image 141
Henry Keiter Avatar answered Jan 22 '26 16:01

Henry Keiter



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!