I'm looking for a way to get the lowercase values out of a string that has both uppercase and potentially lowercase letters
here's an example
sequences = ['CABCABCABdefgdefgdefgCABCAB','FEGFEGFEGwowhelloFEGFEGonemoreFEG','NONEARELOWERCASE'] #sequences with uppercase and potentially lowercase letters
this is what i want to output
upper_output = ['CABCABCABCABCAB','FEGFEGFEGFEGFEGFEG','NONEARELOWERCASE'] #the upper case letters joined together
lower_output = [['defgdefgdefg'],['wowhello','onemore'],[]] #the lower case letters in lists within lists
lower_indx = [[9],[9,23],[]] #where the lower case values occur in the original sequence
so i want the lower_output list be a LIST of SUBLISTS. the SUBLISTS would have all the strings of lowercase letters .
i was thinking of using regex . . .
import re
lower_indx = []
for seq in sequences:
lower_indx.append(re.findall("[a-z]", seq).start())
print lower_indx
for the lowercase lists i was trying:
lower_output = []
for seq in sequences:
temp = ''
temp = re.findall("[a-z]", seq)
lower_output.append(temp)
print lower_output
but the values are not in separate lists (i still need to join them)
[['d', 'e', 'f', 'g', 'd', 'e', 'f', 'g', 'd', 'e', 'f', 'g'], ['w', 'o', 'w', 'h', 'e', 'l', 'l', 'o', 'o', 'n', 'e', 'm', 'o', 'r', 'e'], []]
Sounds like (I may be misunderstanding your question) you just need to capture runs of lowercase letters, rather than each individual lowercase letter. This is easy: just add the + quantifier to your regular expression.
for seq in sequences:
lower_output.append(re.findall("[a-z]+", seq)) # add substrings
The + quantifier specifies that you want "at least one, and as many as you can find in a row" of the preceding expression (in this case '[a-z]'). So this will capture your full runs of lowercase letters all in one group, which should cause them to appear as you want them to in your output lists.
It gets a little big uglier if you want to preserve your list-of-list structure and get the indices as well, but it's still very simple:
for seq in sequences:
matches = re.finditer("[a-z]+", seq) # List of Match objects.
lower_output.append([match.group(0) for match in matches]) # add substrings
lower_indx.append([match.start(0) for match in matches]) # add indices
print lower_output
>>> [['defgdefgdefg'], ['wowhello', 'onemore'], []]
print lower_indx
>>> [[9], [9, 23], []]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With