first of all sorry for the easy question, but i cannot figure out the easiest way to code my problem.
I have a directory with several different file but with common elements (the values_25,_26,_28, etc.) as:
xxxxx_25.txt
xxxxx_26.txt
xxxxx_27.txt
xxxxx_28.txt
yyyyy_25.txt
yyyyy_26.txt
yyyyy_27.txt
yyyyy_29.txt
mmmmm_25.txt
mmmmm_26.txt
mmmmm_27.txt
mmmmm_30.txt
I wish to get lists as
xxxxx_25.txt
yyyyy_25.txt
mmmmm_25.txt
xxxxx_26.txt
yyyyy_26.txt
mmmmm_26.txt
xxxxx_27.txt
yyyyy_27.txt
mmmmm_27.txt
xxxxx_28.txt
yyyyy_29.txt
mmmmm_30.txt
import re
list_with_file_names = 'xxxx_25.txt xxxxx_26.txt xxxxx_27.txt xxxxx_28.txt yyyyy_25.txt yyyyy_26.txt yyyyy_27.txt yyyyy_29.txt mmmmm_25.txt mmmmm_26.txt mmmmm_27.txt mmmmm_30.txt'.split()
def get_number_and_prefix(text):
g = re.match('.*(\S+)(\d+)', text)
return tuple([
int(g.group(2)),
g.group(1)])
nice_list = sorted(list_with_file_names, key=get_number_and_prefix)
Tuples returned from get_number_and_prefix will be sorted first by the number, and later by the prefix
If, instead, you want to group based on the number in filename, you can use something like this:
def update_dict_with_file(dict_, filename):
g = re.match('.*(\d+)', filename)
key = g.group(1)
t = dict_.setdefault(key,[])
t.append(filename)
mydict = {}
[update_dict_with_file(mydict, filename)
for filename in list_with_file_names]
mydict now contains numbers from file names as keys, and lists with file names as values
Edit
To summarise all the answers until now, all you need is to build a sorted list out of your list, using a key getter function that extracts whatever you want from your filenames. You can do it by either fancy one-liner with itertools + list comprehension, or a lengthier for loop (no yieldanywhere?). But, basically, they are all the same. No rocket science.
This will do it:
list_of_files = [
'xxxxx_25.txt',
'xxxxx_26.txt',
'xxxxx_27.txt',
'xxxxx_28.txt',
'yyyyy_25.txt',
'yyyyy_26.txt',
'yyyyy_27.txt',
'yyyyy_29.txt',
'mmmmm_25.txt',
'mmmmm_26.txt',
'mmmmm_27.txt',
'mmmmm_30.txt',
]
import re
regex = re.compile('_([0-9]+)\.txt$')
def keyfn(name):
match = regex.search(name)
if match is None:
return None
else:
return match.group(1)
import itertools
for (key, group) in itertools.groupby(sorted(list_of_files,key=keyfn),keyfn):
print [x for x in group]
or if you want a list of lists as a result, replace the for loop with:
[x for g in itertools.groupby(sorted(list_of_files,key=keyfn),keyfn) for x in g[1]]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With