Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SQL-like MAX and GROUP BY over a list of dicts

I have a list of dicts of the structure

list_of_dicts = [  
{'apples': 123, 'bananas': '556685', 'date': some_date},  
{'apples': 99, 'bananas': '556685', 'date': some_date},  
{'apples': 99, 'bananas': '556685', 'date': some_other_date}, 
{'apples': 88, 'bananas': '2345566', 'date': some_other_date}]

plus a few other fields that do not need to be sorted by.

I've already sorted by apples and date, but I am brainfarting on the idea of how to get a list of only the dicts with the most apples per day a lá an SQL query
SELECT max(apples), from TABLE where location in (list of location names) group by date
to get something like

[ {'apples': 123, 'bananas': '556685' 'date': some_date}, {'apples': 99, 'bananas': '556685' 'date': some_other_date}]

I've already tried b = max(temp_top, key = lambda f: (max(f['apples']), f['date'])) but that gives me the dictionary with the most apples over all while I'm trying to get the most apples for each day.

like image 662
Isaac Avatar asked Nov 22 '25 18:11

Isaac


1 Answers

Going straight ahead, no rocket science:

#group by date
unique_dates={v['date'] for v in data}

#calculate the aggregation function for each group
date_maxapples={d,max(v['apples'] for v in data if v['date']==d) for d in unique_dates}

This may not be the fastest way algorithmically (the list is traversed many times). Yet, it's simple and readable while being not very suboptimal, which is the Python's way of doing things. It might actually be faster than a more sophisticated loop with on-the-fly max calculation (as one would do in C) since most functions used are built-ins (see Faster alternatives to numpy.argmax/argmin which is slow for an example of this paradox).

like image 134
ivan_pozdeev Avatar answered Nov 25 '25 07:11

ivan_pozdeev