I am working with a large dataset (1 million + rows) and am tasked with counting up each truthy value for each ID and generating a new dict.
The solution I came up with works, but does very poor in regards to performance.
I have a dictionary as follow:
"data": {
"employees": [
{
"id": 274,
"report": true
},
{
"id": 274,
"report": false
},
{
"id": 276,
"report": true
},
{
"id": 276,
"report": true
},
{
"id": 278,
"report": true
},
{
"id": 278,
"report": false
}
]
}
I am looking to create a new dictionary with each individual employee ID with a count of each true value.
Something like this:
{274: {'id': 274, 'count': 1}, 276: {'id': 276, 'count': 2}, 278: {'id': 278, 'count': 1}}
My current code:
final_dict = {}
for employee in result["data"]["employees"]:
if employee["id"] not in final_dict.keys():
final_dict[employee["id"]] = {"id": employee["id"]}
grouped_results = [res for res in result["data"]["employees"] if
employee["id"] == res['id']]
final_dict[employee["id"]]["count"] = len(
[res for res in grouped_results if res["report"]]
)
return final_dict
This does what it needs to do, but with the amount of data that is being processed it does very poorly.
I am looking for some advice on how to avoid the multiple loops, in order to improve performance. Any advice helps!
There is no need to make multiple passes, just accumulate as you go so it is linear time not quadratic
result = {}
for employee in input_dict["data"]["employees"]:
_id = employee["id"]
if _id not in result:
# note id is being added redundantly maybe rethink this
result[_id] = dict(id=_id, count=0)
result[_id]["count"] += employee["report"]
With dict.setdefault function:
report_counts = {}
for employee in result["data"]["employees"]:
d = report_counts.setdefault(employee['id'], {'id': employee['id'], 'count': 0})
d['count'] += employee['report']
print(report_counts)
{274: {'id': 274, 'count': 1}, 276: {'id': 276, 'count': 2}, 278: {'id': 278, 'count': 1}}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With