I have a dataframe as following:
raw_data = {'regiment': ['Nighthawks', 'Nighthawks', 'Nighthawks', 'Nighthawks', 'Dragoons', 'Dragoons', 'Dragoons', 'Dragoons', 'Scouts', 'Scouts', 'Scouts', 'Scouts'],
'company': ['1st', '1st', '2nd', '2nd', '1st', '1st', '2nd', '2nd','1st', '1st', '2nd', '2nd'],
'name': ['Miller', 'Jacobson', 'Ali', 'Milner', 'Cooze', 'Jacon', 'Ryaner', 'Sone', 'Sloan', 'Piger', 'Riani', 'Ali'],
'preTestScore': [4, 24, 31, 2, 3, 4, 24, 31, 2, 3, 2, 3],
'postTestScore': [25, 94, 57, 62, 70, 25, 94, 57, 62, 70, 62, 70]}
df = pd.DataFrame(raw_data, columns = ['regiment', 'company', 'name', 'preTestScore', 'postTestScore'])
if I groupby by two columns and count the size,
df.groupby(['regiment','company']).size()
I get the following:
regiment company
Dragoons 1st 2
2nd 2
Nighthawks 1st 2
2nd 2
Scouts 1st 2
2nd 2
dtype: int64
What I want as an output is a dictionary as following:
{'Dragoons':{'1st':2,'2nd':2},
'Nighthawks': {'1st':2,'2nd':2},
... }
I tried different methods but to no avail. Is there relatively clean way to achieve the above?
Thank you so much in advance!!!!
You can add Series.unstack
with DataFrame.to_dict
:
d = df.groupby(['regiment','company']).size().unstack().to_dict(orient='index')
print (d)
{'Dragoons': {'2nd': 2, '1st': 2},
'Nighthawks': {'2nd': 2, '1st': 2},
'Scouts': {'2nd': 2, '1st': 2}}
Another solution, very similar as another answer:
from collections import Counter
df = {i: dict(Counter(x['company'])) for i, x in df.groupby('regiment')}
print (df)
{'Dragoons': {'2nd': 2, '1st': 2},
'Nighthawks': {'2nd': 2, '1st': 2},
'Scouts': {'2nd': 2, '1st': 2}}
But if use first solution, there hs to be problem with NaN
s (it depends of data)
Sample:
raw_data = {'regiment': ['Nighthawks', 'Nighthawks', 'Nighthawks', 'Nighthawks', 'Dragoons', 'Dragoons', 'Dragoons', 'Dragoons', 'Scouts', 'Scouts', 'Scouts', 'Scouts'],
'company': ['1st', '1st', '2nd', '2nd', '1st', '1st', '2nd', '2nd','1st', '1st', '2nd', '3rd'],
'name': ['Miller', 'Jacobson', 'Ali', 'Milner', 'Cooze', 'Jacon', 'Ryaner', 'Sone', 'Sloan', 'Piger', 'Riani', 'Ali'],
'preTestScore': [4, 24, 31, 2, 3, 4, 24, 31, 2, 3, 2, 3],
'postTestScore': [25, 94, 57, 62, 70, 25, 94, 57, 62, 70, 62, 70]}
df = pd.DataFrame(raw_data, columns = ['regiment', 'company', 'name', 'preTestScore', 'postTestScore'])
print (df)
regiment company name preTestScore postTestScore
0 Nighthawks 1st Miller 4 25
1 Nighthawks 1st Jacobson 24 94
2 Nighthawks 2nd Ali 31 57
3 Nighthawks 2nd Milner 2 62
4 Dragoons 1st Cooze 3 70
5 Dragoons 1st Jacon 4 25
6 Dragoons 2nd Ryaner 24 94
7 Dragoons 2nd Sone 31 57
8 Scouts 1st Sloan 2 62
9 Scouts 1st Piger 3 70
10 Scouts 2nd Riani 2 62
11 Scouts 3rd Ali 3 70
df1 = df.groupby(['regiment','company']).size().unstack()
print (df1)
company 1st 2nd 3rd
regiment
Dragoons 2.0 2.0 NaN
Nighthawks 2.0 2.0 NaN
Scouts 2.0 1.0 1.0
d = df1.to_dict(orient='index')
print (d)
{'Dragoons': {'3rd': nan, '2nd': 2.0, '1st': 2.0},
'Nighthawks': {'3rd': nan, '2nd': 2.0, '1st': 2.0},
'Scouts': {'3rd': 1.0, '2nd': 1.0, '1st': 2.0}}
Then is necessary use:
d = {i: dict(Counter(x['company'])) for i, x in df.groupby('regiment')}
print (d)
{'Dragoons': {'2nd': 2, '1st': 2},
'Nighthawks': {'2nd': 2, '1st': 2},
'Scouts': {'3rd': 1, '2nd': 1, '1st': 2}}
Or another John Galt answer.
You can reset the index after group by and pivot your data as per your need. Below code gives the required output.
df = df.groupby(['regiment','company']).size().reset_index()
print(pd.pivot_table(df, values=0, index='regiment', columns='company').to_dict(orient='index'))
output:
{'Nighthawks': {'2nd': 2, '1st': 2}, 'Scouts': {'2nd': 2, '1st': 2}, 'Dragoons': {'2nd': 2, '1st': 2}}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With