Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

multi-index pandas dataframe to a dictionary

I have a dataframe as following:

raw_data = {'regiment': ['Nighthawks', 'Nighthawks', 'Nighthawks', 'Nighthawks', 'Dragoons', 'Dragoons', 'Dragoons', 'Dragoons', 'Scouts', 'Scouts', 'Scouts', 'Scouts'],
    'company': ['1st', '1st', '2nd', '2nd', '1st', '1st', '2nd', '2nd','1st', '1st', '2nd', '2nd'],
    'name': ['Miller', 'Jacobson', 'Ali', 'Milner', 'Cooze', 'Jacon', 'Ryaner', 'Sone', 'Sloan', 'Piger', 'Riani', 'Ali'],
    'preTestScore': [4, 24, 31, 2, 3, 4, 24, 31, 2, 3, 2, 3],
    'postTestScore': [25, 94, 57, 62, 70, 25, 94, 57, 62, 70, 62, 70]}

df = pd.DataFrame(raw_data, columns = ['regiment', 'company', 'name', 'preTestScore', 'postTestScore'])

if I groupby by two columns and count the size,

df.groupby(['regiment','company']).size()

I get the following:

regiment    company
Dragoons    1st        2
            2nd        2
Nighthawks  1st        2
            2nd        2
Scouts      1st        2
            2nd        2
dtype: int64

What I want as an output is a dictionary as following:

{'Dragoons':{'1st':2,'2nd':2},
 'Nighthawks': {'1st':2,'2nd':2}, 
  ... }

I tried different methods but to no avail. Is there relatively clean way to achieve the above?

Thank you so much in advance!!!!

like image 438
user4279562 Avatar asked Sep 07 '25 00:09

user4279562


2 Answers

You can add Series.unstack with DataFrame.to_dict:

d = df.groupby(['regiment','company']).size().unstack().to_dict(orient='index')
print (d)
{'Dragoons': {'2nd': 2, '1st': 2}, 
 'Nighthawks': {'2nd': 2, '1st': 2}, 
 'Scouts': {'2nd': 2, '1st': 2}}

Another solution, very similar as another answer:

from collections import Counter

df = {i: dict(Counter(x['company'])) for i, x in df.groupby('regiment')}
print (df)
{'Dragoons': {'2nd': 2, '1st': 2}, 
'Nighthawks': {'2nd': 2, '1st': 2}, 
'Scouts': {'2nd': 2, '1st': 2}}

But if use first solution, there hs to be problem with NaNs (it depends of data)

Sample:

raw_data = {'regiment': ['Nighthawks', 'Nighthawks', 'Nighthawks', 'Nighthawks', 'Dragoons', 'Dragoons', 'Dragoons', 'Dragoons', 'Scouts', 'Scouts', 'Scouts', 'Scouts'],
    'company': ['1st', '1st', '2nd', '2nd', '1st', '1st', '2nd', '2nd','1st', '1st', '2nd', '3rd'],
    'name': ['Miller', 'Jacobson', 'Ali', 'Milner', 'Cooze', 'Jacon', 'Ryaner', 'Sone', 'Sloan', 'Piger', 'Riani', 'Ali'],
    'preTestScore': [4, 24, 31, 2, 3, 4, 24, 31, 2, 3, 2, 3],
    'postTestScore': [25, 94, 57, 62, 70, 25, 94, 57, 62, 70, 62, 70]}

df = pd.DataFrame(raw_data, columns = ['regiment', 'company', 'name', 'preTestScore', 'postTestScore'])
print (df)
      regiment company      name  preTestScore  postTestScore
0   Nighthawks     1st    Miller             4             25
1   Nighthawks     1st  Jacobson            24             94
2   Nighthawks     2nd       Ali            31             57
3   Nighthawks     2nd    Milner             2             62
4     Dragoons     1st     Cooze             3             70
5     Dragoons     1st     Jacon             4             25
6     Dragoons     2nd    Ryaner            24             94
7     Dragoons     2nd      Sone            31             57
8       Scouts     1st     Sloan             2             62
9       Scouts     1st     Piger             3             70
10      Scouts     2nd     Riani             2             62
11      Scouts     3rd       Ali             3             70

df1 = df.groupby(['regiment','company']).size().unstack()
print (df1)
company     1st  2nd  3rd
regiment                 
Dragoons    2.0  2.0  NaN
Nighthawks  2.0  2.0  NaN
Scouts      2.0  1.0  1.0

d = df1.to_dict(orient='index')
print (d)
{'Dragoons': {'3rd': nan, '2nd': 2.0, '1st': 2.0}, 
'Nighthawks': {'3rd': nan, '2nd': 2.0, '1st': 2.0}, 
'Scouts': {'3rd': 1.0, '2nd': 1.0, '1st': 2.0}}

Then is necessary use:

d = {i: dict(Counter(x['company'])) for i, x in df.groupby('regiment')}
print (d)
{'Dragoons': {'2nd': 2, '1st': 2}, 
'Nighthawks': {'2nd': 2, '1st': 2},
 'Scouts': {'3rd': 1, '2nd': 1, '1st': 2}}

Or another John Galt answer.

like image 113
jezrael Avatar answered Sep 08 '25 12:09

jezrael


You can reset the index after group by and pivot your data as per your need. Below code gives the required output.

df = df.groupby(['regiment','company']).size().reset_index()
print(pd.pivot_table(df, values=0, index='regiment', columns='company').to_dict(orient='index'))

output:

{'Nighthawks': {'2nd': 2, '1st': 2}, 'Scouts': {'2nd': 2, '1st': 2}, 'Dragoons': {'2nd': 2, '1st': 2}}
like image 32
Akshay Kandul Avatar answered Sep 08 '25 14:09

Akshay Kandul