So I have a DataFrame, I labeled the columns a - i.   I want to make a Dictionary of Dictionaries where the outer key is column "a", the inner key is column "d", and the value is "e".  I know how to do this by iterating through each row, but I feel like there is a more efficient way to do this using DataFrame.to_dict() but I can't figure out how...maybe DataFrame.group_by could help but that seems to be used for grouping column or index IDs. 
How can I use pandas (or numpy) to create a Dictionary of Dictionaries efficiently without iterating through each row?  I've shown an example of my current method and what the desired output should be below.
#!/usr/bin/python
import numpy as np
import pandas as pd
tmp_array = np.array([['AAA', 86880690, 86914111, '22RV1', 2, 2, 'H', '-'], ['ABA', 86880690, 86914111, 'A549', 2, 2, 'L', '-'], ['AAC', 86880690, 86914111, 'BFTC-905', 3, 3, 'H', '-'], ['AAB', 86880690, 86914111, 'BT-20', 2, 2, 'H', '-'], ['AAA', 86880690, 86914111, 'C32', 2, 2, 'H', '-']])
DF = pd.DataFrame(tmp_array,columns=["a,b,c,d,e,g,h,i".split(",")])
#print(DF)
a         b         c         d  e  g  h  i
0  AAA  86880690  86914111     22RV1  2  2  H  -
1  ABA  86880690  86914111      A549  2  2  L  -
2  AAC  86880690  86914111  BFTC-905  3  3  H  -
3  AAB  86880690  86914111     BT-20  2  2  H  -
4  AAA  86880690  86914111       C32  2  2  H  -
from collections import defaultdict
from itertools import izip
D_a_d_e = defaultdict(dict)
for a,d,e in izip(DF["a"],DF["d"],DF["e"]):
    D_a_d_e[a][d] = e
#print(D_a_d_e)
#ignore the defaultdict part
defaultdict(<type 'dict'>, {'ABA': {'A549': '2'}, 'AAA': {'22RV1': '2', 'C32': '2'}, 'AAC': {'BFTC-905': '3'}, 'AAB': {'BT-20': '2'}})
I saw this https://stackoverflow.com/questions/28820254/how-to-create-a-pandas-dataframe-using-a-dictionary-in-a-single-column but it was a little different and it also doesn't have an answer.
To create a dictionary from two column values, we first create a Pandas series with the column for keys as index and the other column as values. And then we can apply Pandas' to_dict() function to get dictionary.
To convert pandas DataFrame to Dictionary object, use to_dict() method, this takes orient as dict by default which returns the DataFrame in format {column -> {index -> value}} . When no orient is specified, to_dict() returns in this format.
Series method. To make a series from a dictionary, simply pass the dictionary to the command pandas. Series method. The keys of the dictionary form the index values of the series and the values of the dictionary form the values of the series.
to_dict() method is used to convert a dataframe into a dictionary of series or list like data type depending on orient parameter. Parameters: orient: String value, ('dict', 'list', 'series', 'split', 'records', 'index') Defines which dtype to convert Columns(series into).
There's a to_dict method:
In [11]: DF.to_dict()
Out[11]:
{'a': {0: 'AAA', 1: 'ABA', 2: 'AAC', 3: 'AAB', 4: 'AAA'},
 'b': {0: '86880690', 1: '86880690', 2: '86880690' 3: '86880690', 4: '86880690'},
 'c': {0: '86914111', 1: '86914111', 2: '86914111', 3: '86914111', 4: '86914111'},
 'd': {0: '22RV1', 1: 'A549', 2: 'BFTC-905', 3: 'BT-20', 4: 'C32'},
 'e': {0: '2', 1: '2', 2: '3', 3: '2', 4: '2'},
 'g': {0: '2', 1: '2', 2: '3', 3: '2', 4: '2'},
 'h': {0: 'H', 1: 'L', 2: 'H', 3: 'H', 4: 'H'},
 'i': {0: '-', 1: '-', 2: '-', 3: '-', 4: '-'}}
In [12]: DF.to_dict(orient="index")
Out[12]:
{0: {'a': 'AAA', 'b': '86880690', 'c': '86914111', 'd': '22RV1', 'e': '2', 'g': '2', 'h': 'H', 'i': '-'},
 1: {'a': 'ABA', 'b': '86880690', 'c': '86914111', 'd': 'A549', 'e': '2', 'g': '2', 'h': 'L', 'i': '-'},
 2: {'a': 'AAC', 'b': '86880690', 'c': '86914111', 'd': 'BFTC-905', 'e': '3', 'g': '3', 'h': 'H', 'i': '-'},
 3: {'a': 'AAB', 'b': '86880690', 'c': '86914111', 'd': 'BT-20', 'e': '2', 'g': '2', 'h': 'H', 'i': '-'},
 4: {'a': 'AAA', 'b': '86880690', 'c': '86914111', 'd': 'C32', 'e': '2', 'g': '2', 'h': 'H', 'i': '-'}}
With that in mind you can do the groupby:
In [21]: DF.set_index("d").groupby("a")[["e"]].apply(lambda x: x["e"].to_dict())
Out[21]:
a
AAA    {'C32': '2', '22RV1': '2'}
AAB                {'BT-20': '2'}
AAC             {'BFTC-905': '3'}
ABA                 {'A549': '2'}
dtype: object
That said, you may be able to use a straight up MultiIndex instead of a dictionary of dictionaries:
In [31]: res = DF.set_index(["a", "d"])["e"]
In [32]: res
Out[32]:
a    d
AAA  22RV1       2
ABA  A549        2
AAC  BFTC-905    3
AAB  BT-20       2
AAA  C32         2
Name: e, dtype: object
It'll work much the same way:
In [33]: res["AAA"]
Out[33]:
d
22RV1    2
C32      2
Name: e, dtype: object
In [34]: res["AAA"]["22RV1"]
Out[34]: '2'
But will be a more space-efficient / you're still in pandas.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With