Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Convert jagged array to Pandas dataframe

I'm trying to get a jagged 2D list that looks like this

l = [
    [(1, 0.8656769), (2, 0.08902887), (5, 0.040293545)],
    [(1, 0.5918752), (2, 0.04440181), (4, 0.05204634), (5, 0.3066661)],
    [(1, 0.26327166), (2, 0.26078925), (4, 0.24160784), (5, 0.22958432)],
    [(2, 0.92498404), (5, 0.065140516)],
    [(1, 0.9882947)],
    [(0, 0.23412614), (1, 0.031903207), (2, 0.03044448), (3, 0.6480669), (4, 0.053342175)],
    [(0, 0.056099385), (3, 0.9084766), (5, 0.031809118)],
    [(2, 0.39833495), (4, 0.52058107), (5, 0.077259734)],
    [(0, 0.46812743), (1, 0.10643007), (3, 0.15962379), (4, 0.017917762), (5, 0.24552101)],
    [(0, 0.2556301), (1, 0.7391994)]
]

to become a data frame that looks like this:

enter image description here

In l, each row may or may not contain all columns. Each tuple is structured as follows (column_label, cell_value). If a column is missing for the row, its value should be set to 0 in the data frame.

I've tried

topics_df = pd.DataFrame(l).fillna(0)

but this results in a data frame that looks like this:

enter image description here

like image 740
brienna Avatar asked Oct 28 '25 18:10

brienna


2 Answers

Let us try format the list to dict which panda dataframe can recognized

df = pd.DataFrame(dict(enumerate(list(map(dict,l))))).T.sort_index(axis=1).fillna(0)
Out[17]: 
          0         1         2         3         4         5
0  0.000000  0.865677  0.089029  0.000000  0.000000  0.040294
1  0.000000  0.591875  0.044402  0.000000  0.052046  0.306666
2  0.000000  0.263272  0.260789  0.000000  0.241608  0.229584
3  0.000000  0.000000  0.924984  0.000000  0.000000  0.065141
4  0.000000  0.988295  0.000000  0.000000  0.000000  0.000000
5  0.234126  0.031903  0.030444  0.648067  0.053342  0.000000
6  0.056099  0.000000  0.000000  0.908477  0.000000  0.031809
7  0.000000  0.000000  0.398335  0.000000  0.520581  0.077260
8  0.468127  0.106430  0.000000  0.159624  0.017918  0.245521
9  0.255630  0.739199  0.000000  0.000000  0.000000  0.000000
like image 62
BENY Avatar answered Oct 30 '25 07:10

BENY


You need to change the lists of tuples to a dictionary for pandas to parse it

# l = [{key: val for key, val in row} for row in l]
# df = pd.DataFrame(l).fillna(0).sort_index(axis=1)
df = pd.DataFrame([dict(row) for row in l]).fillna(0).sort_index(axis=1)

Output

          1         2         5         4         0         3
0  0.865677  0.089029  0.040294  0.000000  0.000000  0.000000
1  0.591875  0.044402  0.306666  0.052046  0.000000  0.000000
2  0.263272  0.260789  0.229584  0.241608  0.000000  0.000000
3  0.000000  0.924984  0.065141  0.000000  0.000000  0.000000
4  0.988295  0.000000  0.000000  0.000000  0.000000  0.000000
5  0.031903  0.030444  0.000000  0.053342  0.234126  0.648067
6  0.000000  0.000000  0.031809  0.000000  0.056099  0.908477
7  0.000000  0.398335  0.077260  0.520581  0.000000  0.000000
8  0.106430  0.000000  0.245521  0.017918  0.468127  0.159624
9  0.739199  0.000000  0.000000  0.000000  0.255630  0.000000
like image 44
RichieV Avatar answered Oct 30 '25 07:10

RichieV



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!