Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Convert dataframe of dictionary entries to dataframe of all entries based on existing dictionary

I have a pandas dataframe that consists of an id and an associated count of different encoded words. For instance:

Original = pd.DataFrame(data=[[1,'1:2,2:3,3:1'],[2,'2:2,4:3']], columns=['id','words'])

I have a dictionary that has the mapping to the actual words, for instance:

WordDict = {1:'A',2:'B',3:'C',4:'D'}

What I would like to do is create a new dataframe that maps the counts to columns for all possible words, so it would look something like:

Final =pd.DataFrame(data=[[1,2,3,1,0],[2,0,2,0,3]], columns=['id','A','B','C','D']).set_index('id')

I know I can split the 'words' column of the original into separate columns, and can create a dataframe from WordDict so that it has all possible columns, but could not figure out how to create the mapping.

like image 349
Adam W. Avatar asked Oct 12 '25 11:10

Adam W.


1 Answers

You can use a regex, a list comprehension, and the DataFrame constructor:

import re

Final = pd.DataFrame([{WordDict.get(int(k), None): v
                      for k,v in re.findall('([^:,]+):([^:,]+)', s)}
                      for s in Original['words']], index=Original['id']
                     ).fillna(0).astype(int)

Or with split:

Final = pd.DataFrame([{WordDict.get(int(k), None): v
                       for x in s.split(',')
                       for k,v in [x.split(':')]}
                      for s in Original['words']], index=Original['id']
                     ).fillna(0).astype(int)

Or ast.literal_eval:

from ast import literal_eval

Final = pd.DataFrame([{WordDict.get(k, None): v for k,v in literal_eval(f'{{{s}}}').items()}
                      for s in Original['words']], index=Original['id']
                     ).fillna(0, downcast='infer')

Output:

    A  B  C  D
id            
1   2  3  1  0
2   0  2  0  3
like image 200
mozway Avatar answered Oct 14 '25 00:10

mozway