Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Column of lists to column of tuples in pandas

I have been scratching my head trying to figure out why my attempts to create a column of tuple from a column of lists doesn't work as I expect it to. So, Here is sommething that does work:

def convert(list):
    return tuple(list)

list = ['aaaa', 'aaaa', 'd33f3f', 'sdjd3i']
print(convert(list))


('aaaa', 'aaaa', 'd33f3f', 'sdjd3i')

Now, in my case my dataframe looks like this:

  CLOUD_STORE_ID                     CalculationId        Date
0             772  ff80818178f93bd9017bf5094e602b05  2021-09-17
1             772  ff80818178f93bd9017bf509b4f52b0a  2021-09-17
2             772  ff80818178f93bd9017bf50ccb2c2b20  2021-09-17
0             773  ff80818178f93bd9017bf510f2a12b40  2021-09-17
1             773  ff80818178f93bd9017bf514e2582b55  2021-09-17
2             773  ff80818178f93e26017bf510b2b30c0e  2021-09-17
0             900  ff80818178f93bd9017bf510791f2b39  2021-09-17
1             900  ff80818178f93bd9017bf5109db22b3b  2021-09-17
2             900  ff80818178f93bd9017bf5139e9a2b4d  2021-09-17
0             777  ff80818178f93e26017bf50a780e0bec  2021-09-17

but I need list of strinngs:

CalculationID_df = CalculationID_df.groupby('CLOUD_STORE_ID').agg({ 'CalculationId': lambda x: '{}'.format(list(x))}).reset_index()

which gives:

CLOUD_STORE_ID                                      CalculationId
0             486  ['ff80818178f93bd9017bf521bbfd2b6b', 'ff808181...
1             522               ['ff80818178f93e26017bf4efb6fb0bb6']
2             665  ['ff80818178f93bd9017bf4f7dda12acb', 'ff808181...
3             671  ['ff80818178f93bd9017bf51890802b5d', 'ff808181...
4             772  ['ff80818178f93bd9017bf5094e602b05', 'ff808181...

I actually need tuple to pass to a SQL-query, so I need to transform these lists into tuples. I thought that the above method would do the trick:

def convert(list):
    return tuple(list)
CalculationID_df['Calcs'] = CalculationID_df.CalculationId.apply(convert)

But it returns:

CLOUD_STORE_ID                                      CalculationId  \
0             486  ['ff80818178f93bd9017bf521bbfd2b6b', 'ff808181...   
1             522               ['ff80818178f93e26017bf4efb6fb0bb6']   
2             665  ['ff80818178f93bd9017bf4f7dda12acb', 'ff808181...   

                                               Calcs  
0  ([, ', f, f, 8, 0, 8, 1, 8, 1, 7, 8, f, 9, 3, ...  
1  ([, ', f, f, 8, 0, 8, 1, 8, 1, 7, 8, f, 9, 3, ...  
2  ([, ', f, f, 8, 0, 8, 1, 8, 1, 7, 8, f, 9, 3, ..

which is clearly not what I expected.

What am I doing wrong?

like image 515
Serge de Gosson de Varennes Avatar asked Dec 21 '25 10:12

Serge de Gosson de Varennes


2 Answers

The problem is that your list is a string "['dasdas','dasd']" you need to convert it to a list before your convert it to tuples.

After this row:

CalculationID_df = CalculationID_df.groupby('CLOUD_STORE_ID').agg({ 'CalculationId': lambda x: '{}'.format(list(x))}).reset_index()

Add this code:

import ast
CalculationID_df.CalculationId = CalculationID_df.CalculationId.apply(lambda x: ast.literal_eval(x))

And then convert it to tuples as you did before.

Result:

enter image description here

like image 162
Niv Dudovitch Avatar answered Dec 24 '25 05:12

Niv Dudovitch


IIUC

>>> df.groupby('CLOUD_STORE_ID')['CalculationId'] \
      .apply(tuple).astype(str).reset_index()

   CLOUD_STORE_ID                                      CalculationId
0             772  ('ff80818178f93bd9017bf5094e602b05', 'ff808181...
1             773  ('ff80818178f93bd9017bf510f2a12b40', 'ff808181...
2             777              ('ff80818178f93e26017bf50a780e0bec',)
3             900  ('ff80818178f93bd9017bf510791f2b39', 'ff808181...
like image 21
Corralien Avatar answered Dec 24 '25 04:12

Corralien



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!