I want to save the dataframe df to the .h5 file MainDataFile.h5 :
df.to_hdf ("c:/Temp/MainDataFile.h5", "MainData", mode = "w", format = "table", data_columns=['_FirstDayOfPeriod','Category','ChannelId'])
and get the following error :
*** Exception: cannot find the correct atom type -> > [dtype->object,items->Index(['Libellé_Article', 'Libellé_segment'], dtype='object')]
Now if I drop the column 'Libellé_Article' from df (which is a string column), I don't get the error message anymore.
What could be wrong with this column ? I suspect a special, forbidden, character in it, but unable to find which so far.
UPDATE 1
Following Jeff's comment I have tried to encode the column 'Libellé_Article' :
df['Libellé_Article'] = df['Libellé_Article'].str.encode('utf-8')
The column now appears like this :
df['Libellé_Article']
0 b'PAPETERIE'
2 b'NR CONTRIBUTION DEEE'
4 b'NON UTILISE 103'
7 b"L'ENFANT SOUS TERREUR/MILLER A."
10 b'ENERGIE VITALE ET AUTOGUERISON/CHIA M.'
12 b'ENERGIE COSMIQUE CETTE PUISSANCE QUI EST EN ...
13 b'ENERGIE COSMIQUE CETTE PUISSANCE QUI EST EN ...
18 b"COMMENT ATTIRER L'ARGENT/MURPHY J."
19 b"COMMENT ATTIRER L'ARGENT/MURPHY J."
and when I execute the command to_hdf, I get :
*** TypeError: Cannot serialize the column [Libellé_Article] because its data contents are [mixed] object dtype
This will work in py2. For py3, this should work w/o the encoding step. This is actually a 'mixed' column as it includes strings and unicode.
In [24]: from pandas.compat import u
In [25]: df = DataFrame({'unicode':[u('\u03c3')] * 5 + list('abc') })
In [26]: df
Out[26]:
unicode
0 ?
1 ?
2 ?
3 ?
4 ?
5 a
6 b
7 c
In [27]: df['unicode'] = df.unicode.str.encode('utf-8')
In [28]: df.to_hdf('test.h5','df',mode='w',data_columns=['unicode'],format='table')
In [29]: pd.read_hdf('test.h5','df')
Out[29]:
unicode
0 ?
1 ?
2 ?
3 ?
4 ?
5 a
6 b
7 c
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With