Error message "Exception: cannot find the correct atom type" when executing pandas to_hdf

Question

I want to save the dataframe df to the .h5 file MainDataFile.h5 :

df.to_hdf ("c:/Temp/MainDataFile.h5", "MainData", mode = "w", format = "table", data_columns=['_FirstDayOfPeriod','Category','ChannelId'])

and get the following error :

*** Exception: cannot find the correct atom type -> > [dtype->object,items->Index(['Libellé_Article', 'Libellé_segment'], dtype='object')]

Now if I drop the column 'Libellé_Article' from df (which is a string column), I don't get the error message anymore.

What could be wrong with this column ? I suspect a special, forbidden, character in it, but unable to find which so far.

UPDATE 1

Following Jeff's comment I have tried to encode the column 'Libellé_Article' :

df['Libellé_Article'] = df['Libellé_Article'].str.encode('utf-8')

The column now appears like this :

df['Libellé_Article']
0                                               b'PAPETERIE'
2                                    b'NR CONTRIBUTION DEEE'
4                                         b'NON UTILISE 103'
7                         b"L'ENFANT SOUS TERREUR/MILLER A."
10                 b'ENERGIE VITALE ET AUTOGUERISON/CHIA M.'
12         b'ENERGIE COSMIQUE CETTE PUISSANCE QUI EST EN ...
13         b'ENERGIE COSMIQUE CETTE PUISSANCE QUI EST EN ...
18                     b"COMMENT ATTIRER L'ARGENT/MURPHY J."
19                     b"COMMENT ATTIRER L'ARGENT/MURPHY J."

and when I execute the command to_hdf, I get :

*** TypeError: Cannot serialize the column [Libellé_Article] because its data contents are [mixed] object dtype

Jeff · Accepted Answer

This will work in py2. For py3, this should work w/o the encoding step. This is actually a 'mixed' column as it includes strings and unicode.

In [24]: from pandas.compat import u

In [25]: df = DataFrame({'unicode':[u('\u03c3')] * 5 + list('abc') })

In [26]: df
Out[26]: 
  unicode
0       ?
1       ?
2       ?
3       ?
4       ?
5       a
6       b
7       c

In [27]: df['unicode'] = df.unicode.str.encode('utf-8')

In [28]: df.to_hdf('test.h5','df',mode='w',data_columns=['unicode'],format='table')

In [29]: pd.read_hdf('test.h5','df')
Out[29]: 
  unicode
0       ?
1       ?
2       ?
3       ?
4       ?
5       a
6       b
7       c

Error message "Exception: cannot find the correct atom type" when executing pandas to_hdf

Tags:

string

python-3.x

pandas

hdf5

Georges Casamatta

1 Answers

Jeff

Recent Activity

Donate For Us

Error message "Exception: cannot find the correct atom type" when executing pandas to_hdf

Tags:

string

python-3.x

pandas

hdf5

Georges Casamatta

1 Answers

Jeff

Related questions

Recent Activity

Donate For Us