I've created a pandas dataframe reading it from a scipy.io in the following way (file.sav is an IDL structure created on a different machine. The scipy.io creates a standard python dictionary):
from scipy import io
import pandas as p
import numpy as np
tmp=io.readsav('file.sav', python_dict = True)
df=pd.DataFrame(tmp,index=tmp['shots'].astype('int32'))
the dataframe contains a set of values (from file.sav) and as indices a series of integers of the form 19999,20000,30000 etc. Now I would like to take a subset of these indices, says
df.loc[[19999,20000]]
for some reasons I get errors of the form
raise ValueError('Cannot index with multidimensional key')
plus other and at the end
ValueError: Big-endian buffer not supported on little-endian compiler
But I've checked that both the machine I'm working on and the machine which has created the file.sav are both little endian. So I don't think this is the problem.
Your input file is big endian. see here to transform it: http://pandas.pydata.org/pandas-docs/dev/gotchas.html#byte-ordering-issues
Compare before and after
In [7]: df.dtypes
Out[7]:
a >f4
b >f4
c >f4
shots >f4
dtype: object
In [9]: df.apply(lambda x: x.values.byteswap().newbyteorder())
Out[9]:
<class 'pandas.core.frame.DataFrame'>
Int64Index: 100 entries, 20000 to 20099
Data columns (total 4 columns):
a 100 non-null values
b 100 non-null values
c 100 non-null values
shots 100 non-null values
dtypes: float32(4)
In [10]: df.apply(lambda x: x.values.byteswap().newbyteorder()).dtypes
Out[10]:
a float32
b float32
c float32
shots float32
dtype: object
Also set the index AFTER you do this (e.g. don't do it in the constructor)
df.set_index('shots',inplace=True)
From your comments, I would approach the problem in the following way:
values_i_want = [19999, 20000, 20005, 20007]
subset = df.select(lambda x: x[0] in values_i_want)
if your dataframe is very large (sounds like it is), the select method will probably be pretty slow. In that case, another approach would be to loop through values_i_want taking cross sections (df.xs(val, level=0) and appending them to an output dataframe. In other words (untested):
for n, val in enumerate(values_i_want):
if n == 0:
subset = df.xs(val, level=0)
else:
subset = subset.append(df.xs(val, level=0))
Not sure if that'll be any faster. But it's worth trying if the select approach is too slow.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With