Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas Dataframe return index with inaccurate decimals

I have a Pandas Dataframe like this:

                0         1         2         3         4         5       \
    event_at                                                               
    0.00      1.000000  1.000000  1.000000  1.000000  1.000000  1.000000   
    0.01      0.975381  0.959061  0.979856  0.985625  0.986080  0.976601   
    0.02      0.959103  0.932374  0.966486  0.976037  0.976791  0.961114   
    0.03      0.946154  0.911362  0.955820  0.968362  0.969353  0.948785   
    0.04      0.935378  0.894024  0.946924  0.961940  0.963129  0.938518   
    0.05      0.926099  0.879201  0.939248  0.956385  0.957744  0.929672   
    0.06      0.917608  0.865726  0.932212  0.951282  0.952796  0.921574 
    ......
    0.96      0.072472  0.012264  0.117352  0.217737  0.228561  0.082670   
    0.97      0.066553  0.010632  0.109468  0.207225  0.217870  0.076244   
    0.98      0.060532  0.009069  0.101313  0.196119  0.206555  0.069677   
    0.99      0.054657  0.007642  0.093212  0.184828  0.195031  0.063237   
    1.00      0.019128  0.001314  0.039558  0.100442  0.108064  0.023328

I want to get all indexes

>>> df.index
[0.0, 0.01, 0.02, 0.029999999999999999, 0.040000000000000001, 0.050000000000000003, 0.059999999999999998,
...
0.95999999999999996, 0.96999999999999997, 0.97999999999999998, 0.98999999999999999, 1.0]


# What I expect is like:

    [0.0, 0.01, 0.02, 0.03, 0.04, 0.05, 0.06,
        ...
        0.96, 0.97, 0.98, 0.99, 1.0]

This floating point problem makes me get his exception:

>>> df.loc[0.35].values
Traceback (most recent call last):
  File "I:\Anaconda3\lib\site-packages\pandas\core\indexing.py", line 1395, in _has_valid_type
    error()
  File "I:\Anaconda3\lib\site-packages\pandas\core\indexing.py", line 1390, in error
    (key, self.obj._get_axis_name(axis)))
KeyError: 'the label [0.35] is not in the [index]'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "J:\Workspace\dataset_loader.py", line 171, in <module>
    print(y_pred_cox_alldep.loc[0.35].values)
  File "I:\Anaconda3\lib\site-packages\pandas\core\indexing.py", line 1296, in __getitem__
    return self._getitem_axis(key, axis=0)
  File "I:\Anaconda3\lib\site-packages\pandas\core\indexing.py", line 1466, in _getitem_axis
    self._has_valid_type(key, axis)
  File "I:\Anaconda3\lib\site-packages\pandas\core\indexing.py", line 1403, in _has_valid_type
    error()
  File "I:\Anaconda3\lib\site-packages\pandas\core\indexing.py", line 1390, in error
    (key, self.obj._get_axis_name(axis)))
KeyError: 'the label [0.35] is not in the [index]'
like image 627
Munichong Avatar asked Nov 18 '25 15:11

Munichong


1 Answers

you can do it this way (assuming we want to get a row with a 0.96 index, which is internally represented as 0.95999999999):

In [466]: df.index
Out[466]: Float64Index([0.0, 0.01, 0.02, 0.03, 0.04, 0.05, 0.06, 0.95999999999, 0.97, 0.98, 0.99, 1.0], dtype='float64')

In [467]: df.ix[df.index[np.abs(df.index - 0.96) < 1e-6]]
Out[467]:
             0         1         2         3         4        5
0.96  0.072472  0.012264  0.117352  0.217737  0.228561  0.08267

or, if you can change (round) your index:

In [430]: df.index = [0.0, 0.01, 0.02, 0.03, 0.04, 0.05, 0.06, 0.95999999999, 0.97, 0.98, 0.99, 1.0]

In [431]: df
Out[431]:
             0         1         2         3         4         5
0.00  1.000000  1.000000  1.000000  1.000000  1.000000  1.000000
0.01  0.975381  0.959061  0.979856  0.985625  0.986080  0.976601
0.02  0.959103  0.932374  0.966486  0.976037  0.976791  0.961114
0.03  0.946154  0.911362  0.955820  0.968362  0.969353  0.948785
0.04  0.935378  0.894024  0.946924  0.961940  0.963129  0.938518
0.05  0.926099  0.879201  0.939248  0.956385  0.957744  0.929672
0.06  0.917608  0.865726  0.932212  0.951282  0.952796  0.921574
0.96  0.072472  0.012264  0.117352  0.217737  0.228561  0.082670
0.97  0.066553  0.010632  0.109468  0.207225  0.217870  0.076244
0.98  0.060532  0.009069  0.101313  0.196119  0.206555  0.069677
0.99  0.054657  0.007642  0.093212  0.184828  0.195031  0.063237
1.00  0.019128  0.001314  0.039558  0.100442  0.108064  0.023328

In [432]: df.index
Out[432]: Float64Index([0.0, 0.01, 0.02, 0.03, 0.04, 0.05, 0.06, 0.95999999999, 0.97, 0.98, 0.99, 1.0], dtype='float64')

In [433]: df.ix[.96]
... skipped ...
KeyError: 0.96

let's round the index:

In [434]: df.index = df.index.values.round(2)

In [435]: df.index
Out[435]: Float64Index([0.0, 0.01, 0.02, 0.03, 0.04, 0.05, 0.06, 0.96, 0.97, 0.98, 0.99, 1.0], dtype='float64')

In [436]: df.ix[.96]
Out[436]:
0    0.072472
1    0.012264
2    0.117352
3    0.217737
4    0.228561
5    0.082670
Name: 0.96, dtype: float64

UPDATE: starting from Pandas 0.20.1 the .ix indexer is deprecated, in favor of the more strict .iloc and .loc indexers.

like image 99
MaxU - stop WAR against UA Avatar answered Nov 20 '25 05:11

MaxU - stop WAR against UA



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!