I have DataFrames between 100k and 2m in size. the one I am dealing with for this question is this large, but note that I will have to do the same for the other frames:
>>> len(data)
357451
now this file was created by compiling many files, so the index for it is really odd. So all I wanted to do was reindex it with range(len(data)), but I get this error:
>>> data.reindex(index=range(len(data)))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/core/frame.py", line 2542, in reindex
fill_value, limit)
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/core/frame.py", line 2618, in _reindex_index
limit=limit)
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/core/index.py", line 893, in reindex
limit=limit)
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/core/index.py", line 812, in get_indexer
raise Exception('Reindexing only valid with uniquely valued Index '
Exception: Reindexing only valid with uniquely valued Index objects
This actually makes no sense. Since I am reindexing with an array containing numbers 0 through 357450, all Index objects are unique! Why is it returning this error?
Extra info: I am using python2.7 and pandas 11.0
When it complains that Reindexing only valid with uniquely valued Index, it's not objecting that your new index isn't unique, it's objecting that your old one isn't.
For example:
>>> df = pd.DataFrame(range(5), index = [1,2,3,1,2])
>>> df
0
1 0
2 1
3 2
1 3
2 4
>>> df.reindex(index=range(len(df)))
Traceback (most recent call last):
[...]
File "/usr/local/lib/python2.7/dist-packages/pandas-0.12.0.dev_0bd5e77-py2.7-linux-i686.egg/pandas/core/index.py", line 849, in get_indexer
raise Exception('Reindexing only valid with uniquely valued Index '
Exception: Reindexing only valid with uniquely valued Index objects
but
>>> df.index = range(len(df))
>>> df
0
0 0
1 1
2 2
3 3
4 4
Although I think I'd write
df.reset_index(drop=True)
instead.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With