Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas DataFrame column (Series) has different index than the Dataframe?

Consider this small script:

import pandas as pd

df = pd.DataFrame({'a': [1,2,3]})
b = df.a
b.index = b.index + 1
df['b'] = b
print(df)
print(df.a - df.b)

the output is:

   a    b
0  1  NaN
1  2  1.0
2  3  2.0

0    NaN
1    0.0
2    0.0
3    NaN

while I was expecting df.a - df.b to be

0    NaN
1    1.0
2    1.0

How is this possible? Is it a Pandas bug?

like image 962
Parapapuppolo Avatar asked Oct 20 '25 16:10

Parapapuppolo


2 Answers

aa = pd.DataFrame({'a': [1,2,3]})
bb = aa.a
bb.index = bb.index + 1
aa['b'] = bb
aa.reset_index(drop=True)  # add this

your index does not match.

like image 106
User Avatar answered Oct 23 '25 09:10

User


When you do aa.b - aa.a , you're substracting 2 pandas.Series having a same lenght, but not the same index :

aa.a

1    1
2    2
3    3
Name: a, dtype: int64

Where as:

aa.b

0    NaN
1    1.0
2    2.0
Name: b, dtype: float64

And when you do :

print(aa.b - aa.a)

you're printing the merge of these 2 pandas.Series (regardless the operation type : addition or substraction), and that's why the indices [0,1,2] and [1,2,3] will merged to a new index from 0 to 3 : [0,1,2,3].

And for instance, if you shift of 2 your bb.index instead of 1:

bb.index = bb.index + 2

that time, you will have 5 rows in your new pandas.Series instead of 4. And so on..

bb.index = bb.index + 2
aa['b'] = bb
print(aa.a - aa.b)

0    NaN
1    NaN
2    0.0
3    NaN
4    NaN
dtype: float64
like image 24
Pascal G. Bernard Avatar answered Oct 23 '25 08:10

Pascal G. Bernard



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!