Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas indexing and Key error

Consider the following:

d = {'a': 0.0, 'b': 1.0, 'c': 2.0}

e = pd.Series(d, index = ['a', 'b', 'c'])

df = pd.DataFrame({ 'A' : 1.,'B' : e,'C' :pd.Timestamp('20130102')}).

When i try to access the first row of column B in the following way:

>>> df.B[0]
0.0

I get the correct result.

However, after reading KeyError: 0 when accessing value in pandas series, I was under the assumption that, since I have specified the index as 'a', 'b' and 'c', the correct way to access the first row of column B (using positional arguments) is: df.B.iloc[0] , and df.B[0] should raise a Key Error. I dont know what am I missing. Can someone clarify in which case do I get a Key Error ?

like image 371
Yash Avatar asked Sep 07 '25 01:09

Yash


1 Answers

Problem in your referenced Question is that index of given dataframe is integer, but does not start from 0.

Pandas behaviour when asking for df.B[0] is ambiguous and depends on data type of index and data type of value passed to python slice syntax. It can behave like df.B.loc[0] (index label based) or df.B.iloc[0] (position based) or probably something else I'm not aware of. For predictable behaviour I recommend using loc and iloc.

To illustrate this with your example:

d = [0.0, 1.0, 2.0]
e = pd.Series(d, index = ['a', 'b', 'c'])
df = pd.DataFrame({'A': 1., 'B': e, 'C': pd.Timestamp('20130102')})

df.B[0] # 0.0 - fall back to position based
df.B['0'] # KeyError - no label '0' in index
df.B['a'] # 0.0 - found label 'a' in index
df.B.loc[0] # TypeError - string index queried by integer value
df.B.loc['0'] # KeyError - no label '0' in index
df.B.loc['a'] # 0.0 - found label 'a' in index
df.B.iloc[0] # 0.0 - position based query for row 0
df.B.iloc['0'] # TypeError - string can't be used for position
df.B.iloc['a'] # TypeError - string can't be used for position

With example from referenced article:

d = [0.0, 1.0, 2.0]
e = pd.Series(d, index = [4, 5, 6])
df = pd.DataFrame({'A': 1., 'B': e, 'C': pd.Timestamp('20130102')})

df.B[0] # KeyError - label 0 not in index
df.B['0'] # KeyError - label '0' not in index
df.B.loc[0] # KeyError - label 0 not in index
df.B.loc['0'] # KeyError - label '0' not in index
df.B.iloc[0] # 0.0 - position based query for row 0
df.B.iloc['0'] # TypeError - string can't be used for position
like image 189
Justinas Marozas Avatar answered Sep 09 '25 14:09

Justinas Marozas