I'm intrigued/confused by this example from the documentation:
Selecting a single column, which yields a Series, equivalent to df.A
In [23]: df['A']
Out[23]:
2013-01-01 0.469112
2013-01-02 1.212112
and
Selecting via [], which slices the rows.
In [24]: df[0:3]
Out[24]:
A B C D
2013-01-01 0.469112 -0.282863 -1.509059 -1.135632
2013-01-02 1.212112 -0.173215 0.119209 -1.044236
2013-01-03 -0.861849 -2.104569 -0.494929 1.071804
It's not clear to me how the first accessor "knows" to pick columns and the second knows to pick rows. It's a little annoying to me since I want to access columns by index too.
The primary usage for [] is accessing columns. However, when you pass a slice, it slices the rows:
With DataFrame, slicing inside of [] slices the rows. This is provided largely as a convenience since it is such a common operation.
In order to access columns by integer index, you need to use .iloc. For example to access columns at positions 2 and 3 you would use df.iloc[:, 2:4]. Note that this is based on the position of the columns. You may have columns named 2 and 3 but if they are not at those positions, it will not select them. If you want to select by labels, you would use .loc. For example if you want to get columns B through D: df.loc[:, "B":"D"]. This, unlike the integer slicing, will give you column D, too. For details: http://pandas.pydata.org/pandas-docs/stable/indexing.html
If you want to select column by number, you can't use single brackets:
df = pd.DataFrame({'a':[1,2,5], 'b':[3,4,6]})
In [260]: df[:,1]
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-260-ff65926f441e> in <module>()
----> 1 df[:,1]
You need to use iloc for selecting columns by number:
In [262]: df.iloc[:,1]
Out[262]:
0 3
1 4
2 6
Name: b, dtype: int64
For both lines slicing and columns selection by number use iloc:
In [263]: df.iloc[0:2,1]
Out[263]:
0 3
1 4
Name: b, dtype: int64
For both lines slicing and selecting columns by names, use loc:
In [267]: df.loc[0:2,'a']
Out[267]:
0 1
1 2
2 5
Name: a, dtype: int64
Hope this helps for slicing/selecting with the different conventions.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With