Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Does Pandas DataFrame bracket accessor [ ] prefer columns or rows?

I'm intrigued/confused by this example from the documentation:

Selecting a single column, which yields a Series, equivalent to df.A

In [23]: df['A']  
Out[23]:   
2013-01-01    0.469112  
2013-01-02    1.212112

and

Selecting via [], which slices the rows.

In [24]: df[0:3]  
Out[24]:   
                   A         B         C         D  
2013-01-01  0.469112 -0.282863 -1.509059 -1.135632  
2013-01-02  1.212112 -0.173215  0.119209 -1.044236  
2013-01-03 -0.861849 -2.104569 -0.494929  1.071804  

It's not clear to me how the first accessor "knows" to pick columns and the second knows to pick rows. It's a little annoying to me since I want to access columns by index too.

like image 831
djechlin Avatar asked Oct 29 '25 08:10

djechlin


2 Answers

The primary usage for [] is accessing columns. However, when you pass a slice, it slices the rows:

With DataFrame, slicing inside of [] slices the rows. This is provided largely as a convenience since it is such a common operation.

In order to access columns by integer index, you need to use .iloc. For example to access columns at positions 2 and 3 you would use df.iloc[:, 2:4]. Note that this is based on the position of the columns. You may have columns named 2 and 3 but if they are not at those positions, it will not select them. If you want to select by labels, you would use .loc. For example if you want to get columns B through D: df.loc[:, "B":"D"]. This, unlike the integer slicing, will give you column D, too. For details: http://pandas.pydata.org/pandas-docs/stable/indexing.html

If you want to select column by number, you can't use single brackets:

 df = pd.DataFrame({'a':[1,2,5], 'b':[3,4,6]})

In [260]: df[:,1]
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-260-ff65926f441e> in <module>()
----> 1 df[:,1]

You need to use iloc for selecting columns by number:

In [262]: df.iloc[:,1]
Out[262]:
0    3
1    4
2    6
Name: b, dtype: int64

For both lines slicing and columns selection by number use iloc:

In [263]: df.iloc[0:2,1]
Out[263]:
0    3
1    4
Name: b, dtype: int64

For both lines slicing and selecting columns by names, use loc:

In [267]: df.loc[0:2,'a']
Out[267]:
0    1
1    2
2    5
Name: a, dtype: int64

Hope this helps for slicing/selecting with the different conventions.

like image 38
Colonel Beauvel Avatar answered Oct 31 '25 22:10

Colonel Beauvel



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!