I'm wondering why it is dtf.loc[x] instead of dtf.loc(x). I've read that loc is a property so it makes sense because it isn't a function call, but now I don't know why someone made it a property (don't know a lot about properties) instead of a function - it would be more intuitive to me.
<rant>Because Pandas is not Python</rant>...
More seriously, Pandas is a nice library which uses the poserful numpy (part os scipy) module to process large arrays at C speed. But it comes at the price of some caveats:
loc is just a property of a dataframe that returns the indexer. It has to be a specific property because df[x] is already defined to be the x column of the df dataframeIt could have been a mere function, which would be less disrupting for Python users. But it was essential too to make clear that it was an indexing access. And (the reason for my initial rant), efficiency and consistency with numpy is more important in Pandas that consistency with core Python. A good example for that is the equality between two Series. For consistency with numpy, is is also a Series and not a boolean. But it just break a number of Python goodies, preventing to use in to check whether a Python container of Series contains a specific Series:
a = pd.Series([1, 2, 3])
b = pd.Series([4, 5, 6])
pd.Series([1, 2, 3]) in [a, b]
raises:
Traceback (most recent call last):
File "<pyshell#5>", line 1, in <module>
pd.Series([1, 2, 3]) in [a, b]
File "...Python39\site-packages\pandas\core\generic.py", line 1442, in __nonzero__
raise ValueError(
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
What to learn from that: just accept the fact that Pandas syntax is sometimes inconsistant with Python normal usage...
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With