I have the following code from a Jupyter notebook:
housing.plot(kind="scatter", x="longitude", y="latitude",
s=housing["population"]/100, alpha=0.4, label="population", figsize=(10,7),
c="median_house_value", cmap=plt.get_cmap("jet"), colorbar=True,
sharex=False)
I can't seem to find what is meant by the parameters s and c anywhere in the documentation. Can someone please explain?
housing.plot with kind='scatter' is a pandas function which passes most of its parameters to matplotlib's scatter plot. When a parameter is given as a string (e.g. "median_house_value"), pandas interprets this string as a pandas column name, and the values of that column are passed to matplotlib.
So, c="median_house_value" gives the values of that column as a list to the c= parameter of matplotlib's scatter. There c= is shorthand for color=. When getting a list of numbers as a color, matplotlib first normalizes the list to values between 0 and 1, and then looks up that value in its colormap.
The s=housing["population"]/100 gives a list of each value of the "population" column divided by 100 to matplotlib's s= parameter. This defines the size of the markers, where the size is interpreted as the area of the marker, not its diameter.
Note the awkward **kwargs in the documentation. This is a list of additional parameters which are passed to deeper functions, e.g. to the function that plots lines.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With