I am trying to make a simple scatter plot in pyplot using a Pandas DataFrame object, but want an efficient way of plotting two variables but have the symbols dictated by a third column (key). I have tried various ways using df.groupby, but not successfully. A sample df script is below. This colours the markers according to 'key1', but Id like to see a legend with 'key1' categories. Am I close? Thanks.
import numpy as np import pandas as pd import matplotlib.pyplot as plt df = pd.DataFrame(np.random.normal(10,1,30).reshape(10,3), index = pd.date_range('2010-01-01', freq = 'M', periods = 10), columns = ('one', 'two', 'three')) df['key1'] = (4,4,4,6,6,6,8,8,8,8) fig1 = plt.figure(1) ax1 = fig1.add_subplot(111) ax1.scatter(df['one'], df['two'], marker = 'o', c = df['key1'], alpha = 0.8) plt.show() The primary difference of plt. scatter from plt. plot is that it can be used to create scatter plots where the properties of each individual point (size, face color, edge color, etc.) can be individually controlled or mapped to data.
You can use scatter for this, but that requires having numerical values for your key1, and you won't have a legend, as you noticed.
It's better to just use plot for discrete categories like this.  For example:
import matplotlib.pyplot as plt import numpy as np import pandas as pd np.random.seed(1974)  # Generate Data num = 20 x, y = np.random.random((2, num)) labels = np.random.choice(['a', 'b', 'c'], num) df = pd.DataFrame(dict(x=x, y=y, label=labels))  groups = df.groupby('label')  # Plot fig, ax = plt.subplots() ax.margins(0.05) # Optional, just adds 5% padding to the autoscaling for name, group in groups:     ax.plot(group.x, group.y, marker='o', linestyle='', ms=12, label=name) ax.legend()  plt.show() 
If you'd like things to look like the default pandas style, then just update the rcParams with the pandas stylesheet and use its color generator. (I'm also tweaking the legend slightly):
import matplotlib.pyplot as plt import numpy as np import pandas as pd np.random.seed(1974)  # Generate Data num = 20 x, y = np.random.random((2, num)) labels = np.random.choice(['a', 'b', 'c'], num) df = pd.DataFrame(dict(x=x, y=y, label=labels))  groups = df.groupby('label')  # Plot plt.rcParams.update(pd.tools.plotting.mpl_stylesheet) colors = pd.tools.plotting._get_standard_colors(len(groups), color_type='random')  fig, ax = plt.subplots() ax.set_color_cycle(colors) ax.margins(0.05) for name, group in groups:     ax.plot(group.x, group.y, marker='o', linestyle='', ms=12, label=name) ax.legend(numpoints=1, loc='upper left')  plt.show() 
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With