I have a set of data from which I want to plot the number of keys per unique id count  (x=unique_id_count, y=key_count), and I'm trying to learn how to take advantage of pandas.
In this case:
unique_ids 1 = key count 2
unique_ids 2 = key count 1
from pandas import * key_items = ("a", "a", "a", "a", "a", "b", "b", "b", "b", "b", "c", "c", "c") id_data = ("X", "X", "X", "X", "X", "X", "X", "Y", "Y", "Y", "X", "X", "X")  df = DataFrame({'keys': key_items, 'ids': id_data}) I've managed to mangle the data into what I want by pulling out the data from the dataframe and restructuring it, and rebuilding a new dataframe. In this case it's probably better to do it all in python without pandas...
unique_values = defaultdict(list) for items in df.itertuples(index=False):     key = items[1]     v = items[0]     unique_values[key].append(v)  unique_values_count = {} for k, values in unique_values.iteritems():     unique_values_count[k] = [len(set(values))]  # reformat for plotting key_col = ("a", "b", "c") id_col = [unique_values_count[k][0] for k in key_col]    df2 = DataFrame({"keys":key_col, "unique_id_count": id_col}) df2.groupby("unique_id_count").size().plot(kind="bar") Is there a better way to do this more directly using the initial dataframe?
You can use the nunique() function to count the number of unique values in a pandas DataFrame.
We can count by using the value_counts() method. This function is used to count the values present in the entire dataframe and also count values in a particular column.
How about just directly use value_counts()
pd.value_counts(df['ids']).plot.bar() 
s = df.groupby("keys").ids.agg(lambda x:len(x.unique())) pd.value_counts(s).plot(kind="bar") If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With