The following code consistently produces histograms with bins that are empty, even when the number of samples are large. The empty bins seem to have a regular spacing, but are the same width as other normal bins. This is obviously wrong - why is this happening? It seems like either the rvs method is non-random, or the hist binning procedure is hooped. Also, try altering the number of bins to 50, and another weirdness emerges. In this case, it looks like every other bin has a spuriously high count associated with it.
""" An example of how to plot histograms using matplotlib
This example samples from a Poisson distribution, plots the histogram
and overlays the Gaussian with the same mean and standard deviation
"""
from scipy.stats import poisson
from scipy.stats import norm
from matplotlib import pyplot as plt
#import matplotlib.mlab as mlab
EV = 100   # the expected value of the distribution
bins = 100 # number of bins in our histogram
n = 10000
RV = poisson(EV)  # Define a Poisson-distributed random variable
samples = RV.rvs(n)  # create a list of n random variates drawn from that random variable
events, edges, patches = plt.hist(samples, bins, normed = True, histtype = 'stepfilled')  # make a histogram
print events  # When I run this, some bins are empty, even when the number of samples is large
# the pyplot.hist method returns a tuple containing three items. These are events, a list containing
# the counts for each bin, edges, a list containing the values of the lower edge of each bin
# the final element of edges is the value of the high edge of the final bin
# patches, I'm not quite sure about, but we don't need at any rate
# note that we really only need the edges list, but we need to unpack all three elements of the tuple
# for things to work properly, so events and patches here are really just dummy variables
mean = RV.mean()  # If we didn't know these values already, the mean and std methods are convenience
sd = RV.std()     # methods that allow us to retrieve the mean and standard deviation for any random variable
print "Mean is:", mean, " SD is: ", sd
#print edges
Y = norm.pdf(edges, mean, sd)  # this is how to do it with the sciPy version of a normal PDF
# edges is a list, so this will return a list Y with normal pdf values corresponding to each element of edges
binwidth = (len(edges)) / (max(edges) - min(edges))
Y = Y * binwidth
print "Binwidth is:", 1/binwidth
# The above is a fix to "de-normalize" the normal distribution to properly reflect the bin widths
#Q = [edges[i+1] - edges[i] for i in range(len(edges)-1)]
#print Q  # This was to confirm that the bins are equally sized, which seems to be the case.
plt.plot(edges, Y)
plt.show()

A Histogram Plot is a basic visualization for showing the distribution of values for a single metric. On the Histogram Plot, the X-axis contains bars with binned data reflecting ranges of attribute values. Values are left inclusive and right exclusive, except the last bin which is right inclusive.
In Matplotlib, we use the hist() function to create histograms. The hist() function will use an array of numbers to create a histogram, the array is sent into the function as an argument.
Bins are the number of intervals you want to divide all of your data into, such that it can be displayed as bars on a histogram. A simple method to work our how many bins are suitable is to take the square root of the total number of values in your distribution.
You specify the number of bins using the bins keyword argument of plt. hist() . The plotting utilities are already imported and the seaborn defaults already set.
The empty bins are to be expected when your input data only takes integer values (as is the case for the Poisson RV) and you have more bins than this interval. If that's the case you'll have bins that will never capture a sample and some bins that will capture more than one intervals sample. Change the number of bins and the range to capture an integer interval and the gaps go away.
plt.hist(samples, 
         range=(0,samples.max()),
         bins=samples.max()+1, 
         normed = True, histtype = 'stepfilled')

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With