In plotly I can create a histogram as e.g. in this example code from the documentation:
import plotly.express as px
df = px.data.tips()
fig = px.histogram(df, x="total_bill")
fig.show()
which results to:

My question is how do I get the data values of the histogram? From what I can think of, this question should be equivalent to how do I access the values of a trace? (google did not help with either)
I could use numpy to redo the histogram:
import numpy as np
np.histogram(df.total_bill)
But this will not always result to the same buckets, plus it is re-doing all the sometimes expensive computation that goes to create a histogram.

In the same Plotly Histogram documentation, there's a section called Accessing the counts yaxis values, and it explains that the y values are calculated by the JavaScript in the browser when the figure renders so you can't access it in the figure object (for example, through fig.layout or fig.data, which you might try for other types of charts)
They recommend calculating the counts and bins yourself using np.histogram, then passing these values to px.bar to ensure that your histogram matches the buckets as you intend.
My understanding of your question is that you would like to get the exact intervals and counts displayed in the histogram. For smaller subset of px.data.tips(), this:

And reading off the chart those values would be:
counts = [2, 4, 3, 1]
bins = [5, 15, 25, 35, 45]
There's no direct way to do this, but that doesn't mean it's impossible. At least if you're willing to use the awesome fig.full_figure_for_development() and a little numpy.
xbins = f.data[0].xbins
plotbins = list(np.arange(start=xbins['start'], stop=xbins['end']+xbins['size'], step=xbins['size']))
counts, bins = np.histogram(list(f.data[0].x), bins=plotbins)
[2 4 3 1] [ 5 15 25 35 45]
What I'm guessing you would like to be able to do is this:
Run:
fig.data[0].count
And get:
[2, 4, 3, 1]
But the closest you'll get is this:
Run:
fig.data[0].x
And get:
[15.53, 10.07, 12.6 , 32.83, 35.83, 29.03, 27.18, 22.67, 17.82,
18.78]
And those are just the raw values from the inputdf['total_bill'].tail(10). So DerekO is right in that the rest is handled by javascript. But fig.full_figure_for_development() will:
[...] return a new go.Figure object, prepopulated with the same values you provided, as well as all the default values computed by Plotly.js, to allow you to learn more about what attributes control every detail of your figure and how you can customize them.
So running f = fig.full_figure_for_development(warn=False), and then:
f.data[0].xbins
Will give you:
histogram.XBins({
'end': 45, 'size': 10, 'start': 5
})
And now you know enough to get the same values in your figure with a little numpy:
import plotly.express as px
import numpy as np
df = px.data.tips()
df = df.tail(10)
fig = px.histogram(df, x="total_bill")
f = fig.full_figure_for_development(warn=False)
xbins = f.data[0].xbins
plotbins = list(np.arange(start=xbins['start'], stop=xbins['end']+xbins['size'], step=xbins['size']))
counts, bins = np.histogram(list(f.data[0].x), bins=plotbins)
print(counts, bins)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With