Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Get the values from a histogram or the values from a trace

In plotly I can create a histogram as e.g. in this example code from the documentation:

import plotly.express as px
df = px.data.tips()
fig = px.histogram(df, x="total_bill")
fig.show()

which results to: enter image description here

My question is how do I get the data values of the histogram? From what I can think of, this question should be equivalent to how do I access the values of a trace? (google did not help with either)

I could use numpy to redo the histogram:

import numpy as np
np.histogram(df.total_bill)

But this will not always result to the same buckets, plus it is re-doing all the sometimes expensive computation that goes to create a histogram.

enter image description here

like image 238
ntg Avatar asked Oct 20 '25 15:10

ntg


2 Answers

In the same Plotly Histogram documentation, there's a section called Accessing the counts yaxis values, and it explains that the y values are calculated by the JavaScript in the browser when the figure renders so you can't access it in the figure object (for example, through fig.layout or fig.data, which you might try for other types of charts)

They recommend calculating the counts and bins yourself using np.histogram, then passing these values to px.bar to ensure that your histogram matches the buckets as you intend.

like image 130
Derek O Avatar answered Oct 23 '25 23:10

Derek O


My understanding of your question is that you would like to get the exact intervals and counts displayed in the histogram. For smaller subset of px.data.tips(), this:

enter image description here

And reading off the chart those values would be:

counts = [2, 4, 3, 1]
bins = [5, 15, 25, 35, 45]

There's no direct way to do this, but that doesn't mean it's impossible. At least if you're willing to use the awesome fig.full_figure_for_development() and a little numpy.

Code highlights (complete snippet at the very end)

xbins = f.data[0].xbins
plotbins = list(np.arange(start=xbins['start'], stop=xbins['end']+xbins['size'], step=xbins['size']))
counts, bins = np.histogram(list(f.data[0].x), bins=plotbins)

Output:

[2 4 3 1] [ 5 15 25 35 45]

All the details:

What I'm guessing you would like to be able to do is this:

Run:

fig.data[0].count

And get:

[2, 4, 3, 1]

But the closest you'll get is this:

Run:

fig.data[0].x

And get:

[15.53, 10.07, 12.6 , 32.83, 35.83, 29.03, 27.18, 22.67, 17.82,
   18.78]

And those are just the raw values from the inputdf['total_bill'].tail(10). So DerekO is right in that the rest is handled by javascript. But fig.full_figure_for_development() will:

[...] return a new go.Figure object, prepopulated with the same values you provided, as well as all the default values computed by Plotly.js, to allow you to learn more about what attributes control every detail of your figure and how you can customize them.

So running f = fig.full_figure_for_development(warn=False), and then:

f.data[0].xbins

Will give you:

histogram.XBins({
    'end': 45, 'size': 10, 'start': 5
})

And now you know enough to get the same values in your figure with a little numpy:

Complete code:

import plotly.express as px
import numpy as np

df = px.data.tips()
df = df.tail(10)
fig = px.histogram(df, x="total_bill")
f = fig.full_figure_for_development(warn=False)

xbins = f.data[0].xbins
plotbins = list(np.arange(start=xbins['start'], stop=xbins['end']+xbins['size'], step=xbins['size']))
counts, bins = np.histogram(list(f.data[0].x), bins=plotbins)
print(counts, bins)
like image 30
vestland Avatar answered Oct 23 '25 21:10

vestland