In the Python package statsmodels
, LogitResults.pred_table
can be conveniently used to get a "confusion matrix", for arbitrary an arbitrary threshold t
, for a Logit
model of the form
mod_fit = sm.Logit.from_formula('Y ~ a + b + c', train).fit()
...
mod_fit.pred_table(t)
#Conceptually: pred_table(t, predicted=mod_fit.predict(train), observed=train.Y)
Is there a way to get the equivalent information for test data? For example, if I
pred = mod_fit.predict(test)
how do I get the equivalent of
mod_fit.pred_table(t, predicted=pred, observed=test.Y)
Is there a way to get statsmodels
to do this (e.g. a way to build construct a LogitResults
instance from pred
and train.Y
), or does it need to be done "by hand" — and if so how>
That's a good idea and easy to add. Can you post a github issue about it? You can do this with the following code
import numpy as np
pred = np.array(mod_fit.predict(test) > threshold, dtype=float)
table = np.histogram2d(test.Y, pred, bins=2)[0]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With