Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Getting 'pred_table' information for predicted values of a model in 'statsmodels'

In the Python package statsmodels, LogitResults.pred_table can be conveniently used to get a "confusion matrix", for arbitrary an arbitrary threshold t, for a Logit model of the form

mod_fit = sm.Logit.from_formula('Y ~ a + b + c', train).fit() 
...
mod_fit.pred_table(t) 
#Conceptually: pred_table(t, predicted=mod_fit.predict(train), observed=train.Y)

Is there a way to get the equivalent information for test data? For example, if I

pred = mod_fit.predict(test)

how do I get the equivalent of

mod_fit.pred_table(t, predicted=pred, observed=test.Y)

Is there a way to get statsmodels to do this (e.g. a way to build construct a LogitResults instance from pred and train.Y), or does it need to be done "by hand" — and if so how>

like image 295
orome Avatar asked Sep 02 '25 02:09

orome


1 Answers

That's a good idea and easy to add. Can you post a github issue about it? You can do this with the following code

import numpy as np
pred = np.array(mod_fit.predict(test) > threshold, dtype=float)
table = np.histogram2d(test.Y, pred, bins=2)[0]
like image 198
jseabold Avatar answered Sep 05 '25 09:09

jseabold