Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fitting Survival Models in Python From Aggregate Data

Tags:

python

r

I am trying to fit some basic survival models in Python. The crux of the problem is that I have a large number of observations, so my data is binned-- ie every row is not a single subject but a count of subjects with common attributes (ie. same covariates, same time observed, same observed outcome). It seems that the lifelines package requires observation-per-row to fit a Cox model. Similarly, trying to fit a Weibull curve with scipy.stats.continuous_rv.fit runs into the same problem.

I have done this in R without issue, as their survival regression packages all seem to accept a weights vector that allows for this type of data. It seems extremely inefficient to break my aggregated data up into millions of rows when there are only 10s of thousands of unique rows in the original. Any pointers would be appreciated.

like image 386
John Sears Avatar asked Dec 12 '25 21:12

John Sears


1 Answers

I ended up using the rpy2 package in Python to actually just call R.

importr('survival')
pandas2ri.activate()
coxph_ = r('coxph')
model = coxph_(Formula("Surv(time, outcome) ~ f1 + f2"), data=df, weights=df.num_in_group)

base = importr('base')
print(base.summary(model))

Not great, but gets the job done for now. I was surprised at how good rpy2 was. It has very nice pandas <-> data.frame interoperability. The help is also quite good (rpy2.readthedocs.io).

like image 146
John Sears Avatar answered Dec 14 '25 10:12

John Sears