I am trying to fit some basic survival models in Python. The crux of the problem is that I have a large number of observations, so my data is binned-- ie every row is not a single subject but a count of subjects with common attributes (ie. same covariates, same time observed, same observed outcome). It seems that the lifelines package requires observation-per-row to fit a Cox model. Similarly, trying to fit a Weibull curve with scipy.stats.continuous_rv.fit runs into the same problem.
I have done this in R without issue, as their survival regression packages all seem to accept a weights vector that allows for this type of data. It seems extremely inefficient to break my aggregated data up into millions of rows when there are only 10s of thousands of unique rows in the original. Any pointers would be appreciated.
I ended up using the rpy2 package in Python to actually just call R.
importr('survival')
pandas2ri.activate()
coxph_ = r('coxph')
model = coxph_(Formula("Surv(time, outcome) ~ f1 + f2"), data=df, weights=df.num_in_group)
base = importr('base')
print(base.summary(model))
Not great, but gets the job done for now. I was surprised at how good rpy2 was. It has very nice pandas <-> data.frame interoperability. The help is also quite good (rpy2.readthedocs.io).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With