Fitting Survival Models in Python From Aggregate Data

Question

I am trying to fit some basic survival models in Python. The crux of the problem is that I have a large number of observations, so my data is binned-- ie every row is not a single subject but a count of subjects with common attributes (ie. same covariates, same time observed, same observed outcome). It seems that the lifelines package requires observation-per-row to fit a Cox model. Similarly, trying to fit a Weibull curve with scipy.stats.continuous_rv.fit runs into the same problem.

I have done this in R without issue, as their survival regression packages all seem to accept a weights vector that allows for this type of data. It seems extremely inefficient to break my aggregated data up into millions of rows when there are only 10s of thousands of unique rows in the original. Any pointers would be appreciated.

John Sears · Accepted Answer

I ended up using the rpy2 package in Python to actually just call R.

importr('survival')
pandas2ri.activate()
coxph_ = r('coxph')
model = coxph_(Formula("Surv(time, outcome) ~ f1 + f2"), data=df, weights=df.num_in_group)

base = importr('base')
print(base.summary(model))

Not great, but gets the job done for now. I was surprised at how good rpy2 was. It has very nice pandas <-> data.frame interoperability. The help is also quite good (rpy2.readthedocs.io).

Fitting Survival Models in Python From Aggregate Data

Tags:

python

r

John Sears

1 Answers

John Sears

Recent Activity

Donate For Us

Fitting Survival Models in Python From Aggregate Data

Tags:

python

r

John Sears

1 Answers

John Sears

Related questions

Recent Activity

Donate For Us