You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Right now, the process of fitting a model requires that the user know quite a bit about how survivalstan is built & the details of the various models supported therein.
For example, there are several sets of features which can be combined more or less independently across various models.
These features are:
choice of baseline hazard (aka survival function):
parametric: weibull, exp, gamma, etc (n.b. data typically in "wide" form)
semi-parametric: randomwalk, gamma prior, etc (n.b. data typically in "long" form)
estimate varying-coefficient (yes or no)
right now, takes a single column name to group by
estimate time-varying effects (yes or no)
if yes, then all coefficients are treated as time-varying
In order to use these, the user has to know (1) which model (among those in survivalstan.models) implements the features they desire, assuming such a model exists. Secondly, the user has to know (2) what data format & which inputs the selected model requires. Both of these are unreasonable expectations of the user (per discussion with @julia326).
Ideally, we should enable the user to provide:
their data frame
a patsy formula
currently indicates whether to group by a variable, signals YES to varying-coefficient model
the desired baseline_hazard (with some reasonable default)
(for now) whether to estimate time-varying effects
could also eventually be implemented using patsy timevary(age) + ... syntax
(the above also not yet supported in model code)
In theory, the fit_model function should then prep the data, select the appropriate Stan file, and fit the model. This would be a much cleaner process for fitting a model.
However, some details in the implementation need to be worked out:
Some features (e.g. time-varying effect estimation) are much cleaner to implement in the "long" data form than the wide.
we could (for example) rewrite all models to use the "long" format (see data-format issue, below)
Or, we could throw a FeatureNotImplemented error if the user gives us an invalid combination of inputs
ultimately we will likely want to support varying-coef & time-varying effects (as well as other features) in all models
Second problem is to figure out how to know whether the user has provided data in "long" or "wide" format. If they provide "wide" while the model requires long, we will want to convert to long using prep_data_long_surv. If they provide long & the model requires wide, we throw an error (unless all models are coded to take long data).
It is useful to allow the user to provide "long" data since they may have time-varying covariate values. The auto-convert utility doesn't accommodate these.
However, most of the time the user will likely provide wide data.
One option would be to have a longSurv(..) patsy function which, if given, would signal that the data are in long-format. Otherwise, we assume they are wide.
The text was updated successfully, but these errors were encountered:
Right now, the process of fitting a model requires that the user know quite a bit about how
survivalstan
is built & the details of the various models supported therein.For example, there are several sets of features which can be combined more or less independently across various models.
These features are:
In order to use these, the user has to know (1) which model (among those in
survivalstan.models
) implements the features they desire, assuming such a model exists. Secondly, the user has to know (2) what data format & which inputs the selected model requires. Both of these are unreasonable expectations of the user (per discussion with @julia326).Ideally, we should enable the user to provide:
fit_stan_survival_model
#47timevary(age) + ...
syntaxIn theory, the
fit_model
function should then prep the data, select the appropriate Stan file, and fit the model. This would be a much cleaner process for fitting a model.However, some details in the implementation need to be worked out:
FeatureNotImplemented
error if the user gives us an invalid combination of inputsprep_data_long_surv
. If they provide long & the model requires wide, we throw an error (unless all models are coded to take long data).longSurv(..)
patsy function which, if given, would signal that the data are in long-format. Otherwise, we assume they are wide.The text was updated successfully, but these errors were encountered: