Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

memory usage tipping point #23

Open
Geoff-Holmes opened this issue Oct 17, 2018 · 2 comments
Open

memory usage tipping point #23

Geoff-Holmes opened this issue Oct 17, 2018 · 2 comments

Comments

@Geoff-Holmes
Copy link

Hi Gianluca

I have run into a problem when trying to build a model and have recreated it below using the bc data augmented with some dummy variables.

Both model 7 and model 9 below are trained very fast with little memory overhead (from the viewpoint of Windows task manager).
Model 8 however, which includes all the terms from 7 and 9, doesn't complete due to eating up my 4GB of spare working memory.

# get some data
data(bc)
N<-nrow(bc)

# create some dummy categorical variables
bc$x1<-round(2*runif(N))
bc$x2<-round(3*runif(N))

# create some 'continuous' variables
bc$x3<-round(10*runif(N))
bc$x4<-round(33*runif(N))

# create indicator variables for the levels of categorical variables to allow interactions
bc$x1.1<-as.factor(bc$x1==1)
bc$x1.2<-as.factor(bc$x1==2)
bc$x2.1<-as.factor(bc$x2==1)
bc$x2.2<-as.factor(bc$x2==2)
bc$x2.3<-as.factor(bc$x2==3)

# create interactions
bc$x1.x2<-as.factor(bc$x1*bc$x2)
bc$x1.1.x3<-as.numeric(bc$x1.1)*bc$x3
bc$x1.2.x3<-as.numeric(bc$x1.2)*bc$x3

# modelling with interaction of categorical variables
   form7<-with(bc, Surv(rectime, censrec)~group+x1.1+x1.2+x2.1+x2.2+x2.3+x1.x2+x3+x4)
      m7<-fit.models(form7, data=bc, distr="rps", k=1)
print(m7)

# modelling with interactions of categorical variable with categorical and continuous variable
   form8<-with(bc, Surv(rectime, censrec)~group+x1.1+x1.2+x2.1+x2.2+x2.3+x1.x2+x3+x4+x1.1.x3+x1.2.x3)
      m8<-fit.models(form8, data=bc, distr="rps", k=1)
print(m8)

# modelling with interaction of categorical variable with continuous variable
   form9<-with(bc, Surv(rectime, censrec)~group+x1.1+x1.2+x3+x1.1.x3+x1.2.x3)
      m9<-fit.models(form9, data=bc, distr="rps", k=1)
print(m9)
@giabaio
Copy link
Owner

giabaio commented Oct 18, 2018

Hi Geoff,
I don't think it's necessarily surprising... I personally find interactions with variables with multiple values rather complex to interpret and analyse anyway. So first of all, I wonder whether you could/should consider constructing specific interactions (eg re-group the variables as low/high and then have interactions to mean both at the low level, both at the high level and the two cross-terms)?

Did you tried to see if the problem is specific to RPS or can you reproduce it for other distributions? And does it change much to use RPS with k=1 --- ie does a single knot improve the fit massively in comparison to the Weibull (which would be reference distribution at k=0)?

Finally, you're kind of exploding the terms here --- there are very many categories in your interactions! May be an issue with memory but equally not an awful lot of data to estimate these many parameters?...

@Geoff-Holmes
Copy link
Author

Hi Gianluca
I tried with a few other distributions (Weibull, genF) and also rps with k=0, and in all these cases it worked fine. The model also works fine with flexsurvspline with any number of knots (up to 5 anyway).
In the MLE estimation is it passed to flexsurv in any case? In which case it should presumably work okay.

I found with survHE I had to split the interacting categorical covariates down into indicators to get it to work in the simpler cases.
In the data I'm really interested in I have N=16,000, and I have found that the one internal knot makes a significant difference (to the AIC).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants