Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

specifying starting values in geeglm #5

Open
DamienGeorges opened this issue Oct 29, 2021 · 5 comments
Open

specifying starting values in geeglm #5

DamienGeorges opened this issue Oct 29, 2021 · 5 comments

Comments

@DamienGeorges
Copy link

I want to use start argument in geeglm model to help it to converge. With default starting values it seems the model runs forever, I suspect specifying starting point for the parameters can help the geeglm not to be stucked.

Specifying starting values lead to the following error:

Error in model.frame.default(formula = mf, data = dietox, start = gee1$coefficients,  : 
  variable lengths differ (found for '(start)')

Here is a dummy reproducible example where I use coefficients from a fitted model (gee1) to specify starting point of the second one (gee1.start)

library(geepack)
data(dietox)
dietox$Cu     <- as.factor(dietox$Cu)
mf <- formula(Weight ~ Cu * (Time + I(Time^2) + I(Time^3)))
gee1 <- geeglm(mf, data=dietox, id=Pig, family=poisson("identity"), corstr="ar1")
gee1.start <- geeglm(mf, data=dietox, id=Pig, family=poisson("identity"), corstr="ar1", start = gee1$coefficients)

this returns the following error:

Error in model.frame.default(formula = mf, data = dietox, start = gee1$coefficients,  : 
  variable lengths differ (found for '(start)')

In geeglm help page it is indicated that the start argument behave the same than in a regular glm but applying the exact same technique with a glm works without any issue as in the dummy example bellow:

## Dobson (1990) Page 93: Randomized Controlled Trial :
counts <- c(18,17,15,20,10,20,25,13,12)
outcome <- gl(3,1,9)
treatment <- gl(3,3)
data.frame(treatment, outcome, counts) # showing data
glm1 <- glm(counts ~ outcome + treatment, family = poisson())
glm1.start <- glm(counts ~ outcome + treatment, family = poisson(), start = glm1$coefficients)

Does anyone know how to use correctly start argument in geeglm function?

@gordy2x
Copy link

gordy2x commented Mar 9, 2022

I have the same problem as DamienGeorges.

@troelsgk
Copy link

Also experiencing this.

@jeffeaton
Copy link

Hello @DamienGeorges, @gordy2x, @troelsgk,

We just stumbled on this same issue. I have made a pull request for a patch for @hojsgaard's consideration: #18

I see the issue is some time ago, but if still relevant, you can test it by installing the patch version:

devtools::install_github("jeffeaton/geepack@patch-geeglm-start-arg")

Thanks,
Jeff

@DamienGeorges
Copy link
Author

Thanks @jeffeaton, too late for the project I was running at that time but I am sure I will have the opportunity to try your patch in the future.
Thanks a lot,
Damien

@ad1729
Copy link

ad1729 commented Dec 17, 2023

Edit: just saw the pull request by Jeff which fixes this issue. However, the pull request ignores the value of start if specified in favor of the glm coefficient estimates, which may not be desirable.


Running into the same issue. After running a local copy of the code line-by-line in debug mode, I see that this issue occurs specifically at line 173 when model.frame() is called:

geepack/R/geeglm.R

Lines 163 to 173 in 4a5ee4c

mf <- call
mf[[1]] <- as.name("model.frame")
mftmp <- mf
to_delete <- c("family", "corstr", "control", "zcor", "std.err", "scale.fix")
mftmp[match(to_delete, names(mftmp))] <- NULL
## mftmp$family <- mftmp$corstr <- mftmp$control <- mftmp$zcor <- mftmp$std.err <- NULL
## mftmp$scale.fix <- NULL
mf <- eval(mftmp, parent.frame())

This is because the start parameter when specified is not NULL , which causes the model.frame(...) call to fail.

Viewing call at line 163 (which is just the geeglm() call):

geeglm(formula = resp ~ age + smoke + age:smoke, family = binomial, 
    data = ohio, start = rep(0, 4), id = id, corstr = "exch", 
    scale.fix = TRUE)

Viewing mf at line 165:

model.frame(formula = resp ~ age + smoke + age:smoke, family = binomial, 
    data = ohio, start = rep(0, 4), id = id, corstr = "exch", 
    scale.fix = TRUE)

model.frame() doesn't accept or need most of these arguments, so they're set to NULL at line 168, after which the mftmp object looks like:

model.frame(formula = resp ~ age + smoke + age:smoke, data = ohio, 
    start = rep(0, 4), id = id)

which when run gives the following error

Browse[2]> eval(mftmp, parent.frame())
Error in model.frame.default(formula = resp ~ age + smoke + age:smoke,  : 
  variable lengths differ (found for '(start)')

The variable start should be added to the vector at line 167 to set this to NULL as well so it doesn't get added to the model.frame() function call.

When line 167 is changed to

Browse[2]> to_delete <- c("family", "corstr", "control", "zcor", "std.err", "scale.fix", "start")
Browse[2]> mftmp[match(to_delete, names(mftmp))] <- NULL
Browse[2]> mftmp
model.frame(formula = resp ~ age + smoke + age:smoke, data = ohio, 
    id = id)

then it runs without errors:

Browse[2]> str(eval(mftmp, parent.frame()))
'data.frame':	2148 obs. of  4 variables:
 $ resp : int  0 0 0 0 0 0 0 0 0 0 ...
 $ age  : int  -2 -1 0 1 -2 -1 0 1 -2 -1 ...
 $ smoke: int  0 0 0 0 0 0 0 0 0 0 ...
 $ (id) : int  0 0 0 0 1 1 1 1 2 2 ...
 - attr(*, "terms")=Classes 'terms', 'formula'  language resp ~ age + smoke + age:smoke
  .. ..- attr(*, "variables")= language list(resp, age, smoke)
  .. ..- attr(*, "factors")= int [1:3, 1:3] 0 1 0 0 0 1 0 1 1
  .. .. ..- attr(*, "dimnames")=List of 2
  .. .. .. ..$ : chr [1:3] "resp" "age" "smoke"
  .. .. .. ..$ : chr [1:3] "age" "smoke" "age:smoke"
  .. ..- attr(*, "term.labels")= chr [1:3] "age" "smoke" "age:smoke"
  .. ..- attr(*, "order")= int [1:3] 1 1 2
  .. ..- attr(*, "intercept")= int 1
  .. ..- attr(*, "response")= int 1
  .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv> 
  .. ..- attr(*, "predvars")= language list(resp, age, smoke)
  .. ..- attr(*, "dataClasses")= Named chr [1:4] "numeric" "numeric" "numeric" "numeric"
  .. .. ..- attr(*, "names")= chr [1:4] "resp" "age" "smoke" "(id)"

I haven't tested it for other parameters such as etastart, mustart, etc. but I expect them to fail in the same way as these are not parameters used by model.frame().

However, the glm call a few lines above for getting the initial coefficients runs fine, and uses values for start if passed to geeglm() call:

geepack/R/geeglm.R

Lines 146 to 151 in 4a5ee4c

glmcall <- call
glmcall$id <- glmcall$jack <- glmcall$control <- glmcall$corstr <-
glmcall$waves <- glmcall$zcor <- glmcall$std.err <-
glmcall$scale.fix <- glmcall$scale.value <- NULL
glmcall[[1]] <- as.name("glm")
glmFit <- eval(glmcall, parent.frame())

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants