implement svychisq for multiply-imputed data #236

ajdamico · 2017-05-15T02:14:48Z

https://stats.stackexchange.com/questions/78479/how-to-run-chi-squared-test-on-imputed-data

i am not totally sure how to do it (one commenter says there's a mistake). mitools:::MIcombine.default does not properly deal with a list of htest objects (the this_result object below).

if you think it's something you can implement, we can temporarily put it in lodown (which has lots of other dataset specific helper functions, like pnad_postStratify) and ask dr. lumley if he would prefer for it to go in library(mitools)? currently have two others implemented at: https://github.com/ajdamico/lodown/blob/master/R/survey_functions.R

thanks

library(mitools)
library(survey)

imp1 <- data.frame( col1 = sample( 1:5 , 100 , replace = TRUE ) , col2 = sample( 1:5 , 100 , replace = TRUE ) )
imp2 <- data.frame( col1 = sample( 1:5 , 100 , replace = TRUE ) , col2 = sample( 1:5 , 100 , replace = TRUE ) )

imp.design <- svydesign( ~1 , data = imputationList( list( imp1 , imp2 ) ) )

# svytotal works fine
MIcombine( with( imp.design , svytotal( ~col1 ) ) )

this_result <- with( imp.design , svychisq( ~col1+col2 ) )

# svychisq fails inside of mitools:::MIcombine.default
MIcombine( this_result )

# object types
class( this_result )
lapply( this_result , class )

The text was updated successfully, but these errors were encountered:

ajdamico · 2017-05-19T16:42:01Z

@guilhermejacob could you:

(1) add your code here with maybe two easily-runnable test cases
(2) describe what you've done to djalma and see if he agrees with your math
(3) integrate your code into https://github.com/ajdamico/lodown/blob/master/R/survey_functions.R
(4) decide whether to propose it to lumley as an addition to library(mitools)

guilhermejacob · 2017-05-22T19:32:08Z

Ok, I was able to replicate this example (I don't think it's great, but works).
There's a lot of warnings to this method.

And I don't know how to integrate that so It will run on all Wald statistics.
Anyways, I think it's kind of good.

So answering your points in the previous comments:

(1) If you want to git it a try run this:

# our test function
test_fun <- function (results, variances, call = sys.call(), df.complete = Inf, ...) {
  m <- length(results)
  if ( m != 3 ) { warning( "This test was designed for m = 3. Use it cautiously.")}
  oldcall <- attr(results, "call")
  ndf <- results[[1]]$parameter[[1]]
  if (missing(variances)) {
    results <- lapply(results, function(x) x$statistic )
    variances <- sqrt( unlist(results) )
  }
  cbar <- results[[1]]
  for (i in 2:m) {
    cbar <- cbar + results[[i]]
  }
  
  cbar <- cbar/m
  evar <- var(variances)
  
  r <- (1 + 1/m) * evar
  
  D_2 <- ( cbar/ndf - r * ( m + 1 ) / ( m - 1) ) / ( 1 + r )
  
  v3 <- (ndf^(-3/m)) * (m-1) * ( 1 + 1/r)^2
  
  pval <- 1 - stats::pf( D_2 , df1 = ndf , df2 = v3 , ncp = FALSE )[[1]]
  
  rval <- list(statistic = D_2 , ndf = ndf , ddf = v3 , p.value = pval )
  #class(rval) <- "list"
  warning( "The real p-value could be half or twice the presented below.")
  rval
}


library(mitools)
library(survey)

imp1 <- data.frame( col1 = sample( 1:5 , 100 , replace = TRUE ) , col2 = sample( 1:5 , 100 , replace = TRUE ) )
imp2 <- data.frame( col1 = sample( 1:5 , 100 , replace = TRUE ) , col2 = sample( 1:5 , 100 , replace = TRUE ) )

imp.design <- svydesign( ~1 , data = imputationList( list( imp1 , imp2 ) ) )
# imp.design <- as.svrepdesign( imp.design , replicates = 100 )

# svytotal works fine
MIcombine( with( imp.design , svytotal( ~col1 ) ) )

# svytotal works fine
MIcombine( with( imp.design , svymean( ~col1 ) ) )

this_result <- with( imp.design , svychisq( ~col1+col2 , statistic = "Chisq" ) )

# svychisq fails inside of mitools:::MIcombine.default
MIcombine( this_result )

results <- with( imp.design , svychisq( ~col1+col2 ) ) 

test_fun( results = results )

(2) It takes the results statistics, calculates the variance of the statistics square roots across the m imputations, the calculates the statistic in Li et al. (1991). From that we get the p-value using the degrees of freedom from the chi-square and the formula'a denominator degrees of freedom.

(3) We have to think about that. The only real improvement of this function is minimal: it gets the dfs from the chi-square test. If you just supply it with the test, you could use other package like miceadds to do it.

(4) Discussion ahead.

guilhermejacob · 2017-05-22T20:19:04Z

The test was meant for cases when m = 3. So, We have the following warnings:

The pooled Chi^2-test can be used when k is large, if U and B cannot be retrieved, or if only Chi^2-statistics are available. Compared to the other three methods, however, the results from the Chi^2-test are considerably less reliable. The results were optimized for m = 3 and, unlike the other tests, do not necessarily improve for larger m. According to Li et al. (1991a) the true result could be within a range of one half to twice the obtained p-value. This test should only be used as a rough guide. - Van Buuren (2012, p.159).

and

Li, Meng, et al. (1991) used Monte Carlo simulations to study the performance of D2 statistic under a variety of conditions. Their results suggest that type I error rates can either be too high or too low, depending on the fraction of missing information (e.g., when the fraction of missing information was less than 20%, type I errors dropped below the nominal 0.05 level). Their simulations also indicate that D2 has lower power than D1. Considered as a whole, these simulation results suggest that D2 does not yield accurate inferences, and the authors recommend using the procedure “primarily as a screening test statistic” (p. 83). - Enders (2010, p.240).

The original article can be found here: Li et al. (1991).

DjalmaPessoa · 2017-05-29T18:33:32Z

It is not clear to me the step:

variances <- sqrt( unlist(results))

Could you please cite the formula in Li et al. (1991) that you are using to combine the results?

I believe a better way of testing the independence in a two-way table could be to fit a loglinear model using the function svyloglin and testing if the interation effect is null. It might be possible to use MIcombine for that.

guilhermejacob · 2017-05-29T18:55:53Z

I think it's equation 2.2 in page 74. I'm taking the square roots of every imputed result.
The part inside the square brackets in 2.2 is the sample variance of the square roots of each results.
Makes sense, right?

DjalmaPessoa · 2017-05-29T19:25:39Z

are you using the formula in: https://stats.stackexchange.com/questions/78479/how-to-run-chi-squared-test-on-imputed-data with the correction?

…

On Mon, May 29, 2017 at 3:55 PM, Guilherme Jacob ***@***.***> wrote: I think it's equation 2.2 in page 74. I'm taking the square roots of every imputed result. The part inside the square brackets in 2.2 is the sample variance of the square roots of each results. Makes sense, right? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#236 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AFD-p1LbyIiOpqYargztrDwfXSoj5nI4ks5r-xS7gaJpZM4NamHe> .

DjalmaPessoa · 2017-05-29T19:36:00Z

Where is defined the distribution of the test statistic in 2.1. What are the number of degrees of freedom in the numerator and in the denominator of the F distribution?

guilhermejacob · 2017-05-30T17:19:08Z

Yes, I'm using the correct formula. It's also the same in the books above.
The numerator degrees of freedom k comes from are those from the Chi-square test (as in p.67 of Li et al., 1991). The denominator degrees of freedom are defined in equations 2.16 and 2.17 of Li et al. (1991).

DjalmaPessoa · 2017-05-30T17:31:13Z

you mentioned that:

The test was meant for the cases when m = 3 ?

In your example k= (c-1)x(r-1)= 16 , the value of k has no influence in the applicability?

guilhermejacob · 2017-05-30T17:39:28Z

It doesn't seem to be a problem from the books I read, so I don't think it's a problem. Section 2.4 in the article might give some additional information about this.

DjalmaPessoa · 2017-05-30T18:21:12Z

It is good you believe in your books..., Multiple imputation is bit misterious and magic to me... For k=16 if the missingness fraction is high, m=3 sounds magic! The test statistics I got by applying the function *test_fun* in the example is -0.5042679 which is strange since the reference distribution is F, whose support is the positive real line. Peharps you should truncate the value when negative. In the example you used the parameter statistic = "Chisq". Is it valid to use statistic = "Wald" ? The chisq-test for complex surveys has to be corrected to be valid. The Wald test is more natural because in its definition already uses the correct covariance matrix estimated from the sample design.

…

On Tue, May 30, 2017 at 2:39 PM, Guilherme Jacob ***@***.***> wrote: It doesn't seem to be a problem from the books I read, so I don't think it's a problem. Section 2.4 in the article might give some additional information about this. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#236 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AFD-pyoyI1-X9N5sNWCFh8A08qm6Cz4oks5r_FRRgaJpZM4NamHe> .

guilhermejacob · 2017-05-30T18:57:01Z

Sometimes, it's a matter of faith haha
It should work with Wald too. But X^2 uses the Rao-Scott adjustment, if that's what you're asking.

DjalmaPessoa · 2017-05-30T19:07:36Z

I just said I'd rather use Wald's test!

guilhermejacob · 2017-06-06T20:43:33Z

@DjalmaPessoa , you are absolutely right.
After rereading your comments, I decided to start from scratch.
Instead of writing it, I'll just borrow the miceadds::micombine.chisquare function, passing just two arguments: the pooled results from svychisq and the Chi-squared degrees of freedom.

(1) add your code here with maybe two easily-runnable test cases

MIsvychisq<-function(formula, design , statistic = "Chisq" , ... ) {
  
  if ( !( statistic %in% c( "Chisq" ) ) ) { stop( " This method is only implemented for `statistic = 'Chisq'`." ) }
  
  m <- with( design , svychisq( formula , statistic = statistic ) ) 
  
  dk  <- as.numeric( sapply( m , FUN = function( x ) x[["statistic"]] ) ) 
  df <- as.numeric( sapply( m , FUN = function( x ) x[["parameter"]][ "df" ] ) )
  
  return( miceadds::micombine.chisquare( dk=dk, df=df[[1]] , display = TRUE , version = 1 ) ) 
  
}

library(mitools)
library(survey)

imp1 <- data.frame( col1 = sample( 1:5 , 100 , replace = TRUE ) , col2 = sample( 1:5 , 100 , replace = TRUE ) )
imp2 <- data.frame( col1 = sample( 1:5 , 100 , replace = TRUE ) , col2 = sample( 1:5 , 100 , replace = TRUE ) )
imp3 <- data.frame( col1 = sample( 1:5 , 100 , replace = TRUE ) , col2 = sample( 1:5 , 100 , replace = TRUE ) )

imp.design <- svydesign( ~1 , data = imputationList( list( imp1 , imp2 , imp3 ) ) )

MIsvychisq( formula = ~col1+col2 , imp.design )

It defaults to statistic = 'Chisq', as I'm not sure how it works with F-statistics.
(2) describe what you've done to djalma and see if he agrees with your math
Well, there's no math by my part here.
(3) integrate your code into https://github.com/ajdamico/lodown/blob/master/R/survey_functions.R
Will do.
(4) decide whether to propose it to lumley as an addition to library(mitools)
This function is too small to be a meaningful addition to survey.

guilhermejacob · 2017-06-06T21:08:22Z

For pooled F-statistics, this should help: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4029775/pdf/nihms560910.pdf

DjalmaPessoa · 2017-06-13T19:48:13Z

Example using Lumley's suggestion:

library(mitools)
data.dir<-system.file("dta",package="mitools")
files.men<-list.files(data.dir,pattern="m.\.dta$",full=TRUE)
men<-imputationList(lapply(files.men, foreign::read.dta))
files.women<-list.files(data.dir,pattern="f.\.dta$",full=TRUE)
women<-imputationList(lapply(files.women, foreign::read.dta))
men<-update(men, sex=1)
women<-update(women,sex=0)
all<-rbind(men,women)
library(survey)
designs<-svydesign(id=~id, strata=~sex, data=all)
results_loglinear <- with(designs, svyloglin(~sex*alcdos))
MIcombine(results_loglinear)

but if we want to test if the interaction effect between sex and alcdos is null we need to get the p-values for the combined object.
This would correspond to the test using svychisq with Rao-Scott correction.

ajdamico added this to the v0.1.0 milestone May 15, 2017

ajdamico assigned guilhermejacob and ajdamico May 15, 2017

DjalmaPessoa closed this as completed May 30, 2017

ajdamico reopened this May 30, 2017

ajdamico unassigned guilhermejacob Feb 19, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

implement svychisq for multiply-imputed data #236

implement svychisq for multiply-imputed data #236

ajdamico commented May 15, 2017

ajdamico commented May 19, 2017

guilhermejacob commented May 22, 2017

guilhermejacob commented May 22, 2017

DjalmaPessoa commented May 29, 2017

guilhermejacob commented May 29, 2017

DjalmaPessoa commented May 29, 2017 via email

DjalmaPessoa commented May 29, 2017

guilhermejacob commented May 30, 2017

DjalmaPessoa commented May 30, 2017

guilhermejacob commented May 30, 2017

DjalmaPessoa commented May 30, 2017 via email

guilhermejacob commented May 30, 2017

DjalmaPessoa commented May 30, 2017

guilhermejacob commented Jun 6, 2017 •

edited

Loading

guilhermejacob commented Jun 6, 2017

DjalmaPessoa commented Jun 13, 2017

implement svychisq for multiply-imputed data #236

implement svychisq for multiply-imputed data #236

Comments

ajdamico commented May 15, 2017

ajdamico commented May 19, 2017

guilhermejacob commented May 22, 2017

guilhermejacob commented May 22, 2017

DjalmaPessoa commented May 29, 2017

guilhermejacob commented May 29, 2017

DjalmaPessoa commented May 29, 2017 via email

DjalmaPessoa commented May 29, 2017

guilhermejacob commented May 30, 2017

DjalmaPessoa commented May 30, 2017

guilhermejacob commented May 30, 2017

DjalmaPessoa commented May 30, 2017 via email

guilhermejacob commented May 30, 2017

DjalmaPessoa commented May 30, 2017

guilhermejacob commented Jun 6, 2017 • edited Loading

guilhermejacob commented Jun 6, 2017

DjalmaPessoa commented Jun 13, 2017

guilhermejacob commented Jun 6, 2017 •

edited

Loading