Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MICE (Multiple Imputation by Chained Equations) implementation #3

Closed
rofinn opened this issue Apr 11, 2017 · 18 comments
Closed

MICE (Multiple Imputation by Chained Equations) implementation #3

rofinn opened this issue Apr 11, 2017 · 18 comments

Comments

@rofinn
Copy link
Member

rofinn commented Apr 11, 2017

No description provided.

@rofinn rofinn closed this as completed Apr 11, 2017
@rofinn rofinn reopened this Apr 11, 2017
@ohadle
Copy link

ohadle commented Aug 7, 2017

+1, this would be awesome.
Running the R version on my data currently takes a week.

@skanskan
Copy link

skanskan commented Dec 5, 2018

Hello.

I was about to open the same issue but I've seen you already created it :)
Does it mean Impute.jl will add this option soon?
Julia has many packages but it seems that none is able to do multiple imputation.

How can I do multiple imputation when I have several variables or I want to use more complex methods?, for example using: fully conditional specification (chained equations, mice), bayesian methods, random forest, multilevel imputation, nested imputation, censored data, categorical data, survival data…

I've also created a thread at Julia discourse but nobody replies.
https://discourse.julialang.org/t/how-to-do-multiple-imputation-on-julia/17713

I would like to move from R to Julia.

@rofinn
Copy link
Member Author

rofinn commented Dec 9, 2018

Does it mean Impute.jl will add this option soon?

I'm intending to switch back to this project in the next couple weeks, so hopefully we should have some basic multiple imputation methods available relatively soon. We need to update this package for our 1.0 migration anyways. Unfortunately, I'll be starting out with some simpler methods like iterative PPCA as MultivariateStats.jl is a pretty minimal dependency relative to GLM.jl.

How can I do multiple imputation when I have several variables or I want to use more complex methods?, for example using: fully conditional specification (chained equations, mice), bayesian methods, random forest, multilevel imputation, nested imputation, censored data, categorical data, survival data…

Right now my recommendation would be to use existing libraries in R, Python or C (e.g., RCall.jl, PyCall.jl). In fact, I was considering starting off by wrapping the R mice package rather than writing it from scratch.

I've also created a thread at Julia discourse but nobody replies.

Yeah, I don't know of anyone else focusing on imputation methods right now and I don't generally follow discourse.

@paulstey
Copy link
Contributor

I would be interested in helping to work on this. It's been a while since I looked at the implementations in R; I remember the mice and mi packages. Is there a consensus as to what is considered the canonical implementations in R (or Python)?

@rofinn
Copy link
Member Author

rofinn commented Aug 24, 2019

I believe the R mice package is well regarded, but I don't think we can look at the implementation due to licensing restrictions (GPL vs MIT). It'd probably be better to work on our own implementation from first principles anyways.

NOTE: Some discussions we've been having internally may set the ground work for this. Specifically, we're considering moving the current Context behaviour into an iterator interface and moving the current single pass algorithms into an Impute.Iterators module. We'd then start supporting a multi-pass approach which would require a Impute.Dataset type that encapsulates (1) original data, (2) mask of missing data and (3) sets of imputed values. This should add some flexibility and also support an API for working with multiply imputed datasets.

@skanskan
Copy link

skanskan commented Dec 5, 2019

Any news?

@Hasnep
Copy link

Hasnep commented Oct 25, 2020

Has anyone made a start on this? If they have I would like to offer some help, and if not I can try making an initial version.

@paulstey
Copy link
Contributor

@Hasnep : Thank you so much!! This completely fell off my radar, but I'm happy to help if you and/or others want to start an initial version!!

@rofinn
Copy link
Member Author

rofinn commented Oct 25, 2020

I don't know of anyone who has worked on it yet. Having a julia MICE implementation has become a pretty low priority for Invenia (vs improving the flexibility of the API), so I doubt I'll have time to work on it. As I mentioned above, I think the path of least resistance would be to create a separate MICE.jl package which uses RCall to wrap the R package for now. If that package wants to extend the Impute.jl API that's great, but I'm not sure it should be a requirement for a first pass. We can always choose to merge the two package later if the internal MICE.jl implementation is ever replaced with a pure julia version.

@skanskan
Copy link

I've being reading some benchmarks and it seems that the fastest R package for imputation is hmisc.
https://www.sciencedirect.com/science/article/abs/pii/S0950705118303381

@azev77
Copy link

azev77 commented Feb 4, 2021

This Julia package by @bethandtownes wraps MICE:
https://github.com/bethandtownes/MatrixCompletion.jl/blob/master/test/simulation_chain_eq.jl

@skanskan
Copy link

skanskan commented Feb 4, 2021

This Julia package by @bethandtownes wraps MICE:
https://github.com/bethandtownes/MatrixCompletion.jl/blob/master/test/simulation_chain_eq.jl

I didn't know it.
I'll try, thanks.

@bethandtownes
Copy link

@skanskan would you like a more full feature completed MICE wrapper? I can try to make that happen.

@skanskan
Copy link

skanskan commented Feb 4, 2021

@bethandtownes The truth is that I don't need to use imputation libraries anymore.

I needed it a year ago because I was writing my thesis, but I have already finished. I am a physicist and I have done a PhD in biostatistics, analyzing large medical databases.

I would have liked to use Julia but I found it lacks some important libraries: multiple imputation and meta-analysis. Then I did the thesis using R, though it is slow and doesn't manage the memory properly.

I don't know if I will ever need MICE again for me, but anyway I think it would be very useful addition to attract new users to Julia.

@azev77
Copy link

azev77 commented Feb 5, 2021

@skanskan would you like a more full feature completed MICE wrapper? I can try to make that happen.

I would love that.

@bethandtownes
Copy link

@skanskan would you like a more full feature completed MICE wrapper? I can try to make that happen.

I would love that.

ok. I will extend the MICE implementation.

@tom-metherell
Copy link

Resurrecting this discussion to say:

I made a start on a MICE implementation in Julia here: https://github.com/tom-metherell/Mice.jl. It's heavily based on the R package, but is written entirely in Julia. I haven't tested it extensively yet, but maybe you'd like to 😊

@rofinn
Copy link
Member Author

rofinn commented Nov 13, 2023

Nice! I don't think there's much to add here as the MICE workflow is sufficiently different (e.g., only works with tables, returns an Mids type) from anything we were doing in Impute.jl. Feel free to open an issue / MR if there's a particular subset of the workflow you think Impute.jl should support. Otherwise, I'm gonna close this issue and folks can continue any ongoing discussion on the MICE.jl repo.

@rofinn rofinn closed this as completed Nov 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants