Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Impute missing values prior to DE analysis #81

Open
cmeesters opened this issue Sep 2, 2024 · 0 comments
Open

Impute missing values prior to DE analysis #81

cmeesters opened this issue Sep 2, 2024 · 0 comments

Comments

@cmeesters
Copy link
Collaborator

Is your feature request related to a problem? Please describe.
When dealing with missing values (e.g. when dealing with long-sequencing data), some positions might display missing data (e.g. 0) for samples with overall low readcounts. In this case batch effect correction and normalization gets harder proportional to the fraction of missing data.

Describe the solution you'd like
We can try to implement imputation for such missing values, e.g. with the help of the miceforest package. Here, the MICE-approach (Multiple Imputation by Chained Equations) is followed. The code allows to use gradient boosting as well as mean matching schemes.

Additional context
The imputation schemes are limited compared to MICE's implementation in R. However, while a bayesian approach might be farvourable, linear gradiant mean matching might be equivalent in terms of statistical power and introduced errors: The overal dynamics (spread) of the data is narrow and there is no normal distribution (not even approximately). Hence, the features in miceforest ought to fullfil our needs.

Needs to be evaluated after implementation of #78 and a completed CI with #76 .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant