Impute missing values prior to DE analysis #81

cmeesters · 2024-09-02T13:27:24Z

Is your feature request related to a problem? Please describe.
When dealing with missing values (e.g. when dealing with long-sequencing data), some positions might display missing data (e.g. 0) for samples with overall low readcounts. In this case batch effect correction and normalization gets harder proportional to the fraction of missing data.

Describe the solution you'd like
We can try to implement imputation for such missing values, e.g. with the help of the miceforest package. Here, the MICE-approach (Multiple Imputation by Chained Equations) is followed. The code allows to use gradient boosting as well as mean matching schemes.

Additional context
The imputation schemes are limited compared to MICE's implementation in R. However, while a bayesian approach might be farvourable, linear gradiant mean matching might be equivalent in terms of statistical power and introduced errors: The overal dynamics (spread) of the data is narrow and there is no normal distribution (not even approximately). Hence, the features in miceforest ought to fullfil our needs.

Needs to be evaluated after implementation of #78 and a completed CI with #76 .

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Impute missing values prior to DE analysis #81

Impute missing values prior to DE analysis #81

cmeesters commented Sep 2, 2024

Impute missing values prior to DE analysis #81

Impute missing values prior to DE analysis #81

Comments

cmeesters commented Sep 2, 2024