You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
When dealing with missing values (e.g. when dealing with long-sequencing data), some positions might display missing data (e.g. 0) for samples with overall low readcounts. In this case batch effect correction and normalization gets harder proportional to the fraction of missing data.
Describe the solution you'd like
We can try to implement imputation for such missing values, e.g. with the help of the miceforest package. Here, the MICE-approach (Multiple Imputation by Chained Equations) is followed. The code allows to use gradient boosting as well as mean matching schemes.
Additional context
The imputation schemes are limited compared to MICE's implementation in R. However, while a bayesian approach might be farvourable, linear gradiant mean matching might be equivalent in terms of statistical power and introduced errors: The overal dynamics (spread) of the data is narrow and there is no normal distribution (not even approximately). Hence, the features in miceforest ought to fullfil our needs.
Needs to be evaluated after implementation of #78 and a completed CI with #76 .
The text was updated successfully, but these errors were encountered:
Is your feature request related to a problem? Please describe.
When dealing with missing values (e.g. when dealing with long-sequencing data), some positions might display missing data (e.g. 0) for samples with overall low readcounts. In this case batch effect correction and normalization gets harder proportional to the fraction of missing data.
Describe the solution you'd like
We can try to implement imputation for such missing values, e.g. with the help of the miceforest package. Here, the MICE-approach (Multiple Imputation by Chained Equations) is followed. The code allows to use gradient boosting as well as mean matching schemes.
Additional context
The imputation schemes are limited compared to MICE's implementation in R. However, while a bayesian approach might be farvourable, linear gradiant mean matching might be equivalent in terms of statistical power and introduced errors: The overal dynamics (spread) of the data is narrow and there is no normal distribution (not even approximately). Hence, the features in
miceforest
ought to fullfil our needs.Needs to be evaluated after implementation of #78 and a completed CI with #76 .
The text was updated successfully, but these errors were encountered: