Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Batch correction #3

Open
3 of 10 tasks
MicTott opened this issue Dec 18, 2023 · 0 comments
Open
3 of 10 tasks

Batch correction #3

MicTott opened this issue Dec 18, 2023 · 0 comments

Comments

@MicTott
Copy link
Collaborator

MicTott commented Dec 18, 2023

Roadmap for batch correction

Batch correction here is tricky as we need to correct for both Samples within species and then merge the data across Species. For this, I think it makes the most sense to correct for Samples wthin species first, and then merge and correct for differences across the Species datasets.

I will perform testing using both MNN and Seurat as per this paper.

Comparing batch correct methods (Harmony vs MNN vs Seurat)

1 . Correct samples within species

  • Remove lowly expressed genes: Keep only those genes that have at least 1 UMI in at least 5% of the data

Within each species...

  • Because this is a very large dataset and each dataset have a different number of total cells, let's randomly subset 10k cells from each species. Do not combine subsets.

For Harmony and MNN

  • Compute approx. GLM-PCA using null residuals according to this workflow: 'scry::nullResiduals' + 'scater::runPCA with chosen HVGs. n=50 components

  • Compute UMAP on using scater::runUMAP on uncorrected GLM-PCA dimred

  • Perform batch correction on sample_id using batchelor::reducedMNN with GLM-PCA dimred; add to reducedDims

  • Compute UMAP on using scater::runUMAP on MNN corrected GLM-PCA dimred

  • Perform batch correction on 'sample_id' using 'Harmony::RunHarmony' with GLM-PCA dimred

  • Compute UMAP on using scater::runUMAP on Harmony corrected GLM-PCA dimred

  • Visualize batch correction results by plotting and printing uncorrected, MNN, and Harmony corrected UMAPs colored by sample_id

GLM-PCA resulted in "stringy" clusters no matter how I adjusted the hyperparameters. Because I previously determined that Seurat was the best for clustering, I moved forward with that (which uses Pearson residuals from negative binomial regression (sctransform) instead of GLM-PCA)

Seurat CCA

  • Perform batch correction using Seurat
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant