Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarification in the PBMC reference (SeuratData vs Zenodo vs Azimuth website) #231

Open
batalha23 opened this issue Jul 11, 2024 · 2 comments

Comments

@batalha23
Copy link

First of all, thank you for this tool!
I was about to test annotating my query PBMC datasets with Azimuth's reference when I noticed what appeared (to me) to be inconsistencies in the datasets referred as "pbmcref" throughout the tutorials and repositories, could you please clarify if I'm interpreting this wrong?

refAzimuth <- readRDS("~/R/scHER2/data/ref.Rds")
View(refAzimuth)
refAzimuth
An object of class Seurat
5228 features across 36433 samples within 2 assays
Active assay: refAssay (5000 features, 0 variable features)
1 layer present: data
1 other assay present: ADT
2 dimensional reductions calculated: refUMAP, refDR

  • Looking for the "pbmcref" in the SeuratData package (calling AvailableData() ), this dataset only contains 2700 cells and it appears to have been generated with 10x Genomics v1. The Azimuth tutorial (https://satijalab.github.io/azimuth/articles/run_azimuth_tutorial.html), however, explicitly mentions that "pbmcref" is the reference dataset found in the Azimuth webpage. None of the PBMC datasets found in SeuratData appears to correspond to the one stored in Zenodo or the one mentioned in the paper/website:

image

With all this in mind, could you please clarify if these datasets are supposed to be the same or if it's a bug that needs to be fixed? Also, if they are indeed distinct datasets, which of them contains the single-cell RNA and ADT data generated in the paper (Hao and Hao et al, Cell 2021)?

Thank you very much in advance!

@yi6kim
Copy link

yi6kim commented Aug 22, 2024

I have a similar issue here. I didn't use Zenodo to download the data, but instead, I used InstallData('pbmcref') after loading the library(SeuratData).

When I type in 'thepbmc <- LoadData("pbmcref", "azimuth")' to check its dimensions, it shows there are only 5000 genes and 36,433 cells, as opposed to 161,764 cells that it's supposed to have.

Using LoadData("pbmcref") and LoadData("pbmcref.SeuratData") both give me an error, so I've been using LoadData("pbmcref", "azimuth") to view the dataset, as advised here: satijalab/seurat-data#77.

Screenshot 2024-08-22 at 5 17 02 PM

I also see that AvailableData() states that 'pbmcref.SeuratData' has 2700 cells, but probably this is a typo.

Screenshot 2024-08-22 at 4 35 06 PM Screenshot 2024-08-22 at 4 35 27 PM Screenshot 2024-08-22 at 4 35 46 PM

@michael-kotliar
Copy link

michael-kotliar commented Sep 11, 2024

I believe, you can just download the original dataset from here https://atlas.fredhutch.org/nygc/multimodal-pbmc/ and then run the script from here https://github.com/satijalab/azimuth-references/blob/master/human_pbmc/scripts/export.R
Just make sure that your mapping.cells and plotting.cells include all cells (not a subset)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants