Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Number of mapped reads" from log file #144

Open
BingjieZhang opened this issue Nov 28, 2023 · 3 comments
Open

"Number of mapped reads" from log file #144

BingjieZhang opened this issue Nov 28, 2023 · 3 comments

Comments

@BingjieZhang
Copy link

Hello Chromap Team,

Thank you very much for actively maintaining the chromap!

I recently used Chromap for mapping scATAC-seq data with a barcode whitelist. I found that the log file is a bit confusing. As stated in the documentation, when barcodes and a whitelist are given as input, Chromap will, by default, estimate barcode abundance and perform barcode correction.

I am looking to understand the following QC numbers from the log file:

  1. The total number of mapped reads (regardless of whether the read has a valid cell barcode or not).
  2. The number of mapped reads that have a valid cell barcode.
  3. The number of deduplicated, uniquely mapped reads.

In relation to these questions:

For Q1, should I refer to the "Number of mapped reads" in the log file?
For Q2, what does "Number of barcodes in whitelist" represent? Does it indicate the number of barcodes, or the number of reads with the whitelisted barcodes?

Number of reads: 153220788.
Number of mapped reads: 71076182.
Number of uniquely mapped reads: 66746464.
Number of reads have multi-mappings: 4329718.
Number of candidates: 842925044.
Number of mappings: 71076182.
Number of uni-mappings: 66746464.
Number of multi-mappings: 4329718.
Number of barcodes in whitelist: 37926511.
Number of corrected barcodes: 3772723.
Sorted, deduped and outputed mappings in 48.51s.
uni-mappings: 32567263, # multi-mappings: 1919563, total: 34486826.
Number of output mappings (passed filters): 30174748

These metrics are very useful for my experimental debugging, and I would greatly appreciate your clarification.

@mourisl
Copy link
Collaborator

mourisl commented Nov 29, 2023

For the "number of mapped reads", I believe it is only from those barcode-valid (barcode in the whitelist or corrected barcode) reads.

Q1: If you need the number for unfilterer mapped reads, "number of mapped reads" is the place to look at. What number do you have in mind?
Q2: That is the number of reads with the whitelisted barcodes.

@BingjieZhang
Copy link
Author

Thanks for your responses! Sorry, but I'm not sure if I fully understand what you mean. What do you mean by 'unfiltered' mapped reads? I prefer to know the number of mapped reads regardless of whether the reads have a valid cell barcode or not. I am trying to figure out why I started with 153,220,788 reads, but ended up with only 30,174,748, lol. The reason I feel confused is that for the same sample, I also did a bulk mapping with Bowtie2. As you can see below, the mapping rate is okay, with an 86.75% overall alignment rate and a 56% unique mapping rate (Bowtie2 counts paired-end fragments once, so it's half the number compared to Chromap, but they are mapped with the same FASTQ files).

However, for Chromap, even before deduplication, the ratio is 37,926,511/153,220,788 = 24.7% So, I want to know at which step I am losing reads. If Number of mapped reads: 71,076,182 already includes valid barcodes filtering step (filtered by the whitelist), what are the filtered reads between Number of uni-mappings: 66,746,464 and Number of barcodes in whitelist: 37,926,511? I initially thought 'Number of mapped reads' represented the overall mapping rate, but then it is way lower than the results from Bowtie2.

Hopefully, I have explained my questions clearly, and thank you very much for your help in advance.

bulk mapping summary using bowtie2
bowtie2 -x /hg38/ -1 $name\_R1_val_1.fq* -2 $name\_R2_val_2.fq* --local --very-sensitive-local --no-unal --no-mixed --no-discordant --phred33 -I 10 -X 700 -p 5 -q

76500578 reads; of these:
76500578 (100.00%) were paired; of these:
10140036 (13.25%) aligned concordantly 0 times
43298939 (56.60%) aligned concordantly exactly 1 time
23061603 (30.15%) aligned concordantly >1 times
86.75% overall alignment rate

@mourisl
Copy link
Collaborator

mourisl commented Nov 30, 2023

The read with invalid barcode will not be mapped, so the mapped read count won't include them.
The number 37926511 is with respect to the read fragment (mate pair together), and 153220788 is the read ends (2 times read fragments). Still, the number of barcodes found in the whitelist is too few, causing the overall low alignment rate. You can run Chromap without whitelist and check the alignment rate, which may confirm that the barcode match step is the culprit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants