Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Primers contributing to Clean.Tot.cons #100

Open
martinghunt opened this issue Oct 11, 2022 · 7 comments
Open

Primers contributing to Clean.Tot.cons #100

martinghunt opened this issue Oct 11, 2022 · 7 comments

Comments

@martinghunt
Copy link
Member

We have examples where there is an amplicon with two right-hand primers. Neither primer is excluded at the primer/amplicon identification stage, and both are provided to cylon.

Observed behaviour: in all_stats.tsv, at the "inner" (or left most, whatever you want to call it) right-hand primer position, the reads from the "inner primer" are contributing to Clean.Tot.cons.

Expected behaviour: those reads do not contribute to Clean.Tot.cons.

An example is #99 , in amplicon SARS-CoV-2_76.

This may be confounded by the adjacent amplicon 77 being dropped, I don't know. The behaviour below may also be happening when the adjacent amplicon is not dropped - this needs checking.

For the sample in #99 this is what happens:

  1. no primers for amplicon SARS-CoV-2_76 are excluded (a separate issue to this one)
  2. all the reads arise from the leftmost primer SARS-CoV-2_76_RIGHT (coords 23029-23057).
  3. therefore the only read coverage at 23029-23057 is from reads ending with primer SARS-CoV-2_76_RIGHT . The primer is not (as far as the code knows) at the end of the amplicon - the decision earlier was to also keep SARS-CoV-2_76_RIGHT_alt1 (coords 23121-23141).
  4. all_stats.tsv reports non-zero Clean.Tot.cons values at 23029-23057 - they are fairly close to the values in RawDepth.
@jeff-k
Copy link
Contributor

jeff-k commented Oct 11, 2022

The primer id stage before cylon is different than what is used for masking the consensus. When an adjacent amplicon is dropped the overhanging primer bases will be considered, so we would expect those reads to contribute to Clean.Tot.cons.

Are there examples where this happens when the adjacent amplicon is not dropped?

@martinghunt
Copy link
Member Author

This issue has nothing to do with cylon. It's about when primer parts of reads are used to contirbute to good or bad coverage for masking, and in the stats.

We have sequence chunks like this in the amplicon from left to right:

L = left chunk (I don't care about left primers here)
P1 = primer1.
Q = sequence between primer1 and primer2
P2 = primer2.

and the subsequent amplicon is dropped.

I would expect that the P1 part of the reads that were made during PCR from P1 do not count towards the good depth, since there should be reads that arose from P2 during PCR. Those reads from P2 should also include the P1 sequence and can be relied upon for their P1 sequence.

@jeff-k
Copy link
Contributor

jeff-k commented Oct 11, 2022

I'm sorry I don't quite understand. How does this differ from what we're doing right now?

@martinghunt
Copy link
Member Author

Current behaviour:
Reads originating from primer P1 contribute to good coverage in the P1 region of the consensus.
Reads originating from primer P2 contribute to good coverage in the P1 region of the consensus consensus.

Expected behaviour:
Reads originating from primer P1 do not contribute to coverage in the P1 region of the consensus.
Reads originating from primer P2 contribute to good coverage in the P1 region of the consensus consensus.

@jeff-k
Copy link
Contributor

jeff-k commented Oct 11, 2022

P1 and P2 are alts on the same side of an amplicon, right? Do you have an example?

@iqbal-lab
Copy link
Contributor

this simulation is the example @jeff-k

@jeff-k
Copy link
Contributor

jeff-k commented Oct 11, 2022

In #99's example we are adjacent to a missing amplicon. This means that primer bases will be considered no matter what. I'm not sure how we can determine that the current behaviour is deviating from the expected behaviour.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants