Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update 3.12.20240221 #298

Merged
merged 10 commits into from
Feb 21, 2024
Merged

Update 3.12.20240221 #298

merged 10 commits into from
Feb 21, 2024

Conversation

erinyoung
Copy link
Member

  • adds bbnorm option (set params.bbnorm = true as false is the default)
  • updates freyja to 1.4.9-2024-02-20
  • updates nextclade to 3.2.1
  • adds fixed file for pango-collapse
  • updates pangolin to 4.3.1-pdata-1.25.1
  • sets ACI and IGV-reports to false by default

@erinyoung
Copy link
Member Author

When comparing with and without bbnorm:

==> cecret/samtools_coverage/3528365.cov.hist <==
MN908947.3 (29.9Kbp)
>  90.00% │▁███████████████████████████████████▇██████████▇█ │ Number of reads: 199978
>  80.00% │██████████████████████████████████████████████████│ 
>  70.00% │██████████████████████████████████████████████████│ Covered bases:   29.8Kbp
>  60.00% │██████████████████████████████████████████████████│ Percent covered: 99.59%
>  50.00% │██████████████████████████████████████████████████│ Mean coverage:   862x
>  40.00% │██████████████████████████████████████████████████│ Mean baseQ:      32.8
>  30.00% │██████████████████████████████████████████████████│ Mean mapQ:       60
>  20.00% │██████████████████████████████████████████████████│ 
>  10.00% │██████████████████████████████████████████████████│ Histo bin width: 598bp
>   0.00% │██████████████████████████████████████████████████│ Histo max bin:   100%
          1        6.0K     12.0K     17.9K     23.9K      29.9K  

==> cecret/samtools_coverage/3540826-UT-A01290-240207.cov.hist <==
MN908947.3 (29.9Kbp)
>  90.00% │▇███████████████████████████████████▇██████████▇█▁│ Number of reads: 71398
>  80.00% │██████████████████████████████████████████████████│ 
>  70.00% │██████████████████████████████████████████████████│ Covered bases:   29.8Kbp
>  60.00% │██████████████████████████████████████████████████│ Percent covered: 99.72%
>  50.00% │██████████████████████████████████████████████████│ Mean coverage:   325x
>  40.00% │██████████████████████████████████████████████████│ Mean baseQ:      36.1
>  30.00% │██████████████████████████████████████████████████│ Mean mapQ:       60
>  20.00% │██████████████████████████████████████████████████│ 
>  10.00% │██████████████████████████████████████████████████│ Histo bin width: 598bp
>   0.00% │██████████████████████████████████████████████████│ Histo max bin:   100%
          1        6.0K     12.0K     17.9K     23.9K      29.9K  

==> nonorm/samtools_coverage/3528365.cov.hist <==
MN908947.3 (29.9Kbp)
>  90.00% │▇███████████████████████████████████▇██████████▇█▁│ Number of reads: 565714
>  80.00% │██████████████████████████████████████████████████│ 
>  70.00% │██████████████████████████████████████████████████│ Covered bases:   29.8Kbp
>  60.00% │██████████████████████████████████████████████████│ Percent covered: 99.73%
>  50.00% │██████████████████████████████████████████████████│ Mean coverage:   2.44e+03x
>  40.00% │██████████████████████████████████████████████████│ Mean baseQ:      32.7
>  30.00% │██████████████████████████████████████████████████│ Mean mapQ:       60
>  20.00% │██████████████████████████████████████████████████│ 
>  10.00% │██████████████████████████████████████████████████│ Histo bin width: 598bp
>   0.00% │██████████████████████████████████████████████████│ Histo max bin:   100%
          1        6.0K     12.0K     17.9K     23.9K      29.9K  

==> nonorm/samtools_coverage/3540826-UT-A01290-240207.cov.hist <==
MN908947.3 (29.9Kbp)
>  90.00% │████████████████████████████████████▇████████████▁│ Number of reads: 22334310
>  80.00% │██████████████████████████████████████████████████│ 
>  70.00% │██████████████████████████████████████████████████│ Covered bases:   29.8Kbp
>  60.00% │██████████████████████████████████████████████████│ Percent covered: 99.79%
>  50.00% │██████████████████████████████████████████████████│ Mean coverage:   1.02e+05x
>  40.00% │██████████████████████████████████████████████████│ Mean baseQ:      36.2
>  30.00% │██████████████████████████████████████████████████│ Mean mapQ:       60
>  20.00% │██████████████████████████████████████████████████│ 
>  10.00% │██████████████████████████████████████████████████│ Histo bin width: 598bp
>   0.00% │██████████████████████████████████████████████████│ Histo max bin:   100%
          1        6.0K     12.0K     17.9K     23.9K      29.9K  
          ```

@erinyoung
Copy link
Member Author

And the final summary file

==> cecret/cecret_results.txt <==
sample_id	sample	pangolin_lineage	nextclade_clade	vadr_p/f	fasta_line	fastqc_raw_reads_1	fastqc_raw_reads_2	num_N	num_total	seqyclean_PairsKept	seqyclean_Perc_Kept	num_pos_100X	insert_size_after_trimming	bcftools_variants_identified	samtools_meandepth_after_trimming	samtools_per_1X_coverage_after_trimming	vadr_model	vadr_alerts	nextclade_clade_who	nextclade_qc_overallscore	nextclade_qc_overallstatus	pangolin_conflict	pangolin_ambiguity_score	pangolin_scorpio_call	pangolin_scorpio_support	pangolin_scorpio_conflict	pangolin_scorpio_notes	pangolin_version	pangolin_pangolin_version	pangolin_scorpio_version	pangolin_constellation_version	pangolin_is_designated	pangolin_qc_status	pangolin_qc_notes	pangolin_note	pangocollapse_lineage	pangocollapse_Lineage_full	pangocollapse_Lineage_expanded	pangocollapse_Lineage_family	freyja_summarized	Cecret version	seqyclean	bwa	ivar	ivar consensus
3528365	3528365	XCR	recombinant	PASS	3528365	325979.0	325979.0	718	29759	109728.0	97.3586	29040	171.0	122	861.881	99.5887	NC_045512	-	recombinant	3.396763	good	0.0		Omicron (XBB.1.5-like)	0.94	0.01	scorpio call: Alt alleles 82; Ref alleles 1; Amb alleles 1; Oth alleles 3	PUSHER-v1.25.1	4.3.1	0.3.19	v0.1.12	False	pass	Ambiguous content: 4%	Usher placements: XCR(1/1); scorpio lineage XBB.1.5 conflicts with inference lineage XCR	XCR	XCR	XCR	Recombinant	[('Other'  0.9999999999996719)]	v3.12.20240221	seqyclean : Version: 1.10.09 (2018-10-16)	bwa : Version: 0.7.17-r1188	ivar : iVar version 1.4.2	iVar version 1.4.2
3540826-UT-A01290-240207	3540826-UT-A01290-240207	JN.1.1	23I	PASS	3540826-UT-A01290-240207	12181621.0	12181621.0	111	29796	48071.0	87.1799	29685	187.6	132	325.301	99.7224	NC_045512	-	Omicron	0.0	good	0.0		Omicron (BA.2-like)	0.92	0.03	scorpio call: Alt alleles 57; Ref alleles 2; Amb alleles 0; Oth alleles 3	PUSHER-v1.25.1	4.3.1	0.3.19	v0.1.12	False	pass	Ambiguous content: 2%	Usher placements: JN.1.1(1/1)	JN.1.1	B.1.1.529.2.86.1.1.1	B.1.1.529:BA.2.86.1:JN.1.1	BA.2	[('BA.2.86* (BA.2.86X)'  0.999999999994306)]	v3.12.20240221	seqyclean : Version: 1.10.09 (2018-10-16)	bwa : Version: 0.7.17-r1188	ivar : iVar version 1.4.2	iVar version 1.4.2
bbnorm_test	bbnorm	JN.1.1	23I	PASS	bbnorm_test	55140.0	55140.0	111	29796	44283.0	87.4588	29685	188.2	132	304.238	99.7224	NC_045512	-	Omicron	0.0	good	0.0		Omicron (BA.2-like)	0.92	0.03	scorpio call: Alt alleles 57; Ref alleles 2; Amb alleles 0; Oth alleles 3	PUSHER-v1.25.1	4.3.1	0.3.19	v0.1.12	False	pass	Ambiguous content: 2%	Usher placements: JN.1.1(1/1)	JN.1.1	B.1.1.529.2.86.1.1.1	B.1.1.529:BA.2.86.1:JN.1.1	BA.2	[('BA.2.86* (BA.2.86X)'  0.9999999999906976)]	v3.12.20240221	seqyclean : Version: 1.10.09 (2018-10-16)	bwa : Version: 0.7.17-r1188	ivar : iVar version 1.4.2	iVar version 1.4.2

==> nonorm/cecret_results.txt <==
sample_id	sample	pangolin_lineage	nextclade_clade	vadr_p/f	fasta_line	fastqc_raw_reads_1	fastqc_raw_reads_2	num_N	num_total	seqyclean_PairsKept	seqyclean_Perc_Kept	num_pos_100X	insert_size_after_trimming	bcftools_variants_identified	samtools_meandepth_after_trimming	samtools_per_1X_coverage_after_trimming	vadr_model	vadr_alerts	nextclade_clade_who	nextclade_qc_overallscore	nextclade_qc_overallstatus	pangolin_conflict	pangolin_ambiguity_score	pangolin_scorpio_call	pangolin_scorpio_support	pangolin_scorpio_conflict	pangolin_scorpio_notes	pangolin_version	pangolin_pangolin_version	pangolin_scorpio_version	pangolin_constellation_version	pangolin_is_designated	pangolin_qc_status	pangolin_qc_notes	pangolin_note	pangocollapse_lineage	pangocollapse_Lineage_full	pangocollapse_Lineage_expanded	pangocollapse_Lineage_family	freyja_summarized	Cecret version	seqyclean	bwa	ivar	ivar consensus
3528365	3528365	XCR	recombinant	PASS	3528365	325979.0	325979.0	758	29801	316176.0	96.9928	29043	170.8	125	2442.1	99.7325	NC_045512	-	recombinant	3.877421	good	0.0		Omicron (XBB.1.5-like)	0.94	0.01	scorpio call: Alt alleles 82; Ref alleles 1; Amb alleles 1; Oth alleles 3	PUSHER-v1.25.1	4.3.1	0.3.19	v0.1.12	False	pass	Ambiguous content: 4%	Usher placements: XCR(1/1); scorpio lineage XBB.1.5 conflicts with inference lineage XCR	XCR	XCR	XCR	Recombinant	[('Other'  0.9999999999965108)]	v3.12.20240221	seqyclean : Version: 1.10.09 (2018-10-16)	bwa : Version: 0.7.17-r1188	ivar : iVar version 1.4.2	iVar version 1.4.2
3540826-UT-A01290-240207	3540826-UT-A01290-240207	JN.1.1	23I	PASS	3540826-UT-A01290-240207	12181621.0	12181621.0	14	29805	11326208.0	92.9778	29815	186.5	137	101836.0	99.7927	NC_045512	-	Omicron	0.0	good	0.0		Omicron (BA.2-like)	0.92	0.03	scorpio call: Alt alleles 57; Ref alleles 2; Amb alleles 0; Oth alleles 3PUSHER-v1.25.1	4.3.1	0.3.19	v0.1.12	False	pass	Ambiguous content: 2%	Usher placements: JN.1.1(1/1)	JN.1.1	B.1.1.529.2.86.1.1.1	B.1.1.529:BA.2.86.1:JN.1.1	BA.2	[('BA.2.86* (BA.2.86X)'  0.9977042473353005)]	v3.12.20240221	seqyclean : Version: 1.10.09 (2018-10-16)	bwa : Version: 0.7.17-r1188	ivar : iVar version 1.4.2	iVar version 1.4.2
bbnorm_test	bbnorm	JN.1.1	23I	PASS	bbnorm_test	55140.0	55140.0	111	29796	48071.0	87.1799	29685	187.6	132	325.301	99.7224	NC_045512	-	Omicron	0.0	good	0.0		Omicron (BA.2-like)	0.92	0.03	scorpio call: Alt alleles 57; Ref alleles 2; Amb alleles 0; Oth alleles 3	PUSHER-v1.25.1	4.3.1	0.3.19	v0.1.12	False	pass	Ambiguous content: 2%	Usher placements: JN.1.1(1/1)	JN.1.1	B.1.1.529.2.86.1.1.1	B.1.1.529:BA.2.86.1:JN.1.1	BA.2	[('BA.2.86* (BA.2.86X)'  0.999999999994306)]	v3.12.20240221	seqyclean : Version: 1.10.09 (2018-10-16)	bwa : Version: 0.7.17-r1188	ivar : iVar version 1.4.2	iVar version 1.4.2

@erinyoung
Copy link
Member Author

Notably, normalization should not be used on wastewater or mixed samples.

In general, bbnorm appears to slightly increase the number of "N"s in the sequence (14 -> 111 for 3540826), which reduces the number of variants observed (137 -> 135 for 3540826). It does not seem to impact Freyja or Pangolin overall results, but there may be key variants that end up missing.

It DOES speed up runtime. By... a lot for samples with a lot of reads.

These three samples without normalization : 1 h 44 m 5 s
These three samples with normalization : 21 m 7 s

This was referenced Feb 21, 2024
@erinyoung erinyoung merged commit dc59fbe into master Feb 21, 2024
18 checks passed
@erinyoung erinyoung deleted the update-20240221 branch February 27, 2024 17:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant