Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

error in SomaticCombineChannel #887

Closed
anoronh4 opened this issue Mar 8, 2021 · 7 comments · Fixed by #983
Closed

error in SomaticCombineChannel #887

anoronh4 opened this issue Mar 8, 2021 · 7 comments · Fixed by #983
Assignees
Labels
backburner probably won't address in a near future bug Something isn't working somatic analysis related to post-alignment analysis of the tumor sample
Milestone

Comments

@anoronh4
Copy link
Collaborator

anoronh4 commented Mar 8, 2021

For the sample s_C_006353_M001_d__s_C_006353_N002_d we got the following error:

Traceback (most recent call last):
  File "/usr/bin/filter-vcf.py", line 82, in <module>
    tier2_other = sum([alleles[a][1] for a in valid_alleles if a not in [alt, ref]])
IndexError: tuple index out of range
@gongyixiao
Copy link
Collaborator

Maybe this is related. I will check https://github.com/mskcc/tempo/pull/768/files

@gongyixiao
Copy link
Collaborator

gongyixiao commented Mar 24, 2021

This particular variant cause this issue. More investigation is needed.

5	149210402	.	C	T	.	LowEVS	SOMATIC;QSS=12;TQSS=2;NT=ref;QSS_NT=12;TQSS_NT=2;SGT=CC->CT;DP=877;MQ=60;MQ0=0;ReadPosRankSum=-1.39;SNVSB=0;SomaticEVS=2.98;EVSF=12,1,0.01676,60,0,0,-1.3884,-2.9235,0.015625,0.016484,6,22,0,0;Strelka2;non_cancer_AC_nfe_onf=0;non_cancer_AF_nfe_onf=0;non_cancer_AC_nfe_seu=1;non_cancer_AF_nfe_seu=9.42329e-05;non_cancer_AC_eas=0;non_cancer_AF_eas=0;non_cancer_AC_asj=0;non_cancer_AF_asj=0;non_cancer_AC_eas_jpn=0;non_cancer_AF_eas_jpn=0;non_cancer_AC_afr=0;non_cancer_AF_afr=0;non_cancer_AC_amr=0;non_cancer_AF_amr=0;non_cancer_AC_oth=0;non_cancer_AF_oth=0;non_cancer_AC_nfe_nwe=1;non_cancer_AF_nfe_nwe=2.53203e-05;non_cancer_AC_nfe_bgr=1;non_cancer_AF_nfe_bgr=0.000395883;non_cancer_AC_nfe_est=0;non_cancer_AF_nfe_est=0;non_cancer_AC_nfe=4;non_cancer_AF_nfe=3.89347e-05;non_cancer_AC_nfe_swe=1;non_cancer_AF_nfe_swe=3.95413e-05;non_cancer_AC=7;non_cancer_AF=2.95451e-05;non_cancer_AC_fin=1;non_cancer_AF_fin=4.62278e-05;non_cancer_AC_eas_oea=0;non_cancer_AF_eas_oea=0;non_cancer_AC_raw=8;non_cancer_AF_raw=3.37615e-05;non_cancer_AC_sas=2;non_cancer_AF_sas=6.55222e-05;non_cancer_AC_eas_kor=0;non_cancer_AF_eas_kor=0;non_cancer_AC_popmax=2;non_cancer_AF_popmax=6.55222e-05;FLANKSEQ=GACCGCCTGG[C]GCCAGGCAGG	DP:FDP:SDP:SUBDP:AU:CU:GU:TU	512:8:0:0:0,0:504,512:0,0:0,0	364:6:0:0:0:.:.:.```

@anoronh4
Copy link
Collaborator Author

anoronh4 commented Mar 24, 2021

364:6:0:0:0:.:.:. 

the script is expecting that each of the last four fields of the tumor column (above, delimited by :) has a comma. maybe we should check if the strelka vcf from the preceding step is reproducible, but otherwise we should validate that all the alleles have a , and skip those that don't. this code block has one function, which is to assign a flag of multiallelic2, and if the information is insufficient to assess, we should just skip.

@anoronh4 anoronh4 added bug Something isn't working somatic analysis related to post-alignment analysis of the tumor sample labels Mar 25, 2021
@anoronh4
Copy link
Collaborator Author

anoronh4 commented Mar 29, 2021

actually a recent resume seems to have mysteriously fixed this issue:

$ grep s_C_006353_M001_d__s_C_006353_N002_d trace.txt | grep SomaticCombineChannel
332939	59/cc8dd0	72903605	SomaticCombineChannel	s_C_006353_M001_d__s_C_006353_N002_d	SomaticCombineChannel (s_C_006353_M001_d__s_C_006353_N002_d)	FAILED	130	-	/juno/work/taylorlab/cmopipeline/singularity_images/cmopipeline-bcftools-vt-1.2.2.img	1	3h	-	2 GB	1	2021-03-27 19:37:26.767	2021-03-27 23:15:51.135	2021-03-27 23:15:51.137	3h 38m 24s2ms	-	-	-	-	-	-	-	-	-	-	-	-	-
339820	24/607cef	72917064	SomaticCombineChannel	s_C_006353_M001_d__s_C_006353_N002_d	SomaticCombineChannel (s_C_006353_M001_d__s_C_006353_N002_d)	COMPLETED	0	-	/juno/work/taylorlab/cmopipeline/singularity_images/cmopipeline-bcftools-vt-1.2.2.img	1	6h	-	4 GB	2	2021-03-27 23:15:55.877	2021-03-28 00:18:17.549	2021-03-28 00:18:17.551	1h 2m 22s	10m 26s	-	61.6%	1.2%	3.1 GB	3.2 GB	3.3 GB	3.4 GB	54 GB	100.1 MB	90371	85860	13.6 MB	58.9 MB

for whatever reason the bam file for s_C_006353_N002_d has an updated timestamp and for that reason maybe all the somatic analysis for this sample was repeated upon -resume. the same variant was called by strelka, but it didn't have the irregularities in the AU:CU:GU:TU fields. i'm still curious about the error, maybe we can send to strelka dev team?

@anoronh4 anoronh4 added the backburner probably won't address in a near future label Mar 29, 2021
@anoronh4
Copy link
Collaborator Author

Occurred again with s_C_001841_P002_d__s_C_001841_N003_d.

@anoronh4 anoronh4 changed the title error in SomaticCombineChannel error in SomaticCombineChannel Sep 14, 2021
@gongyixiao gongyixiao added this to the 2.0 milestone Mar 16, 2023
@anoronh4
Copy link
Collaborator Author

anoronh4 commented Mar 17, 2023

Occurred again with s_C_LPN576_T001_d02__s_C_LPN576_N001_d01. I think i found the offending line:

zcat work/31/356d10c5d0b243c8a4190e068909c7/s_C_LPN576_T001_d02__s_C_LPN576_N001_d01.strelka2.vcf.gz | grep ":\.:\.:\." | cut -f 9-
DP:FDP:SDP:SUBDP:AU:CU:GU:TU	56:0:0:0:0,0:0,0:56,57:0,0	121:0:0:0:0:.:.:.

i have moved this Strelka2 work dir to separate location so i can try rerunning and see if the issue resolves that way.

@anoronh4
Copy link
Collaborator Author

anoronh4 commented Mar 21, 2023

Forcing a re-do of Strelka worked! The same variant is now annotated differently:

DP:FDP:SDP:SUBDP:AU:CU:GU:TU	56:0:0:0:0,0:0,0:56,57:0,0	121:0:0:0:0,0:0,0:118,120:3,3

This suggests that we can fix it by triggering an error whenever Strelka does this, causing nextflow to retry the process.

Interestingly i also found several other changes, like a number of variants dropped. the retry had 13 less variants. The re-do only had one variant that the original did not have.

I submitted an issue to the Strelka github

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backburner probably won't address in a near future bug Something isn't working somatic analysis related to post-alignment analysis of the tumor sample
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants