Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in GATK-SV joint-calling terra pipeline, 07-FilterBatchSites step #774

Open
sdwang008 opened this issue Feb 5, 2025 · 5 comments
Open

Comments

@sdwang008
Copy link

Hello, I am currently trying to do a pilot run using the GATK-SV pipeline in cohort mode on terra. I've posted this issue on terra support forum and cross-posting here. I'm advancing through the pipeline using pre-configured settings and inputs, but encountering an error at step 07-FilterBatchSites.

The error message is:

Adjudicating BAF (1)...
Traceback (most recent call last):
  File "/opt/conda/envs/gatk-sv/bin/svtk", line 7, in <module>
    exec(compile(f.read(), __file__, 'exec'))
  File "/opt/svtk/scripts/svtk", line 65, in <module>
    main()
  File "/opt/svtk/scripts/svtk", line 62, in main
    getattr(cli, command)(sys.argv[2:])
  File "/opt/svtk/svtk/cli/adjudicate.py", line 33, in main
    scores, cutoffs = adjudicate_SV(metrics)
  File "/opt/svtk/svtk/adjudicate/adjudicate_sv.py", line 342, in adjudicate_SV
    cutoffs[0] = adjudicate_BAF1(metrics)
  File "/opt/svtk/svtk/adjudicate/adjudicate_sv.py", line 67, in adjudicate_BAF1
    cutoffs = adjudicate_BAF(
  File "/opt/svtk/svtk/adjudicate/adjudicate_sv.py", line 34, in adjudicate_BAF
    del_cutoffs = rf_classify(metrics, trainable, testable, features,
  File "/opt/svtk/svtk/adjudicate/random_forest.py", line 19, in rf_classify
    rf = RandomForest(trainable, testable, features, cutoffs, labeler, name,
  File "/opt/svtk/svtk/adjudicate/random_forest.py", line 44, in __init__
    raise Exception('No clean variants found')
Exception: No clean variants found

Digging a little deeper, this is caused by the batch metrics file generated in the previous steps missing these two columns: BAF_snp_ratio and BAF_del_loglik. I think there are other columns as well, but these two directly caused this error. I'm not sure if this is a bug or because I'm inputting something wrong. I'm still getting used to terra and understanding the pipeline, so I appreciate any help, thanks!

@mwalker174
Copy link
Collaborator

Hi @sdwang008 thanks for reporting this. Usually we see this error when there aren't enough samples in batch (we recommend at least 100), but missing BAF columns is a new error mode.

I've brought this to the attention of our GATK support specialist in your ticket on the Terra forum. Let's see if you can resolve it there first.

@sdwang008
Copy link
Author

Hi @mwalker174 , thanks for your prompt response! I am doing a pilot run of just 20 samples to test out the cohort-mode pipeline before running with all samples, so yes number of samples in each batch is very small. Would this be the cause? I can follow up in the terra forum post afterwards.
And to be clear, the two column headers are there, the columns are just empty.

@mwalker174
Copy link
Collaborator

Yes that's the most likely cause. We should probably make this more clear in the documentation.

@sdwang008
Copy link
Author

I see, I guess this is because the BAFtest doesn't have enough samples to generate the metrics? How should I proceed then—can I work around it and still run this small pilot batch, or should I give a larger batch a go?

@mwalker174
Copy link
Collaborator

Yes you'll need to run more samples.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants