Skip to content
This repository has been archived by the owner on Nov 6, 2020. It is now read-only.

Add GRIDSS support #68

Open
d-cameron opened this issue Jun 12, 2019 · 1 comment
Open

Add GRIDSS support #68

d-cameron opened this issue Jun 12, 2019 · 1 comment

Comments

@d-cameron
Copy link

I'd like to add GRIDSS support and am happy to provide a PR. Actually getting GRIDSS to run looks relatively straight-forward, but there are some downstream steps that I'm not sure how to handle.

It looks like I'd need to make the following changes:

  • Update bioconda to latest gridss
  • Update Dockerfile to add gridss/samtools/bwa
  • Update parliament2 driver scripts
  • Add gridssmerge.py

Outstanding issues:

  • GRIDSS makes a callout to bwa to do assembly realignment. I presume I'll need to add a bwa index preprocessing step, correct?
  • There is any documentation on the merging script output format? There are a few issues here:
    • The default full GRIDSS call set is as spammy as Pindel. Should I filter just to high confidence variants as a preprocessing set?
    • GRIDSS reports all variants in BND notation. I'm not sure exactly what to do to translate the GRIDSS VCF into the combined format. Is there any documentation on the merged format
      • AFAIK, GRIDSS writes fully spec-compliant VCF. That doesn't look like it'll be much of an advantage as the Manta2merge.py replaced the spec-defined IMPRECISE field with your own PREC one, and if the header is anything to go by, the merged file isn't even a VCF. Is this file different to the survivor merged one, and it'll all be ok?
    • GRIDSS outputs a number of useful files of various sizes. They range from kb size metrics files, a ~5GB assembly contig BAM, and ~25% of input file size BAM containing all non-reference (SR/SC/OEA/DP) reads genome-wide. Just keep all the outputs, or filter the large files down to the regions reported in the merged output?
    • GRIDSS reports single breakends. These don't have a partner and can't be converted to bedpe. How do I include these in the combined output?

I'm pretty sure I've done basic misinterpretation of how parliament works. Any pointers would be much appreciated.

@AndrewCarroll
Copy link
Contributor

Hi Daniel,

I have been curious about adding GRIDSS to Parliament2 as well, but am too constrained to easily make it happen. Here are some thoughts:

  1. You list of changes looks roughly correct.

  2. You will need to add a BWA index for assembly realignment. Given the time that generating a BWA index requires to generate, I would probably do either: a) Initiate the index at the start of the job and then check back later to initiate GRIDSS in the per-chromosome steps when the index is complete. This approach would be good if GRIDSS efficiently multi-threads or if it otherwise can't be correctly run in a per-chromosome manner. b) Include pre-built indices for a few common 37/38 flavors and only re-index if required. c) Just take the performance hit. Note though that we have heavily optimized Parliament2 for speed as one of the major goals is scalability, so if BWA index takes a while it will hurt.

  3. For metrics, I would tar them into a GRIDSS-metrics.tar.gz file and add that as an output to Parliament2. This gives access to the files, but doesn't further inflate the file number.

  4. The SV event uses SURVIVOR to merge. The commands for this should be in the script itself. If you write a valid VCF, SURVIVOR should generally work with it. Let me know if you need more clarity. I would reduce to high confidence events for this.

  5. I am not 100% sure on the best way to incorporate single breakends. I am going to think about this some more.

  6. One important thing not present here is that Parliament2 uses caller overlap to assign confidence based on GIAB truth sets. This is very powerful for filtering events based on which callers make them. We'd need to make a new calibration table with GRIDSS. I think only I have done that so far, but it would be a good thing to democratize that process. This could be a good opportunity for that.

Thanks!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants