-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add XY filtration workflow #191
base: main
Are you sure you want to change the base?
Conversation
…122-GIABGermlineVariant
@@ -43,6 +48,9 @@ params { | |||
bundle_omni_1000g_2p5_vcf_gz = "/hot/resource/tool-specific-input/GATK/GRCh38/1000G_omni2.5.hg38.vcf.gz" | |||
bundle_phase1_1000g_snps_high_conf_vcf_gz = "/hot/resource/tool-specific-input/GATK/GRCh38/1000G_phase1.snps.high_confidence.hg38.vcf.gz" | |||
|
|||
// Specify BED file path for Pseudoautosomal Region (PAR) | |||
par_bed = "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this be a standardized reference in /hot/resource/ ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll defer this to @yashpatel6 as I do not have permission to create a dir in /hot/resource/
Here's the GRCh38 version of PAR BED. You can remove the commented lines from this file when you make a copy in /hot/resource/
- /hot/project/method/AlgorithmEvaluation/BNCH-000122-GIABSexChrGermlineFilter/GIAB/AshkenazimTrio/germline-small-variant/filter_XY/pseudoautosomal_regions_hg38.bed
|
||
#Filter XY calls | ||
##Extract XY calls | ||
X_contig = vcf_matrix.locus.contig.startswith('chrX') | vcf_matrix.locus.contig.startswith('X') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we would ever encounter this in DNA-metapipeline, but just FYI I have seen X/Y encoded as chr23 and chr24 in some genetic data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I'm aware. However to keep consistent with variant calls in the DNA-metapipeline, chrX and chrY notation should be fine in pipeline outputs.
@Faizal-Eeman Would it be possible to add a line to the outputed VCF header documenting the XY filtration, similar to how bcftools appends every operation to the header? It would be a good way to maintain a record of what has been done to the file. |
I skimmed through the output file and everything looks as expected: PARs remain diploid in both X and Y, non-PARs are haploid in X and Y. Missing genotypes are always in diploid notation: |
@alkaZeltser for now I'm appending the script command like GATK does to VCF header
As to the steps in the workflow, I've added a document to |
The same sample was treated as an
By the time this test finished, I updated the python script to name the output file based on the |
Description
ADD XY filter workflow
Closes #190
Testing Results
N-T paired WGS (
sample_sex = XY
)N-T paired WGS (
sample_sex = XX
)Checklist
I have read the code review guidelines and the code review best practice on GitHub check-list.
I have reviewed the Nextflow pipeline standards.
The name of the branch is meaningful and well formatted following the standards, using [AD_username (or 5 letters of AD if AD is too long)]-[brief_description_of_branch].
I have set up or verified the branch protection rule following the github standards before opening this pull request.
I have added my name to the contributors listings in the
manifest
block in thenextflow.config
as part of this pull request, am listedalready, or do not wish to be listed. (This acknowledgement is optional.)
I have added the changes included in this pull request to the
CHANGELOG.md
under the next release version or unreleased, and updated the date.I have updated the version number in the
metadata.yaml
andmanifest
block of thenextflow.config
file following semver, or the version number has already been updated. (Leave it unchecked if you are unsure about new version number and discuss it with the infrastructure team in this PR.)I have tested the pipeline on at least one A-mini sample.