-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Trying to run Unfazed #1
Comments
That is confusing. I don't see any mistakes in your command and I don't think you'd see that particular message if bad inputs had caused an issue (although I could be wrong and will keep thinking about it). The most likely explanation is that the particular informative site you mentioned didn't pass quality control filters built into unfazed. The main QC requirements (for a heterozygous variant) are:
For homozygous calls the same filters apply except the read allele balance must be between 0.0-0.2 or 0.8-1.0 (depending on whether it's a hom-alt or a hom-ref. Please let me know if this is helpful. I'd be happy to dig more into it with you, especially if the issue is a bug in my code I can fix. |
What variant caller was used to create the VCF? Does it have AB, DP and GQ attributes in the genotype columns? |
Thanks for your reply I am using FreeBayes (default run) to call the variants. I ran a test and it worked for some of my variants (including the one in the above example). So thanks for your help For one of the variants (below), I can see that both the DNM site and the informative site can pass these filters, but still not phased. Are there any other criteria used?
By the way, is there an option to play with the quality control filters set by the tool? |
A possible reason a variant like this one might not be phased could be if there are multiple informative sites in the region that give contradictory results, possibly because a recombination event or sequencing error occurred. If you use the Quality control filters are not currently exposed to the user, although this may change shortly. I'm also considering ways to improve the information given to the user about why certain variants aren't phased, although I want to avoid overwhelmingly large outputs as well. |
Great! looking for the new versions then Thank you |
Hi, Thanks for making this tool. I tried it out yesterday on some data and the output for all sites is:
We know the parent-of-origin for many of the variants based on assessing the read data, so would expect this program to find at least some of them. Based on your preprint, it looks like the number of variants where parent-of-origin can be determined is higher than in previous papers/methods so that is why we hope to use it. The command is as follows: The input of the bed file is as follows (only the top line shown for space reasons): The input of the childname.trio.vcf is a full genome-wide vcf from DeepVariant (the FORMAT field is as follows: GT:DP:AD:GQ:PL:RNC). I have also tried a VCF from GATK and also tried adding the AB in to one of these VCFs. None of these changes worked. I have also tried only giving it the child cram and not the parents … still the same result. The ped format is standard plink format Would like to run this tool and any insight is appreciated. Thanks, Tychele |
I should note that I have also tried the bed file as follows:
|
Hmm, that's disappointing. It's possible there's an instability I'm not aware of. Could you try running with |
Okay, added
|
Also, just added |
Yes, there is a bug in the code at I think the bug I fixed is probably not the same one you ran into before |
Okay, thanks |
I have created a new release v1.0.1 which should fix the known bug |
Okay, I tried installing with conda but it looks like the new version is not there yet? Also, I took some time to look through the test files in the GitHub and decided to change my bed from saying SNV or INDEL to saying POINT. Then I ran it on 10 sites and it ran on those sites (with parent-of-origin) for four of them! I will do some more testing but it looks like POINT is the key and not to use SNV or INDEL for the bed. Also, this worked on the VCF did not contain AB if that is helpful. Thanks, Tychele |
Ah, a great catch. I'll work on improving the generality of that term as well, and clarifying the necessary VCF fields (allele balance is calculated from AD in Unfazed). I really appreciate your patience and persistence in working through these new-software issues. |
Hi, The program is generally working well now (currently using v1.0.1). However, for a subset of the trios in my dataset (n = 48), I keep getting this error:
When I remove the X chromosome for those trios it runs fine. The children are all males in these particular trios so am assuming the de novo would be primarily on the X chromosome from the mother (a possible exception being the pseudoautosomal regions). However, the whole program crashes because of the KeyError and I don't get the results for the rest of the genome without removing the X chromosome and rerunning. Also, I looked over the bed inputs and they look fine. Tychele |
For male samples Unfazed is intended to only phase the PAR for chrX and automatically assign the rest of the chrX variants to maternal origin. Clearly that's not working and I'll figure it out. Thanks for pointing this out! |
Thank you! The tool is working quite well. I compared it with some code we have for phasing using the nearest variant (Whatshap phased vcf plus some custom scripts). Unfazed found more variants and when both approaches looked at the same site there was 100% concordance on the parent-of-origin (awesome!). Also, wanted to mention one other error that I have seen so far and that was this one:
When I switched from the DeepVariant to GATK vcf for this trio everything went great so am not sure if it is a DeepVariant VCF issue (the VCF is valid) or something else. |
Very helpful error messages. I'll push a new version with fixes asap |
Hi,
I am new to programming and I am trying to run Unfazed to phase my DNMs, but could't make it
Here is my code:
unfazed -d My_DNM_File.vcf -s Main_VCF_File_With_All_Variants.vcf.gz -p pedigree.ped -b bamFiles/ -t 2 -o vcf --outfile test.vcf -r ref/hs37d5.fa
My_DNM_File.vcf
contains the vcf file header and only the DNM variant I am testing (4 44986884)Main_VCF_File_With_All_Variants.vcf.gz
is a multi-sample sorted/bgzipped/indexed vcf file with all variants of the family (i.e. Not only DNMs)pedigree.ped
is a tab delimited file that shows the relationship of the trio I am testing (Format: Family_ID Sample_ID Father Mother Sex Phenotype)bamFiles/
the directory that contains the three bam files of the trio (named as recommended by the tool: sample_ID.bam)Output:
No usable informative sites for variant 4:44986883-44986884
However, I am using this DNM for the test because I am sure it can be phased as there is an informative site in the same read near to it:
CHROM POS Child Father Mother
4 44986884 0/1 0/0 0/0 <-- My DNM site
4 44986893 0/1 0/0 1/1 <-- Informative site
Appreciate your help
The text was updated successfully, but these errors were encountered: