-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fvecVCF not creating fvec windows for the whole chromosome #60
Comments
Hi @rileycorcoran -- my initial guess is that you are running this using a window size that is so small that there are too few SNPs to calculate stats in most windows. What happens if you increase the window size by 5x? |
Thank you @andrewkern for the quick reply and the help! Increasing to
Thank you again! |
550kb doesn't sound too large to me, but this mostly depends on the recombination rate in your organism. For mosquitoes this was the approximate size we've used in the past. For humans even bigger.
for filtering, i'd recommend not using a MAF filter if sequencing depth is adequate (say 10-20x?). that will retain low frequency variants which will be informative. As for other QC filters it's a bit hard to say without more detail |
Thank you for the clarification! It's good to know that something around the size of 550kb has been used in other systems. I don't know the recombination rate for my system (non-model), but it seems like 550kb is a fairly versatile size. After testing out some sizes, it seems like my HPC doesn't have enough memory to go a lot larger on My sequencing depth was ~10x, so it's good to know that I don't need to use a MAF filter. The rest of my QC filters were relatively unrestrictive, so I'm hoping that allowed me to retain enough SNPs for good analyses. Thank you again for the help! |
Hello, I've been trying to get diploS/HIC working with my own data for a while, and while I've fixed many small errors I can't figure out what could be going wrong here. I'm running fvecVCF with a whole genome vcf file and specifying a single chromosome for analysis by name and length. It does run successfully (I think) as the error file seems to analyze the entire chromosome in 5,000bp windows, but the .fvec file (and then also the corresponding .preds from predict) produced only contains seven 5,000bp windows.
I'm running this on a HPC cluster and using this NCBI reference genome (with the edit of removing excess header information to replicate the reference in the Anopheles example)
Current fvecVCF code (sorry for the variables, I decided not to replace them since their names didn't seem useful for debugging):
example from the .err file:
The entire corresponding .preds file:
Am I trying to run this with too large of a input (i.e. whole chromosome rather than 1mil bp segment)? Am I running out of memory? Is my reference not providing enough unmasked SNPs? Is this caused by an error earlier in the pipeline (i.e. fvecSim or training)?
I can provide any other code or data if necessary, and any thoughts or help on this would be greatly appreciated. Thanks!
The text was updated successfully, but these errors were encountered: