-
Notifications
You must be signed in to change notification settings - Fork 223
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Filtering: vcf_filter.py: added possibility to include a path to custom filters python file, not only a local script file #7
Closed
gitanoqevaporelmundoentero
wants to merge
442
commits into
jdoughertyii:master
from
gitanoqevaporelmundoentero:filtering
Closed
Filtering: vcf_filter.py: added possibility to include a path to custom filters python file, not only a local script file #7
gitanoqevaporelmundoentero
wants to merge
442
commits into
jdoughertyii:master
from
gitanoqevaporelmundoentero:filtering
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Don't choke on missing GT
Make metadata RE reluctant (stop on first = not last)
Use requirements files to consolidate dependencies.
These coordinates should represent the zero-based, half-open region of the reference sequence affected by all the events included in ALT. These coordinates allow the user to identify precisely which bases are altered by the events in the record. Provides more thorough documentation on the coordinate schemes for _Record.POS, .start, and .end.
Adds _Record.affected_start and .affected_end.
making alternate allele frequency work in the case of non-diploid genotypes
As reported in #164, we previously crashed on flag INFO fields declared as strings (and the number of values declared as 1). This is indeed not according to spec, but we should probably allow it anyway.
It is not valid according to the spec, but issue #164 shows a VCF file where the FORMAT column contains just a dot character. We have no way of interpreting the subsequent genotype columns in that case, so this patch ignores them.
Allow flag INFO field to be declared as string
Don't crash when FORMAT is set to the missing value (.)
The spec actually does not allow for metadata lines without value, but we shouldn't crash on them. Fixes #168
Before we figure out what causes this, let's have a working test suite by fixing pysam on the latest working release. Traceback: Traceback (most recent call last): File "/home/travis/build/jamescasbon/PyVCF/build/lib.linux-x86_64-3.3/vcf/test/test_vcf.py", line 1109, in testNoVariantsInRange fetched_variants = self.reader.fetch('20', 14370, 17329) File "/home/travis/build/jamescasbon/PyVCF/build/lib.linux-x86_64-3.3/vcf/parser.py", line 623, in fetch self.reader = self._tabix.fetch(chrom, start, end) File "ctabix.pyx", line 345, in pysam.ctabix.Tabixfile.fetch (pysam/ctabix.c:4241) TypeError: expected bytes, str found See #175
- Add R as an INFO field count (number of alleles including reference). - Support the optional Source and Version keys on INFO metainformation. Thanks alot @travc for contributing these fixes! See #172
Partial support for VCFv4.2
The VCF 4.0 and newer specifications say the ALT field is a comma separated list that includes "base Strings made up of the bases A,C,G,T,N". Notably, the last case was not handled by `Record.is_snp`, causing it to erroneously report `False` for records with "N" as the ALT.
Bugfix: SNP records with N as ALT now noted as SNPs.
Run tests for Python 3.4.
* Remember the ploidity of uncalled genotypes such that the sample genotypes written by PyVCF.Writer match the sample genotypes read by PyVCF.Reader. * For uncalled _Calls, gt_nums and gt_bases are None; gt_alleles is a list of "None" with a length of _Call.ploidity.
Warning about open file handles muddle the output of unit tests and are a potentially confusing factor to those interpreting the tests.
The sample.data.GT attribute is no longer set to None for uncalled calls, which means that _format_sample can now rely on obtaining the original sample genotype.
Uncalled genotypes support
Fixes #181
Fix double quoting issue when writing VCFs
The issue in 0.8.0 seems to be fixed in 0.8.1, so it's now safe to just blacklist 0.8.0 specifically. See #175
… now it is possible to include a path to custom filters python file
Sorry, I was trying to include this PR in jamescasbon PyVCF... Better there or better here? :S |
Better there, I've not looked at this repo in ages. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
vcf_filter.py: local-script argument changed to custom-filters, since now it is possible to include a path to custom filters python file