-
Notifications
You must be signed in to change notification settings - Fork 223
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ID field as semicolon-separated list #8
Closed
Closed
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Don't choke on missing GT
Alt classes
Make metadata RE reluctant (stop on first = not last)
Fix writing of Number=A and G INFO/FORMAT fields
Adds _Record.affected_start and .affected_end.
making alternate allele frequency work in the case of non-diploid genotypes
As reported in #164, we previously crashed on flag INFO fields declared as strings (and the number of values declared as 1). This is indeed not according to spec, but we should probably allow it anyway.
It is not valid according to the spec, but issue #164 shows a VCF file where the FORMAT column contains just a dot character. We have no way of interpreting the subsequent genotype columns in that case, so this patch ignores them.
Allow flag INFO field to be declared as string
Don't crash when FORMAT is set to the missing value (.)
The spec actually does not allow for metadata lines without value, but we shouldn't crash on them. Fixes #168
Before we figure out what causes this, let's have a working test suite by fixing pysam on the latest working release. Traceback: Traceback (most recent call last): File "/home/travis/build/jamescasbon/PyVCF/build/lib.linux-x86_64-3.3/vcf/test/test_vcf.py", line 1109, in testNoVariantsInRange fetched_variants = self.reader.fetch('20', 14370, 17329) File "/home/travis/build/jamescasbon/PyVCF/build/lib.linux-x86_64-3.3/vcf/parser.py", line 623, in fetch self.reader = self._tabix.fetch(chrom, start, end) File "ctabix.pyx", line 345, in pysam.ctabix.Tabixfile.fetch (pysam/ctabix.c:4241) TypeError: expected bytes, str found See #175
Partial support for VCFv4.2
The VCF 4.0 and newer specifications say the ALT field is a comma separated list that includes "base Strings made up of the bases A,C,G,T,N". Notably, the last case was not handled by `Record.is_snp`, causing it to erroneously report `False` for records with "N" as the ALT.
Bugfix: SNP records with N as ALT now noted as SNPs.
Run tests for Python 3.4.
* Remember the ploidity of uncalled genotypes such that the sample genotypes written by PyVCF.Writer match the sample genotypes read by PyVCF.Reader. * For uncalled _Calls, gt_nums and gt_bases are None; gt_alleles is a list of "None" with a length of _Call.ploidity.
Warning about open file handles muddle the output of unit tests and are a potentially confusing factor to those interpreting the tests.
The sample.data.GT attribute is no longer set to None for uncalled calls, which means that _format_sample can now rely on obtaining the original sample genotype.
Uncalled genotypes support
Fix double quoting issue when writing VCFs
The issue in 0.8.0 seems to be fixed in 0.8.1, so it's now safe to just blacklist 0.8.0 specifically. See #175
…ls 1.2 when inputs have no ##contig information
Support ##contig headers with only ID attributes. Generated by bcftools 1.2 when inputs have no ##contig information
Allow for whitespace after commas in metadata lines
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
From the VCF spec: "ID - identifier: Semi-colon separated list of unique identifiers where available. If this is a dbSNP variant it is encouraged to use the rs number(s). No identifier should be present in more than one data record. If there is no identifier available, then the missing value should be used. (String, no white-space or semi-colons permitted)"
In parser.py, next(self)