Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ID field as semicolon-separated list #8

Closed
wants to merge 446 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
446 commits
Select commit Hold shift + click to select a range
a7db94c
Merge pull request #28 from martijnvermaat/missing-gt
Jun 12, 2012
5cd3375
Merge branch 'master' of github.com:jamescasbon/PyVCF
Jun 12, 2012
d928de1
merge dzerbino's structural variation work
Jun 12, 2012
b749fbf
update HISTORY
Jun 12, 2012
0952467
cache _map reference
Jun 12, 2012
1b6ae32
remove tabs from tests
Jun 12, 2012
014baa8
typo
Jun 12, 2012
39147d0
Created a proper class structure for the different types of ALT calls…
dzerbino Jun 12, 2012
5024f6e
Conflict merging
dzerbino Jun 12, 2012
9d75113
SingleBreakend now a subclass of Breakend
dzerbino Jun 12, 2012
0e2fce2
Corrected docs
dzerbino Jun 12, 2012
5dfb5ed
Making breakends a bit more fool-proof
dzerbino Jun 12, 2012
c9640fd
Allow direct comparison of _Substitution and strings for backward com…
dzerbino Jun 14, 2012
a6613fc
ignore
Jun 14, 2012
49da991
mv test data
Jun 15, 2012
90fdc6d
#49 allow malformed INFO string fields
Jun 15, 2012
0360cc7
remove print left in
Jun 15, 2012
a99a4c4
Merge pull request #48 from dzerbino/altClasses
Jun 15, 2012
139dc6c
Make metadata RE reluctant (stop on first =)
Jun 16, 2012
9e075dd
Merge pull request #51 from lennax/lenna
Jun 18, 2012
ecb2ab9
ignore noseids
Jun 18, 2012
cb8effa
More extensible conversion of header field counts.
Jun 20, 2012
121223b
Refactor metadata writing: fix incorrect Number.
Jun 20, 2012
f29727c
Removing _map; in header only Number can be None
Jun 20, 2012
609764e
Fix ALT writing ('.' for None).
Jun 21, 2012
c309a10
reenable doctests, PEP8 enhancements
Jun 25, 2012
880ce55
Merge pull request #53 from lennax/lenna
Jun 25, 2012
8a79f57
Add error bias filter
Jun 25, 2012
87d9ad6
repr for SV
Jun 25, 2012
23f9c85
store multiple metadata values with the same key, closes #52
Jun 25, 2012
313d38f
LICENCSE update
Jun 25, 2012
2227574
Add tox.ini for tox (http://tox.testrun.org/)
msabramo Jun 25, 2012
8bf6d20
Add .tox to .gitignore
msabramo Jun 25, 2012
a23c749
Modify Writer to output FILTER field properly for multiple filters:
cmclean Jun 25, 2012
39f85fa
Merge pull request #22 from martijnvermaat/trim-variant
Jun 26, 2012
174065b
Merge branch 'master' of github.com:jamescasbon/PyVCF
Jun 26, 2012
b0ee8e7
disable python 3 travis, for the moment, HISTORY, better docs
Jun 26, 2012
54672d5
add doctests for breakends, fix str for breakends and add to tests
Jun 26, 2012
8eece3f
add filters from @libor-m
Jun 26, 2012
ffd6062
use first line of filter docstring in FILTER meta, update README
Jun 26, 2012
6416532
Merge pull request #55 from cmclean/master
Jun 26, 2012
b18c83a
Merge pull request #54 from msabramo/tox
Jun 26, 2012
388ccc8
fix syntax error from PR
Jun 26, 2012
7321960
vcf/__init__.py: Use absolute imports for better Python 3 compatibility
msabramo Jun 26, 2012
fea207d
Check that flag is true.
Jun 26, 2012
60538fa
Python 3 compatibility
msabramo Jun 26, 2012
8dccb00
Merge pull request #58 from lennax/lenna
Jun 27, 2012
c09858f
Merge pull request #59 from msabramo/py32
Jun 27, 2012
6f8e778
enable travis on py3.2 fixes #56
Jun 27, 2012
b4688cc
Bump version to 0.5.0
Jun 27, 2012
1217403
remove cruft
Jun 27, 2012
2118748
factor out model into separate module
Jun 27, 2012
618988f
Add cython code #25
Jun 27, 2012
1d6918d
cython support with pure python fallback, pypy tests
Jun 27, 2012
d4b127f
adjust travis deps
Jun 27, 2012
a3f7cc9
adjust travis deps
Jun 27, 2012
0b64db9
Modified logic on Flag fields to print only if True,
cmclean Jun 27, 2012
fe9a138
Error checking added to flush and close
cmclean Jun 27, 2012
939e5ce
Merge pull request #61 from cmclean/master
Jun 27, 2012
1a95a0c
statprof profiling
Jun 28, 2012
3fabc92
remove duplicate model code
Jun 28, 2012
6d38025
use namedtuple instead of a dict for calldata
Jun 28, 2012
2b85497
PEP8
Jun 28, 2012
cc894c5
Made _AltRecord into ABC.
Jul 1, 2012
1640ff0
Tweaked super() in Alts for multiple inheritance.
Jul 1, 2012
99d3453
Fix Reader super call; pep8.
Jul 2, 2012
6e91990
Merge pull request #64 from lennax/lenna
Jul 2, 2012
75cf8f8
Initial per-sample line filtering.
Jul 2, 2012
18deb2a
Improved samp filter performance, allow invert.
Jul 2, 2012
8477e6f
Args can be provided all at once or in sequence.
Jul 2, 2012
73376c8
Reduced amount of sample filter code in parser.
Jul 2, 2012
362bbab
Actually write out sample-filtered file.
Jul 3, 2012
d71b2cd
Switched Writer \r\n to os.linesep.
Jul 3, 2012
bce2c47
Fixed sample name list update/printing.
Jul 3, 2012
67744c0
Moved all sample filtering to filter script.
Jul 6, 2012
a048ec0
Implemented argparse.
Jul 7, 2012
19ce645
Tweak args, pep8, move empty outfile warning.
Jul 7, 2012
95fc70b
Fixed argparse arg names.
Jul 7, 2012
67afb27
Changed default out to sys.stdout
Jul 7, 2012
33d2b5c
Added unit test for sample filtering script.
Jul 7, 2012
792d685
Added authorship statement.
Jul 7, 2012
d78a945
Added sample filter to list of scripts in setup.
Jul 7, 2012
75c4775
Moved sample filter object to src dir.
Jul 9, 2012
0047032
Using logging for easy quiet mode.
Jul 9, 2012
6b1fa89
Unit test for sample filter module.
Jul 9, 2012
817f5e9
Docs/test for undo_monkey_patch
Jul 9, 2012
0b0d809
Changed tests to use subprocess returncode.
Jul 9, 2012
746ece9
Destructor undoes patch; warn if 0 samples kept
Jul 9, 2012
30321c5
Recommend explicit use of del.
Jul 9, 2012
49f8897
Added empty filter list; del is now less critical.
Jul 9, 2012
119f19d
bias_test try except: Exception is AttributeError not KeyError when i…
ian1roberts Jul 17, 2012
cf714f8
Remove debug print statement
ian1roberts Jul 17, 2012
f811332
get method no longer used. Return sample data attributes with __geta…
ian1roberts Jul 17, 2012
ae7f9d0
use getattr
Sep 10, 2012
27ee8e3
record.FILTER is now always a list
Sep 10, 2012
2a728f4
format empty filter lists
Sep 10, 2012
a2a1a8e
Write record without crashing on missing GT
martijnvermaat Sep 21, 2012
a6f1fab
Add missing test case to test suite
martijnvermaat Sep 22, 2012
8f3c0a9
Test writer on bcftools output
martijnvermaat Sep 22, 2012
964c372
Merge pull request #72 from martijnvermaat/missing-gt
Sep 23, 2012
61c08ec
Skip lines with only whitespace
martijnvermaat Sep 23, 2012
9712355
Strip whitespace only once
martijnvermaat Sep 23, 2012
5da6932
Merge pull request #73 from martijnvermaat/ignore-empty-lines
Sep 25, 2012
bc75d63
Allow vcf_melt to process format fields that are missing in some records
seandavi Oct 18, 2012
5b86e30
Fixed bug in walk_together when one or more VCFs have no records
seandavi Oct 18, 2012
9b9194d
Merge pull request #76 from seandavi/master
Oct 26, 2012
e63960c
apply 0.6.0 release which seemed to get commited off of a branch
Nov 27, 2012
fb835a2
Changed the rule to split records into columns
marcofalcioni Nov 14, 2012
b6c085b
add strict whitespace option to allow for well formed VCFs with space…
Nov 27, 2012
3cd09d5
0.6.1 release
Nov 27, 2012
f554810
Allow flexibility in parsing INFO values specified as integers in the…
chapmanb Dec 3, 2012
69ecc53
Merge pull request #79 from chapmanb/master
Dec 5, 2012
b793020
Fixes #78
seandavi Dec 5, 2012
fa91b6b
Merge pull request #80 from seandavi/master
Dec 6, 2012
c957aab
0.6.2 version bump
Dec 6, 2012
95fd749
history update
Dec 6, 2012
8acaeb3
Correctly format contig output lines from writer, making output VCFs …
chapmanb Dec 26, 2012
9ca7798
Merge pull request #81 from chapmanb/master
Jan 2, 2013
9d43fa9
Correctly write meta lines with dictionary value
martijnvermaat Jan 10, 2013
1225561
Preserve order in meta lines with dictionary value
martijnvermaat Jan 10, 2013
9bb2a04
Merge pull request #84 from martijnvermaat/dictionary-meta
Jan 14, 2013
3256c66
add missing cparse implementation of #79
Jan 16, 2013
6a64d4b
version bump to 0.6.3
Jan 16, 2013
53548b6
Update .travis.yml
Jan 17, 2013
4ce6aff
handle String INFO fields with multiple values
alimanfoo Jan 28, 2013
3540bb7
Update writer unit tests to test call data equality
bow Jan 30, 2013
e3e5484
Fix bug that removes sample data when GT field is not present
bow Jan 29, 2013
dfc938b
Merge pull request #88 from bow/patch_writer-no-gt-field
Jan 30, 2013
1dbe5b5
Merge pull request #87 from alimanfoo/master
Jan 30, 2013
10b26fc
Record with empty list of samples instead of None
martijnvermaat Feb 26, 2013
9d7f44f
Only write FORMAT if it is in the template
martijnvermaat Feb 26, 2013
ac099c0
Merge pull request #97 from martijnvermaat/no-format
Mar 4, 2013
4fb0c86
Adhere to `strict_whitespace` in parsing column headers
martijnvermaat Mar 16, 2013
0fd74aa
Forgot to add test file
martijnvermaat Mar 16, 2013
6280a65
Merge pull request #102 from martijnvermaat/column-headers-separator
Mar 16, 2013
46f83b1
* adding support for contigs in the VCF header.
cgnh Jun 6, 2013
33f0711
* ignore the rest of the contig information
cgnh Jun 6, 2013
bd21fdc
Merge pull request #105 from cgnh/master
jamescasbon Jun 21, 2013
c276e7b
tests and fix for gatk header issue
alimanfoo Jul 11, 2013
b19f2bd
added test and fix for commas inside quoted value
alimanfoo Jul 11, 2013
51fac4b
whitespace?
alimanfoo Jul 11, 2013
2c91665
Fix contig test case for new contig header parsing
martijnvermaat Jul 12, 2013
d2f96d8
Added pickling support for '_Record' and '_CallData' -- closes #108
superbobry Jul 12, 2013
39391ee
Merge pull request #111 from martijnvermaat/contig-test
jamescasbon Jul 15, 2013
6b11eef
Merge pull request #109 from alimanfoo/gatk_26_meta
jamescasbon Jul 15, 2013
435c050
Merge pull request #112 from superbobry/master
jamescasbon Jul 15, 2013
76afe77
add python 3.3 testing, HISTORY updates
Jul 15, 2013
2dd8622
version 0.6.4
Jul 15, 2013
67b21a1
Differentiate between no filtering and PASS
martijnvermaat Aug 7, 2013
cc70525
Allow fields in contig definition before length
martijnvermaat Aug 14, 2013
e347e91
Merge pull request #116 from martijnvermaat/allow-contig-fields
martijnvermaat Aug 14, 2013
cd30d62
Merge pull request #115 from martijnvermaat/filter-pass-none
martijnvermaat Aug 16, 2013
831c023
Test if contig lines are output by writer
martijnvermaat Sep 19, 2013
bb72c5b
Output contig lines in writer
martijnvermaat Sep 19, 2013
3575ba6
Merge pull request #119 from martijnvermaat/write-contigs
martijnvermaat Sep 19, 2013
c4c6925
Test parsing and writing INFO with type Character
martijnvermaat Sep 20, 2013
e3fc03a
Fix parsing INFO lines with type Character
martijnvermaat Sep 20, 2013
4a29a2a
Merge pull request #121 from martijnvermaat/info-type-character
martijnvermaat Sep 20, 2013
58ef505
Add TestInfoTypeCharacter to test suite
martijnvermaat Sep 24, 2013
6039805
Fixed exception when reading single breakends
Nov 5, 2013
915f356
Merge pull request #126 from pkrusche/master
martijnvermaat Nov 6, 2013
4892aab
Do not maintain the order of INFO fields within records
martijnvermaat Nov 16, 2013
a6f9e1d
Merge pull request #128 from martijnvermaat/no-info-order
martijnvermaat Nov 19, 2013
0bd567c
Fixed tox.ini error regarding duplicate test section.
theboocock Nov 25, 2013
bef2938
Merge pull request #129 from smilefreak/toxini
martijnvermaat Nov 25, 2013
e50d750
Fix incorrect and missing reserved INFO/FORMAT fields
martijnvermaat Nov 25, 2013
cfd7091
Added method to return alt. allele frequencies when there is more tha…
Nov 29, 2013
50a2fcb
made aaf a list, changed to use Counter
Dec 2, 2013
36b4b68
Changed aaf to use collections.Counter. Made aaf return a list with f…
Dec 2, 2013
5497120
Add custom equality function as walk_together argument
bow Dec 3, 2013
d1218a6
Merge pull request #131 from mgymrek/master
jamescasbon Dec 3, 2013
cede181
Merge pull request #132 from bow/patch_walk-customfunc
martijnvermaat Dec 3, 2013
226f56a
Add dependency on collections.Counter implementation for Python 2.6
martijnvermaat Dec 3, 2013
7dc48c0
Merge pull request #133 from martijnvermaat/counter-python2.6
jamescasbon Dec 3, 2013
322c212
Fix unit tests on Python 2.6 and add missing tests to the suite
martijnvermaat Dec 4, 2013
b8c0af7
Fix comparison of _Record objects on Python 3
martijnvermaat Dec 4, 2013
23d1fc0
Update __eq__ operators to return False for comparison with different…
bow Dec 4, 2013
1a103cd
Add tests for updated equality behavior
bow Dec 4, 2013
8dcca20
Update walk_together test to accomodate __eq__ behavior change
bow Dec 4, 2013
d39ffa0
Adding method to compute heterozygosity for a site
Dec 11, 2013
cfade35
added heterozygosity method, fixed typo in docstring
Dec 11, 2013
1bd477a
fixed small typo in readme.rst for heterozygosity...
Dec 11, 2013
9c3822d
Ensure spurious line ending characters on records are stripped away
bow Jan 11, 2014
a60ef2f
Fix so conversion to Py3 works
bow Jan 11, 2014
a06f583
Changed the default line ending in vcf.Writer() to '\n'.
Feb 5, 2014
ab941e0
Merge pull request #139 from azalea/master
jamescasbon Feb 6, 2014
e6129f5
Merge pull request #137 from bow/patch_parser-line-ends
jamescasbon Feb 6, 2014
f39db4c
Merge pull request #135 from mgymrek/master
jamescasbon Feb 6, 2014
a5318ba
Merge pull request #134 from bow/patch_equalities
jamescasbon Feb 6, 2014
7c27103
version 0.6.5
Feb 6, 2014
616f310
Fix for issue #140, add vcf_record_sort_key arg
datagram Feb 6, 2014
2de70ce
Fixed spacing and wrapping in utils.py, removed test for old walk_tog…
Feb 6, 2014
d7563dc
Fixed edge case where all inputs are empty, simplified logic
Feb 6, 2014
ce4d20f
finished fixing edge case where 'other' is None
Feb 7, 2014
d51db23
Test data for testing the fix for issue #140
Feb 7, 2014
28dfe37
Added tests for walk_together with more complex inputs
Feb 7, 2014
734daf4
bump version
Feb 10, 2014
d1a9fdc
fix missing .pyx
Feb 21, 2014
ba00d83
Merge branch 'master' of https://github.com/jamescasbon/PyVCF into lenna
Feb 22, 2014
cbe8d90
Restore subprocess import to test
Feb 22, 2014
45513dd
Merge pull request #66 from lennax/lenna
jamescasbon Feb 23, 2014
097f2d0
making alternate allele frequency work in the case of non-diploid all…
Mar 6, 2014
9a51b24
fixing small typo in elif in test case for aaf
Mar 6, 2014
608078a
adding one more test case for non-diploids
Mar 6, 2014
4952f63
updating ploidy vcf example file
Mar 6, 2014
eeb892c
Marks skipped tests as skipped, not passed.
gotgenes May 12, 2014
f3d6a35
Skips fragile tests broken for Python 3.
gotgenes May 13, 2014
0e757e1
Skips broken test for PyPy.
gotgenes May 13, 2014
49be99b
Decorate the TestTabix case rather than its tests.
gotgenes May 13, 2014
1842c48
Merge pull request #155 from gotgenes/feature-indicate_skipped_tests
martijnvermaat May 14, 2014
4fba62c
Reader.fetch uses zero-based, half-open coordinates.
gotgenes May 14, 2014
66256cc
Merge pull request #156 from gotgenes/feature-zbho_tabix_coordinates
martijnvermaat May 14, 2014
2e4498d
Fixes fetch documentation in package docstring.
gotgenes May 14, 2014
201f672
Merge pull request #157 from gotgenes/bugfix-update_fetch_documentation
martijnvermaat May 14, 2014
2d522b5
Removes setup import from distutils that overrides setuptools setup.
gotgenes May 14, 2014
2451c16
Tidies up Python 2.6 dependencies
gotgenes May 15, 2014
6066590
Updates PyPI trove classifiers.
gotgenes May 15, 2014
5f55a59
Merge pull request #158 from gotgenes/bugfix-dont_override_setuptools…
martijnvermaat May 16, 2014
2ef6a4f
Use requirements files to consolidate dependencies.
gotgenes May 16, 2014
dca065c
Merge pull request #159 from gotgenes/feature-use_requirements_files
martijnvermaat May 16, 2014
47acb56
Adds _Record.affected_start and .affected_end.
gotgenes May 19, 2014
13c7f94
Merge pull request #161 from gotgenes/feature-affected_coordinates
martijnvermaat Jun 8, 2014
82413a1
Merge pull request #148 from mgymrek/master
martijnvermaat Jun 8, 2014
2f0d577
Allow flag INFO field to be declared as string
martijnvermaat Jun 25, 2014
d927381
Don't crash when FORMAT is set to the missing value (.)
martijnvermaat Jun 25, 2014
fd390b1
Merge pull request #165 from martijnvermaat/string_as_flag
martijnvermaat Jun 25, 2014
bf37d96
Merge pull request #166 from martijnvermaat/format-none
martijnvermaat Jun 25, 2014
e7d350b
Don't crash on metadata lines without value
martijnvermaat Jul 6, 2014
c8f3f8d
Temporarily fix pysam on 0.7.8 (0.8.0 fails on Python 3)
martijnvermaat Sep 9, 2014
eafd842
Partial support for VCFv4.2
martijnvermaat Sep 9, 2014
6b866b4
Merge pull request #174 from jamescasbon/vcf-4.2
martijnvermaat Sep 9, 2014
f6e955f
Bugfix: SNP records with N as ALT now noted as SNPs.
gotgenes Sep 9, 2014
0a993e1
Run tests for Python 3.4.
gotgenes Sep 9, 2014
c4b0c8c
Merge pull request #177 from gotgenes/bugfix-snps_with_n
martijnvermaat Sep 9, 2014
8f01434
Merge pull request #178 from gotgenes/feature-test_python_3.4
martijnvermaat Sep 9, 2014
e8a05d9
Add Python 3.4 trove classifier
martijnvermaat Sep 13, 2014
82d8288
Add test cases for uncalled genotypes support
amwenger Sep 17, 2014
6f7b3d9
Close file handles in TestUncalledGenotypes tests
amwenger Sep 17, 2014
14e4837
Add support for uncalled genotypes
amwenger Sep 17, 2014
80a638c
Simplify _format_sample logic
amwenger Sep 17, 2014
123c6da
Merge pull request #179 from amwenger/amw-uncalled-genotypes-support
martijnvermaat Sep 19, 2014
28725da
Tolerate equals sign in INFO field value
martijnvermaat Oct 10, 2014
2fceb0c
fix double quoting issue when writing VCFs
davecap Oct 24, 2014
655b16a
Merge pull request #186 from davecap/fix-double-quoting-writer
martijnvermaat Oct 27, 2014
35ebae1
Blacklist pysam 0.8.0 in unit tests (fails on Python 3)
martijnvermaat Nov 10, 2014
2c8d94f
Support ##contig headers with only ID attributes. Generated by bcftoo…
chapmanb Feb 16, 2015
3184ce7
Merge pull request #190 from chapmanb/master
martijnvermaat Feb 16, 2015
5864f83
Allow for whitespace after commas in metadata lines
martijnvermaat Mar 14, 2015
9345ca9
Merge pull request #195 from martijnvermaat/metadata-whitespace
martijnvermaat Mar 14, 2015
ec193b1
Enable compression to be disabled for .gz filenames
cariaso Apr 19, 2015
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 13 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
PyVCF.egg-info
build
dist
*.pyc
docs/_build
.ropeproject
1kg.prof
.noseids
.tox
.DS_Store
vcf/cparse.c
vcf/cparse.so
.coverage
13 changes: 13 additions & 0 deletions .travis.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
# Validate this file using http://lint.travis-ci.org/
language: python
python:
- "2.6"
- "2.7"
- "3.2"
- "3.3"
- "3.4"
- "pypy"
install:
- "if [[ $TRAVIS_PYTHON_VERSION == '2.6' ]]; then pip install -r requirements/python2.6-requirements.txt; elif [[ $TRAVIS_PYTHON_VERSION == 'pypy' ]]; then pip install -r requirements/pypy-requirements.txt; else pip install -r requirements/common-requirements.txt; fi"
- python setup.py install
script: python setup.py test
28 changes: 28 additions & 0 deletions LICENSE
Original file line number Diff line number Diff line change
@@ -1,3 +1,31 @@
Copyright (c) 2011-2012, Population Genetics Technologies Ltd, All rights reserved.

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:

1. Redistributions of source code must retain the above copyright notice, this
list of conditions and the following disclaimer.

2. Redistributions in binary form must reproduce the above copyright notice, this
list of conditions and the following disclaimer in the documentation and/or
other materials provided with the distribution.

3. Neither the name of the Population Genetics Technologies Ltd nor the names of
its contributors may be used to endorse or promote products derived from this
software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.


Copyright (c) 2011 John Dougherty

Permission is hereby granted, free of charge, to any person obtaining a copy of
Expand Down
1 change: 1 addition & 0 deletions MANIFEST.in
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
recursive-include vcf *.pyx
140 changes: 112 additions & 28 deletions README.rst
Original file line number Diff line number Diff line change
@@ -1,4 +1,6 @@
A VCFv4.0 parser for Python.
A VCFv4.0 and 4.1 parser for Python.

Online version of PyVCF documentation is available at http://pyvcf.rtfd.org/

The intent of this module is to mimic the ``csv`` module in the Python stdlib,
as opposed to more flexible serialization formats like JSON or YAML. ``vcf``
Expand All @@ -8,22 +10,22 @@ specified in the meta-information lines -- specifically the ##INFO and
against the reserved types mentioned in the spec. Failing that, it will just
return strings.

There is currently one piece of interface: ``VCFReader``. It takes a file-like
There main interface is the class: ``Reader``. It takes a file-like
object and acts as a reader::

>>> import vcf
>>> vcf_reader = vcf.VCFReader(open('example.vcf', 'rb'))
>>> vcf_reader = vcf.Reader(open('vcf/test/example-4.0.vcf', 'r'))
>>> for record in vcf_reader:
... print record
Record(CHROM='20', POS=14370, ID='rs6054257', REF='G', ALT=['A'], QUAL=29,
FILTER='PASS', INFO={'H2': True, 'NS': 3, 'DB': True, 'DP': 14, 'AF': [0.5]
}, FORMAT='GT:GQ:DP:HQ', samples=[{'GT': '0', 'HQ': [58, 50], 'DP': 3, 'GQ'
: 49, 'name': 'NA00001'}, {'GT': '0', 'HQ': [65, 3], 'DP': 5, 'GQ': 3, 'nam
e' : 'NA00002'}, {'GT': '0', 'DP': 3, 'GQ': 41, 'name': 'NA00003'}])
Record(CHROM=20, POS=14370, REF=G, ALT=[A])
Record(CHROM=20, POS=17330, REF=T, ALT=[A])
Record(CHROM=20, POS=1110696, REF=A, ALT=[G, T])
Record(CHROM=20, POS=1230237, REF=T, ALT=[None])
Record(CHROM=20, POS=1234567, REF=GTCT, ALT=[G, GTACT])


This produces a great deal of information, but it is conveniently accessed.
The attributes of a Record are the 8 fixed fields from the VCF spec plus two
more. That is:
The attributes of a Record are the 8 fixed fields from the VCF spec::

* ``Record.CHROM``
* ``Record.POS``
Expand All @@ -34,55 +36,137 @@ more. That is:
* ``Record.FILTER``
* ``Record.INFO``

plus two more attributes to handle genotype information:
plus attributes to handle genotype information:

* ``Record.FORMAT``
* ``Record.samples``
* ``Record.genotype``

``samples``, not being the title of any column, is left lowercase. The format
``samples`` and ``genotype``, not being the title of any column, are left lowercase. The format
of the fixed fields is from the spec. Comma-separated lists in the VCF are
converted to lists. In particular, one-entry VCF lists are converted to
one-entry Python lists (see, e.g., ``Record.ALT``). Semicolon-delimited lists
of key=value pairs are converted to Python dictionaries, with flags being given
a ``True`` value. Integers and floats are handled exactly as you'd expect::

>>> vcf_reader = vcf.Reader(open('vcf/test/example-4.0.vcf', 'r'))
>>> record = vcf_reader.next()
>>> print record.POS
17330
14370
>>> print record.ALT
['A']
[A]
>>> print record.INFO['AF']
[0.017]
[0.5]

There are a number of convienience methods and properties for each ``Record`` allowing you to
examine properties of interest::

>>> print record.num_called, record.call_rate, record.num_unknown
3 1.0 0
>>> print record.num_hom_ref, record.num_het, record.num_hom_alt
1 1 1
>>> print record.nucl_diversity, record.aaf, record.heterozygosity
0.6 [0.5] 0.5
>>> print record.get_hets()
[Call(sample=NA00002, CallData(GT=1|0, GQ=48, DP=8, HQ=[51, 51]))]
>>> print record.is_snp, record.is_indel, record.is_transition, record.is_deletion
True False True False
>>> print record.var_type, record.var_subtype
snp ts
>>> print record.is_monomorphic
False

``record.FORMAT`` will be a string specifying the format of the genotype
fields. In case the FORMAT column does not exist, ``record.FORMAT`` is
``None``. Finally, ``record.samples`` is a list of dictionaries containing the
parsed sample column::
parsed sample column and ``record.genotype`` is a way of looking up genotypes
by sample name::

>>> record = vcf_reader.next()
>>> for sample in record.samples:
... print sample['GT']
'1|2'
'2|1'
'2/2'
0|0
0|1
0/0
>>> print record.genotype('NA00001')['GT']
0|0

The genotypes are represented by ``Call`` objects, which have three attributes: the
corresponding Record ``site``, the sample name in ``sample`` and a dictionary of
call data in ``data``::

>>> call = record.genotype('NA00001')
>>> print call.site
Record(CHROM=20, POS=17330, REF=T, ALT=[A])
>>> print call.sample
NA00001
>>> print call.data
CallData(GT=0|0, GQ=49, DP=3, HQ=[58, 50])

Please note that as of release 0.4.0, attributes known to have single values (such as
``DP`` and ``GQ`` above) are returned as values. Other attributes are returned
as lists (such as ``HQ`` above).

There are also a number of methods::

>>> print call.called, call.gt_type, call.gt_bases, call.phased
True 0 T|T True

Metadata regarding the VCF file itself can be investigated through the
following attributes:

* ``VCFReader.metadata``
* ``VCFReader.infos``
* ``VCFReader.filters``
* ``VCFReader.formats``
* ``VCFReader.samples``
* ``Reader.metadata``
* ``Reader.infos``
* ``Reader.filters``
* ``Reader.formats``
* ``Reader.samples``

For example::

>>> vcf_reader.metadata['fileDate']
20090805
'20090805'
>>> vcf_reader.samples
['NA00001', 'NA00002', 'NA00003']
>>> vcf_reader.filters
{'q10': Filter(id='q10', desc='Quality below 10'),
's50': Filter(id='s50', desc='Less than 50% of samples have data')}
OrderedDict([('q10', Filter(id='q10', desc='Quality below 10')), ('s50', Filter(id='s50', desc='Less than 50% of samples have data'))])
>>> vcf_reader.infos['AA'].desc
Ancestral Allele
'Ancestral Allele'

ALT records are actually classes, so that you can interrogate them::

>>> reader = vcf.Reader(open('vcf/test/example-4.1-bnd.vcf'))
>>> _ = reader.next(); row = reader.next()
>>> print row
Record(CHROM=1, POS=2, REF=T, ALT=[T[2:3[])
>>> bnd = row.ALT[0]
>>> print bnd.withinMainAssembly, bnd.orientation, bnd.remoteOrientation, bnd.connectingSequence
True False True T

Random access is supported for files with tabix indexes. Simply call fetch for the
region you are interested in::

>>> vcf_reader = vcf.Reader(filename='vcf/test/tb.vcf.gz')
>>> for record in vcf_reader.fetch('20', 1110696, 1230237): # doctest: +SKIP
... print record
Record(CHROM=20, POS=1110696, REF=A, ALT=[G, T])
Record(CHROM=20, POS=1230237, REF=T, ALT=[None])

Or extract a single row::

>>> print vcf_reader.fetch('20', 1110696) # doctest: +SKIP
Record(CHROM=20, POS=1110696, REF=A, ALT=[G, T])


The ``Writer`` class provides a way of writing a VCF file. Currently, you must specify a
template ``Reader`` which provides the metadata::

>>> vcf_reader = vcf.Reader(filename='vcf/test/tb.vcf.gz')
>>> vcf_writer = vcf.Writer(open('/dev/null', 'w'), vcf_reader)
>>> for record in vcf_reader:
... vcf_writer.write_record(record)


An extensible script is available to filter vcf files in vcf_filter.py. VCF filters
declared by other packages will be available for use in this script. Please
see :doc:`FILTERS` for full description.

56 changes: 56 additions & 0 deletions docs/API.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
API
===

vcf.Reader
----------

.. autoclass:: vcf.Reader
:members:

vcf.Writer
----------

.. autoclass:: vcf.Writer
:members:

vcf.model._Record
-----------

.. autoclass:: vcf.model._Record
:members:

vcf.model._Call
---------

.. autoclass:: vcf.model._Call
:members:

vcf.model._AltRecord
-----------

.. autoclass:: vcf.model._AltRecord
:members:

vcf.model._Substitution
-----------

.. autoclass:: vcf.model._Substitution
:members:

vcf.model._SV
-----------

.. autoclass:: vcf.model._SV
:members:

vcf.model._SingleBreakend
-----------

.. autoclass:: vcf.model._SingleBreakend
:members:

vcf.model._Breakend
-----------

.. autoclass:: vcf.parser._Breakend
:members:
Loading