Skip to content

Commit

Permalink
Merge pull request #67 from sbslee/0.17.0-dev
Browse files Browse the repository at this point in the history
0.17.0 dev
  • Loading branch information
sbslee authored Jul 12, 2022
2 parents 74d5c14 + 05a8998 commit 79ec441
Show file tree
Hide file tree
Showing 10 changed files with 259 additions and 34 deletions.
10 changes: 9 additions & 1 deletion CHANGELOG.rst
Original file line number Diff line number Diff line change
@@ -1,6 +1,15 @@
Changelog
*********

0.17.0 (2022-07-12)
-------------------

* :issue:`63`: Fix bug in :meth:`api.utils.estimate_phase_beagle` when there is only one variant in input VCF and Beagle throws an error.
* Update :command:`compare-genotypes` command to print the entire discordant calls when ``--verbose`` is used.
* Update :command:`compute-copy-number` command to ensure that the samples in CovFrame[ReadDepth] and SampleTable[Statistics] are in the same order.
* :issue:`64`: Update :meth:`api.utils.import_variants` method to 'diploidize' the input VCF when the target gene is G6PD. This is because some variant callers output haploid genotypes for males for the X chromosome, interfering with downstream analyses.
* Remove unnecessary optional argument ``assembly`` from :meth:`api.core.get_ref_allele`.

0.16.0 (2022-06-08)
-------------------

Expand Down Expand Up @@ -71,7 +80,6 @@ Changelog
* Deprecate :meth:`sdk.utils.parse_input_bams` method.
* Update :meth:`api.utils.predict_alleles` method to match ``0.31.0`` version of ``fuc`` package.
* Fix bug in :command:`filter-samples` command when ``--exclude`` argument is used for archive files with SampleTable type.
* Remove unnecessary optional argument ``assembly`` from :meth:`api.core.get_ref_allele`.
* Improve CNV caller for CYP2A6, CYP2B6, CYP2D6, CYP2E1, CYP4F2, GSTM1, SLC22A2, SULT1A1, UGT1A4, UGT2B15, and UGT2B17.
* Add a new CNV call for CYP2D6: ``PseudogeneDeletion``.
* In CYP2E1 CNV nomenclature, ``PartialDuplication`` has been renamed to ``PartialDuplicationHet`` and a new CNV call ``PartialDuplicationHom`` has been added. Furthermore, calling algorithm for CYP2E1\*S1 allele has been updated. When partial duplication is present, from now on the algorithm requires only \*7 to call \*S1 instead of both \*7 and \*4.
Expand Down
20 changes: 17 additions & 3 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -177,6 +177,15 @@ you can access a development branch with the ``git checkout`` command. When
you do this, please make sure your environment already has all the
dependencies installed.

.. note::
`Beagle <https://faculty.washington.edu/browning/beagle/beagle.html>`__
is one of the default software tools used by PyPGx for haplotype phasing
SNVs and indels. The program is freely available and published under the
`GNU General Public License <https://faculty.washington.edu/browning/
beagle/gpl_license>`__. Users do not need to download Beagle separately
because a copy of the software (``beagle.28Jun21.220.jar``) is already
included in PyPGx.

.. warning::
You're not done yet! Keep scrolling down to obtain the resource bundle
for PyPGx, which is essential for running the package.
Expand Down Expand Up @@ -238,13 +247,13 @@ visually inspect SV calls. Below are CYP2D6 examples:
* - Normal
- .. image:: https://raw.githubusercontent.com/sbslee/pypgx-data/main/dpsv/gene-model-CYP2D6-1.png
- .. image:: https://raw.githubusercontent.com/sbslee/pypgx-data/main/dpsv/GRCh37-CYP2D6-8.png
* - DeletionHet
* - WholeDel1
- .. image:: https://raw.githubusercontent.com/sbslee/pypgx-data/main/dpsv/gene-model-CYP2D6-2.png
- .. image:: https://raw.githubusercontent.com/sbslee/pypgx-data/main/dpsv/GRCh37-CYP2D6-1.png
* - DeletionHom
* - WholeDel1Hom
- .. image:: https://raw.githubusercontent.com/sbslee/pypgx-data/main/dpsv/gene-model-CYP2D6-3.png
- .. image:: https://raw.githubusercontent.com/sbslee/pypgx-data/main/dpsv/GRCh37-CYP2D6-6.png
* - Duplication
* - WholeDup1
- .. image:: https://raw.githubusercontent.com/sbslee/pypgx-data/main/dpsv/gene-model-CYP2D6-4.png
- .. image:: https://raw.githubusercontent.com/sbslee/pypgx-data/main/dpsv/GRCh37-CYP2D6-2.png
* - Tandem3
Expand All @@ -254,6 +263,11 @@ visually inspect SV calls. Below are CYP2D6 examples:
- .. image:: https://raw.githubusercontent.com/sbslee/pypgx-data/main/dpsv/gene-model-CYP2D6-10.png
- .. image:: https://raw.githubusercontent.com/sbslee/pypgx-data/main/dpsv/GRCh37-CYP2D6-7.png

PyPGx was recently applied to the entire high-coverage WGS dataset from 1KGP
(N=2,504). Click `here <https://github.com/sbslee/1kgp-pgx-paper/tree/main/
sv-tables>`__ to see individual SV calls, and corresponding copy number
profiles and allele fraction profiles.

GRCh37 vs. GRCh38
=================

Expand Down
20 changes: 17 additions & 3 deletions docs/create.py
Original file line number Diff line number Diff line change
Expand Up @@ -204,6 +204,15 @@
you do this, please make sure your environment already has all the
dependencies installed.
.. note::
`Beagle <https://faculty.washington.edu/browning/beagle/beagle.html>`__
is one of the default software tools used by PyPGx for haplotype phasing
SNVs and indels. The program is freely available and published under the
`GNU General Public License <https://faculty.washington.edu/browning/
beagle/gpl_license>`__. Users do not need to download Beagle separately
because a copy of the software (``beagle.28Jun21.220.jar``) is already
included in PyPGx.
.. warning::
You're not done yet! Keep scrolling down to obtain the resource bundle
for PyPGx, which is essential for running the package.
Expand Down Expand Up @@ -265,13 +274,13 @@
* - Normal
- .. image:: https://raw.githubusercontent.com/sbslee/pypgx-data/main/dpsv/gene-model-CYP2D6-1.png
- .. image:: https://raw.githubusercontent.com/sbslee/pypgx-data/main/dpsv/GRCh37-CYP2D6-8.png
* - DeletionHet
* - WholeDel1
- .. image:: https://raw.githubusercontent.com/sbslee/pypgx-data/main/dpsv/gene-model-CYP2D6-2.png
- .. image:: https://raw.githubusercontent.com/sbslee/pypgx-data/main/dpsv/GRCh37-CYP2D6-1.png
* - DeletionHom
* - WholeDel1Hom
- .. image:: https://raw.githubusercontent.com/sbslee/pypgx-data/main/dpsv/gene-model-CYP2D6-3.png
- .. image:: https://raw.githubusercontent.com/sbslee/pypgx-data/main/dpsv/GRCh37-CYP2D6-6.png
* - Duplication
* - WholeDup1
- .. image:: https://raw.githubusercontent.com/sbslee/pypgx-data/main/dpsv/gene-model-CYP2D6-4.png
- .. image:: https://raw.githubusercontent.com/sbslee/pypgx-data/main/dpsv/GRCh37-CYP2D6-2.png
* - Tandem3
Expand All @@ -281,6 +290,11 @@
- .. image:: https://raw.githubusercontent.com/sbslee/pypgx-data/main/dpsv/gene-model-CYP2D6-10.png
- .. image:: https://raw.githubusercontent.com/sbslee/pypgx-data/main/dpsv/GRCh37-CYP2D6-7.png
PyPGx was recently applied to the entire high-coverage WGS dataset from 1KGP
(N=2,504). Click `here <https://github.com/sbslee/1kgp-pgx-paper/tree/main/
sv-tables>`__ to see individual SV calls, and corresponding copy number
profiles and allele fraction profiles.
GRCh37 vs. GRCh38
=================
Expand Down
46 changes: 46 additions & 0 deletions docs/faq.rst
Original file line number Diff line number Diff line change
Expand Up @@ -77,3 +77,49 @@ consistent with the other variant-level analyses you may also just use the
same VCF for PyPGx. The bottom line is, if you are going to create your own
input VCF, then you need to know what you are doing. Otherwise, it's probably
safer to use :command:`create-input-vcf`.

``chr22_KI270879v1_alt`` in GRCh38
==================================

Users may encounter an error like below when working with GRCh38 data:

.. code-block:: text
$ pypgx prepare-depth-of-coverage \
depth-of-coverage.zip \
in.bam \
--assembly GRCh38
Traceback (most recent call last):
File "/Users/sbslee/opt/anaconda3/envs/fuc/bin/pypgx", line 33, in <module>
sys.exit(load_entry_point('pypgx', 'console_scripts', 'pypgx')())
File "/Users/sbslee/Desktop/pypgx/pypgx/__main__.py", line 33, in main
commands[args.command].main(args)
File "/Users/sbslee/Desktop/pypgx/pypgx/cli/prepare_depth_of_coverage.py", line 90, in main
archive = utils.prepare_depth_of_coverage(
File "/Users/sbslee/Desktop/pypgx/pypgx/api/utils.py", line 1247, in prepare_depth_of_coverage
cf = pycov.CovFrame.from_bam(bams, regions=regions, zero=True)
File "/Users/sbslee/Desktop/fuc/fuc/api/pycov.py", line 345, in from_bam
results += pysam.depth(*(bams + args + ['-r', region]))
File "/Users/sbslee/opt/anaconda3/envs/fuc/lib/python3.9/site-packages/pysam/utils.py", line 69, in __call__
raise SamtoolsError(
pysam.utils.SamtoolsError: 'samtools returned with error 1: stdout=, stderr=samtools depth: cannot parse region "chr22_KI270879v1_alt:267307-281486"\n'
This is a GRCh38-specific issue. One of the genes with SV is GSTT1 and it is
located in the contig ``chr22_KI270879v1_alt``, which is missing in input BAM
file. That's why the :command:`prepare-depth-of-coverage` command is
complaining. To solve this issue, you can either re-align sequence reads in
the presence of the contig in your FASTA reference genome or work around it
by excluding GSTT1 from your analysis:

.. code-block:: text
$ pypgx prepare-depth-of-coverage \
depth-of-coverage.zip \
in.bam \
--assembly GRCh38 \
--genes GSTT1 \
--exclude
For more details, please see the following articles:
:ref:`readme:GRCh37 vs. GRCh38` and :ref:`genes:GRCh38 data for GSTT1`.
Related GitHub issues: :issue:`65`.
Loading

0 comments on commit 79ec441

Please sign in to comment.