Skip to content

Commit

Permalink
Merge pull request #58 from sbslee/0.15.0-dev
Browse files Browse the repository at this point in the history
0.15.0 dev
  • Loading branch information
sbslee authored May 3, 2022
2 parents bbac2c7 + 83ef1fa commit 0c8c33a
Show file tree
Hide file tree
Showing 20 changed files with 608 additions and 62 deletions.
15 changes: 15 additions & 0 deletions CHANGELOG.rst
Original file line number Diff line number Diff line change
@@ -1,6 +1,21 @@
Changelog
*********

0.15.0 (2022-05-03)
-------------------

* Add new optional arguments ``--genes`` and ``--exclude`` to :command:`prepare-depth-of-coverage` command.
* Add new command :command:`slice-bam`.
* Add new command :command:`print-data`.
* Fix typo "statistcs" to "statistics" throughout the package.
* Update :meth:`sdk.utils.simulate_copy_number` method to automatically handle duplicate sample names.
* Improve CNV caller for CYP2A6, CYP2B6, CYP2D6, CYP2E1, GSTM1, SLC22A2, SULT1A1, UGT1A4, UGT2B15, UGT2B17.
* Add new CNV calls for CYP2A6: ``Deletion2Hom``, ``Hybrid5``, ``Hybrid6``, ``PseudogeneDeletion``.
* Add new CNV call for CYP2D6: ``Tandem2F``.
* Add new CNV call for GSTM1: ``Normal,Deletion2``.
* Add new CNV call for SULT1A1: ``Unknown1``.
* Add new CNV call for UGT2B17: ``Deletion,PartialDeletion3``.

0.14.0 (2022-04-03)
-------------------

Expand Down
25 changes: 21 additions & 4 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -357,7 +357,7 @@ currently defined semantic types:
- ``SampleTable[Results]``
* TSV file for storing various results for each sample.
* Requires following metadata: ``Gene``, ``Assembly``, ``SemanticType``.
- ``SampleTable[Statistcs]``
- ``SampleTable[Statistics]``
* TSV file for storing control gene's various statistics on read depth for each sample. Used for converting target gene's read depth to copy number.
* Requires following metadata: ``Control``, ``Assembly``, ``SemanticType``, ``Platform``.
- ``VcfFrame[Consolidated]``
Expand All @@ -370,11 +370,12 @@ currently defined semantic types:
* VcfFrame for storing target gene's phased variant data.
* Requires following metadata: ``Platform``, ``Gene``, ``Assembly``, ``SemanticType``, ``Program``.

Wroking with archive files
Working with archive files
--------------------------

To demonstrate how easy it is to work with PyPGx archive files, below we will
show some examples. First, download an archive:
show some examples. First, download an archive to play with, which has
``SampleTable[Results]`` as semantic type:

.. code-block:: text
Expand All @@ -389,6 +390,14 @@ Let's print its metadata:
Assembly=GRCh37
SemanticType=SampleTable[Results]
Now print its main data (but display first sample only):

.. code-block:: text
$ pypgx print-data grch37-CYP2D6-results.zip | head -n 2
Genotype Phenotype Haplotype1 Haplotype2 AlternativePhase VariantData CNV
HG00276_PyPGx *4/*5 Poor Metabolizer *4;*10;*74;*2; *10;*74;*2; ; *4:22-42524947-C-T:0.913;*10:22-42526694-G-A,22-42523943-A-G:1.0,1.0;*74:22-42525821-G-T:1.0;*2:default; DeletionHet
We can unzip it to extract files inside (note that ``tmpcty4c_cr`` is the
original folder name):

Expand Down Expand Up @@ -500,7 +509,7 @@ input data is from whole genome sequencing (WGS) or targeted sequencing
This pipeline supports SV detection based on copy number analysis for genes
that are known to have SV. Therefore, if the target gene is associated with
SV (e.g. CYP2D6) it's strongly recommended to provide a
``CovFrame[DepthOfCoverage]`` file and a ``SampleTable[Statistcs]`` file in
``CovFrame[DepthOfCoverage]`` file and a ``SampleTable[Statistics]`` file in
addtion to a VCF file containing SNVs/indels. If the target gene is not
associated with SV (e.g. CYP3A5) providing a VCF file alone is enough. You can
visit the `Genes <https://pypgx.readthedocs.io/en/latest/genes.html>`__ page
Expand All @@ -515,6 +524,9 @@ HaplotypeCaller). See the `Variant caller choice <https://pypgx.readthedocs.
io/en/latest/faq.html#variant-caller-choice>`__ section for detailed
discussion on when to use either option.

Check out the `GeT-RM WGS tutorial <https://pypgx.readthedocs.io/en/latest/
tutorials.html#get-rm-wgs-tutorial>`__ to see this pipeline in action.

Chip pipeline
-------------

Expand All @@ -534,6 +546,9 @@ The pipeline currently does not support SV detection. Please post a GitHub
issue if you want to contribute your development skills and/or data for
devising an SV detection algorithm.

Check out the `Coriell Affy tutorial <https://pypgx.readthedocs.io/en/latest/
tutorials.html#coriell-affy-tutorial>`__ to see this pipeline in action.

Long-read pipeline
------------------

Expand Down Expand Up @@ -664,11 +679,13 @@ For getting help on the CLI:
prepare-depth-of-coverage
Prepare a depth of coverage file for all target
genes with SV from BAM files.
print-data Print the main data of specified archive.
print-metadata Print the metadata of specified archive.
run-chip-pipeline Run genotyping pipeline for chip data.
run-long-read-pipeline
Run genotyping pipeline for long-read sequencing data.
run-ngs-pipeline Run genotyping pipeline for NGS data.
slice-bam Slice BAM file for all genes used by PyPGx.
test-cnv-caller Test CNV caller for target gene.
train-cnv-caller Train CNV caller for target gene.
Expand Down
93 changes: 71 additions & 22 deletions docs/cli.rst
Original file line number Diff line number Diff line change
Expand Up @@ -55,11 +55,13 @@ For getting help on the CLI:
prepare-depth-of-coverage
Prepare a depth of coverage file for all target
genes with SV from BAM files.
print-data Print the main data of specified archive.
print-metadata Print the metadata of specified archive.
run-chip-pipeline Run genotyping pipeline for chip data.
run-long-read-pipeline
Run genotyping pipeline for long-read sequencing data.
run-ngs-pipeline Run genotyping pipeline for NGS data.
slice-bam Slice BAM file for all genes used by PyPGx.
test-cnv-caller Test CNV caller for target gene.
train-cnv-caller Train CNV caller for target gene.
Expand Down Expand Up @@ -201,13 +203,13 @@ compute-control-statistics
[Example] For the VDR gene from WGS data:
$ pypgx compute-control-statistics \
VDR \
control-statistcs.zip \
control-statistics.zip \
1.bam 2.bam
[Example] For a custom region from targeted sequencing data:
$ pypgx compute-control-statistics \
chr1:100-200 \
control-statistcs.zip \
control-statistics.zip \
bam.list \
--bed probes.bed
Expand All @@ -218,7 +220,7 @@ compute-copy-number
$ pypgx compute-copy-number -h
usage: pypgx compute-copy-number [-h] [--samples-without-sv TEXT [TEXT ...]]
read-depth control-statistcs copy-number
read-depth control-statistics copy-number
Compute copy number from read depth for target gene.
Expand All @@ -233,7 +235,7 @@ compute-copy-number
Positional arguments:
read-depth Input archive file with the semantic type
CovFrame[ReadDepth].
control-statistcs Input archive file with the semantic type
control-statistics Input archive file with the semantic type
SampleTable[Statistics].
copy-number Output archive file with the semantic type
CovFrame[CopyNumber].
Expand Down Expand Up @@ -703,6 +705,7 @@ prepare-depth-of-coverage
$ pypgx prepare-depth-of-coverage -h
usage: pypgx prepare-depth-of-coverage [-h] [--assembly TEXT] [--bed PATH]
[--genes TEXT [TEXT ...]] [--exclude]
depth-of-coverage bams [bams ...]
Prepare a depth of coverage file for all target genes with SV from BAM files.
Expand All @@ -713,22 +716,26 @@ prepare-depth-of-coverage
have star alleles defined only by SNVs/indels (e.g. CYP3A5).
Positional arguments:
depth-of-coverage Output archive file with the semantic type
CovFrame[DepthOfCoverage].
bams One or more input BAM files. Alternatively, you can
provide a text file (.txt, .tsv, .csv, or .list)
containing one BAM file per line.
depth-of-coverage Output archive file with the semantic type
CovFrame[DepthOfCoverage].
bams One or more input BAM files. Alternatively, you can
provide a text file (.txt, .tsv, .csv, or .list)
containing one BAM file per line.
Optional arguments:
-h, --help Show this help message and exit.
--assembly TEXT Reference genome assembly (default: 'GRCh37')
(choices: 'GRCh37', 'GRCh38').
--bed PATH By default, the input data is assumed to be WGS. If
it's targeted sequencing, you must provide a BED file
to indicate probed regions. Note that the 'chr' prefix
in contig names (e.g. 'chr1' vs. '1') will be
automatically added or removed as necessary to match
the input BAM's contig names.
-h, --help Show this help message and exit.
--assembly TEXT Reference genome assembly (default: 'GRCh37')
(choices: 'GRCh37', 'GRCh38').
--bed PATH By default, the input data is assumed to be WGS. If
it's targeted sequencing, you must provide a BED file
to indicate probed regions. Note that the 'chr' prefix
in contig names (e.g. 'chr1' vs. '1') will be
automatically added or removed as necessary to match
the input BAM's contig names.
--genes TEXT [TEXT ...]
List of genes to include.
--exclude Exclude specified genes. Ignored when --genes is not
used.
[Example] From WGS data:
$ pypgx prepare-depth-of-coverage \
Expand All @@ -741,6 +748,22 @@ prepare-depth-of-coverage
bam.list \
--bed probes.bed
print-data
==========

.. code-block:: text
$ pypgx print-data -h
usage: pypgx print-data [-h] input
Print the main data of specified archive.
Positional arguments:
input Input archive file.
Optional arguments:
-h, --help Show this help message and exit.
print-metadata
==============

Expand Down Expand Up @@ -876,7 +899,7 @@ run-ngs-pipeline
CovFrame[DepthOfCoverage].
--control-statistics PATH
Archive file with the semantic type
SampleTable[Statistcs].
SampleTable[Statistics].
--platform TEXT Genotyping platform (default: 'WGS') (choices: 'WGS',
'Targeted')
--assembly TEXT Reference genome assembly (default: 'GRCh37')
Expand All @@ -897,7 +920,7 @@ run-ngs-pipeline
Do not plot copy number profile.
--do-not-plot-allele-fraction
Do not plot allele fraction profile.
--cnv-caller PATH Archive file with the semantic type Model[CNV]. By
--cnv-caller PATH Archive file with the semantic type Model[CNV]. By
default, a pre-trained CNV caller in the ~/pypgx-bundle
directory will be used.
Expand All @@ -913,17 +936,43 @@ run-ngs-pipeline
CYP2D6-pipeline \
--variants variants.vcf.gz \
--depth-of-coverage depth-of-coverage.tsv \
--control-statistcs control-statistics-VDR.zip
--control-statistics control-statistics-VDR.zip
[Example] To genotype the CYP2D6 gene from targeted sequencing data:
$ pypgx run-ngs-pipeline \
CYP2D6 \
CYP2D6-pipeline \
--variants variants.vcf.gz \
--depth-of-coverage depth-of-coverage.tsv \
--control-statistcs control-statistics-VDR.zip \
--control-statistics control-statistics-VDR.zip \
--platform Targeted
slice-bam
=========

.. code-block:: text
$ pypgx slice-bam -h
usage: pypgx slice-bam [-h] [--assembly TEXT] [--genes TEXT [TEXT ...]]
[--exclude]
input output
Slice BAM file for all genes used by PyPGx.
Positional arguments:
input Input BAM file. It must be already indexed to allow
random access.
output Output BAM file.
Optional arguments:
-h, --help Show this help message and exit.
--assembly TEXT Reference genome assembly (default: 'GRCh37')
(choices: 'GRCh37', 'GRCh38').
--genes TEXT [TEXT ...]
List of genes to include.
--exclude Exclude specified genes. Ignored when --genes is not
used.
test-cnv-caller
===============

Expand Down
23 changes: 19 additions & 4 deletions docs/create.py
Original file line number Diff line number Diff line change
Expand Up @@ -384,7 +384,7 @@
- ``SampleTable[Results]``
* TSV file for storing various results for each sample.
* Requires following metadata: ``Gene``, ``Assembly``, ``SemanticType``.
- ``SampleTable[Statistcs]``
- ``SampleTable[Statistics]``
* TSV file for storing control gene's various statistics on read depth for each sample. Used for converting target gene's read depth to copy number.
* Requires following metadata: ``Control``, ``Assembly``, ``SemanticType``, ``Platform``.
- ``VcfFrame[Consolidated]``
Expand All @@ -397,11 +397,12 @@
* VcfFrame for storing target gene's phased variant data.
* Requires following metadata: ``Platform``, ``Gene``, ``Assembly``, ``SemanticType``, ``Program``.
Wroking with archive files
Working with archive files
--------------------------
To demonstrate how easy it is to work with PyPGx archive files, below we will
show some examples. First, download an archive:
show some examples. First, download an archive to play with, which has
``SampleTable[Results]`` as semantic type:
.. code-block:: text
Expand All @@ -416,6 +417,14 @@
Assembly=GRCh37
SemanticType=SampleTable[Results]
Now print its main data (but display first sample only):
.. code-block:: text
$ pypgx print-data grch37-CYP2D6-results.zip | head -n 2
Genotype Phenotype Haplotype1 Haplotype2 AlternativePhase VariantData CNV
HG00276_PyPGx *4/*5 Poor Metabolizer *4;*10;*74;*2; *10;*74;*2; ; *4:22-42524947-C-T:0.913;*10:22-42526694-G-A,22-42523943-A-G:1.0,1.0;*74:22-42525821-G-T:1.0;*2:default; DeletionHet
We can unzip it to extract files inside (note that ``tmpcty4c_cr`` is the
original folder name):
Expand Down Expand Up @@ -527,7 +536,7 @@
This pipeline supports SV detection based on copy number analysis for genes
that are known to have SV. Therefore, if the target gene is associated with
SV (e.g. CYP2D6) it's strongly recommended to provide a
``CovFrame[DepthOfCoverage]`` file and a ``SampleTable[Statistcs]`` file in
``CovFrame[DepthOfCoverage]`` file and a ``SampleTable[Statistics]`` file in
addtion to a VCF file containing SNVs/indels. If the target gene is not
associated with SV (e.g. CYP3A5) providing a VCF file alone is enough. You can
visit the `Genes <https://pypgx.readthedocs.io/en/latest/genes.html>`__ page
Expand All @@ -542,6 +551,9 @@
io/en/latest/faq.html#variant-caller-choice>`__ section for detailed
discussion on when to use either option.
Check out the `GeT-RM WGS tutorial <https://pypgx.readthedocs.io/en/latest/
tutorials.html#get-rm-wgs-tutorial>`__ to see this pipeline in action.
Chip pipeline
-------------
Expand All @@ -561,6 +573,9 @@
issue if you want to contribute your development skills and/or data for
devising an SV detection algorithm.
Check out the `Coriell Affy tutorial <https://pypgx.readthedocs.io/en/latest/
tutorials.html#coriell-affy-tutorial>`__ to see this pipeline in action.
Long-read pipeline
------------------
Expand Down
Loading

0 comments on commit 0c8c33a

Please sign in to comment.