Merge pull request #40 from sbslee/0.10.0-dev

0.10.0 dev
sbslee · Dec 19, 2021 · 855ec33 · 855ec33
2 parents ad4dda0 + ab88275
commit 855ec33
Show file tree

Hide file tree

Showing 41 changed files with 3,494 additions and 430 deletions.
diff --git a/CHANGELOG.rst b/CHANGELOG.rst
@@ -1,6 +1,26 @@
 Changelog
 *********
 
+0.10.0 (2021-12-19)
+-------------------
+
+* :issue:`32`: Update :command:`import-variants` command to accept phased VCF as input. It will output VcfFrame[Consolidated] if the input VCF is fully phased or otherwise VcfFrame[Imported] as usual.
+* Add new property ``sdk.utils.Archive.type`` to quickly access the archive's semantic type.
+* Update :meth:`sdk.utils.Archive.check_type` method to be able to test more than one semantic type at once.
+* Update :meth:`api.plot.plot_vcf_allele_fraction` method to accept both VcfFrame[Imported] and VcfFrame[Consolidated].
+* :issue:`32`: Update :command:`run-ngs-pipeline` command to accept phased VCF as input. In this case, the command will skip statistical haplotype phasing.
+* :issue:`34`: Update commands :command:`run-ngs-pipeline` and :command:`run-chip-pipeline` to load large VCF files significantly faster by allowing random access. This also means, from now on, input VCF files must be BGZF compressed (.gz) and indexed (.tbi).
+* :issue:`36`: Update phenotype data for CACNA1S, CFTR, IFNL3, RYR1 (thanks `@NTNguyen13 <https://github.com/NTNguyen13>`__).
+* :pr:`39`: Add new gene F5 (thanks `@NTNguyen13 <https://github.com/NTNguyen13>`__).
+* Update :command:`import-variants` command to be able to subset/exclude specified samples.
+* Update :command:`import-read-depth` command to be able to subset/exclude specified samples.
+* Rename ``--samples`` argument from :command:`compute-copy-number` command to ``--samples-without-sv``.
+* Rename ``--samples`` argument from :command:`run-ngs-pipeline` command to ``--samples-without-sv``.
+* Update :command:`run-ngs-pipeline` and :command:`run-chip-pipeline` commands to be able to subset/exclude specified samples.
+* Remove ``--fn`` argument from :command:`filter-samples` command.
+* Update CNV caller for CYP2D6, GSTM1, and UGT1A4.
+* Update :meth:`api.plot.plot_cn_af` method to accept both VcfFrame[Imported] and VcfFrame[Consolidated].
+
 0.9.0 (2021-12-05)
 ------------------
 
@@ -19,7 +39,7 @@ Changelog
 * Add new method :meth:`api.core.get_strand`.
 * Add new method :meth:`api.core.get_exon_starts`.
 * Add new method :meth:`api.core.get_exon_ends`.
-* :pr:`31`: Fix minor bug in commands :command:`run-ngs-pipeline` and :command:`import-read-depth` (thanks to `@NTNguyen13 <https://github.com/NTNguyen13>`__).
+* :pr:`31`: Fix minor bug in commands :command:`run-ngs-pipeline` and :command:`import-read-depth` (thanks `@NTNguyen13 <https://github.com/NTNguyen13>`__).
 * Fix minor bug in :meth:`api.core.predict_score` method.
 * Update variant information for following alleles: CYP2D6\*27, CYP2D6\*32, CYP2D6\*131, CYP2D6\*141.
 

diff --git a/README.rst b/README.rst
@@ -36,7 +36,7 @@ available at the `Read the Docs <https://pypgx.readthedocs.io/en/latest/>`_.
 PyPGx is compatible with both of the Genome Reference Consortium Human (GRCh)
 builds, GRCh37 (hg19) and GRCh38 (hg38).
 
-There are currently 57 pharmacogenes in PyPGx:
+There are currently 58 pharmacogenes in PyPGx:
 
 .. list-table::
 
@@ -71,35 +71,35 @@ There are currently 57 pharmacogenes in PyPGx:
      - CYP19A1
      - CYP26A1
    * - DPYD
+     - F5
      - G6PD
      - GSTM1
      - GSTP1
-     - GSTT1
-   * - IFNL3
+   * - GSTT1
+     - IFNL3
      - NAT1
      - NAT2
      - NUDT15
-     - POR
-   * - PTGIS
+   * - POR
+     - PTGIS
      - RYR1
      - SLC15A2
      - SLC22A2
-     - SLCO1B1
-   * - SLCO1B3
+   * - SLCO1B1
+     - SLCO1B3
      - SLCO2B1
      - SULT1A1
      - TBXAS1
-     - TPMT
-   * - UGT1A1
+   * - TPMT
+     - UGT1A1
      - UGT1A4
      - UGT2B7
      - UGT2B15
-     - UGT2B17
-   * - VKORC1
+   * - UGT2B17
+     - VKORC1
      - XPC
      -
      -
-     -
 
 Your contributions (e.g. feature ideas, pull requests) are most welcome.
 
@@ -179,7 +179,7 @@ the presence of ALT contigs reduces the sensitivity of variant calling
 and many other analyses including SV detection. Therefore, if you have
 sequencing data in GRCh38, make sure it's aligned to the main contigs only.
 
-The only exception to above rule is the *GSTT1* gene, which is located on
+The only exception to above rule is the GSTT1 gene, which is located on
 ``chr22`` for GRCh37 but on ``chr22_KI270879v1_alt`` for GRCh38. This gene is
 known to have an extremely high rate of gene deletion polymorphism in the
 population and thus requires SV analysis. Therefore, if you are interested in
@@ -288,28 +288,30 @@ currently defined semantic types:
 Phenotype prediction
 ====================
 
-Many of the genes in PyPGx have a diplotype-phenotype table available from
-the Clinical Pharmacogenetics Implementation Consortium (CPIC). PyPGx will
-use this information to perform phenotype prediction. Note that there two
-types of phenotype prediction:
-
-- Method 1. Diplotype-phenotype mapping: This method directly uses the
-  diplotype-phenotype mapping as defined by CPIC. Using the CYP2B6 gene as an
-  example, the diplotypes \*6/\*6, \*1/\*29, \*1/\*2, \*1/\*4, and \*4/\*4
-  correspond to Poor Metabolizer, Intermediate Metabolizer, Normal
-  Metabolizer, Rapid Metabolizer, and Ultrarapid Metabolizer.
-- Method 2. Activity score: This method uses a standard unit of enzyme
-  activity known as an activity score. Using the CYP2D6 gene as an example,
-  the fully functional reference \*1 allele is assigned a value of 1,
-  decreased-function alleles such as \*9 and \*17 receive a value of
-  0.5, and nonfunctional alleles including \*4 and \*5 have a value of
-  0. The sum of values assigned to both alleles constitutes the activity
-  score of a diplotype. Consequently, subjects with \*1/\*1, \*1/\*4,
-  and \*4/\*5 diplotypes have an activity score of 2 (Normal Metabolizer),
-  1 (Intermediate Metabolizer), and 0 (Poor Metabolizer), respectively.
+Many genes in PyPGx have a genotype-phenotype table available from the
+Clinical Pharmacogenetics Implementation Consortium (CPIC) or
+the Pharmacogenomics Knowledge Base (PharmGKB). PyPGx uses these tables to
+perform phenotype prediction with one of the two methods:
+
+- Method 1. Simple diplotype-phenotype mapping: This method directly uses the
+  diplotype-phenotype mapping as defined by CPIC or PharmGKB. Using the
+  CYP2B6 gene as an example, the diplotypes \*6/\*6, \*1/\*29, \*1/\*2,
+  \*1/\*4, and \*4/\*4 correspond to Poor Metabolizer, Intermediate
+  Metabolizer, Normal Metabolizer, Rapid Metabolizer, and Ultrarapid
+  Metabolizer.
+- Method 2. Summation of haplotype activity scores: This method uses a
+  standard unit of enzyme activity known as an activity score. Using the
+  CYP2D6 gene as an example, the fully functional reference \*1 allele is
+  assigned a value of 1, decreased-function alleles such as \*9 and \*17
+  receive a value of 0.5, and nonfunctional alleles including \*4 and \*5
+  have a value of 0. The sum of values assigned to both alleles constitutes
+  the activity score of a diplotype. Consequently, subjects with \*1/\*1,
+  \*1/\*4, and \*4/\*5 diplotypes have an activity score of 2 (Normal
+  Metabolizer), 1 (Intermediate Metabolizer), and 0 (Poor Metabolizer),
+  respectively.
 
 Please visit the `Genes <https://pypgx.readthedocs.io/en/latest/
-genes.html>`__ page to see the list of genes with a CPIC diplotype-phenotype
+genes.html>`__ page to see the list of genes with a genotype-phenotype
 table and each of their prediction method.
 
 Getting help
@@ -345,7 +347,7 @@ For getting help on the CLI:
                            Estimate haplotype phase of observed variants with the Beagle program.
        filter-samples      Filter Archive file for specified samples.
        import-read-depth   Import read depth data for the target gene.
-       import-variants     Import variant data for the target gene.
+       import-variants     Import variant (SNV/indel) data for the target gene
        plot-bam-copy-number
                            Plot copy number profile from CovFrame[CopyNumber].
        plot-bam-read-depth
@@ -383,7 +385,6 @@ Below is the list of submodules available in the API:
 - **plot** : The plot submodule is used to plot various kinds of profiles such as read depth, copy number, and allele fraction.
 - **utils** : The utils submodule contains main actions of PyPGx.
 
-
 For getting help on a specific submodule (e.g. utils):
 
 .. code:: python3