Merge pull request #53 from PGScatalog/dev

Minor Documentation Edits
PGScatalog · Sep 16, 2022 · 4952d21 · 4952d21
2 parents 8e173f5 + db0cad2
commit 4952d21
Show file tree

Hide file tree

Showing 6 changed files with 29 additions and 20 deletions.
diff --git a/README.md b/README.md
@@ -17,17 +17,20 @@ and/or user-defined PGS/PRS.
 
 ## Pipeline summary
 
-1. Optionally, fetch a scorefile from the PGS Catalog API
-2. Validate and optionally liftover PGS Catalog and/or user-defined scoring file
-   formats
-3. Standardise variant data to a common specification (PLINK2)
-4. Match variants in the scoring file against variants in the genotyping data
-5. Calculate scores for each sample (handling multiple scores in paralell)
-6. Produce a summary report
+1. Downloading scoring files using the PGS Catalog API in a specified genome build (GRCh37 and GRCh38).
+2. Reading custom scoring files (and performing a liftover if genotyping data is in a different build).
+3. Automatically combines and creates scoring files for efficient parallel computation of multiple PGS
+    - Matching variants in the scoring files against variants in the target dataset (in plink bfile/pfile or VCF format)
+4. Calculates PGS for all samples (linear sum of weights and dosages)
+5. Creates a summary report to visualize score distributions and pipeline metadata (variant matching QC)
 
 ### Features in development
 
-1. Ancestry estimation using reference datasets
+- *Genetic Ancestry*: calculate similarity of target samples to populations in a
+  reference dataset (e.g. [1000 Genomes (1000G)](http://www.nature.com/nature/journal/v526/n7571/full/nature15393.html), 
+  [Human Genome Diversity Project (HGDP)](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7115999/)) using principal components analysis (PCA).
+- *PGS Normalization*: Using reference population data and/or PCA projections to report
+  individual-level PGS predictions (e.g. percentiles, z-scores) that account for genetic ancestry.
 
 ## Quick start
 
@@ -48,16 +51,16 @@ and/or user-defined PGS/PRS.
 4. Start running your own analysis!
 
     ```console
-    nextflow run pgscatalog/pgsc_calc -profile <docker/singularity/conda> --input samplesheet.csv --accession PGS001229
+    nextflow run pgscatalog/pgsc_calc -profile <docker/singularity/conda> --input samplesheet.csv --pgs_id PGS001229
     ```
 
 See [getting
-started](https://pgscatalog.github.io/pgsc_calc/getting-started.html) for more
+started](https://pgsc-calc.readthedocs.io/en/latest/getting-started.html) for more
 details.
 
 ## Documentation
 
-[Full documentation is available on Read the Docs](https://pgsc-calc.readthedocs.io/en/)
+[Full documentation is available on Read the Docs](https://pgsc-calc.readthedocs.io/)
 
 ## Credits
 

diff --git a/docs/_templates/globaltoc.html b/docs/_templates/globaltoc.html
@@ -2,12 +2,12 @@ <h3>Contents</h3>
 
 <ul>
   <li><a href="{{ pathto('index') }}">Home</a></li>
-  <li><a href="{{ pathto('getting-started') }}">Get started</a></li>
+  <li><a href="{{ pathto('getting-started') }}">Getting started</a></li>
   <li><a href="{{ pathto('how-to/index') }}">How-to guides</a></li>
   <li><a href="{{ pathto('reference/index') }}">Reference guide</a></li>
   <ul>
-    <li><a href="{{ pathto('reference/input') }}">Input Parameters/Flags</a></li>
-    <li><a href="{{ pathto('reference/params') }}">Samplesheet schema</a></li>
+    <li><a href="{{ pathto('reference/params') }}">Input Parameters/Flags</a></li>
+    <li><a href="{{ pathto('reference/input') }}">Samplesheet schema</a></li>
   </ul>
   <li><a href="{{ pathto('output') }}">Outputs & results</a></li>
   <li><a href="{{ pathto('troubleshooting') }}">Troubleshooting</a></li>

diff --git a/docs/getting-started.rst b/docs/getting-started.rst
@@ -2,8 +2,8 @@
 
 .. _get started:
 
-Get started
-===========
+Getting started
+===============
 
 ``pgsc_calc`` requires Nextflow and one of Docker, Singularity, or
 Anaconda. You will need a POSIX compatible system, like Linux or macOS, to run ``pgsc_calc``.
@@ -169,13 +169,18 @@ There are five mandatory columns. Columns that specify genomic data paths
 Save this spreadsheet in :term:`CSV` format (e.g., ``samplesheet.csv``). An
 example template is `available here`_.
 
+.. note::
+    All samplesets have to be in the same genome build (either GRCh37 or GRCh38) which is specified
+    using the ``--target_build [GRCh3#]`` command. All scoring files are downloaded or mapped to match the specified
+    genome build, no liftover/re-mapping of the genotyping data is performed within the pipeline.
+
 .. _`available here`: https://github.com/PGScatalog/pgsc_calc/blob/master/assets/examples/example_data/bfile_samplesheet.csv
 
 2. Select scoring files
 -----------------------
 
 pgsc_calc makes it simple to work with polygenic scores that have been published
-in the PGS Catalog. You can specify one or more scores using the ``--accession``
+in the PGS Catalog. You can specify one or more scores using the ``--pgs_id``
 parameter:
 
 .. code-block:: console
@@ -197,7 +202,7 @@ to using the ``--target_build`` parameter. The ``--target_build`` parameter only
 In the case of the example above, both ``PGS001229`` and ``PGS001405`` are reported in genome build GRCh37.
 In cases where the build of your genomic data are different from the original build of the PGS Catalog score
 then the pipeline will download a `harmonized (remapped rsIDs and/or lifted positions)`_  versions of the
-scoring file(s) in the user-specified build.
+scoring file(s) in the user-specified build of the genotyping datasets.
 
 Custom scoring files can be lifted between genome builds using the ``--liftover`` flag, (see :ref:`liftover`
 for more information). An example would look like:

diff --git a/docs/index.rst b/docs/index.rst
@@ -22,7 +22,7 @@ Currently the pipeline (implemented in `nextflow`_) works by:
 - Reading custom scoring files (and performing a liftover if genotyping data is in a different build).
 - Matching variants in the scoring files against variants in the target dataset (in plink bfile/pfile or VCF format)
 - Automatically combines and creates scoring files for efficient parallel computation of multiple PGS
-- Calculates PGS SUMS for all samples
+- Calculates PGS for all samples (linear sum of weights and dosages)
 - Creates a summary report to visualize score distributions and pipeline metadata (variant matching QC)
 
 The pipeline is build on top of `PLINK 2`_ and the `PGS Catalog Utilities`_ python package (for interacting

diff --git a/environments/pgscatalog_utils/environment.yml b/environments/pgscatalog_utils/environment.yml
@@ -1,5 +1,6 @@
 name: pgscatalog_utils
 dependencies:
 - python=3.10
+- pip
 - pip:
     - pgscatalog_utils==0.1.2
diff --git a/tests/config/nextflow.config b/tests/config/nextflow.config
@@ -9,7 +9,7 @@ params {
 process {
     cpus = 2
     memory = 4.GB
-    time = 2.h
+    time = 1.h
     publishDir = { "${params.outdir}/${task.process.tokenize(':')[-1].tokenize('_')[0].toLowerCase()}" }
 }