Skip to content

Commit

Permalink
Merge pull request #53 from PGScatalog/dev
Browse files Browse the repository at this point in the history
Minor Documentation Edits
  • Loading branch information
nebfield authored Sep 16, 2022
2 parents 8e173f5 + db0cad2 commit 4952d21
Show file tree
Hide file tree
Showing 6 changed files with 29 additions and 20 deletions.
25 changes: 14 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,17 +17,20 @@ and/or user-defined PGS/PRS.

## Pipeline summary

1. Optionally, fetch a scorefile from the PGS Catalog API
2. Validate and optionally liftover PGS Catalog and/or user-defined scoring file
formats
3. Standardise variant data to a common specification (PLINK2)
4. Match variants in the scoring file against variants in the genotyping data
5. Calculate scores for each sample (handling multiple scores in paralell)
6. Produce a summary report
1. Downloading scoring files using the PGS Catalog API in a specified genome build (GRCh37 and GRCh38).
2. Reading custom scoring files (and performing a liftover if genotyping data is in a different build).
3. Automatically combines and creates scoring files for efficient parallel computation of multiple PGS
- Matching variants in the scoring files against variants in the target dataset (in plink bfile/pfile or VCF format)
4. Calculates PGS for all samples (linear sum of weights and dosages)
5. Creates a summary report to visualize score distributions and pipeline metadata (variant matching QC)

### Features in development

1. Ancestry estimation using reference datasets
- *Genetic Ancestry*: calculate similarity of target samples to populations in a
reference dataset (e.g. [1000 Genomes (1000G)](http://www.nature.com/nature/journal/v526/n7571/full/nature15393.html),
[Human Genome Diversity Project (HGDP)](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7115999/)) using principal components analysis (PCA).
- *PGS Normalization*: Using reference population data and/or PCA projections to report
individual-level PGS predictions (e.g. percentiles, z-scores) that account for genetic ancestry.

## Quick start

Expand All @@ -48,16 +51,16 @@ and/or user-defined PGS/PRS.
4. Start running your own analysis!

```console
nextflow run pgscatalog/pgsc_calc -profile <docker/singularity/conda> --input samplesheet.csv --accession PGS001229
nextflow run pgscatalog/pgsc_calc -profile <docker/singularity/conda> --input samplesheet.csv --pgs_id PGS001229
```

See [getting
started](https://pgscatalog.github.io/pgsc_calc/getting-started.html) for more
started](https://pgsc-calc.readthedocs.io/en/latest/getting-started.html) for more
details.

## Documentation

[Full documentation is available on Read the Docs](https://pgsc-calc.readthedocs.io/en/)
[Full documentation is available on Read the Docs](https://pgsc-calc.readthedocs.io/)

## Credits

Expand Down
6 changes: 3 additions & 3 deletions docs/_templates/globaltoc.html
Original file line number Diff line number Diff line change
Expand Up @@ -2,12 +2,12 @@ <h3>Contents</h3>

<ul>
<li><a href="{{ pathto('index') }}">Home</a></li>
<li><a href="{{ pathto('getting-started') }}">Get started</a></li>
<li><a href="{{ pathto('getting-started') }}">Getting started</a></li>
<li><a href="{{ pathto('how-to/index') }}">How-to guides</a></li>
<li><a href="{{ pathto('reference/index') }}">Reference guide</a></li>
<ul>
<li><a href="{{ pathto('reference/input') }}">Input Parameters/Flags</a></li>
<li><a href="{{ pathto('reference/params') }}">Samplesheet schema</a></li>
<li><a href="{{ pathto('reference/params') }}">Input Parameters/Flags</a></li>
<li><a href="{{ pathto('reference/input') }}">Samplesheet schema</a></li>
</ul>
<li><a href="{{ pathto('output') }}">Outputs & results</a></li>
<li><a href="{{ pathto('troubleshooting') }}">Troubleshooting</a></li>
Expand Down
13 changes: 9 additions & 4 deletions docs/getting-started.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,8 @@

.. _get started:

Get started
===========
Getting started
===============

``pgsc_calc`` requires Nextflow and one of Docker, Singularity, or
Anaconda. You will need a POSIX compatible system, like Linux or macOS, to run ``pgsc_calc``.
Expand Down Expand Up @@ -169,13 +169,18 @@ There are five mandatory columns. Columns that specify genomic data paths
Save this spreadsheet in :term:`CSV` format (e.g., ``samplesheet.csv``). An
example template is `available here`_.

.. note::
All samplesets have to be in the same genome build (either GRCh37 or GRCh38) which is specified
using the ``--target_build [GRCh3#]`` command. All scoring files are downloaded or mapped to match the specified
genome build, no liftover/re-mapping of the genotyping data is performed within the pipeline.

.. _`available here`: https://github.com/PGScatalog/pgsc_calc/blob/master/assets/examples/example_data/bfile_samplesheet.csv

2. Select scoring files
-----------------------

pgsc_calc makes it simple to work with polygenic scores that have been published
in the PGS Catalog. You can specify one or more scores using the ``--accession``
in the PGS Catalog. You can specify one or more scores using the ``--pgs_id``
parameter:

.. code-block:: console
Expand All @@ -197,7 +202,7 @@ to using the ``--target_build`` parameter. The ``--target_build`` parameter only
In the case of the example above, both ``PGS001229`` and ``PGS001405`` are reported in genome build GRCh37.
In cases where the build of your genomic data are different from the original build of the PGS Catalog score
then the pipeline will download a `harmonized (remapped rsIDs and/or lifted positions)`_ versions of the
scoring file(s) in the user-specified build.
scoring file(s) in the user-specified build of the genotyping datasets.

Custom scoring files can be lifted between genome builds using the ``--liftover`` flag, (see :ref:`liftover`
for more information). An example would look like:
Expand Down
2 changes: 1 addition & 1 deletion docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ Currently the pipeline (implemented in `nextflow`_) works by:
- Reading custom scoring files (and performing a liftover if genotyping data is in a different build).
- Matching variants in the scoring files against variants in the target dataset (in plink bfile/pfile or VCF format)
- Automatically combines and creates scoring files for efficient parallel computation of multiple PGS
- Calculates PGS SUMS for all samples
- Calculates PGS for all samples (linear sum of weights and dosages)
- Creates a summary report to visualize score distributions and pipeline metadata (variant matching QC)

The pipeline is build on top of `PLINK 2`_ and the `PGS Catalog Utilities`_ python package (for interacting
Expand Down
1 change: 1 addition & 0 deletions environments/pgscatalog_utils/environment.yml
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
name: pgscatalog_utils
dependencies:
- python=3.10
- pip
- pip:
- pgscatalog_utils==0.1.2
2 changes: 1 addition & 1 deletion tests/config/nextflow.config
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ params {
process {
cpus = 2
memory = 4.GB
time = 2.h
time = 1.h
publishDir = { "${params.outdir}/${task.process.tokenize(':')[-1].tokenize('_')[0].toLowerCase()}" }
}

Expand Down

0 comments on commit 4952d21

Please sign in to comment.