Skip to content

Commit

Permalink
Merge pull request #194 from cidgoh/development
Browse files Browse the repository at this point in the history
Merge dev to master
  • Loading branch information
anwarMZ authored Dec 16, 2024
2 parents bdf3434 + f6dcdf8 commit d147c94
Show file tree
Hide file tree
Showing 112 changed files with 19,807 additions and 5,277 deletions.
49 changes: 49 additions & 0 deletions .github/workflows/nextflow_CI.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
name: Nextflow CI

on:
push:
branches:
- development
pull_request:
branches:
- master

jobs:
test_sarscov2_user:
name: Run pipeline test (user)
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3

- name: Install Nextflow
run: |
wget -qO- get.nextflow.io | bash
sudo mv nextflow /usr/local/bin/
- name: Run pipeline test (user)
run: |
nextflow run main.nf -profile docker --prefix "covidmvp-user-$(date +%Y-%m-%d)" -params-file covidmvp_user.yaml
test_sarscov2_reference:
name: Run pipeline test (reference)
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3

- name: Install Nextflow
run: |
wget -qO- get.nextflow.io | bash
sudo mv nextflow /usr/local/bin/
- name: Run pipeline test (reference)
run: |
nextflow run main.nf -profile docker --prefix "covidmvp-$(date +%Y-%m-%d)" --end_date $(date +%Y-%m-%d) -params-file covidmvp_clinical_params.yaml
check_success:
name: Check if all tests passed
needs: [test_sarscov2_user, test_sarscov2_reference]
runs-on: ubuntu-latest
steps:
- name: Check job status
if: ${{ failure() }}
run: exit 1
9 changes: 2 additions & 7 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,23 +1,18 @@
.nextflow*
work/*
results/*
.idea/*
.github/data/metadata
.github/data/sequence
.DS_store
testing/*
*.ipynb
weekly*
*.error
*.out
bin/__*
*.sh
latest*
data/*
*.fa.xz
*.gz
hMPXV_*
bin/web.log
assets/config.ini
*-params.yaml
*.xz
*.xz
*.log
3 changes: 3 additions & 0 deletions .vscode/extensions.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
{
"recommendations": ["nextflow.nextflow"]
}
6 changes: 6 additions & 0 deletions .vscode/settings.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
{
"cSpell.words": ["bioinformatics", "cpus"],
"nextflow.debug": true,
"nextflow.formatting.harshilAlignment": true,
"nextflow.java.home": "/cvmfs/soft.computecanada.ca/easybuild/software/2020/Core/java/17.0.2/"
}
55 changes: 31 additions & 24 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,19 +7,37 @@

## Introduction

**nf-ncov-voc** is a bioinformatics analysis workflow used for
performing variant calling on SARS-CoV-2 genomes to identify and
profile mutations in Variants of Concern (VOCs), Variants of
Interest (VOIs) and Variants under Monitoring (VUMs). This workflow has
four main stages - **Preprocessing**, **Genomic Analysis (Variant
Calling)** , **Functional Annotation** and **Surveillance**.
**nf-ncov-voc** is a bioinformatics workflow developed to process viral genomes and integrate the contextual data.
The workflow was intially designed for processing SARS-CoV-2 for COVID-19 pandemic response and has been later adapted for more priority viruses e.g., Mpox and Influenza. The workflow is developed in a modular structure with several modules and sub-workflows leveraged from [nf-core](https://nf-co.re). These modules and sub-workflows are assembled in a plug-n-play manner based on the data and viral charatceristics. Each virus supported by the workflow has its own workflow file that directs the assembly of sub-workflows and modules.

The workflow is built using [Nextflow](https://www.nextflow.io)- [DSL2](https://www.nextflow.io/docs/latest/dsl2.html), a workflow tool to run tasks across multiple compute infrastructures in a very portable manner. It can use `conda`/`Docker`/`Singularity` containers making installation trivial and results highly reproducible.

**nf-ncov-voc** workflow can be used in combination with an interactive
visualization tool [COVID-MVP](https://github.com/cidgoh/COVID-MVP)
or as a stand-alone high-throughput analysis tool to produce
mutation profiles and surveillance reports.
visualization tool [VIRUS-MVP](https://github.com/cidgoh/VIRUS-MVP) or as a stand-alone high-throughput analysis tool to produce mutation profiles and surveillance reports.

A detailed structure and each module of the workflow is presented
below in the dataflow diagram

### nf-ncov-voc Dataflow

![DataFlow](figs/COVIDMVP.drawio.png)

### Functional Annotation

**nf-ncov-voc** offers a unique opportunity to integrate the contextual data with genomics data. Variant Called File (VCF) generated for each group or sample is then converted to a Genome Variant File (GVF) to integrate the functions associated to different mutations. For more information of how the functions are curated and structured see the dedicated repository [_Pokay_](https://github.com/nodrogluap/pokay)

**nf-ncov-voc** with the help of functional data in _Pokay_, produces surveillance reports that are developed in collaboration with the Public Health partners and offers a high-level yet comprehensive report on each mutation its associated functions in literature.

### Input data

As an input, **nf-ncov-voc** can accept different formats, Whole Genome Sequences (WGS) in `FASTA` format with a Metadata file in `TSV` format; paired-end short read sequences in `FASTQ` format with a Metadata file in `TSV` format. Additionally, the input can also be `VCF` file that contains variants called.

### Grouping data

### Quality control

### Variant Calling

As an input, **nf-ncov-voc** workflow requires SARS-CoV-2 consensus
sequences in `FASTA` format and Metadata file in `TSV` format.
Sequences in pre-processing stage are filtered using Metadata
variables, quality filtered and assigned lineages. Sequences
assigned as VOCs, VOIs and VUMs are then mapped to SARS-CoV-2 genome,
Expand All @@ -35,19 +53,6 @@ summarized using functional indicators to highlight key functions
and mutations responsible for them for e.g. **P618H** role in
_convalescent plasma escape_.

The workflow is built using [Nextflow](https://www.nextflow.io)-
[DSL2](https://www.nextflow.io/docs/latest/dsl2.html), a workflow
tool to run tasks across multiple compute infrastructures in a very
portable manner. It can use `conda`/`Docker`/`Singularity`
containers making installation trivial and results highly reproducible.

A detailed structure and each module of the workflow is presented
below in the dataflow diagram

### nf-ncov-voc Dataflow

![DataFlow](figs/COVIDMVP.drawio.png)

### Pre-Processing

This module offers two ways to get lineage information for each
Expand Down Expand Up @@ -121,6 +126,8 @@ See the
[parameters](https://github.com/cidgoh/nf-ncov-voc/blob/master/docs/PARAMETERS.md)
docs for all available options when running the workflow.

** Further developments will continue to adapt nf-ncov-voc to other viruses in near furture. **

## Usage

1. Install [`Nextflow`](https://www.nextflow.io/docs/latest/getstarted.html#installation) (`>=21.04.0`)
Expand Down
Loading

0 comments on commit d147c94

Please sign in to comment.