Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SARS-CoV-2 verison of RAMPART #1

Open
wants to merge 25 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 20 commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
45640ed
created python scripts and snakemake files for detection of SARS-COV-…
janka000 May 2, 2021
68145fa
adding strand matching pipeline to server scripts
janka000 May 2, 2021
74c38ff
showing variant data on front-end
janka000 May 2, 2021
169e17a
moving old docs and examples to /old folder
janka000 May 2, 2021
6d2c7b9
added new docs and SARS-CoV-2 protocols
janka000 May 2, 2021
6783cf4
added new environment file
janka000 May 2, 2021
3867c67
removing comments and fixing code style&formatting
janka000 May 2, 2021
f70b095
I figured out that we need the if statement we removed
janka000 May 3, 2021
4465258
added new screenshots
janka000 May 3, 2021
acdab22
removing custom configuration
janka000 May 3, 2021
1bfdd05
removing __pycache__
janka000 May 3, 2021
2b37599
moving pipleines to separate folder and restoring EBOLA examples
janka000 May 4, 2021
681249e
removing default_protocol from /old folder
janka000 May 4, 2021
8e4b169
removing forgotten changes
janka000 May 4, 2021
ba32447
tidying up
janka000 May 5, 2021
412c26b
tidying up python scripts + requested changes
janka000 May 5, 2021
199ca87
new mutations file + some additional info in json
janka000 May 6, 2021
bba5d22
another requested changes
janka000 May 7, 2021
e955c8a
small fix
janka000 May 10, 2021
119a3ce
variantsTree maybe fixed
janka000 May 14, 2021
981fd70
fixing whitespaces and mutationsTree
janka000 May 14, 2021
1d03b3b
annotatedPath fix
janka000 May 16, 2021
cebd4ea
added log files for variant calling pipeline
janka000 Jul 27, 2021
4cba686
added support for structure with csv in subfolders to strand_matching…
janka000 Jul 27, 2021
9ad8c7e
added support for gziped fastq files in variant calling pipeline
janka000 Jul 27, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -42,3 +42,6 @@ binned*

# snakemake stuff
.snakemake/

# python stuff
__pycache__/
25 changes: 4 additions & 21 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ Read Assignment, Mapping, and Phylogenetic Analysis in Real Time.

RAMPART runs concurrently with MinKNOW and shows you demuxing / mapping results in real time.

![](docs/images/main.png)
![](docs/img/main.png)


## Motivation
Expand All @@ -13,29 +13,12 @@ Furthermore, the small size of many pathogens mean that insightful sequence data
RAMPART run concurrently with MinION sequencing of such pathogens.
It provides a real-time overview of genome coverage and reference matching for each barcode.

RAMPART was originally designed to work with amplicon-based primer schemes (e.g. for [ebola](https://github.com/artic-network/primer-schemes)), but this isn't a requirement.


This version of RAMPART is designed for ... <!-- #todo -->

## Documentation

* [Installation](docs/installation.md)
* [Running an example dataset & understanding the visualisations](docs/examples.md)
* [Installation](docs/installation.md) <!-- * [Running an example dataset & understanding the visualisations](docs/examples.md) -->
* [Setting up for your own run](docs/setting-up.md)
* [Configuring RAMPART using protocols](docs/protocols.md)
* [Debugging when things don't work](docs/debugging.md)
* [Notes relating to RAMPART development](docs/developing.md)




## Status

RAMPART is in development with a publication forthcoming.
Please [get in contact](https://twitter.com/hamesjadfield) if you have any issues, questions or comments.


## RAMPART has been deployed to sequence:
* [Covid strand matching pipeline](docs/barcode_strand_match.md)

* [Yellow Fever Virus in Brazil](https://twitter.com/Hill_SarahC/status/1149372404260593664)
* [ARTIC workshop in Accra, Ghana](https://twitter.com/george_l/status/1073245364197711874)
4 changes: 2 additions & 2 deletions default_protocol/pipelines.json
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
{
"annotation": {
"name": "Annotate reads",
"path": "pipelines/demux_map",
"path": "../pipelines/default_pipeline/demux_map",
"config_file": "config.yaml",
"requires": [
{
Expand All @@ -12,7 +12,7 @@
},
"export_reads": {
"name": "Export reads",
"path": "pipelines/bin_to_fastq",
"path": "../pipelines/default_pipeline/bin_to_fastq",
"config_file": "config.yaml",
"run_per_sample": true
}
Expand Down
92 changes: 92 additions & 0 deletions docs/barcode_strand_match.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
# Covid strand matching pipeline

This pipeline looks for new `.csv` files that RAMPART has created in annotations pipeline (located in `/annotations` folder) since last time when the pipeline was triggered.

Then it looks for matching `.fastq` files and creates `.bam` files from them one by one, using `minimap2`.

The `.bam` files are no longer needed after we process them in our python script, so that they are deleted afterwards.

The first output file of this pipeline is located in `/annotations/base_count/count.csv` and contains counts of each base (A, C, G or T) for each position in reference for each barcode. In each line of the file there is

* position in the reference genome
* barcode name
* count of A's mapped to this position in reference genome
* count of C's
* count of G's
* count of T's

When you trigger the pipeline next time, the pipeline will use this output file as one of its inputs, new counts will be added to those from the existing `counts.csv` file and a new file will be created as output.

The next step of our pipeline is determining the variants based on the provided `.txt` file of a specific format (see mutations file section below) which contains the changes in reference genome that are specific for some known variants of sars-cov-2.

By default our pipeline uses one of the `.txt` files we have created. All the `.txt` files are located in /covid_protocol/pipelines/run_python_scripts/rules/mut_files/
You may also want to provide your own. To do this, you have to create a `.txt` file in the directory mentioned above, and then in `covid_protocol/pipelines/run_python_scripts/config.yaml` replace the name of the file to be used with your own.

You can also set your own threshold value, which determines minimal number of reads mapped to the position in the reference genome, so that our python script will clasify a mutation as significat enough to support that the barcode sample corresponds to a variant.

These are the default settings:

```
###mutations###
coverage_threshold: 10
mutations_file: mutbb.txt
```


At the end, the `annotations/results` folder should contain a `mutations.json` file containing the mutations we matched to the barcodes.

Once the json file is available, it will be loaded to RAMPART and the results will be shown.

## The mutations file
This file specifies the variants of sars-cov-2 to look for and mutations that are specific for a variant.
Each line starts with a label of a variant at the beginning,
followed by exactly one space and then a number of mutations that we want to match so that we can say that a barcode corresponds to this variant.
Then there are mutations that are typical for a variant separated by spaces.
Lines starting with `#` are comments and are ignored when parsing the file.

### Example
```
UK 5 C3267T C5388A ... G28280C A28281T T28282A
```
in our default file you can see this line,

starting with "UK", which is our label for this variant.

the label is followed by a number, 5, which says that "if at least 5 of the following mutations are present int he sample, classify the sample as this variant"

the number is followed by mutations (for example C is changed to T at the position 3267 mapped to reference genome) that are typical for this variant, separated by spaces.

### Tree-like structure
You can also provide another variants in tree-like structure, using syntax `start_sub` and `end_sub` in separate lines:
```
#UK variant
UK 5 C3267T ... T28282A

#more specific variants for UK
start_sub
UK-subvariant_1 1 A17615G

#subvariants for UK-subvariant_1
start_sub
.
.
UK-subvariant_1-Poland 4 C5301T C7420T C9693T G23811T C25350T C28677T G29348T
UK-subvariant_1-Gambia 3 T6916C T18083C G22132A C23929T
end_sub

end_sub

#CZ variant
CZ 3 G12988T G15598A G18028T T24910C T26972C
```

This means that we will look for UK variant, and if we will find at least 5 mutations from the list provided in the UK line, we will also continue searching for other more specific variants.

For example if UK variant is matched, we will check whether there is also a mutation in position A17615G,

if it is, then we will chcek if there are some of the mutations specified in its subsection - UK-subvariant_1-Poland or UK-subvariant_1-Gambia.

We will stop searching at the point when there are no more subsections specified or when less than the required count of mutations vere found for a sample.

We will look for CZ variant too. This one has no subvariants specified in this example file, so no further search would be made.

Binary file added docs/img/main.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/img/main_2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/img/old/main.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/img/old/main_2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/img/old/main_3.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/img/s1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/img/s2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/img/s3.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/img/s4.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/img/s5.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
73 changes: 8 additions & 65 deletions docs/installation.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,84 +4,25 @@
These instructions assume that you have installed [MinKNOW](https://community.nanoporetech.com/downloads) and are able to run it.


## Install from conda

We also assume that you are using conda -- See [instructions here](https://conda.io/projects/conda/en/latest/user-guide/install/index.html) to install conda on your machine.

### Step 1: Create a new conda environment or install nodejs into your current conda environment

Create a new conda environment and activate it via:

```bash
conda create -n artic-rampart -y nodejs=12 # any version >10 should be fine
conda activate artic-rampart
```

Or install NodeJS into your currently activated environment via:

```bash
conda install -y nodejs=12 # any version >10 should be fine
```

### Step 2: Install RAMPART

```bash
conda install -y artic-network::rampart=1.1.0
```

### Step 3: Install dependencies

Note that you may already have some or all of these in your environment, in which case they can be skipped.
Additionally, some are only needed for certain analyses and can also be skipped as desired.

> If you are installing RAMPART into the [artic-ncov2019](https://github.com/artic-network/artic-ncov2019) conda environment, you will already have all of these dependencies.


Python, biopython, snakemake and minimap2 are required

```bash
conda install -y "python>=3.6"
conda install -y anaconda::biopython
conda install -y -c conda-forge -c bioconda "snakemake<5.11" # snakemake 5.11 will not work currently
conda install -y bioconda::minimap2=2.17
```

If you are using guppy to demux samples, you don't need Porechop,
however if you require RAMPART to perform demuxing then you must install the ARTIC fork of Porechop:

```bash
python -m pip install git+https://github.com/artic-network/[email protected]
```

If you wish to use the post-processing functionality available in RAMPART to bin reads, then you'll need `binlorry`:

```bash
python -m pip install binlorry==1.3.0_alpha1
```

### Step 4: Check that it works

```
rampart --help
```

---

## Install from source

(1) Clone the Github repo

```bash
git clone https://github.com/artic-network/rampart.git
git clone https://github.com/fmfi-compbio/rampart.git
cd rampart
```

(2) Create an activate the conda environment with the required dependencies.
You can either follow steps 1 & 3 above, or use the provided `environment.yml` file via
(2) Create an activate the conda environment with the required dependencies using the provided `environment.yml` file via

*note: we are using a modified version of porechop, where we fixed a bug which caused that in the original version of RAMPART the first 12 barcodes were missing for 96 pcr barcode set*

```bash
conda env create -f environment.yml
conda activate artic-rampart
conda activate covid-artic-rampart
```

(3) Install dependencies using `npm`
Expand All @@ -92,6 +33,8 @@ npm install

(4) Build the RAMPART client bundle

*note: you will have to run this command anytime you pull a new version from gitHub*

```bash
npm run build
```
Expand All @@ -100,7 +43,7 @@ npm run build
so that it is available via the `rampart` command

```bash
npm install --global .
npm install --global
```

Check that things work by running `rampart --help`
Expand Down
Loading