fmfi-compbio · janka000 · May 2, 2021 · May 2, 2021 · May 2, 2021 · May 2, 2021
diff --git a/.gitignore b/.gitignore
@@ -42,3 +42,6 @@ binned*
 
 # snakemake stuff
 .snakemake/
+
+# python stuff
+__pycache__/
diff --git a/README.md b/README.md
@@ -4,7 +4,7 @@ Read Assignment, Mapping, and Phylogenetic Analysis in Real Time.
 
 RAMPART runs concurrently with MinKNOW and shows you demuxing / mapping results in real time.
 
-![](docs/images/main.png)
+![](docs/img/main.png)
 
 
 ## Motivation
@@ -13,29 +13,12 @@ Furthermore, the small size of many pathogens mean that insightful sequence data
 RAMPART run concurrently with MinION sequencing of such pathogens.
 It provides a real-time overview of genome coverage and reference matching for each barcode.
 
-RAMPART was originally designed to work with amplicon-based primer schemes (e.g. for [ebola](https://github.com/artic-network/primer-schemes)), but this isn't a requirement.
-
-
+This version of RAMPART is designed for ... <!-- #todo -->
 
 ## Documentation
 
-* [Installation](docs/installation.md)
-* [Running an example dataset & understanding the visualisations](docs/examples.md)
+* [Installation](docs/installation.md) <!-- * [Running an example dataset & understanding the visualisations](docs/examples.md) -->
 * [Setting up for your own run](docs/setting-up.md)
 * [Configuring RAMPART using protocols](docs/protocols.md)
-* [Debugging when things don't work](docs/debugging.md)
-* [Notes relating to RAMPART development](docs/developing.md)
-
-
-
-
-## Status
-
-RAMPART is in development with a publication forthcoming.
-Please [get in contact](https://twitter.com/hamesjadfield) if you have any issues, questions or comments.
-
-
-## RAMPART has been deployed to sequence:
+* [Covid strand matching pipeline](docs/barcode_strand_match.md)
 
-* [Yellow Fever Virus in Brazil](https://twitter.com/Hill_SarahC/status/1149372404260593664)
-* [ARTIC workshop in Accra, Ghana](https://twitter.com/george_l/status/1073245364197711874)
diff --git a/default_protocol/pipelines.json b/default_protocol/pipelines.json
@@ -1,7 +1,7 @@
 {
 	"annotation": {
 		"name": "Annotate reads",
-		"path": "pipelines/demux_map",
+		"path": "../pipelines/default_pipeline/demux_map",
 		"config_file": "config.yaml",
 		"requires": [
 			{
@@ -12,7 +12,7 @@
 	},
 	"export_reads": {
 		"name": "Export reads",
-		"path": "pipelines/bin_to_fastq",
+		"path": "../pipelines/default_pipeline/bin_to_fastq",
 		"config_file": "config.yaml",
 		"run_per_sample": true
 	}

diff --git a/docs/barcode_strand_match.md b/docs/barcode_strand_match.md
@@ -0,0 +1,92 @@
+# Covid strand matching pipeline
+
+This pipeline looks for new `.csv` files that RAMPART has created in annotations  pipeline (located in `/annotations` folder) since last time when the pipeline was triggered.
+
+Then it looks for matching `.fastq` files and creates `.bam` files from them one by one, using `minimap2`. 
+
+The `.bam` files are no longer needed after we process them in our python script, so that they are deleted afterwards.
+
+The first output file of this pipeline is located in `/annotations/base_count/count.csv` and contains counts of each base (A, C, G or T) for each position in reference for each barcode. In each line of the file there is 
+
+* position in the reference genome
+* barcode name
+* count of A's mapped to this position in reference genome
+* count of C's
+* count of G's
+* count of T's
+
+When you trigger the pipeline next time, the pipeline will use this output file as one of its inputs, new counts will be added to those from the existing `counts.csv` file and a new file will be created as output.
+
+The next step of our pipeline is determining the variants based on the provided `.txt` file of a specific format (see mutations file section below) which contains the changes in reference genome that are specific for some known variants of sars-cov-2.
+
+By default our pipeline uses one of the `.txt` files we have created. All the `.txt` files are located in /covid_protocol/pipelines/run_python_scripts/rules/mut_files/
+You may also want to provide your own. To do this, you have to create a `.txt` file in the directory mentioned above, and then in `covid_protocol/pipelines/run_python_scripts/config.yaml` replace the name of the file to be used with your own. 
+
+You can also set your own threshold value, which determines minimal number of reads mapped to the position in the reference genome, so that our python script will clasify a mutation as significat enough to support that the barcode sample corresponds to a variant.
+
+These are the default settings:
+
+```
+###mutations###
+coverage_threshold: 10
+mutations_file: mutbb.txt
+```
+
+
+At the end, the `annotations/results` folder should contain a `mutations.json` file containing the mutations we matched to the barcodes.
+
+Once the json file is available, it will be loaded to RAMPART and the results will be shown.
+
+## The mutations file
+This file specifies the variants of sars-cov-2 to look for and mutations that are specific for a variant.
+Each line starts with a label of a variant at the beginning, 
+followed by exactly one space and then a number of mutations that we want to match so that we can say that a barcode corresponds to this variant.
+Then there are mutations that are typical for a variant separated by spaces.
+Lines starting with `#` are comments and are ignored when parsing the file.
+
+### Example
+```
+UK 5 C3267T C5388A ... G28280C A28281T T28282A
+```
+in our default file you can see this line, 
+
+starting with "UK", which is our label for this variant. 
+
+the label is followed by a number, 5, which says that "if at least 5 of the following mutations are present int he sample, classify the sample as this variant"
+
+the number is followed by mutations (for example C is changed to T at the position 3267 mapped to reference genome) that are typical for this variant, separated by spaces.
+
+### Tree-like structure
+You can also provide another variants in tree-like structure, using syntax `start_sub` and `end_sub` in separate lines:
+```
+#UK variant
+UK 5 C3267T ... T28282A
+
+#more specific variants for UK
+start_sub
+UK-subvariant_1 1 A17615G
+
+#subvariants for UK-subvariant_1
+start_sub
+.
+.
+UK-subvariant_1-Poland 4 C5301T C7420T C9693T G23811T C25350T C28677T G29348T
+UK-subvariant_1-Gambia 3 T6916C T18083C G22132A C23929T
+end_sub
+
+end_sub
+
+#CZ variant
+CZ 3 G12988T G15598A G18028T T24910C T26972C 
+```
+
+This means that we will look for UK variant, and if we will find at least 5 mutations from the list provided in the UK line, we will also continue searching for other more specific variants.
+
+For example if UK variant is matched, we will check whether there is also a mutation in position A17615G, 
+
+if it is, then we will chcek if there are some of the mutations specified in its subsection - UK-subvariant_1-Poland or UK-subvariant_1-Gambia. 
+
+We will stop searching at the point when there are no more subsections specified or when less than the required count of mutations vere found for a sample.
+
+We will look for CZ variant too. This one has no subvariants specified in this example file, so no further search would be made.
+
diff --git a/docs/img/main.png b/docs/img/main.png
diff --git a/docs/img/main_2.png b/docs/img/main_2.png
diff --git a/docs/img/old/main.png b/docs/img/old/main.png
diff --git a/docs/img/old/main_2.png b/docs/img/old/main_2.png
diff --git a/docs/img/old/main_3.png b/docs/img/old/main_3.png
diff --git a/docs/img/s1.png b/docs/img/s1.png
diff --git a/docs/img/s2.png b/docs/img/s2.png
diff --git a/docs/img/s3.png b/docs/img/s3.png
diff --git a/docs/img/s4.png b/docs/img/s4.png
diff --git a/docs/img/s5.png b/docs/img/s5.png
diff --git a/docs/installation.md b/docs/installation.md
@@ -4,84 +4,25 @@
 These instructions assume that you have installed [MinKNOW](https://community.nanoporetech.com/downloads) and are able to run it.
 
 
-## Install from conda
-
 We also assume that you are using conda -- See [instructions here](https://conda.io/projects/conda/en/latest/user-guide/install/index.html) to install conda on your machine.
 
-### Step 1: Create a new conda environment or install nodejs into your current conda environment
-
-Create a new conda environment and activate it via:
-
-```bash
-conda create -n artic-rampart -y nodejs=12 # any version >10 should be fine
-conda activate artic-rampart
-```
-
-Or install NodeJS into your currently activated environment via:
-
-```bash
-conda install -y nodejs=12 # any version >10 should be fine
-```
-
-### Step 2: Install RAMPART
-
-```bash
-conda install -y artic-network::rampart=1.1.0
-```
-
-### Step 3: Install dependencies
-
-Note that you may already have some or all of these in your environment, in which case they can be skipped.
-Additionally, some are only needed for certain analyses and can also be skipped as desired.
-
-> If you are installing RAMPART into the [artic-ncov2019](https://github.com/artic-network/artic-ncov2019) conda environment, you will already have all of these dependencies.
-
-
-Python, biopython, snakemake and minimap2 are required
-
-```bash
-conda install -y "python>=3.6"
-conda install -y anaconda::biopython 
-conda install -y -c conda-forge -c bioconda "snakemake<5.11" # snakemake 5.11 will not work currently
-conda install -y bioconda::minimap2=2.17
-```
-
-If you are using guppy to demux samples, you don't need Porechop,
-however if you require RAMPART to perform demuxing then you must install the ARTIC fork of Porechop:
-
-```bash
-python -m pip install git+https://github.com/artic-network/[email protected]
-```
-
-If you wish to use the post-processing functionality available in RAMPART to bin reads, then you'll need `binlorry`:
-
-```bash
-python -m pip install binlorry==1.3.0_alpha1
-```
-
-### Step 4: Check that it works
-
-```
-rampart --help
-```
-
----
 
 ## Install from source
 
 (1) Clone the Github repo 
 
 ```bash
-git clone https://github.com/artic-network/rampart.git
+git clone https://github.com/fmfi-compbio/rampart.git
 cd rampart
 ```
 
-(2) Create an activate the conda environment with the required dependencies.
-You can either follow steps 1 & 3 above, or use the provided `environment.yml` file via
+(2) Create an activate the conda environment with the required dependencies using the provided `environment.yml` file via
+
+*note: we are using a modified version of porechop, where we fixed a bug which caused that in the original version of RAMPART the first 12 barcodes were missing for 96 pcr barcode set*
 
 ```bash
 conda env create -f environment.yml
-conda activate artic-rampart
+conda activate covid-artic-rampart
 ```
 
 (3) Install dependencies using `npm`
@@ -92,6 +33,8 @@ npm install
 
 (4) Build the RAMPART client bundle
 
+*note: you will have to run this command anytime you pull a new version from gitHub*
+
 ```bash
 npm run build
 ```
@@ -100,7 +43,7 @@ npm run build
 so that it is available via the `rampart` command
 
 ```bash
-npm install --global .
+npm install --global 
 ```
 
 Check that things work by running `rampart --help`