Merge pull request #44 from qbic-pipelines/dev

Update docs for release
qbic-pipelines · Nov 19, 2021 · 20a6502 · 20a6502
2 parents c5ce6c7 + 66d1dea
commit 20a6502
Show file tree

Hide file tree

Showing 13 changed files with 77 additions and 96 deletions.
diff --git a/docs/images/tower_run.png b/docs/images/tower_run.png
diff --git a/docs/images/tower_run1.png b/docs/images/tower_run1.png
diff --git a/docs/images/tower_run2.png b/docs/images/tower_run2.png
diff --git a/docs/images/tower_workspaces.png b/docs/images/tower_workspaces.png
diff --git a/docs/markdown/clusters/denbi_cloud.md b/docs/markdown/clusters/denbi_cloud.md
@@ -2,27 +2,45 @@
 
 ## Access to a project
 
-Either request a project to [deNBI cloud](https://cloud.denbi.de) or ask someone at QBiC if they can create an instance for you in one of their projects.
+Either request a new project to [deNBI cloud](https://cloud.denbi.de), or ask the RDDS team or team leader if you can create an instance in one of their projects. The info on how to apply for a new project is collected on the [deNBI wiki](https://cloud.denbi.de/wiki/portal/allocation/). We recommend you to apply for an `OpenStack Project` so you can configure your own settings and instances.
 
-## Documentation
+You should register with your University account to obtain an ELIXIR ID that will allow you to log into deNBI cloud, and once you have an account there, you can be added to an existing project. The instructions on how to register are found [here](https://cloud.denbi.de/wiki/registration/).
+
+> Important! After registering it is necessary to send an email to the Tübingen cloud administrator, so that they activate your account.
+
+## deNBI official documentation
 
 The documentation on how to create instances and other important things is collected on the [deNBI Tübingen page](https://cloud.denbi.de/wiki/Compute_Center/Tuebingen). This documentation is not perfect, though, and I found it useful to add a few more notes here.
 
-## Create an instance
+## Creating an instance
+
+1. Log into `cloud.denbi.de`, select your project, and log into the OpenStack web interface by clicking the green button `Log into Openstack`.
+
+    ![Login openstack](./denbi_cloud_images/login_openstack.png)
+
+2. You should then see the project overview board. This overview shows how many instances, and total CPUs, memory (GB), and Storage (GB) is still available for this project. If this is not enough for your needs, you can ask for access to another project or create a new project.
+
+    ![Openstack project overview](./denbi_cloud_images/project_overview_board.png)
+
+3. To create a new instance, go to the left menu: Compute -> Instances -> Launch Instance button. This will prompt a step by step guide:
+    * Details: add an Instance Name
+    * Source: select either "Image" for a generic image e.g. CentOS operating system, or "Instance Snapshot" for creating an Instance from a previous snapshot. For running `Nextflow` workflows, you can use the Instance Snapshot `nextflow-singularity` which already has `java-jdk12`, `Nextflow`, `Singularity` and `Docker` installed (check if Nextflow should be updated with `nextflow self-update`).
+    * Flavour: select the instance flavour (number of CPUs and RAM).
+    * Networks: select `denbi_uni_tuebingen_external` network.
+    * Network Ports: leave empty.
+    * Security Groups: add `default` AND `external_access`.
+    * Key Pair: add a new key pair or select yours. Only one Key Pair is allowed per instance and if you lose the private key you will not be able to access the instance any more! If you choose to create a new keypair, make sure to copy the private key that is displayed to your computer, and store it under the `~/.ssh/` directory. You will also need to adapt the rights of this file so that only you (the computer main user) can read this file. You can do that in the command line by:
 
-Go to the menu: Compute -> Instances -> Launch Instance button.
+        ```bash
+        chmod 600 <your_private_ssh_key>
+        ```
 
-* Details: add an Instance Name
-* Source: select either "Image" for a generic image e.g. CentOS operating system, or "Instance Snapshot" for creating an Instance from a previous snapshot. For running `Nextflow` workflows, you can use the Instance Snapshot `nextflow-singularity` which already has `java-jdk12`, `Nextflow`, `Singularity` and `Docker` installed (check if Nextflow should be updated with `nextflow self-update`).
-* Flavour: select the instance flavour (number of CPUs and RAM).
-* Networks: select `denbi_uni_tuebingen_external` network.
-* Network Ports: leave empty.
-* Security Groups: add `default` AND `external_access`.
-* Key Pair: add a new key pair or select yours. Only one Key Pair is allowed per instance and if you lose the private key you will not be able to access the instance any more!
-* Rest of fields: leave default.
-* Press on `create instance`
+    * Rest of fields: leave default.
+    * Press on `create instance`
 
-You should now see your image being Spawn on the Instance dashboard. It might take several minutes to spawn, especially if created from an Instance Snapshot.
+You should now see your image being Spawn on the **Instance dashboard**. It might take several minutes to spawn, especially if created from an Instance Snapshot. In this dashboard you will be able to see the instance IP and the operating system, which you will need to log into the instance via `SSH`.
+
+![instance board](./denbi_cloud_images/instance_IP_username.png)
 
 ## SSH to an instance
 
@@ -32,7 +50,7 @@ To ssh to an instance, you need the private key of the Key Pair that was used to
 ssh -i /path/to/private/ssh-key <username>@<IP>
 ```
 
-The username is the name of the operating system that was used in the image. For the `nextflow-singularity` instance snapshot, it is `centos`.
+The username is the name of the operating system that was used in the image. For the `nextflow-singularity` instance snapshot, it is `centos`. For an Ubuntu-based instance, that will be `ubuntu`.
 
 ```bash
 ssh -i /path/to/private/ssh-key centos@<IP>
@@ -65,14 +83,16 @@ In order to use an external cinder volume, you need to first create one on the O
 
 ## Setting-up nextflow, singularity, docker
 
+If you haven't created an instance based on an Image that already has java, Nextflow and singularity or docker installed (e.g. the `nextflow-singularity` image), you will need to install this software.
+
 * Installation instructions for [Java](https://phoenixnap.com/kb/install-java-on-centos) on CentOS. For Nextflow you will need Java jdk <= 11.
 * Instructions for installing Nextflow can be found [here](https://www.nextflow.io/docs/latest/getstarted.html)
 * On CentOS, singularity can be installed with the package manager `yum`. First install the [dependencies](https://sylabs.io/guides/3.0/user-guide/installation.html#before-you-begin) and then head straight to the [CentOS section](https://sylabs.io/guides/3.0/user-guide/installation.html#install-the-centos-rhel-package-using-yum)
 * For installing docker, please follow the [instructions](https://docs.docker.com/engine/install/centos/) and the [post-installation steps](https://docs.docker.com/engine/install/linux-postinstall/)
 
 ## Running Nextflow pipelines on deNBI
 
-Running Nextflow pipelines on deNBI VMs is like running them locally on your computer. When launching a pipeline, make sure to define the maximum resources available at your instance, either with the appropriate parameters or with a custom config file:
+Running Nextflow pipelines on deNBI VMs is like running them locally on your computer. When launching a pipeline, make sure to define the maximum resources available at your instance, either with the appropriate parameters or with a custom config file (e.g. in a file called `custom.config`):
 
 ```console
 params {
@@ -81,3 +101,11 @@ params {
   max_time = 960.h
 }
 ```
+
+Then run the pipeline with the `singularity` or `docker` profile, whatever container system you prefer and have installed in the instance, and by providing this config file. The best is to start the run inside a screen session. For example:
+
+```bash
+screen -S newscreen
+nextflow pull nf-core/rnaseq -r 3.4
+nextflow run nf-core/rnaseq -r 3.4 -profile singularity,test -c custom.config
+```
diff --git a/docs/markdown/clusters/denbi_cloud_images/instance_IP_username.png b/docs/markdown/clusters/denbi_cloud_images/instance_IP_username.png
diff --git a/docs/markdown/clusters/denbi_cloud_images/login_openstack.png b/docs/markdown/clusters/denbi_cloud_images/login_openstack.png
diff --git a/docs/markdown/clusters/denbi_cloud_images/project_overview_board.png b/docs/markdown/clusters/denbi_cloud_images/project_overview_board.png
diff --git a/docs/markdown/clusters/running_jobs.md b/docs/markdown/clusters/running_jobs.md
@@ -7,7 +7,6 @@ These are general statements on how to submit jobs to our clusters.
   * [Dependencies](#dependencies)
     * [Install Nextflow](#install-nextflow)
     * [Nextflow version](#nextflow-version)
-  * [Loading singularity modules](#loading-singularity-modules)
   * [Pipeline profiles](#pipeline-profiles)
   * [Example bash file](#example-bash-file)
 * [Screen sessions](#screen-sessions)
@@ -24,12 +23,11 @@ of the functionality might not be available, for example the `cfc` and `binac` p
 
 Please start all your nextflow jobs from the head node.
 Nextflow interacts directly with the Slurm scheduler and will take care of submitting individual jobs to the nodes.
-If you submit via an interactive job, weird errors can occur, e.g. the cache directory for the containers is mounted read only on the compute nodes, so you can't pull new containers from a node.
+If you submit via an interactive job, strange errors can occur, e.g. the cache directory for the containers is mounted read-only on the compute nodes, so you can't pull new containers from a node.
 
 ### Dependencies
 
-To run Nextflow pipelines in our clusters, you will need Nextflow, java and singularity installed.
-Luckily, our sysadmin made a module for singularity and java in our clusters already, so you will just need to load these modules.
+To run Nextflow pipelines in our clusters, you will need Nextflow, java and singularity installed. Java and singularity are already installed in all cluster nodes so these do not need to be installed. You also do not need to load these modules.
 
 You will still have to install Nextflow for your user, that's very simple and described in the next section.
 
@@ -41,7 +39,7 @@ You will still have to install Nextflow for your user, that's very simple and de
     wget -qO- get.nextflow.io | bash
     ```
 
-* Optionally, move the nextflow file in a directory accessible by your `$PATH` variable (this is only required to avoid to remember and type the Nextflow full path each time you need to run it).
+* Optionally, move the nextflow file in a directory accessible by your `$PATH` variable (required only to avoid remembering and typing the Nextflow full path each time you need to run it).
 
 For more information, visit the [Nextflow documentation](https://www.nextflow.io/docs/latest/en/latest/getstarted.html).
 
@@ -65,17 +63,6 @@ If not, set it by running:
 NXF_VER=19.10.0
 ```
 
-### Loading singularity modules
-
-We currently load Singularity as a module on both BinAC and CFC to make sure that all paths are set accordingly and load required configuration parameters tailored for the respective system.
-Please do use *only* these Singularity versions and *NOT* any other (custom) singularity versions out there.
-These singularity modules will already load the required java module
-so you don't need to take care of that.
-
-```bash
-module load devel/singularity/3.4.2
-```
-
 ### Pipeline profiles
 
 #### On CFC
@@ -105,7 +92,7 @@ nextflow run custom/pipeline -c custom.config
 Please use the [binac profile](https://github.com/nf-core/configs/blob/master/conf/binac.config) by adding `-profile binac` to run your analyses. For example:
 
 ```bash
-nextflow run nf-core/rnaseq -r 1.4.2 -profile cfc_dev
+nextflow run nf-core/rnaseq -r 1.4.2 -profile binac
 ```
 
 For Nextflow pipelines not part of nf-core and not created with the nf-core create command, these profiles will not be available.
@@ -122,7 +109,6 @@ Here is an example bash file:
 ```bash
 #!/bin/bash
 module purge
-module load devel/singularity/3.4.2
 nextflow run nf-core/sarek -r 2.6.2 -profile cfc,test
 ```
 
@@ -177,9 +163,9 @@ Here are some useful commands for the Slurm scheduler.
 ## Submitting custom jobs
 
 > *Important note*: running scripts without containerizing them is never 100% reproducible, even when using conda environments.
-It is ok for testing, but talk to your group leader about the possibilities of containerizing the analysis or adding your scripts to a pipeline.
+It is ok to test pipelines, but talk to your group leader about the possibilities of containerizing the analysis or adding your scripts to a pipeline.
 
-To run custom scripts (R or python or whatever you need) in the cluster, it is mandatory to use a dependency management system. This ensures at least some reproducibility for the results. You have two possibilities: use a clean conda environment and eexport it as an `environment.yml` file, or working on Rstudio and then using Rmaggedon.
+To run custom scripts (R or Python, or any other tool needed) in the cluster, it is mandatory to use a dependency management system. This ensures at least some reproducibility for the results. You have two possibilities: (1) use a clean conda environment and export it as an `environment.yml` file, or (2) working in Rstudio and then using Rmaggedon.
 
 * *Using conda*: create a conda environment and install there all the necessary dependencies. Once you have them all, export the dependencies to a yml file containing the project code:
 
@@ -211,7 +197,7 @@ srun -N 1 --ntasks-per-node=8 --mem=16G --time=12000 --pty bash
 
 Change the resources as needed:
 
-* N are the number of nodes
+* `-N` are the number of nodes
 * `--ntasks-per-node` are the number of cpus
 * `--mem` is the memory required
 * `--time` is the time required in seconds
@@ -229,7 +215,7 @@ You should see your job listed when running `squeue`.
 
 ### Submitting a bash script with `sbatch`
 
-If you have a batch script, you can submit it to the cluster with the `sbatch` command.
+Please mind the [above-mentioned instructions](#submitting-nextflow-pipelines) for submitting Nextflow pipelines. If you have a batch script that is not a Nextflow pipeline run, you can submit it to the cluster with the `sbatch` command.
 
 ```bash
 sbatch <your_script.sh>

diff --git a/docs/markdown/clusters/tower.md b/docs/markdown/clusters/tower.md
@@ -1,6 +1,6 @@
 # Nextflow tower
 
-To be able to follow the Nextflow workflow rusn via tower, you can add Tower access credentials in your Nextflow configuration file (`~/.nextflow/config`) using the following snippet:
+To be able to follow the Nextflow workflow runs via tower, you can add Tower access credentials in your Nextflow configuration file (`~/.nextflow/config`) using the following snippet:
 
 ```console
 tower {
@@ -11,12 +11,23 @@ tower {
 }
 ```
 
+Your access token can be created on [this page](http://cfgateway1.zdv.uni-tuebingen.de/tokens).
+
+The workspace ID can be found on the organisation's Workspaces overview page. [Here](http://cfgateway1.zdv.uni-tuebingen.de/orgs/QBiC/workspaces) you can find QBiC's workspaces:
+
+![workspaces](../../images/tower_workspaces.png)
+
 To submit a pipeline to a different Workspace using the Nextflow command line tool, you can provide the workspace ID as an environment variable. For example
 
 ```console
 export TOWER_WORKSPACE_ID=000000000000000
 ```
 
-The workspace ID can be found on the organisation's Workspaces overview page.
+If you are outside of the University, access to the tower is only possible via VPN. When you started your run, you can now track its progress [here](http://cfgateway1.zdv.uni-tuebingen.de) after selecting your workspace and your run. Here is an example of what it looks like:
+
+![example run](../../images/tower_run.png)
+![example run](../../images/tower_run1.png)
+![example run](../../images/tower_run2.png)
+You can select your run on the left. You will see the name of the run, your command line and the progress and stats of the run.
 
 For more info on how to use tower please refere to the [Tower docs](https://help.tower.nf/).
diff --git a/docs/markdown/pipelines/ampliseq.md b/docs/markdown/pipelines/ampliseq.md
@@ -1,35 +1,32 @@
 # 16S amplicon sequencing
 
-This is how to perform 16S amplicon analyses.
+This is how to perform 16S amplicon analyses. A video explanation of the biology, the bioinformatics problem and the analysis pipeline can be found for version 2.1.0 in the [nf-core bytesize talk 25](https://nf-co.re/events/2021/bytesize-25-nf-core-ampliseq).
 
 ## Ampliseq pipeline
 
 To perform 16S amplicon sequencing analyses we employ the [nf-core/ampliseq](https://github.com/nf-core/ampliseq) pipeline.
 
 ### Quick start
 
-* Latest stable release `-r 2.1.0`
+* Latest stable release `-r 2.1.1`
 
 A typical command would look like this
 
 ```bash
-nextflow run nf-core/ampliseq -profile cfc -r 2.1.0 \
+nextflow run nf-core/ampliseq -profile cfc -r 2.1.1 \
 --input “data” \
 --FW_primer "GTGYCAGCMGCCGCGGTAA" \
 --RV_primer "GGACTACNVGGGTWTCTAAT" \
 --metadata "metadata.tsv" \
---trunc_qmin 35 \
---classifier_removeHash
+--trunc_qmin 35
 ```
 
-See [here](https://nf-co.re/ampliseq/1.2.0/parameters#manifest) the info on how to create the `metadata.tsv` file.
+Sequencing data can be analysed with the pipeline using a folder containing `.fastq.gz` files with [direct fastq input](https://nf-co.re/ampliseq/2.1.1/usage#direct-fastq-input) or [samplesheet input](https://nf-co.re/ampliseq/2.1.1/usage#samplesheet-input), also see [here](https://nf-co.re/ampliseq/2.1.1/parameters#input).
 
-If data are distributed on multiple sequencing runs, please use `--multipleSequencingRuns` and note the different requirements for metadata file and folder structure in the [pipeline documentation](https://nf-co.re/ampliseq/1.2.0/parameters#multiplesequencingruns)
-
-### Known bugs
+See [here](https://nf-co.re/ampliseq/2.1.1/parameters#metadata) the info on how to create the `metadata.tsv` file.
 
-* All versions include a known bug that is why the `--classifier_removeHash` param should be used.
+If data are distributed on multiple sequencing runs, please use `--multipleSequencingRuns` and note the different requirements for metadata file and folder structure in the [pipeline documentation](https://nf-co.re/ampliseq/1.2.0/parameters#multiplesequencingruns)
 
 ## Reporting
 
-There are no details about reporting yet.
+There are no details about reporting yet. Please refer to the [output documentation](https://nf-co.re/ampliseq/2.1.1/output).
diff --git a/docs/markdown/pipelines/debugging.md b/docs/markdown/pipelines/debugging.md
@@ -131,9 +131,7 @@ This error occurs because the `scratch` space on the nodes for staging files the
 
   ```bash
   process {
-    withName:MarkDuplicates {
-    scratch = '/sfs/7/workspace/ws/my-ws-name/tmp'
-    }
+    scratch = false
   }
   ```