Skip to content

Commit

Permalink
Add files via upload
Browse files Browse the repository at this point in the history
  • Loading branch information
RamiyapriyaS authored Dec 23, 2024
1 parent 0989c2e commit 609eff7
Showing 1 changed file with 61 additions and 44 deletions.
105 changes: 61 additions & 44 deletions notebooks/AWS-ParallelCluster.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -34,8 +34,8 @@
"\n",
"Please follow the installation instructions for the ParallelCluster UI provided here: [here](https://github.com/STRIDES/NIHCloudLabAWS/blob/main/docs/Install_AWSParallelCluster.md). These instructions will guide you through the necessary steps to create a CloudFormation Stack through which you can access the AWS ParallelCluster UI. \n",
"\n",
"Additionally, we urge you to check out the documents within the `docs/` folder of the repository for more bioinformatics and Gen AI tutorials.",
"\n",
"Additionally, we urge you to check out the documents within the `docs/` folder of the repository for more bioinformatics and Gen AI tutorials.\n",
"\n",
"Once you have created the Cloud Formation Stack for the PCUI, navigate to the user interface URL. It will look like this:"
]
},
Expand Down Expand Up @@ -63,13 +63,13 @@
"metadata": {},
"source": [
"### Create a Cluster \n",
"Let's create a cluster within the ParallelCluster environment.",
"Let's create a cluster within the ParallelCluster environment.\n",
"\n",
"![create-cluster.png](attachment:create-cluster.png)\n",
"\n",
"1. In the PCUI Clusters view, choose **Create cluster** > **Step by step**.\n",
"2. In Cluster, **Name**, enter a name for your cluster.\n",
"3. Choose a **VPC** with a public subnet for your cluster, and choose Next.\n",
"3. Choose a **VPC** from the available options and choose Next. CloudLab users will have access to pre-configured VPC networks.\n",
"4. In **Head node**, choose Add **SSM session**. This will allow you to access the head node through the **`Shell`** button. Change the instance type of your head node to **t2.xlarge**. \n",
"5. In **Queues**, provide a name and subnet for your queue.\n",
"6. In **Compute resources**, choose 1 for **Static nodes** and select **c5n.large** as the instance type for your compute resources. \n",
Expand Down Expand Up @@ -110,7 +110,7 @@
" AdditionalIamPolicies:\n",
" - Policy: arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore\n",
" Ssh:\n",
" KeyName: snakemake-cluster-key-pair\n",
" KeyName: Snakemake-cluster-key-pair\n",
"Scheduling:\n",
" Scheduler: slurm\n",
" SlurmQueues:\n",
Expand Down Expand Up @@ -171,7 +171,7 @@
"id": "73692fbe",
"metadata": {},
"source": [
"3. Install conda. We will be executing snakemake using conda. "
"3. Install conda. We will be executing Snakemake using conda. "
]
},
{
Expand All @@ -196,7 +196,7 @@
"source": [
"3. Install Snakemake and the Snakemake ParallelCluster plugin. \n",
"\n",
"Note: the PCluster plugin requires snakemake > 8.0.0"
"Note: the PCluster plugin requires Snakemake > 8.0.0"
]
},
{
Expand All @@ -205,8 +205,8 @@
"metadata": {},
"source": [
"```bash\n",
"pip3 install snakemake==8.25.5\n",
"pip3 install snakemake-executor-plugin-pcluster-slurm\n",
"pip3 install Snakemake==8.25.5\n",
"pip3 install Snakemake-executor-plugin-pcluster-slurm\n",
"```"
]
},
Expand All @@ -215,7 +215,7 @@
"id": "5302b6ba",
"metadata": {},
"source": [
"Alternatively, you may use conda to install snakemake using the following command: "
"Alternatively, you may use conda to install Snakemake using the following command: "
]
},
{
Expand All @@ -224,7 +224,7 @@
"metadata": {},
"source": [
"```bash\n",
"conda install bioconda::snakemake==8.25.5\n",
"conda install bioconda::Snakemake==8.25.5\n",
"```"
]
},
Expand Down Expand Up @@ -275,7 +275,7 @@
"id": "907420c9",
"metadata": {},
"source": [
"2. Submit the job using an sbatch command "
"2. Submit the job using an `sbatch` command "
]
},
{
Expand Down Expand Up @@ -308,7 +308,7 @@
"metadata": {},
"source": [
"```bash \n",
"mkdir hello-world-snakemake\n",
"mkdir hello-world-Snakemake\n",
"vim Snakefile\n",
"```"
]
Expand Down Expand Up @@ -339,7 +339,7 @@
"id": "4590ade3",
"metadata": {},
"source": [
"2. Execute the workflow using the snakemake command, specifying `pcluster-slurm` as the executor."
"2. Execute the workflow using the Snakemake command, specifying `pcluster-slurm` as the executor."
]
},
{
Expand All @@ -348,7 +348,7 @@
"metadata": {},
"source": [
"```bash\n",
"snakemake --executor pcluster-slurm \n",
"Snakemake --executor pcluster-slurm \n",
"```"
]
},
Expand Down Expand Up @@ -390,20 +390,27 @@
" \"output_2.txt\" \n",
"```\n",
"\n",
"**Shell Command:** The shell keyword is used to specify the shell command that will be executed to produce the output files.\n",
"**Shell Command:** \n",
"\n",
"The shell keyword is used to specify the shell command that will be executed to produce the output files.\n",
"\n",
"#### Commandline Command Breakdown: \n",
"\n",
"**snakemake:** Invokes the snakemake tool. This tool will look for a Snakefile in the current working directory \n",
"**--executor pcluster-slurm:** The flag enables the workflow to be executed through the slurm cluster connected to the head node"
"**Snakemake:** \n",
"\n",
"Invokes the Snakemake tool. This tool will look for a Snakefile in the current working directory \n",
"\n",
"**--executor pcluster-slurm:** \n",
"\n",
"The flag enables the workflow to be executed through the slurm cluster connected to the head node"
]
},
{
"cell_type": "markdown",
"id": "fbf47ae7",
"metadata": {},
"source": [
"## Submitting a Bioinformatics Snakemake workflow to the Slurm cluster\n",
"## Submitting a bioinformatics Snakemake workflow to the Slurm cluster\n",
"\n",
"In this example, we will use Snakemake and the pcluster-slurm plugin to run a Bioinformatics pipeline. \n",
"\n",
Expand Down Expand Up @@ -612,17 +619,18 @@
"* `SAMPLES = [\"A\", \"B\"]` defines the samples to be processed.\n",
"\n",
"**Workflow:**\n",
"* all: Specifies the final output files required to complete the workflow.\n",
"* **all:** Specifies the final output files required to complete the workflow.\n",
"* Bioinformatics rules \n",
" - bwa_index: Indexes the reference genome file (data/genome.fa) for alignment.\n",
" - bwa_map: Maps the sequencing reads (data/samples/{sample}.fastq) to the indexed genome and converts the output to BAM format.\n",
" - samtools_sort: Sorts the BAM files generated from the mapping step.\n",
" - samtools_index: Indexes the sorted BAM files for faster access.\n",
" - bcftools_call: Calls genetic variants from the sorted and indexed BAM files.\n",
" - plot_quals: Generates a plot of the quality of the called variants.\n",
"* The conda environment required for each rule is derived from the `conda_env` variable found within the config file\n",
"* Each rule uses shell commands to perform the required bioinformatics tasks (e.g., bwa index, bwa mem, samtools sort, samtools index, bcftools mpileup, bcftools call).\n",
"* The order in which each rule must be run, is defined from the input and output parameters. For example, as the `bcf_tools` rule requires a sorted and indexed bam file as the input, it will be executed after the `samtools_index` rule. \n",
" * **bwa_index:** Indexes the reference genome file (data/genome.fa) for alignment.\n",
" * **bwa_map:** Maps the sequencing reads (data/samples/{sample}.fastq) to the indexed genome and converts the output to BAM format.\n",
" * **samtools_sort:** Sorts the BAM files generated from the mapping step.\n",
" * **samtools_index:** Indexes the sorted BAM files for faster access.\n",
" * **bcftools_call:** Calls genetic variants from the sorted and indexed BAM files.\n",
" * **plot_quals:** Generates a plot of the quality of the called variants.\n",
" \n",
"* The **conda environment** required for each rule is derived from the `conda_env` variable found within the config file\n",
"* Each rule uses **shell commands** to perform the required bioinformatics tasks (e.g., bwa index, bwa mem, samtools sort, samtools index, bcftools mpileup, bcftools call).\n",
"* The **order** in which each rule must be run, is defined from the **input and output parameters**. For example, as the `bcf_tools` rule requires a sorted and indexed bam file as the input, it will be executed after the `samtools_index` rule. \n",
"\n"
]
},
Expand All @@ -635,13 +643,32 @@
}
},
"source": [
"### Execute the workflow using the **snakemake** command, specifying **pcluster-slurm** as the executor and **conda** as the environment management system\n",
"### Execute the workflow \n",
"\n",
"Execute the workflow using the **Snakemake** command, specifying **pcluster-slurm** as the executor and **conda** as the environment management system\n",
"\n",
"\n",
"```bash\n",
"snakemake --executor pcluster-slurm --use-conda -j 5\n",
"Snakemake --executor pcluster-slurm --use-conda -j 5\n",
"```\n",
"### Command Breakdown \n"
"#### Commandline Command Breakdown: \n",
"\n",
"**Snakemake:** \n",
"\n",
"Invoke the Snakemake tool. \n",
"\n",
"**--executor pcluster-slurm:** \n",
"\n",
"Specify `pcluster-slurm` as the `--executor`\n",
"\n",
"**--use-conda** \n",
"\n",
"This flag tells Snakemake to use Conda environments for managing dependencies. When this flag is used, Snakemake will look for environment.yaml files specified in the workflow rules and create Conda environments accordingly. \n",
"\n",
"**-j** \n",
"\n",
"This flag specifies the number of jobs (or threads) to run in parallel.\n",
"\n"
]
},
{
Expand All @@ -655,18 +682,8 @@
"\n",
"## References: \n",
"* [AWS ParallelCluster](https://docs.aws.amazon.com/parallelcluster/)\n",
"* [Snakemake Documentation](https://snakemake.readthedocs.io/en/stable/)\n",
"* [Snakemake `pcluster-slurm` plugin](https://snakemake.github.io/snakemake-plugin-catalog/plugins/executor/pcluster-slurm.html)\n"
]
},
{
"cell_type": "markdown",
"id": "cf514196",
"metadata": {},
"source": [
"Graphics \n",
"- parallelcluster graphic \n",
"- snakemake files graphic"
"* [Snakemake Documentation](https://Snakemake.readthedocs.io/en/stable/)\n",
"* [Snakemake `pcluster-slurm` plugin](https://Snakemake.github.io/Snakemake-plugin-catalog/plugins/executor/pcluster-slurm.html)\n"
]
}
],
Expand Down

0 comments on commit 609eff7

Please sign in to comment.