-
Notifications
You must be signed in to change notification settings - Fork 9
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Upload hands on activity for distubuted comput and multiprocessing (#16)
* Initial commit * scripts * HPC script * HPC script * HPC troubleshooting * HPC script working * combining files * combining files * alpine tested * almost final * parallel compute graph * upload cheatsheet * update the README * Update lectures/10.HPC_and_parallel_compute/README.md Co-authored-by: Gregory Way <[email protected]> * Update lectures/10.HPC_and_parallel_compute/SLURM_cheatsheet.md Co-authored-by: Gregory Way <[email protected]> * Update lectures/10.HPC_and_parallel_compute/SLURM_cheatsheet.md Co-authored-by: Gregory Way <[email protected]> * Update lectures/10.HPC_and_parallel_compute/scripts/analyze_sequences.py Co-authored-by: Gregory Way <[email protected]> * Update lectures/10.HPC_and_parallel_compute/SLURM_cheatsheet.md Co-authored-by: Gregory Way <[email protected]> * Update lectures/10.HPC_and_parallel_compute/SLURM_cheatsheet.md Co-authored-by: Gregory Way <[email protected]> * Update lectures/10.HPC_and_parallel_compute/SLURM_cheatsheet.md Co-authored-by: Gregory Way <[email protected]> * Update lectures/10.HPC_and_parallel_compute/SLURM_cheatsheet.md Co-authored-by: Gregory Way <[email protected]> * Update lectures/10.HPC_and_parallel_compute/SLURM_cheatsheet.md Co-authored-by: Gregory Way <[email protected]> * Update lectures/10.HPC_and_parallel_compute/scripts/analyze_sequences.py Co-authored-by: Gregory Way <[email protected]> * updates lecture 10 files * course files udpate * add python script to utils * HPC run * add the data * run all analyses * update activity --------- Co-authored-by: Gregory Way <[email protected]>
- Loading branch information
1 parent
4eacc26
commit 84e6f3b
Showing
22 changed files
with
865 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -5,4 +5,5 @@ | |
*.Rproj | ||
.Rhistory | ||
.RData | ||
*__pycache__/ | ||
*.snakemake |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,49 @@ | ||
repos: | ||
- repo: https://gitlab.com/vojko.pribudic.foss/pre-commit-update | ||
rev: v0.6.0post1 # Insert the latest tag here | ||
hooks: | ||
- id: pre-commit-update | ||
args: [--exclude, black, --keep, isort] | ||
# Formats import order | ||
- repo: https://github.com/pycqa/isort | ||
rev: 5.12.0 | ||
hooks: | ||
- id: isort | ||
name: isort (python) | ||
args: ["--profile", "black", "--filter-files"] | ||
|
||
#Code formatter for both python files and jupyter notebooks | ||
- repo: https://github.com/psf/black | ||
rev: 22.10.0 | ||
hooks: | ||
- id: black-jupyter | ||
- id: black | ||
language_version: python3.10 | ||
|
||
- repo: https://github.com/nbQA-dev/nbQA | ||
rev: 1.9.0 | ||
hooks: | ||
- id: nbqa-isort | ||
additional_dependencies: [isort==5.6.4] | ||
args: [--profile=black] | ||
|
||
|
||
# remove unused imports | ||
- repo: https://github.com/hadialqattan/pycln.git | ||
rev: v2.4.0 | ||
hooks: | ||
- id: pycln | ||
|
||
# additional hooks found with in the pre-commit lib | ||
- repo: https://github.com/pre-commit/pre-commit-hooks | ||
rev: v5.0.0 | ||
hooks: | ||
- id: trailing-whitespace # removes trailing white spaces | ||
- id: mixed-line-ending # removes mixed end of line | ||
args: | ||
- --fix=lf | ||
- id: pretty-format-json # JSON Formatter | ||
args: | ||
- --autofix | ||
- --indent=4 | ||
- --no-sort-keys |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,26 @@ | ||
# Lecture 10: High performance computing and parallel computing | ||
|
||
This lecture will cover parallel computing and high performance computing. | ||
We have the following learning objectives: | ||
1. Familiarize with the concept of parallel computing | ||
2. Understand how to leverage parallel computing | ||
3. Learn about high performance computing | ||
4. Understand how to leverage high performance computing | ||
5. Learn how to use HPC resources and best practices | ||
|
||
We will be using some pre-written scripts to explore parallel computing and high performance computing. | ||
The following scripts are available in the [scripts](./scripts) directory: | ||
* [analyze_sequences](scripts/analyze_sequences.py) | ||
* This script contains the core sequence analysis function that we use to analyze sequences. | ||
Note this script is can be run for a single sequence and in a serial fashion but we will also call to be parallelized. | ||
* [multiprocessing_run](scripts/multiprocessing_run.sh) | ||
* This script runs itself in parallel using the `multiprocessing` module in Python. | ||
This shell script calls the `multiprocessing_sequence_analysis.py` script below | ||
* [multiprocessing_sequence_analysis](scripts/multiprocessing_sequence_analysis.py) | ||
* The script is called by the `multiprocessing_run.sh` script. | ||
* [plot_parallel_compute_analysis](scripts/plot_parallel_compute_analysis.py) | ||
* This script plots the results of the parallel computing analysis. | ||
* [serial_run](scripts/serial_run.sh) | ||
* This script runs the `analyze_sequences.py` script in serial. | ||
* [submit_jobs_HPC](scripts/submit_jobs_HPC.sh) | ||
* This script submits jobs to the HPC cluster in an array job. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,79 @@ | ||
# Slurm Guide | ||
|
||
For bash scripts, this line should be the first line of code in every script | ||
``` | ||
#!/bin/bash # Shebang slash bin slash bash | ||
``` | ||
|
||
Next are the SBATCH directives that tell slurm scheduler how to handle your job. | ||
These directives should be at the top of your script, but under the shebang line. | ||
|
||
### Frequent SLURM directives | ||
``` | ||
#SBATCH --job-name=parallel_job # job name | ||
#SBATCH -t 1-23:59:59 # D-HH-MM-SS | ||
#SBATCH -t 59 # MM | ||
#SBATCH -t 59:59 # MM:SS | ||
#SBATCH -t 59:59:59 # HH:MM:SS | ||
#SBATCH -t 1-23 # D-HH | ||
#SBATCH -t 1-23:59 # D-HH-MM | ||
#SBATCH --mem=16G # 16 Gigabytes | ||
#SBATCH --output=out_%j.log | ||
#SBATCH --ntasks # number of tasks | ||
#SBATCH --mail-type=NONE, BEGIN, END, FAIL, ALL # email events | ||
#SBATCH [email protected] | ||
``` | ||
### Slurm Commands | ||
#### Environment modules | ||
``` | ||
module purge # removes all modules | ||
module avail # lists all modules availble for loading | ||
module list # list all currently loaded modules | ||
module load # loads module (hint: us the tab key to autocomplete) | ||
``` | ||
#### Submitting a job | ||
``` | ||
sbatch script.sh # submit script.sh | ||
``` | ||
#### Checking job status | ||
``` | ||
squeue -u {User} # check submitted jobs in queue | ||
``` | ||
#### Canceling a job or all jobs | ||
``` | ||
scancel {jobid} # Cancel job | ||
scancel -u {User} # Cancel all jobs for user | ||
``` | ||
#### Check job details | ||
``` | ||
jobstats $USER {days} # Check job stats for user for the last {days} | ||
``` | ||
#### Check job efficiency | ||
``` | ||
seff {jobid} | ||
``` | ||
#### Check fairshare | ||
``` | ||
module load slurmtools | ||
levelfs $USER | ||
``` | ||
#### Check user and institution account billings | ||
``` | ||
suuser $USER | ||
suacct amc-general | ||
``` | ||
|
||
#### Example SBATCH | ||
``` | ||
#!/bin/bash | ||
#SBATCH --job-name=Slurm_job # job name "slurm_job" | ||
#SBATCH -t 1-23 # Time 1 day, 23 hours | ||
#SBATCH --mem=16G # 16 Gigabytes of RAM | ||
#SBATCH --output=out_%j.log # std output/error file | ||
#SBATCH --mail-type=END,FAIL # send email on job end/fail | ||
#SBATCH [email protected] # send email to this address | ||
module load python/3.9.6 | ||
module list | ||
``` |
111 changes: 111 additions & 0 deletions
111
lectures/10.hpc_and_parallel_compute/data/parallel_compute_analysis.csv
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,111 @@ | ||
sequences,time_per_sequence(s),core_count | ||
10,0.5,1 | ||
100,0.5,1 | ||
1000,0.5,1 | ||
10000,0.5,1 | ||
100000,0.5,1 | ||
1000000,0.5,1 | ||
10000000,0.5,1 | ||
100000000,0.5,1 | ||
1000000000,0.5,1 | ||
10000000000,0.5,1 | ||
10,0.5,2 | ||
100,0.5,2 | ||
1000,0.5,2 | ||
10000,0.5,2 | ||
100000,0.5,2 | ||
1000000,0.5,2 | ||
10000000,0.5,2 | ||
100000000,0.5,2 | ||
1000000000,0.5,2 | ||
10000000000,0.5,2 | ||
10,0.5,4 | ||
100,0.5,4 | ||
1000,0.5,4 | ||
10000,0.5,4 | ||
100000,0.5,4 | ||
1000000,0.5,4 | ||
10000000,0.5,4 | ||
100000000,0.5,4 | ||
1000000000,0.5,4 | ||
10000000000,0.5,4 | ||
10,0.5,8 | ||
100,0.5,8 | ||
1000,0.5,8 | ||
10000,0.5,8 | ||
100000,0.5,8 | ||
1000000,0.5,8 | ||
10000000,0.5,8 | ||
100000000,0.5,8 | ||
1000000000,0.5,8 | ||
10000000000,0.5,8 | ||
10,0.5,16 | ||
100,0.5,16 | ||
1000,0.5,16 | ||
10000,0.5,16 | ||
100000,0.5,16 | ||
1000000,0.5,16 | ||
10000000,0.5,16 | ||
100000000,0.5,16 | ||
1000000000,0.5,16 | ||
10000000000,0.5,16 | ||
10,0.5,32 | ||
100,0.5,32 | ||
1000,0.5,32 | ||
10000,0.5,32 | ||
100000,0.5,32 | ||
1000000,0.5,32 | ||
10000000,0.5,32 | ||
100000000,0.5,32 | ||
1000000000,0.5,32 | ||
10000000000,0.5,32 | ||
10,0.5,64 | ||
100,0.5,64 | ||
1000,0.5,64 | ||
10000,0.5,64 | ||
100000,0.5,64 | ||
1000000,0.5,64 | ||
10000000,0.5,64 | ||
100000000,0.5,64 | ||
1000000000,0.5,64 | ||
10000000000,0.5,64 | ||
10,0.5,128 | ||
100,0.5,128 | ||
1000,0.5,128 | ||
10000,0.5,128 | ||
100000,0.5,128 | ||
1000000,0.5,128 | ||
10000000,0.5,128 | ||
100000000,0.5,128 | ||
1000000000,0.5,128 | ||
10000000000,0.5,128 | ||
10,0.5,256 | ||
100,0.5,256 | ||
1000,0.5,256 | ||
10000,0.5,256 | ||
100000,0.5,256 | ||
1000000,0.5,256 | ||
10000000,0.5,256 | ||
100000000,0.5,256 | ||
1000000000,0.5,256 | ||
10000000000,0.5,256 | ||
10,0.5,512 | ||
100,0.5,512 | ||
1000,0.5,512 | ||
10000,0.5,512 | ||
100000,0.5,512 | ||
1000000,0.5,512 | ||
10000000,0.5,512 | ||
100000000,0.5,512 | ||
1000000000,0.5,512 | ||
10000000000,0.5,512 | ||
10,0.5,1024 | ||
100,0.5,1024 | ||
1000,0.5,1024 | ||
10000,0.5,1024 | ||
100000,0.5,1024 | ||
1000000,0.5,1024 | ||
10000000,0.5,1024 | ||
100000000,0.5,1024 | ||
1000000000,0.5,1024 | ||
10000000000,0.5,1024 |
10 changes: 10 additions & 0 deletions
10
lectures/10.hpc_and_parallel_compute/data/sequences_to_analyze.txt
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
GCXCCXAGGGTTGCAGTCAAATGTCCA | ||
CGGCCAATGAGGGXCGCXTAGGTCAT | ||
TAGGTGGATACCXCTXATATATGATT | ||
CCXATATTAAGACATATAATTGGAGG | ||
TATTACACGCCCAAATAATTTGGCXA | ||
TCAGCXGCXGGGAAGCGGGCGCXATACT | ||
CGGATGATCATCXGGGATGATGTCTA | ||
GCGCCXGGAAGACGAATCTTAATTA | ||
TTAGGAACXTXXCAATATGTTTCGGT | ||
ACTTCTATGTCTXTGGATTACAAACA |
10 changes: 10 additions & 0 deletions
10
lectures/10.hpc_and_parallel_compute/environments/README.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
# Environment creation | ||
We need to create the environment needed for this lecture and hands on activity. | ||
To do so run the following command from this directory: | ||
```bash | ||
conda env create -f parallel_and_hpc_compute_env.yaml | ||
``` | ||
OR | ||
```bash | ||
mamba env create -f parallel_and_hpc_compute_env.yaml | ||
``` |
16 changes: 16 additions & 0 deletions
16
lectures/10.hpc_and_parallel_compute/environments/parallel_and_hpc_compute_env.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
name: parallel_and_hpc_compute_env | ||
channels: | ||
- conda-forge | ||
- defaults | ||
dependencies: | ||
- python=3.11 | ||
- conda-forge::pandas | ||
- conda-forge::jupyter | ||
- conda-forge::ipykernel | ||
- conda-forge::nbconvert | ||
- conda-forge::pip | ||
- conda-forge::matplotlib | ||
- conda-forge::seaborn | ||
- pip: | ||
- argparse | ||
|
24 changes: 24 additions & 0 deletions
24
...10.hpc_and_parallel_compute/hands_on_activity/5mc_sequence_analysis_activity.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,24 @@ | ||
# Hand on: 5mC sequence analysis activity | ||
|
||
You want to identify the 5mC content in each of 10 sequences. | ||
Where X is 5mC and C is cytosine. | ||
The goal is to identify the number of 5mC in each sequence byt using mutliple compute approaches. | ||
For the sequences below, identify the number of 5mC in each sequence by using the following approaches: | ||
* Serial approach | ||
* Parallel approach | ||
* Python multiprocessing approach | ||
* GNU parallel approach | ||
* HPC approach | ||
|
||
Sequences: | ||
0. GCXCCXAGGGTTGCAGTCAAATGTCC | ||
1. ACTTCTATGTCTXTGGATTACAAACA | ||
2. CGGCCAATGAGGGXCGCXTAGGTCAT | ||
3. TAGGTGGATACCXCTXATATATGATT | ||
4. CCXATATTAAGACATATAATTGGAGG | ||
5. TATTACACGCCCAAATAATTTGGCXA | ||
6. TCAGCXGCXGGGAAGCGGGCGCXATA | ||
7. CGGATGATCATCXGGGATGATGTCTA | ||
8. GCGCCXGGAAGACGAATCTTAATTAX | ||
9. TTAGGAACXTXXCAATATGTTTCGGT | ||
|
Oops, something went wrong.