Skip to content

Commit

Permalink
Upload hands on activity for distubuted comput and multiprocessing (#16)
Browse files Browse the repository at this point in the history
* Initial commit

* scripts

* HPC script

* HPC script

* HPC troubleshooting

* HPC script working

* combining files

* combining files

* alpine tested

* almost final

* parallel compute graph

* upload cheatsheet

* update the README

* Update lectures/10.HPC_and_parallel_compute/README.md

Co-authored-by: Gregory Way <[email protected]>

* Update lectures/10.HPC_and_parallel_compute/SLURM_cheatsheet.md

Co-authored-by: Gregory Way <[email protected]>

* Update lectures/10.HPC_and_parallel_compute/SLURM_cheatsheet.md

Co-authored-by: Gregory Way <[email protected]>

* Update lectures/10.HPC_and_parallel_compute/scripts/analyze_sequences.py

Co-authored-by: Gregory Way <[email protected]>

* Update lectures/10.HPC_and_parallel_compute/SLURM_cheatsheet.md

Co-authored-by: Gregory Way <[email protected]>

* Update lectures/10.HPC_and_parallel_compute/SLURM_cheatsheet.md

Co-authored-by: Gregory Way <[email protected]>

* Update lectures/10.HPC_and_parallel_compute/SLURM_cheatsheet.md

Co-authored-by: Gregory Way <[email protected]>

* Update lectures/10.HPC_and_parallel_compute/SLURM_cheatsheet.md

Co-authored-by: Gregory Way <[email protected]>

* Update lectures/10.HPC_and_parallel_compute/SLURM_cheatsheet.md

Co-authored-by: Gregory Way <[email protected]>

* Update lectures/10.HPC_and_parallel_compute/scripts/analyze_sequences.py

Co-authored-by: Gregory Way <[email protected]>

* updates lecture 10 files

* course files udpate

* add python script to utils

* HPC run

* add the data

* run all analyses

* update activity

---------

Co-authored-by: Gregory Way <[email protected]>
  • Loading branch information
MikeLippincott and gwaybio authored Nov 7, 2024
1 parent 4eacc26 commit 84e6f3b
Show file tree
Hide file tree
Showing 22 changed files with 865 additions and 0 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -5,4 +5,5 @@
*.Rproj
.Rhistory
.RData
*__pycache__/
*.snakemake
49 changes: 49 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
repos:
- repo: https://gitlab.com/vojko.pribudic.foss/pre-commit-update
rev: v0.6.0post1 # Insert the latest tag here
hooks:
- id: pre-commit-update
args: [--exclude, black, --keep, isort]
# Formats import order
- repo: https://github.com/pycqa/isort
rev: 5.12.0
hooks:
- id: isort
name: isort (python)
args: ["--profile", "black", "--filter-files"]

#Code formatter for both python files and jupyter notebooks
- repo: https://github.com/psf/black
rev: 22.10.0
hooks:
- id: black-jupyter
- id: black
language_version: python3.10

- repo: https://github.com/nbQA-dev/nbQA
rev: 1.9.0
hooks:
- id: nbqa-isort
additional_dependencies: [isort==5.6.4]
args: [--profile=black]


# remove unused imports
- repo: https://github.com/hadialqattan/pycln.git
rev: v2.4.0
hooks:
- id: pycln

# additional hooks found with in the pre-commit lib
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v5.0.0
hooks:
- id: trailing-whitespace # removes trailing white spaces
- id: mixed-line-ending # removes mixed end of line
args:
- --fix=lf
- id: pretty-format-json # JSON Formatter
args:
- --autofix
- --indent=4
- --no-sort-keys
26 changes: 26 additions & 0 deletions lectures/10.hpc_and_parallel_compute/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
# Lecture 10: High performance computing and parallel computing

This lecture will cover parallel computing and high performance computing.
We have the following learning objectives:
1. Familiarize with the concept of parallel computing
2. Understand how to leverage parallel computing
3. Learn about high performance computing
4. Understand how to leverage high performance computing
5. Learn how to use HPC resources and best practices

We will be using some pre-written scripts to explore parallel computing and high performance computing.
The following scripts are available in the [scripts](./scripts) directory:
* [analyze_sequences](scripts/analyze_sequences.py)
* This script contains the core sequence analysis function that we use to analyze sequences.
Note this script is can be run for a single sequence and in a serial fashion but we will also call to be parallelized.
* [multiprocessing_run](scripts/multiprocessing_run.sh)
* This script runs itself in parallel using the `multiprocessing` module in Python.
This shell script calls the `multiprocessing_sequence_analysis.py` script below
* [multiprocessing_sequence_analysis](scripts/multiprocessing_sequence_analysis.py)
* The script is called by the `multiprocessing_run.sh` script.
* [plot_parallel_compute_analysis](scripts/plot_parallel_compute_analysis.py)
* This script plots the results of the parallel computing analysis.
* [serial_run](scripts/serial_run.sh)
* This script runs the `analyze_sequences.py` script in serial.
* [submit_jobs_HPC](scripts/submit_jobs_HPC.sh)
* This script submits jobs to the HPC cluster in an array job.
79 changes: 79 additions & 0 deletions lectures/10.hpc_and_parallel_compute/SLURM_cheatsheet.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
# Slurm Guide

For bash scripts, this line should be the first line of code in every script
```
#!/bin/bash # Shebang slash bin slash bash
```

Next are the SBATCH directives that tell slurm scheduler how to handle your job.
These directives should be at the top of your script, but under the shebang line.

### Frequent SLURM directives
```
#SBATCH --job-name=parallel_job # job name
#SBATCH -t 1-23:59:59 # D-HH-MM-SS
#SBATCH -t 59 # MM
#SBATCH -t 59:59 # MM:SS
#SBATCH -t 59:59:59 # HH:MM:SS
#SBATCH -t 1-23 # D-HH
#SBATCH -t 1-23:59 # D-HH-MM
#SBATCH --mem=16G # 16 Gigabytes
#SBATCH --output=out_%j.log
#SBATCH --ntasks # number of tasks
#SBATCH --mail-type=NONE, BEGIN, END, FAIL, ALL # email events
#SBATCH [email protected]
```
### Slurm Commands
#### Environment modules
```
module purge # removes all modules
module avail # lists all modules availble for loading
module list # list all currently loaded modules
module load # loads module (hint: us the tab key to autocomplete)
```
#### Submitting a job
```
sbatch script.sh # submit script.sh
```
#### Checking job status
```
squeue -u {User} # check submitted jobs in queue
```
#### Canceling a job or all jobs
```
scancel {jobid} # Cancel job
scancel -u {User} # Cancel all jobs for user
```
#### Check job details
```
jobstats $USER {days} # Check job stats for user for the last {days}
```
#### Check job efficiency
```
seff {jobid}
```
#### Check fairshare
```
module load slurmtools
levelfs $USER
```
#### Check user and institution account billings
```
suuser $USER
suacct amc-general
```

#### Example SBATCH
```
#!/bin/bash
#SBATCH --job-name=Slurm_job # job name "slurm_job"
#SBATCH -t 1-23 # Time 1 day, 23 hours
#SBATCH --mem=16G # 16 Gigabytes of RAM
#SBATCH --output=out_%j.log # std output/error file
#SBATCH --mail-type=END,FAIL # send email on job end/fail
#SBATCH [email protected] # send email to this address
module load python/3.9.6
module list
```
Original file line number Diff line number Diff line change
@@ -0,0 +1,111 @@
sequences,time_per_sequence(s),core_count
10,0.5,1
100,0.5,1
1000,0.5,1
10000,0.5,1
100000,0.5,1
1000000,0.5,1
10000000,0.5,1
100000000,0.5,1
1000000000,0.5,1
10000000000,0.5,1
10,0.5,2
100,0.5,2
1000,0.5,2
10000,0.5,2
100000,0.5,2
1000000,0.5,2
10000000,0.5,2
100000000,0.5,2
1000000000,0.5,2
10000000000,0.5,2
10,0.5,4
100,0.5,4
1000,0.5,4
10000,0.5,4
100000,0.5,4
1000000,0.5,4
10000000,0.5,4
100000000,0.5,4
1000000000,0.5,4
10000000000,0.5,4
10,0.5,8
100,0.5,8
1000,0.5,8
10000,0.5,8
100000,0.5,8
1000000,0.5,8
10000000,0.5,8
100000000,0.5,8
1000000000,0.5,8
10000000000,0.5,8
10,0.5,16
100,0.5,16
1000,0.5,16
10000,0.5,16
100000,0.5,16
1000000,0.5,16
10000000,0.5,16
100000000,0.5,16
1000000000,0.5,16
10000000000,0.5,16
10,0.5,32
100,0.5,32
1000,0.5,32
10000,0.5,32
100000,0.5,32
1000000,0.5,32
10000000,0.5,32
100000000,0.5,32
1000000000,0.5,32
10000000000,0.5,32
10,0.5,64
100,0.5,64
1000,0.5,64
10000,0.5,64
100000,0.5,64
1000000,0.5,64
10000000,0.5,64
100000000,0.5,64
1000000000,0.5,64
10000000000,0.5,64
10,0.5,128
100,0.5,128
1000,0.5,128
10000,0.5,128
100000,0.5,128
1000000,0.5,128
10000000,0.5,128
100000000,0.5,128
1000000000,0.5,128
10000000000,0.5,128
10,0.5,256
100,0.5,256
1000,0.5,256
10000,0.5,256
100000,0.5,256
1000000,0.5,256
10000000,0.5,256
100000000,0.5,256
1000000000,0.5,256
10000000000,0.5,256
10,0.5,512
100,0.5,512
1000,0.5,512
10000,0.5,512
100000,0.5,512
1000000,0.5,512
10000000,0.5,512
100000000,0.5,512
1000000000,0.5,512
10000000000,0.5,512
10,0.5,1024
100,0.5,1024
1000,0.5,1024
10000,0.5,1024
100000,0.5,1024
1000000,0.5,1024
10000000,0.5,1024
100000000,0.5,1024
1000000000,0.5,1024
10000000000,0.5,1024
10 changes: 10 additions & 0 deletions lectures/10.hpc_and_parallel_compute/data/sequences_to_analyze.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
GCXCCXAGGGTTGCAGTCAAATGTCCA
CGGCCAATGAGGGXCGCXTAGGTCAT
TAGGTGGATACCXCTXATATATGATT
CCXATATTAAGACATATAATTGGAGG
TATTACACGCCCAAATAATTTGGCXA
TCAGCXGCXGGGAAGCGGGCGCXATACT
CGGATGATCATCXGGGATGATGTCTA
GCGCCXGGAAGACGAATCTTAATTA
TTAGGAACXTXXCAATATGTTTCGGT
ACTTCTATGTCTXTGGATTACAAACA
10 changes: 10 additions & 0 deletions lectures/10.hpc_and_parallel_compute/environments/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
# Environment creation
We need to create the environment needed for this lecture and hands on activity.
To do so run the following command from this directory:
```bash
conda env create -f parallel_and_hpc_compute_env.yaml
```
OR
```bash
mamba env create -f parallel_and_hpc_compute_env.yaml
```
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
name: parallel_and_hpc_compute_env
channels:
- conda-forge
- defaults
dependencies:
- python=3.11
- conda-forge::pandas
- conda-forge::jupyter
- conda-forge::ipykernel
- conda-forge::nbconvert
- conda-forge::pip
- conda-forge::matplotlib
- conda-forge::seaborn
- pip:
- argparse

Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
# Hand on: 5mC sequence analysis activity

You want to identify the 5mC content in each of 10 sequences.
Where X is 5mC and C is cytosine.
The goal is to identify the number of 5mC in each sequence byt using mutliple compute approaches.
For the sequences below, identify the number of 5mC in each sequence by using the following approaches:
* Serial approach
* Parallel approach
* Python multiprocessing approach
* GNU parallel approach
* HPC approach

Sequences:
0. GCXCCXAGGGTTGCAGTCAAATGTCC
1. ACTTCTATGTCTXTGGATTACAAACA
2. CGGCCAATGAGGGXCGCXTAGGTCAT
3. TAGGTGGATACCXCTXATATATGATT
4. CCXATATTAAGACATATAATTGGAGG
5. TATTACACGCCCAAATAATTTGGCXA
6. TCAGCXGCXGGGAAGCGGGCGCXATA
7. CGGATGATCATCXGGGATGATGTCTA
8. GCGCCXGGAAGACGAATCTTAATTAX
9. TTAGGAACXTXXCAATATGTTTCGGT

Loading

0 comments on commit 84e6f3b

Please sign in to comment.