From d6dddc63f818eace67ae8efe011cdaee70b3ffae Mon Sep 17 00:00:00 2001 From: samumantha Date: Thu, 28 Sep 2023 22:59:16 +0300 Subject: [PATCH] cleanup --- materials/exercise_basics.md | 5 --- materials/fair_share.md | 24 +++++++++- materials/htc.md | 86 ++++++++++-------------------------- materials/own_project.md | 20 +++------ 4 files changed, 53 insertions(+), 82 deletions(-) diff --git a/materials/exercise_basics.md b/materials/exercise_basics.md index da48d22e..8c7fc4b2 100644 --- a/materials/exercise_basics.md +++ b/materials/exercise_basics.md @@ -2,10 +2,6 @@ ## Interactive ---- -topic: Batch jobs -title: Tutorial - Interactive batch jobs ---- # Batch job tutorial - Interactive jobs @@ -160,5 +156,4 @@ squeue -u $USER 💡 [FAQ on CSC batch jobs](https://docs.csc.fi/support/faq/#batch-jobs) in Docs CSC ---- diff --git a/materials/fair_share.md b/materials/fair_share.md index 7ec01e5f..0ea342ee 100644 --- a/materials/fair_share.md +++ b/materials/fair_share.md @@ -20,7 +20,7 @@ TODO: restaurant analogy? * Getting started with Slurm batch jobs on Puhti/Mahti and LUMI -## Queing +## Queueing - A job is queued and starts when the requested resources become available - The order in which the queued jobs start depends on their priority and currently available resources @@ -30,6 +30,28 @@ TODO: restaurant analogy? - Some queues have a lower priority (e.g. _longrun_ -- use shorter if you can!) - See our documentation for more information on [Getting started with running batch jobs on Puhti/Mahti](https://docs.csc.fi/computing/running/getting-started/) and [LUMI](https://docs.lumi-supercomputer.eu/runjobs/). +### Optimal usage on multi-user computing platforms + +- The computing resources are shared among hundreds of your colleagues, who all have different resource needs +- Resources allocated to your job are not available for others to use + - Important to request only the resources you need and ensure that the resources are used efficiently +- Even if you _can_ use more resources, should you? + +### One resource type will be a bottleneck + +
+- A single node can host many jobs from different users +- Different jobs need different resources +- Typically, the cores run out before the memory does +- Sometimes a job uses only one core, but will consume all memory + - No further jobs will fit in the node + - If the job is _not_ using the memory (just reserving it), resources are wasted +
+
+![](img/node-cpu-full.svg "Node cpu full"){width=45%} +![](img/node-mem-full.svg "Node memory full from one job"){width=45%} +
+ # Schema of how the batch job scheduler works ![](./images/slurm-sketch.svg) \ No newline at end of file diff --git a/materials/htc.md b/materials/htc.md index fe9d2114..72e32a6d 100644 --- a/materials/htc.md +++ b/materials/htc.md @@ -1,32 +1,17 @@ # High Throughput Computing (HTC) and parallelization -## Slurm accounting - -### Optimal usage on multi-user computing platforms - -- The computing resources are shared among hundreds of your colleagues, who all have different resource needs -- Resources allocated to your job are not available for others to use - - Important to request only the resources you need and ensure that the resources are used efficiently -- Even if you _can_ use more resources, should you? +## Running things at same time -### One resource type will be a bottleneck +* within batch script +

→ array job, GNU parallel

+* within python script +

→ multiprocessing, joblib, dask

+* within R script +

→ future, foreach, snow

-
-- A single node can host many jobs from different users -- Different jobs need different resources -- Typically, the cores run out before the memory does -- Sometimes a job uses only one core, but will consume all memory - - No further jobs will fit in the node - - If the job is _not_ using the memory (just reserving it), resources are wasted -
-
-![](img/node-cpu-full.svg "Node cpu full"){width=45%} -![](img/node-mem-full.svg "Node memory full from one job"){width=45%} -
+## Slurm accounting: batch job resource usage -### Slurm accounting: batch job resource usage 1/2 -
- Resource usage can be queried with `seff ` - Points to pay attention to: - Low CPU Efficiency: @@ -39,12 +24,9 @@ - Lots of caveats here - Low GPU efficiency: - Better to use CPUs? Disk I/O? -
-
+ ![](img/seff-output-new.png "Seff output"){width=90%} -
-### Slurm accounting: batch job resource usage 2/2 - Not all usage is captured by Slurm accounting - If CPU efficiency seems too low, look at the completion time @@ -83,7 +65,7 @@ - Parallelism simplified: - You use hundreds of ordinary computers simultaneously to solve a single problem -# First steps for fast jobs (1/2) +# First steps for fast jobs - Spend a little time to investigate: - Which of the available software would be the best to solve the kind of problem you have? @@ -92,16 +74,13 @@ - The software that solves your problem fastest might not always be the best - Issues like ease-of-use and compute power/memory/disk demands are also highly relevant - Quite often it is useful to start simple and gradually use more complex approaches if needed - -# First steps for fast jobs (2/2) - - When you've found the software you want to use, check if it is available at CSC as a [pre-installed optimized version](https://docs.csc.fi/apps/) - Familiarize yourself with the software manual, if available - If you need to install a software package distributed through Conda, [you need to containerize it](https://docs.csc.fi/computing/usage-policy/#conda-installations) - Containerizing greatly speeds up performance at startup and can be done easily with the [Tykky wrapper](https://docs.csc.fi/computing/containers/tykky/) - If you can't find suitable software, consider writing your own code -# Optimize the performance of your own code (1/2) +# Optimize the performance of your own code - If you have written your own code, compile it with optimizing compiler options - Docs CSC: compiling on [Puhti](https://docs.csc.fi/computing/compiling-puhti/) and [Mahti](https://docs.csc.fi/computing/compiling-mahti/) @@ -110,9 +89,6 @@ - Docs CSC: [Queue options](https://docs.csc.fi/computing/running/batch-job-partitions/) - [Available partitions on LUMI](https://docs.lumi-supercomputer.eu/runjobs/scheduled-jobs/partitions/) - Use the test case to optimize computations before starting massive ones - -# Optimize the performance of your own code (2/2) - - Use profiling tools to find out how much time is spent in different parts of the code - Docs CSC: [Performance analysis](https://docs.csc.fi/computing/performance/) - [Profiling on LUMI](https://docs.lumi-supercomputer.eu/development/profiling/strategies/) @@ -144,7 +120,13 @@ - [GPUs in Mahti batch jobs](https://docs.csc.fi/computing/running/creating-job-scripts-mahti/#gpu-batch-jobs) - [GPUs in LUMI batch jobs](https://docs.lumi-supercomputer.eu/runjobs/scheduled-jobs/lumig-job/) -# What is MPI? + +:::{note} +:class: dropdown + +## Advanced topics + +### What is MPI? - MPI (Message Passing Interface) is a widely used standard for writing software that runs in parallel - MPI utilizes parallel **processes** that _do not share memory_ @@ -152,7 +134,7 @@ - Communication can be a performance bottleneck - MPI is required when running on multiple nodes -# What is OpenMP? +### What is OpenMP? - OpenMP (Open Multi-Processing) is a standard that utilizes compute cores that share memory, i.e. **threads** - They do not need to send messages between each other @@ -160,7 +142,7 @@ - This appears when different compute cores process and update the same data without proper synchronization - OpenMP is restricted to a single node -# Self study materials for OpenMP and MPI +### Self study materials for OpenMP and MPI - There are many tutorials available online - Look with simple searches for _e.g._ "MPI tutorial" @@ -168,6 +150,9 @@ - Available on [GitHub](https://github.com/csc-training/parallel-prog/) - See also the [materials of CSC Summer School in HPC](https://github.com/csc-training/summerschool) + +::: + # Task farming -- running multiple independent jobs simultaneously - Task farming == running many similar independent jobs simultaneously @@ -178,8 +163,6 @@ - Guidelines and solutions are suggested in [Docs CSC](https://docs.csc.fi/computing/running/throughput/) - Many options: [FireWorks](https://docs.csc.fi/computing/running/fireworks/), [Nextflow](https://docs.csc.fi/support/tutorials/nextflow-puhti/), [Snakemake](https://snakemake.github.io/), [Knime](https://www.knime.com/), [BioBB](http://mmb.irbbarcelona.org/biobb/), ... -# Task farming 2.0 - - Before opting for a workflow manager, check if the code you run has built-in high-throughput features - Many chemistry software ([CP2K](https://docs.csc.fi/apps/cp2k/#high-throughput-computing-with-cp2k), [GROMACS](https://docs.csc.fi/apps/gromacs/#high-throughput-computing-with-gromacs), [Amber](https://docs.csc.fi/apps/amber/#high-throughput-computing-with-amber), _etc._) provide methods for efficient task farming - Also [Python](https://docs.csc.fi/apps/python/#python-parallel-jobs) and [R](https://docs.csc.fi/support/tutorials/parallel-r/), if you write your own code @@ -208,15 +191,12 @@ - Also, if you process lots of data, make sure you [use the disk efficiently](https://docs.csc.fi/support/tutorials/ml-data/#using-the-shared-file-system-efficiently) - Does your code run on AMD GPUs? [LUMI](https://docs.lumi-supercomputer.eu/hardware/compute/lumig/) has a massive GPU capacity! -# Tricks of the trade 1/4 +# Tricks of the trade - Although it is reasonable to try to achieve best performance by using the fastest computers available, it is not the only important issue - Different codes may give very different performance for a given use case - Compare the options you have in [CSC's software selection](https://docs.csc.fi/apps/) - Before launching massive simulations, look for the most efficient algorithms to get the job done - -# Tricks of the trade 2/4 - - Well-known boosters are: - Enhanced sampling methods _vs._ brute force molecular dynamics - Machine learning methods @@ -228,17 +208,11 @@ - When using separate runs to scan a parameter space, start with a coarse scan, and improve resolution where needed - Be mindful of the number of jobs/job steps, use meta-schedulers if needed - Try to use or implement checkpoints/restarts in your software, and _check results between restarts_ - -# Tricks of the trade 3/4 - - Try to formulate your scientific results when you have a minimum amount of computational results - Helps to clarify what you still need to compute, what computations would be redundant and what data you need to store - Reserving more memory and/or more compute cores does not necessary equal faster computations - Check with `seff`, `sacct` and from software-specific log files if the memory was used and whether the job ran faster - Testing for optimal amount of cores and memory is advised before performing massive computations - -# Tricks of the trade 4/4 - - If possible, running the same job on a laptop may be useful for comparison - Avoid unnecessary reads and writes of data and containerize Conda environments to improve I/O performance - Read and write in big chunks and avoid reading/writing lots of small files @@ -249,22 +223,10 @@ - Don't run too long jobs without a restart option - Increased risk of something going wrong, resulting in lost time/results -## Speed up --> https://a3s.fi/CSC_training/10_speed_up_jobs.html -"Running large amount of jobs, often same analysis to different input data." -

→ Map sheets, tiles, data from different areas

-## Running things at same time - -* within batch script -

→ array job, GNU parallel

-* within python script -

→ multiprocessing, joblib, dask

-* within R script -

→ future, foreach, snow

diff --git a/materials/own_project.md b/materials/own_project.md index 38971eb0..e9e229e5 100644 --- a/materials/own_project.md +++ b/materials/own_project.md @@ -11,6 +11,10 @@ On project organization: [CodeRefinery lesson - Reproducible research](https://coderefinery.github.io/reproducible-research/organizing-projects/)
[CodeRefinery lesson - Modular code development](https://coderefinery.github.io/modular-type-along/instructor-guide/) +## Making use of HPC resources + +* Just moving a script to HPC does not make it run faster +* ## Moving from GUI to CLI/scripts @@ -32,7 +36,7 @@ Google the toolname from eg QGIS and the desired scripting language to get some * know your resources * make sure your code works as expected before moving to HPC -# Before starting large-scale calculations +## Before starting large-scale calculations - Check how the software and your actual input performs - Common job errors are caused by typos in batch/input scripts @@ -42,7 +46,7 @@ Google the toolname from eg QGIS and the desired scripting language to get some - It's _much worse_ to always run with excessively large requests "just in case" -# Running a new application in Puhti 1/2 +## Running a new application in Puhti - If it comes with tutorials, do at least one - This will likely be the fastest way forward @@ -54,9 +58,6 @@ Google the toolname from eg QGIS and the desired scripting language to get some - Perhaps it is easier to find the correct command line options - Use the `top` command to get rough estimate of memory use, _etc_. - If developers provide some test or example data, run it first and make sure results are correct - -# Running a new application in Puhti 2/2 - - You can use the _test_ queue to check that your batch job script is correct - Limits : 15 min, 2 nodes - Job turnaround usually very fast even if machine is "full" @@ -67,17 +68,8 @@ Google the toolname from eg QGIS and the desired scripting language to get some - How many cores to allocate? - This depends on many things, so you have to try, see our [instructions about a scaling test](https://docs.csc.fi/support/tutorials/cmdline-handson/#scaling-test-for-an-mpi-parallel-job) -#### Python -* Package availability: `module load geoconda` and `list-packages` ([adding Python packages for your own usage](https://docs.csc.fi/apps/python/#installing-python-packages-to-existing-modules)) -#### R -#### Other -#### Making use of HPC resources - -* Just moving a script to HPC does not make it run faster -* -