Skip to content

Commit

Permalink
Added all changes associated with EPCC_Cirrus_slurm.
Browse files Browse the repository at this point in the history
  • Loading branch information
mbareford authored and tkphd committed Jan 10, 2024
1 parent 5f19e45 commit 8525205
Show file tree
Hide file tree
Showing 87 changed files with 532 additions and 246 deletions.
19 changes: 9 additions & 10 deletions _episodes/14-modules.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,16 +27,16 @@ understand the reasoning behind this approach. The three biggest factors are:

Software incompatibility is a major headache for programmers. Sometimes the
presence (or absence) of a software package will break others that depend on
it. Two of the most famous examples are Python 2 and 3 and C compiler versions.
it. Two well known examples are Python and C compiler versions.
Python 3 famously provides a `python` command that conflicts with that provided
by Python 2. Software compiled against a newer version of the C libraries and
then used when they are not present will result in a nasty `'GLIBCXX_3.4.20'
not found` error, for instance.
then run on a machine that has older C libraries installed will result in a
nasty `'GLIBCXX_3.4.20' not found` error.

Software versioning is another common issue. A team might depend on a certain
package version for their research project - if the software version was to
change (for instance, if a package was updated), it might affect their results.
Having access to multiple software versions allow a set of researchers to
Having access to multiple software versions allows a set of researchers to
prevent software versioning issues from affecting their results.

Dependencies are where a particular software package (or even a particular
Expand Down Expand Up @@ -89,10 +89,7 @@ message telling you so
```
{: .language-bash}

```
No Modulefiles Currently Loaded.
```
{: .output}
{% include {{ site.snippets }}/modules/default-modules.snip %}

## Loading and Unloading Software

Expand Down Expand Up @@ -198,7 +195,9 @@ Let's examine the output of `module avail` more closely.
> >
> > ```
> > {{ site.remote.bash_shebang }}
> >
> > {% include {{ site.snippets }}/scheduler/sbatch-options.snip %}
> > {{ site.sched.comment }} {{ site.sched.flag.time }} 00:00:30
> >
> > module load {{ site.remote.module_python3 }}
> >
> > python3 --version
Expand All @@ -212,4 +211,4 @@ Let's examine the output of `module avail` more closely.
> {: .solution}
{: .challenge}
{% include links.md %}
{% include links.md %}
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,7 @@ sched:
info: "sinfo"
comment: "#SBATCH"
hist: "sacct -u yourUsername"
hist_filter: ""

episode_order:
- 10-hpc-intro
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
```
bin etc lib64 proc sbin sys var
boot {{ site.remote.homedir | replace: "/", "" }} mnt root scratch tmp working
dev lib opt run srv usr
```
{: .output}
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
```
No Modulefiles Currently Loaded.
```
{: .output}
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
* **Hostname**: Where did your job run?
* **MaxRSS**: What was the maximum amount of memory used?
* **Elapsed**: How long did the job take?
* **State**: What is the job currently doing/what happened to it?
* **MaxDiskRead**: Amount of data read from disk.
* **MaxDiskWrite**: Amount of data written to disk.
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
> Jobs on an HPC system might run for days or even weeks. We probably have
> better things to do than constantly check on the status of our job with
> `{{ site.sched.status }}`. Looking at the manual page for
> `{{ site.sched.submit.name }}`, can you set up our test job to send you an email
> when it finishes?
>
> > ## Hint
> >
> > You can use the *manual pages* for {{ site.sched.name }} utilities to find
> > more about their capabilities. On the command line, these are accessed
> > through the `man` utility: run `man <program-name>`. You can find the same
> > information online by searching > "man <program-name>".
> >
> > ```
> > {{ site.remote.prompt }} man {{ site.sched.submit.name }}
> > ```
> > {: .language-bash}
> {: .solution}
{: .challenge}
23 changes: 0 additions & 23 deletions _includes/snippets_library/EPCC_Cirrus_pbs/cluster/queue-info.snip

This file was deleted.

This file was deleted.

This file was deleted.

This file was deleted.

This file was deleted.

This file was deleted.

This file was deleted.

This file was deleted.

This file was deleted.

This file was deleted.

This file was deleted.

This file was deleted.

This file was deleted.

This file was deleted.

This file was deleted.

This file was deleted.

This file was deleted.

This file was deleted.

74 changes: 74 additions & 0 deletions _includes/snippets_library/EPCC_Cirrus_slurm/_config_options.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
#------------------------------------------------------------
# EPCC, The University of Edinburgh: Cirrus + PBS Pro
#------------------------------------------------------------

# Cluster host and scheduler options: the defaults come from
# Graham at Compute Canada, running Slurm. Other options can
# be found in the library of snippets,
# `_includes/snippets_library`. To use one, replace options
# below with those in `_config_options.yml` from the
# library. E.g, to customise for Cirrus at EPCC, running
# Slurm, we could replace the options below with those from
#
# _includes/snippets_library/EPCC_Cirrus_slurm/_config_options.yml
#
# If your cluster is not represented in the library, please
# copy an existing folder, rename it, and customize for your
# installation. Remember to keep the leading slash on the
# `snippets` variable below!

snippets: "/snippets_library/EPCC_Cirrus_slurm"

local:
prompt: "[auser@laptop ~]$"
bash_shebang: "#!/bin/bash"

remote:
name: "Cirrus"
login: "login.cirrus.ac.uk"
host: "cirrus-login1"
node: "r1i0n32"
location: "EPCC, The University of Edinburgh"
homedir: "/lustre/home/tc001"
user: "auser"
group: "tc001"
prompt: "[auser@cirrus-login1 ~]$"
bash_shebang: "#!/bin/bash"
module_python3: "anaconda/python3-2021.11"

sched:
name: "Slurm"
submit:
name: "sbatch"
options: "--partition=standard --qos=standard --time=00:02:00"
queue:
debug: "debug"
testing: "testing"
status: "squeue"
flag:
user: "-u auser"
interactive: "--time=00:20:00 --partition=standard --qos=standard --pty /usr/bin/bash --login"
histdetail: "-l -j"
name: "-J"
partition: "-p standard"
qos: "-q standard"
time: "-t"
queue: "-p"
nodes: "-N"
tasks: "-n"
del: "scancel"
interactive: "srun"
info: "sinfo"
comment: "#SBATCH"
hist: "sacct"
hist_filter: "--format=JobID,JobName,State,Elapsed,NodeList,MaxRSS,MaxDiskRead,MaxDiskWrite"

episode_order:
- 11-hpc-intro
- 12-cluster
- 13-scheduler
- 14-modules
- 15-transferring-files
- 16-parallel
- 17-resources
- 18-responsibility
Loading

0 comments on commit 8525205

Please sign in to comment.