For most of the exercises, skeleton codes are provided both for
Fortran and C/C++ in the corresponding subdirectory. Some exercise
skeletons have sections marked with “TODO” for completing the
exercises. In addition, all of the
exercises have exemplary full codes (that can be compiled and run) in the
solutions
folder. Note that these are seldom the only or even the best way to
solve the problem.
The exercise material can be downloaded with the command
git clone https://github.com/csc-training/advanced-mpi.git
If you have a GitHub account you can also Fork this repository and clone then your fork.
Exercises can be carried out using the CSC's Puhti supercomputer. See CSC User Documentation for general instructions on using Puhti.
In case you have working parallel program development environment in your laptop (Fortran or C compiler, MPI development library, etc.) you may also use that. Note, however, that no support for installing MPI environment can be provided during the course.
Puhti can be accessed via ssh using the
provided username (trainingxxx
) and password:
ssh -Y [email protected]
For editing program source files you can use e.g. nano editor:
nano prog.f90 &
(^
in nano's shortcuts refer to Ctrl key, i.e. in order to save file and exit editor press Ctrl+X
)
Also other popular editors (emacs, vim, gedit) are available.
All the exercises in the supercomputers should be carried out in the
scratch disk area. The name of the scratch directory can be
queried with the command csc-workspaces
. As the base directory is
shared between members of the project, you should create your own
directory:
cd /scratch/project_2000745
mkdir -p $USER
cd $USER
Compilation of the MPI programs can be performed with the mpif90
,
mpicxx
, and mpicc
wrapper commands:
mpif90 -o my_mpi_exe test.f90
or
mpicxx -o my_mpi_exe test.cpp
or
mpicc -o my_mpi_exe test.c
The wrapper commands include automatically all the flags needed for building MPI programs.
In order to use HDF5 in Puhti, you need the load the HDF5 module with MPI I/O support:
module load hdf5/1.10.4-mpi
When building programs, -lhdf5
needs to be added to linker flags, e.g.
mpif90 -o my_hdf5_exe test.f90 -lhdf5
or setting LDFLAGS
etc. in a Makefile:
LDFLAGS=... -lhdf5
In Puhti, programs need to be executed via the batch job system. Simple job running with 4 MPI tasks can be submitted with the following batch job script:
#!/bin/bash
#SBATCH --job-name=example
#SBATCH --account=project_2000745
#SBATCH --partition=small
#SBATCH --reservation=advance-mpi
#SBATCH --time=00:05:00
#SBATCH --ntasks=4
srun my_mpi_exe
Save the script e.g. as job.sh
and submit it with sbatch job.sh
.
The output of job will be in file slurm-xxxxx.out
. You can check the status of your jobs with squeue -u $USER
and kill possible hanging applications with
scancel JOBID
.
The reservation mpi_intro
is available during the course days and it
is accessible only with the training user accounts.
In most MPI implementations parallel program can be started with the mpiexec
launcher:
mpiexec -n 4 ./my_mpi_exe
The Allinea DDT parallel debugger is available in CSC
supercomputers. In order to use the debugger, build your code first with the -g
flag. The DDT is
then enabled via the module system:
module load ddt
The debugger is run in an interactive session, and for proper
functioning the environment variable SLURM_OVERLAP
needs to be set.
- Set
SLURM_OVERLAP
and request Slurm allocation interactively:
export SLURM_OVERLAP=1
salloc --nodes=1 --ntasks-per-node=2 --account=project_2000745 --partition=small --reservation=mpi_intro
- Start the application under debugger
ddt srun ./buggy
For smoother GUI performance, we recommend using NoMachine remote desktop to connect to Puhti.
Start by loading scorep
and scalasca
modules:
module load scorep scalasca
Instrument the application by prepeding compile command with scorep
:
scorep mpicc -o my_mpi_app my_mpi_code.c
Collect and create flat profile by prepending srun
with scan
:
...
#SBATCH --ntasks=8
module load scalasca
scan srun ./my_mpi_app
Scalasca analysis report explorer square
does not work currently in
the CSC supercomputers, but the experiment directory can be copied to
local workstation for visual analysis:
(On local workstation)
rsync -r puhti.csc.fi:/path_to_rundir/scorep_my_mpi_app_8_sum .
The scorep-score
command can be used also in the supercomputers to
estimate storage requirements before starting tracing:
scorep-score -r scorep_my_mpi_app_8_sum/profile.cubex
In order to collect and analyze the trace, add -q
and -t
options
to scan
:
...
#SBATCH --ntasks=8
module load scalasca
scan -q -t srun ./my_mpi_app
The experiment directory containing the trace can now be copied to local workstation for visual analysis:
rsync -r puhti.csc.fi:/path_to_rundir/scorep_my_mpi_app_8_trace .
On CSC supercomputers, one can use Intel Traceanalyzer for
investigating the trace (Traceanalyzer can read the .otf2
produced
by ScoreP / Scalasca):
module load intel-itac
traceanalyzer &
Next, choose the "Open" dialog and select the trace.otf2
file within
the experiment directory (e.g. scorep_my_mpi_app_8_trace
). For smoother GUI
performance, we recommend using NoMachine remote desktop
to connect to Puhti.
More information about Scalasca can be found in CSC User Documentation