How do I reduce Pilgrim overhead? #38

michael-beebe · 2023-08-11T18:14:18Z

Any scientific application (LAMMPS, WarpX) I try to plug pilgrim into seems to end up not being able to run at all. For example, I used a very simple LJ LAMMPS potential with a very small problem size that finishes running in about 15 seconds on a single node. When I turn on Pilgrim like so:

`#!/bin/bash -l
#SBATCH ...

ml load cray-mpich/8.1.25
ml load PrgEnv-gnu/8.3.3

export PILGRIM_INSTALL=""
export PILGRIM_DEBUG=0
export PILGRIM_TIMING_MODE=ZSTD # or LOSSLESS, or AGGREGATED, i've tried them all
export PILGRIM_TRACING=ON
export PILGRIM_TRACING_MODE=DEFAULT
pilgrim_flags="--export=ALL,LD_PRELOAD=${PILGRIM_INSTALL}/.libs/libpilgrim.so"

EXE=../bin/warpx.3d.MPI.CUDA.DP.PDP.OPMD.QED
INPUTS=./inputs

export MPICH_OFI_NIC_POLICY=GPU
GPU_AWARE_MPI="amrex.use_gpu_aware_mpi=1"
SRUN_FLAGS="--cpus-per-task=16 --cpu-bind=cores"

srun --cpu-bind=cores $pilgrim_flags bash -c "
export CUDA_VISIBLE_DEVICES=$((3-SLURM_LOCALID));
${EXE} ${INPUTS} ${GPU_AWARE_MPI}" \

${PILGRIM_INSTALL}/pilgrim2text ./pilgrim-logs`

The job times out after 3 hours. Any suggestions to reduce the overhead so I can get the job to finish? I have been successful with small test cases but not with any "real world" apps.

wangvsa · 2023-08-11T19:45:34Z

Did you see any output suggesting the simulation was still running? 15 secs vs 3 hours doesn't seem to be an overhead issue. More likely a deadlock/blocking bug in Pilgrim. Can you share your LAMMPS configuration file? I could try it on my side.

michael-beebe · 2023-08-11T23:31:54Z

Sure thing! Here is the input I mentioned

`# 3d Lennard-Jones melt

variable N index off # Newton Setting
variable w index 0 # Warmup Timesteps
variable t index 10 # Main Run Timesteps
variable m index 1 # Main Run Timestep Multiplier
variable n index 0 # Use NUMA Mapping for Multi-Node
variable p index 0 # Use Power Measurement

variable x index 1
variable y index 1
variable z index 1

variable xx equal $x
variable yy equal $y
variable zz equal $z
variable rr equal floor($t*$m)

newton $N
if "$n > 0" then "processors * * * grid numa"

units lj
atom_style atomic

lattice fcc 0.8442
region box block 0 ${xx} 0 ${yy} 0 ${zz}
create_box 1 box
create_atoms 1 box
mass 1 1.0

velocity all create 1.44 87287 loop geom

pair_style lj/cut 2.5
pair_coeff 1 1 1.0 1.0 2.5

neighbor 0.3 bin
neigh_modify delay 0 every 20 check no

fix 1 all nve
thermo 1000

if "$p > 0" then "run_style verlet/power"

if "$w > 0" then "run $w"
run ${rr}`

wangvsa · 2023-08-15T18:00:30Z

Just tried your input and a few other configurations for LAMMPS and they all worked fine on my side.
Which machine were you using? It's unlikely the issue, but could you try some applications without GPU?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How do I reduce Pilgrim overhead? #38

How do I reduce Pilgrim overhead? #38

michael-beebe commented Aug 11, 2023

wangvsa commented Aug 11, 2023

michael-beebe commented Aug 11, 2023

wangvsa commented Aug 15, 2023

How do I reduce Pilgrim overhead? #38

How do I reduce Pilgrim overhead? #38

Comments

michael-beebe commented Aug 11, 2023

wangvsa commented Aug 11, 2023

michael-beebe commented Aug 11, 2023

wangvsa commented Aug 15, 2023