Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Updated wrap_rrdesi to fix multiple use cases.
The root bug was that if the number of GPUs available was > the rank of the communicator, it was making a bad assumption that you wanted to use at least ngpu ranks. So when calling wrap_rrdesi directly without srun, the length of the communicator was obviously 1 but there were 4 GPUs in the node so it was splitting the input files and rank 0 was only taking 1/4 of them but there were no other ranks to run anything. I fixed this, added informative warning messages where appropriate, and cleaned up the login node logic that had been copy/pasted from elsewhere. Here are a bunch of example test cases: Run on login node cdwarner@perlmutter:login16:/global/cfs/cdirs/desi/users/cdwarner/code/desispec/bin> ./wrap_rrdesi -i $MYRRDIR/list_coadds.ascii -o $SCRATCH/wrap/ --gpu --overwrite wrap_rrdesi should not be run on a login node. The following were all run after getting an interactive node with salloc -N 1 -C gpu -q interactive -t 3:00:00 -A desi_g --gpus-per-node=4 Run directly - now works with warnings: cdwarner@nid001173:/global/cfs/cdirs/desi/users/cdwarner/code/desispec/bin> ./wrap_rrdesi -i $MYRRDIR/list_coadds.ascii -o $SCRATCH/wrap/ --gpu --overwrite WARNING: Detected that wrap_rrdesi is not being run with srun command. WARNING: Calling directly can lead to under-utilizing resources. Recommended syntax: srun -N nodes -n tasks -c 2 --gpu-bind=map_gpu:3,2,1,0 ./wrap_rrdesi [options] Ex: 8 tasks each with GPU support on 2 nodes: srun -N 2 -n 8 -c 2 --gpu-bind=map_gpu:3,2,1,0 wrap_rrdesi ... Ex: 64 tasks on 1 node and 4 GPUs - this will run on both GPU and non-GPU nodes at once: srun -N 1 -n 64 -c 2 --gpu-bind=map_gpu:3,2,1,0 wrap_rrdesi ... WARNING: wrap_rrdesi was called with 4 GPUs but only 1 MPI ranks. WARNING: Will only use 1 GPUs. Running 18 input files on 1 GPUs and 1 total procs... Run with srun and n < ngpu: cdwarner@nid001173:/global/cfs/cdirs/desi/users/cdwarner/code/desispec/bin> srun -N 1 -n 2 -c 2 --gpu-bind=map_gpu:3,2,1,0 ./wrap_rrdesi -i $MYRRDIR/list_coadds.ascii -o $SCRATCH/wrap/ --gpu --overwrite WARNING: wrap_rrdesi was called with 4 GPUs but only 2 MPI ranks. WARNING: Will only use 2 GPUs. Running 18 input files on 2 GPUs and 2 total procs... As expected run: cdwarner@nid001173:/global/cfs/cdirs/desi/users/cdwarner/code/desispec/bin> srun -N 1 -n 4 -c 2 --gpu-bind=map_gpu:3,2,1,0 ./wrap_rrdesi -i $MYRRDIR/list_coadds.ascii -o $SCRATCH/wrap/ --gpu --overwrite Running 18 input files on 4 GPUs and 4 total procs... Run with GPU + CPU: cdwarner@nid001173:/global/cfs/cdirs/desi/users/cdwarner/code/desispec/bin> srun -N 1 -n 64 -c 2 --gpu-bind=map_gpu:3,2,1,0 ./wrap_rrdesi -i $MYRRDIR/list_coadds.ascii -o $SCRATCH/wrap/ --gpu --overwrite Running 18 input files on 4 GPUs and 6 total procs... Run with -n 64 but --gpuonly cdwarner@nid001133:/global/cfs/cdirs/desi/users/cdwarner/code/desispec/bin> srun -N 1 -n 64 -c 2 --gpu-bind=map_gpu:3,2,1,0 ./wrap_rrdesi -i $MYRRDIR/list_coadds.ascii -o $SCRATCH/wrap/ --gpu --overwrite --gpuonly Running 18 input files on 4 GPUs and 4 total procs... Run with too many nodes requested (handled by srun): cdwarner@nid001173:/global/cfs/cdirs/desi/users/cdwarner/code/desispec/bin> srun -N 2 -n 8 -c 2 --gpu-bind=map_gpu:3,2,1,0 ./wrap_rrdesi -i $MYRRDIR/list_coadds.ascii -o $SCRATCH/wrap/ --gpu --overwrite srun: error: Only allocated 1 nodes asked for 2 The following were all run after getting an interactive 2 nodes with salloc --nodes 2 --qos interactive --time 4:00:00 --constraint gpu --gpus-per-node=4 --account desi_g Run as expected cdwarner@nid001048:/global/cfs/cdirs/desi/users/cdwarner/code/desispec/bin> srun -N 2 -n 8 -c 2 --gpu-bind=map_gpu:3,2,1,0 ./wrap_rrdesi -i $MYRRDIR/list_coadds.ascii -o $SCRATCH/wrap/ --gpu --overwrite Running 18 input files on 8 GPUs and 8 total procs... Run with too few n cdwarner@nid001048:/global/cfs/cdirs/desi/users/cdwarner/code/desispec/bin> srun -N 2 -n 6 -c 2 --gpu-bind=map_gpu:3,2,1,0 ./wrap_rrdesi -i $MYRRDIR/list_coadds.ascii -o $SCRATCH/wrap/ --gpu --overwrite WARNING: wrap_rrdesi was called with 8 GPUs but only 6 MPI ranks. WARNING: Will only use 6 GPUs. Running 18 input files on 6 GPUs and 6 total procs... The following were run with an interactive node obtained with -n argument: salloc --nodes 1 -n 128 --qos interactive --time 4:00:00 --constraint gpu --gpus-per-node=4 --account desi_g Run directly cdwarner@nid001133:/global/cfs/cdirs/desi/users/cdwarner/code/desispec/bin> ./wrap_rrdesi -i $MYRRDIR/list_coadds.ascii -o $SCRATCH/wrap/ --gpu --overwrite WARNING: Detected that wrap_rrdesi is not being run with srun command. WARNING: Calling directly can lead to under-utilizing resources. Recommended syntax: srun -N nodes -n tasks -c 2 --gpu-bind=map_gpu:3,2,1,0 ./wrap_rrdesi [options] Ex: 8 tasks each with GPU support on 2 nodes: srun -N 2 -n 8 -c 2 --gpu-bind=map_gpu:3,2,1,0 wrap_rrdesi ... Ex: 64 tasks on 1 node and 4 GPUs - this will run on both GPU and non-GPU nodes at once: srun -N 1 -n 64 -c 2 --gpu-bind=map_gpu:3,2,1,0 wrap_rrdesi ... WARNING: wrap_rrdesi was called with 4 GPUs but only 1 MPI ranks. WARNING: Will only use 1 GPUs. Running 18 input files on 1 GPUs and 1 total procs... Run as expected cdwarner@nid001133:/global/cfs/cdirs/desi/users/cdwarner/code/desispec/bin> srun -N 1 -n 4 -c 2 --gpu-bind=map_gpu:3,2,1,0 ./wrap_rrdesi -i $MYRRDIR/list_coadds.ascii -o $SCRATCH/wrap/ --gpu --overwrite Running 18 input files on 4 GPUs and 4 total procs... Finally, if MPI is not available: try: import mpi4py.MPI as MPI except ImportError: have_mpi = False print ("MPI not available - required to run wrap_rrdesi") sys.exit(0)
- Loading branch information