Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updated wrap_rrdesi to fix multiple use cases. #2429

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

craigwarner-ufastro
Copy link
Contributor

The root bug was that if the number of GPUs available was > the rank of the communicator, it was making a bad assumption that you wanted to use at least ngpu ranks. So when calling wrap_rrdesi directly without srun, the length of the communicator was obviously 1 but there were 4 GPUs in the node so it was splitting the input files and rank 0 was only taking 1/4 of them but there were no other ranks to run anything.

I fixed this, added informative warning messages where appropriate, and cleaned up the login node logic that had been copy/pasted from elsewhere. Here are a bunch of example test cases:

Run on login node

cdwarner@perlmutter:login16:/global/cfs/cdirs/desi/users/cdwarner/code/desispec/bin> ./wrap_rrdesi -i $MYRRDIR/list_coadds.ascii -o $SCRATCH/wrap/ --gpu --overwrite wrap_rrdesi should not be run on a login node.

The following were all run after getting an interactive node with

salloc -N 1 -C gpu -q interactive -t 3:00:00 -A desi_g --gpus-per-node=4

Run directly - now works with warnings:

cdwarner@nid001173:/global/cfs/cdirs/desi/users/cdwarner/code/desispec/bin> ./wrap_rrdesi -i $MYRRDIR/list_coadds.ascii -o $SCRATCH/wrap/ --gpu --overwrite WARNING: Detected that wrap_rrdesi is not being run with srun command. WARNING: Calling directly can lead to under-utilizing resources. Recommended syntax: srun -N nodes -n tasks -c 2 --gpu-bind=map_gpu:3,2,1,0 ./wrap_rrdesi [options]
Ex: 8 tasks each with GPU support on 2 nodes:
srun -N 2 -n 8 -c 2 --gpu-bind=map_gpu:3,2,1,0 wrap_rrdesi ...
Ex: 64 tasks on 1 node and 4 GPUs - this will run on both GPU and non-GPU nodes at once:
srun -N 1 -n 64 -c 2 --gpu-bind=map_gpu:3,2,1,0 wrap_rrdesi ...
WARNING: wrap_rrdesi was called with 4 GPUs but only 1 MPI ranks. WARNING: Will only use 1 GPUs.
Running 18 input files on 1 GPUs and 1 total procs...

Run with srun and n < ngpu:

cdwarner@nid001173:/global/cfs/cdirs/desi/users/cdwarner/code/desispec/bin> srun -N 1 -n 2 -c 2 --gpu-bind=map_gpu:3,2,1,0 ./wrap_rrdesi -i $MYRRDIR/list_coadds.ascii -o $SCRATCH/wrap/ --gpu --overwrite WARNING: wrap_rrdesi was called with 4 GPUs but only 2 MPI ranks. WARNING: Will only use 2 GPUs.
Running 18 input files on 2 GPUs and 2 total procs...

As expected run:

cdwarner@nid001173:/global/cfs/cdirs/desi/users/cdwarner/code/desispec/bin> srun -N 1 -n 4 -c 2 --gpu-bind=map_gpu:3,2,1,0 ./wrap_rrdesi -i $MYRRDIR/list_coadds.ascii -o $SCRATCH/wrap/ --gpu --overwrite Running 18 input files on 4 GPUs and 4 total procs...

Run with GPU + CPU:

cdwarner@nid001173:/global/cfs/cdirs/desi/users/cdwarner/code/desispec/bin> srun -N 1 -n 64 -c 2 --gpu-bind=map_gpu:3,2,1,0 ./wrap_rrdesi -i $MYRRDIR/list_coadds.ascii -o $SCRATCH/wrap/ --gpu --overwrite Running 18 input files on 4 GPUs and 6 total procs...

Run with -n 64 but --gpuonly

cdwarner@nid001133:/global/cfs/cdirs/desi/users/cdwarner/code/desispec/bin> srun -N 1 -n 64 -c 2 --gpu-bind=map_gpu:3,2,1,0 ./wrap_rrdesi -i $MYRRDIR/list_coadds.ascii -o $SCRATCH/wrap/ --gpu --overwrite --gpuonly Running 18 input files on 4 GPUs and 4 total procs...

Run with too many nodes requested (handled by srun):

cdwarner@nid001173:/global/cfs/cdirs/desi/users/cdwarner/code/desispec/bin> srun -N 2 -n 8 -c 2 --gpu-bind=map_gpu:3,2,1,0 ./wrap_rrdesi -i $MYRRDIR/list_coadds.ascii -o $SCRATCH/wrap/ --gpu --overwrite srun: error: Only allocated 1 nodes asked for 2

The following were all run after getting an interactive 2 nodes with

salloc --nodes 2 --qos interactive --time 4:00:00 --constraint gpu --gpus-per-node=4 --account desi_g

Run as expected

cdwarner@nid001048:/global/cfs/cdirs/desi/users/cdwarner/code/desispec/bin> srun -N 2 -n 8 -c 2 --gpu-bind=map_gpu:3,2,1,0 ./wrap_rrdesi -i $MYRRDIR/list_coadds.ascii -o $SCRATCH/wrap/ --gpu --overwrite Running 18 input files on 8 GPUs and 8 total procs...

Run with too few n

cdwarner@nid001048:/global/cfs/cdirs/desi/users/cdwarner/code/desispec/bin> srun -N 2 -n 6 -c 2 --gpu-bind=map_gpu:3,2,1,0 ./wrap_rrdesi -i $MYRRDIR/list_coadds.ascii -o $SCRATCH/wrap/ --gpu --overwrite WARNING: wrap_rrdesi was called with 8 GPUs but only 6 MPI ranks. WARNING: Will only use 6 GPUs.
Running 18 input files on 6 GPUs and 6 total procs...

The following were run with an interactive node obtained with -n argument:

salloc --nodes 1 -n 128 --qos interactive --time 4:00:00 --constraint gpu --gpus-per-node=4 --account desi_g

Run directly

cdwarner@nid001133:/global/cfs/cdirs/desi/users/cdwarner/code/desispec/bin> ./wrap_rrdesi -i $MYRRDIR/list_coadds.ascii -o $SCRATCH/wrap/ --gpu --overwrite WARNING: Detected that wrap_rrdesi is not being run with srun command. WARNING: Calling directly can lead to under-utilizing resources. Recommended syntax: srun -N nodes -n tasks -c 2 --gpu-bind=map_gpu:3,2,1,0 ./wrap_rrdesi [options]
Ex: 8 tasks each with GPU support on 2 nodes:
srun -N 2 -n 8 -c 2 --gpu-bind=map_gpu:3,2,1,0 wrap_rrdesi ...
Ex: 64 tasks on 1 node and 4 GPUs - this will run on both GPU and non-GPU nodes at once:
srun -N 1 -n 64 -c 2 --gpu-bind=map_gpu:3,2,1,0 wrap_rrdesi ...
WARNING: wrap_rrdesi was called with 4 GPUs but only 1 MPI ranks. WARNING: Will only use 1 GPUs.
Running 18 input files on 1 GPUs and 1 total procs...

Run as expected

cdwarner@nid001133:/global/cfs/cdirs/desi/users/cdwarner/code/desispec/bin> srun -N 1 -n 4 -c 2 --gpu-bind=map_gpu:3,2,1,0 ./wrap_rrdesi -i $MYRRDIR/list_coadds.ascii -o $SCRATCH/wrap/ --gpu --overwrite Running 18 input files on 4 GPUs and 4 total procs...

Finally, if MPI is not available:

try:
    import mpi4py.MPI as MPI
except ImportError:
    have_mpi = False
    print ("MPI not available - required to run wrap_rrdesi")
    sys.exit(0)

The root bug was that if the number of GPUs available was > the rank of the communicator, it was making a bad assumption that you wanted to use at least ngpu ranks. So when calling wrap_rrdesi directly without srun, the length of the communicator was obviously 1 but there were 4 GPUs in the node so it was splitting the input files and rank 0 was only taking 1/4 of them but there were no other ranks to run anything.

I fixed this, added informative warning messages where appropriate, and cleaned up the login node logic that had been copy/pasted from elsewhere. Here are a bunch of example test cases:

    Run on login node

cdwarner@perlmutter:login16:/global/cfs/cdirs/desi/users/cdwarner/code/desispec/bin> ./wrap_rrdesi -i $MYRRDIR/list_coadds.ascii -o $SCRATCH/wrap/ --gpu --overwrite
wrap_rrdesi should not be run on a login node.

The following were all run after getting an interactive node with

 salloc -N 1 -C gpu -q interactive -t 3:00:00 -A desi_g --gpus-per-node=4

    Run directly - now works with warnings:

cdwarner@nid001173:/global/cfs/cdirs/desi/users/cdwarner/code/desispec/bin> ./wrap_rrdesi -i $MYRRDIR/list_coadds.ascii -o $SCRATCH/wrap/ --gpu --overwrite
WARNING: Detected that wrap_rrdesi is not being run with srun command.
WARNING: Calling directly can lead to under-utilizing resources.
Recommended syntax: srun -N nodes -n tasks -c 2 --gpu-bind=map_gpu:3,2,1,0  ./wrap_rrdesi [options]
	Ex: 8 tasks each with GPU support on 2 nodes:
		srun -N 2 -n 8 -c 2 --gpu-bind=map_gpu:3,2,1,0  wrap_rrdesi ...
	Ex: 64 tasks on 1 node and 4 GPUs - this will run on both GPU and non-GPU nodes at once:
		srun -N 1 -n 64 -c 2 --gpu-bind=map_gpu:3,2,1,0  wrap_rrdesi ...
WARNING: wrap_rrdesi was called with 4 GPUs but only 1 MPI ranks.
WARNING: Will only use 1 GPUs.
Running 18 input files on 1 GPUs and 1 total procs...

    Run with srun and n < ngpu:

cdwarner@nid001173:/global/cfs/cdirs/desi/users/cdwarner/code/desispec/bin> srun -N 1 -n 2 -c 2 --gpu-bind=map_gpu:3,2,1,0  ./wrap_rrdesi -i $MYRRDIR/list_coadds.ascii -o $SCRATCH/wrap/ --gpu --overwrite
WARNING: wrap_rrdesi was called with 4 GPUs but only 2 MPI ranks.
WARNING: Will only use 2 GPUs.
Running 18 input files on 2 GPUs and 2 total procs...

    As expected run:

cdwarner@nid001173:/global/cfs/cdirs/desi/users/cdwarner/code/desispec/bin> srun -N 1 -n 4 -c 2 --gpu-bind=map_gpu:3,2,1,0  ./wrap_rrdesi -i $MYRRDIR/list_coadds.ascii -o $SCRATCH/wrap/ --gpu --overwrite
Running 18 input files on 4 GPUs and 4 total procs...

    Run with GPU + CPU:

cdwarner@nid001173:/global/cfs/cdirs/desi/users/cdwarner/code/desispec/bin> srun -N 1 -n 64 -c 2 --gpu-bind=map_gpu:3,2,1,0  ./wrap_rrdesi -i $MYRRDIR/list_coadds.ascii -o $SCRATCH/wrap/ --gpu --overwrite
Running 18 input files on 4 GPUs and 6 total procs...

    Run with -n 64 but --gpuonly

cdwarner@nid001133:/global/cfs/cdirs/desi/users/cdwarner/code/desispec/bin> srun -N 1 -n 64 -c 2 --gpu-bind=map_gpu:3,2,1,0  ./wrap_rrdesi -i $MYRRDIR/list_coadds.ascii -o $SCRATCH/wrap/ --gpu --overwrite --gpuonly
Running 18 input files on 4 GPUs and 4 total procs...

    Run with too many nodes requested (handled by srun):

cdwarner@nid001173:/global/cfs/cdirs/desi/users/cdwarner/code/desispec/bin> srun -N 2 -n 8 -c 2 --gpu-bind=map_gpu:3,2,1,0  ./wrap_rrdesi -i $MYRRDIR/list_coadds.ascii -o $SCRATCH/wrap/ --gpu --overwrite
srun: error: Only allocated 1 nodes asked for 2

The following were all run after getting an interactive 2 nodes with

salloc --nodes 2 --qos interactive --time 4:00:00 --constraint gpu --gpus-per-node=4 --account desi_g

    Run as expected

cdwarner@nid001048:/global/cfs/cdirs/desi/users/cdwarner/code/desispec/bin> srun -N 2 -n 8 -c 2 --gpu-bind=map_gpu:3,2,1,0  ./wrap_rrdesi -i $MYRRDIR/list_coadds.ascii -o $SCRATCH/wrap/ --gpu --overwrite
Running 18 input files on 8 GPUs and 8 total procs...

    Run with too few n

cdwarner@nid001048:/global/cfs/cdirs/desi/users/cdwarner/code/desispec/bin> srun -N 2 -n 6 -c 2 --gpu-bind=map_gpu:3,2,1,0  ./wrap_rrdesi -i $MYRRDIR/list_coadds.ascii -o $SCRATCH/wrap/ --gpu --overwrite
WARNING: wrap_rrdesi was called with 8 GPUs but only 6 MPI ranks.
WARNING: Will only use 6 GPUs.
Running 18 input files on 6 GPUs and 6 total procs...

The following were run with an interactive node obtained with -n argument:

salloc --nodes 1 -n 128 --qos interactive --time 4:00:00 --constraint gpu --gpus-per-node=4 --account desi_g

    Run directly

cdwarner@nid001133:/global/cfs/cdirs/desi/users/cdwarner/code/desispec/bin> ./wrap_rrdesi -i $MYRRDIR/list_coadds.ascii -o $SCRATCH/wrap/ --gpu --overwrite
WARNING: Detected that wrap_rrdesi is not being run with srun command.
WARNING: Calling directly can lead to under-utilizing resources.
Recommended syntax: srun -N nodes -n tasks -c 2 --gpu-bind=map_gpu:3,2,1,0  ./wrap_rrdesi [options]
	Ex: 8 tasks each with GPU support on 2 nodes:
		srun -N 2 -n 8 -c 2 --gpu-bind=map_gpu:3,2,1,0  wrap_rrdesi ...
	Ex: 64 tasks on 1 node and 4 GPUs - this will run on both GPU and non-GPU nodes at once:
		srun -N 1 -n 64 -c 2 --gpu-bind=map_gpu:3,2,1,0  wrap_rrdesi ...
WARNING: wrap_rrdesi was called with 4 GPUs but only 1 MPI ranks.
WARNING: Will only use 1 GPUs.
Running 18 input files on 1 GPUs and 1 total procs...

    Run as expected

cdwarner@nid001133:/global/cfs/cdirs/desi/users/cdwarner/code/desispec/bin> srun -N 1 -n 4 -c 2 --gpu-bind=map_gpu:3,2,1,0  ./wrap_rrdesi -i $MYRRDIR/list_coadds.ascii -o $SCRATCH/wrap/ --gpu --overwrite
Running 18 input files on 4 GPUs and 4 total procs...

Finally, if MPI is not available:

    try:
        import mpi4py.MPI as MPI
    except ImportError:
        have_mpi = False
        print ("MPI not available - required to run wrap_rrdesi")
        sys.exit(0)
@craigwarner-ufastro
Copy link
Contributor Author

This closes issue #321 on redrock
desihub/redrock#321

@sbailey
Copy link
Contributor

sbailey commented Dec 19, 2024

@weaverba137 could we get your help on the test configuration? It isn't clear to me why the regular tests get a working environment and pass while the coverage tests don't. In the current state I tried bumping the coverage test DESIUTIL_VERSION to 3.4.3 to give more numpy version flexibility, and temporarily commented out "python -m pip install specutils" in the coverage tests in case specutils restrictions were driving the incompatible environment. No luck. The only remaining configuration difference I see is including pytest-cov and coveralls, both of which really are needed for testing coverage...

@weaverba137
Copy link
Member

There are two separate version incompatibilities in the coverage test. Here's what's happening, based on the most recent GitHub actions configurations on this branch:

  1. When desiutil 3.4.2 is installed, compatible numpy and matplotlib versions are installed, and these are not significantly changed by subsequent installation steps.
  2. When desiutil 3.4.3 is installed, compatible numpy and matplotlib versions are also installed, however a subsequent installation step significantly downgrades the numpy version (1.26 --> 1.22), but does not also downgrade the matplotlib version. Note that the astropy version is also downgraded in the installation to a version that is compatible with the downgraded numpy version.
  3. I believe that doing a "bare" install of specutils without specifying a version installs an incompatible version of scipy, overriding the version of scipy installed via requirements.txt (because specutils was being installed after requirements.txt), and again, subsequent downgrades of numpy and astropy result in an incompatibility, since scipy is not also downgraded.

@weaverba137
Copy link
Member

Interesting! Now the failures are due to fitsio not finding the bz2 library. I tried setting specutils<1.15 to match NERSC.

@weaverba137
Copy link
Member

Between yesterday and today, ubuntu-latest when from 22.04 to 24.04, so now it looks like we also need to explicitly set the operating system.

@weaverba137
Copy link
Member

Unfortunately the older version of specutils still brings in an incompatible version of scipy. In order to sort this out, you'll have to think about what the installation of specutils is trying to accomplish.

@sbailey
Copy link
Contributor

sbailey commented Dec 20, 2024

Thanks for looking into these packaging version issues. Including specutils here is for including desispec.spectra.Spectra.to_specutils and .from_specutils in the coverage tests. If needed, we could temporarily drop these from the github testing just by not installing specutils; they will still be included in the nightly NERSC tests since we have specutils installed there.

At the same time, this might be a canary in the coal mine about whether we are losing the ability to install a self-consistent dependency stack prior to updating to numpy 2 and the latest astropy. While making those updates would be a good thing, they shouldn't be a blocking factor for this unrelated PR, and I was hoping we'd also be able to maintain support for older numpy and astropy (and be able to test that).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants