-
Notifications
You must be signed in to change notification settings - Fork 81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Question] How would one use MPI with the ReplicaExchangeSampler? #738
Comments
I would also be interested in this. I am trying to run a ReplicaExchangeSampler using multiple nodes with one GPU each for free energy calculations.
Update: I managed to get it working. I now get around 1700 ns/day using 4 nodes and 8 replicas which is very close to perfect scaling. The only thing I needed to change was the group_size parameter inside the openmmtools code: openmmtools/openmmtools/multistate/multistatesampler.py Lines 1301 to 1302 in 9fc8ab7
Simply append 'group_size=1' to the function call. As far as I can see, it otherwise calls the function with group_size=None and mpiplus tries to figure it out by itself. In my case that lead to the behavior described above. I am not sure if that is intended, or if I'm just missing some kind of environmental variable, but it seems to work for me. If desired, I can submit a pull request for a corresponding change. Anyways, here is the code I came up with so far: from mpi4py import MPI
from openmm import unit, XmlSerializer, app
import openmm as mm
from openmmtools import states, mcmc, multistate
import numpy as np
import os
# Deserialize system and load pdb file
system = XmlSerializer.deserializeSystem(open('system.xml', 'r').read())
pdbFile = app.PDBFile('eq.pdb')
n_replicas = 8
lambdas = np.round(np.linspace(0, 1, n_replicas), 8)
class LambdaState(states.GlobalParameterState):
lambda_en = states.GlobalParameterState.GlobalParameter('lambda_en', 0.)
# Create thermodynamic states
thermodynamic_states = []
for lambda_value in lambdas:
thermodynamic_state = states.ThermodynamicState(system=system, temperature=300*unit.kelvin)
lambda_state = LambdaState.from_system(system)
lambda_state.lambda_en = lambda_value
compound_state = states.CompoundThermodynamicState(thermodynamic_state, composable_states=[lambda_state])
thermodynamic_states.append(compound_state)
# MCMC move setup
mcmc_move = mcmc.LangevinDynamicsMove(timestep=2*unit.femtosecond, n_steps=50000)
# ReplicaExchangeSampler setup
simulation = multistate.ReplicaExchangeSampler(mcmc_moves=mcmc_move, number_of_iterations=10, online_analysis_interval=1, online_analysis_target_error=0., replica_mixing_scheme='swap-neighbors')
reporter = multistate.MultiStateReporter(f'output.nc', checkpoint_interval=1)
# Initialize simulation
simulation.create(thermodynamic_states=thermodynamic_states,
sampler_states=states.SamplerState(pdbFile.positions, box_vectors=system.getDefaultPeriodicBoxVectors()),
storage=reporter)
# Run simulation
simulation.equilibrate(2)
simulation.run() Which I submit to our SLURM cluster with the following script using #!/bin/bash
#SBATCH --job-name=hrex_test
#SBATCH --nodes=4
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=4
#SBATCH --time=24:00:00
#SBATCH --gpus-per-task=1
#SBATCH --partition=NNNGN
source /home/schuhmarc/anaconda3/etc/profile.d/conda.sh
conda activate openMMCluster
mpirun python hrex_cluster_test.py (I have also tested this on CPU nodes and it seems to work there as well) Note that I am not experienced in MPI at all, so take all this with a grain of salt. I would be very happy about additional input here as well. |
I tested out the modification in multistatesampler.py, and with the settings and software on my cluster it did not seem to lead to the correct behavior with just I'm wondering if in your case you can get the correct scaling in this way because you seem to have only 1 GPU per node on your cluster? If you want to use MPI within a node, you have to use a command similar to this:
Where the hostfile contains the name of the node repeated once for each process:
And the appfile contains:
Here I've got 3 GPUs in one node, so I'm launching 3 processes with each one explicitly allocated one GPU, but this approach should also work fine for multiple nodes (ofc depending on cluster settings), we just have to launch one process per GPU. I've made a simple bash script to generate files in the correct format for my hardware/MPI setup, with 3 GPUs/node and OpenMPI. For MPICH you can use the clusterutils package also from the Chodera lab to generate MPICH compatible hostfiles and configfiles which should detect number and configuration of GPUs automatically, and the command becomes something like:
An important point is that you need to install/load an mpi4py version that's compatible with your cluster's MPI build for this to work properly. I haven't tested it, but one should be able to use a similar approach to get this working with the CPU platform. Regarding the python script, you use the same python script whether running with MPI or not. I've used openmmtools for repex without MPI quite a lot because this MPI setup can be quite fiddly, the python scripts are identical in both cases. |
That is true, we only have single-GPU nodes in our cluster. If you look at previous issues, it seems that otherwise some kind of masking is indeed necessary. While investigating, I have also tried running with the option With the modification, get about 1150 ns/day on three nodes and without the modification this performance drops down to 576 ns/day on the same three nodes. I am not sure if you do need to define the hosts you are running on, as per the mpirun documentation this should not be necessary in a SLURM environment ( see here ). Note that this should also be true for older versions (the version we are using is 3.1.3) On that note, I was not able to start my jobs with However, there is also some interesting discussion about this in #713 with a script
Just note that this has not been updated for 7 years already. Anyways, could you maybe check whether my proposed change above does make a difference in performance for you in both 1 node multi-GPU and multi-node single-GPU each setups? I don't have access to any multi-GPU nodes atm. |
Hi there!
I've been futzing around with the ReplicaExchangeSampler class, trying my best to figure out how the Python code in my main file should be laid out to actually run a simulation in MPI. Is it possible to get a minimal working example of a simulation running in MPI so that I can adjust it to fit my needs?
For context, I would be running a simulation on the SLURM job management system, parallelized across CPU cores. GPUs would not be a part of this simulation.
Thank you in advance!
The text was updated successfully, but these errors were encountered: