Question about the Parallelization #358
-
Dear all, When I perform a NPT calculation via SSCHA+QE, I met some troubles about the parallelization. Firstly, I followed the turtorials for automatic submission with a cluster, but it is failed due to the pretection from the HPC cluster. So I tried to apply more nodes directly in the slurm script, and use the mpirun directly. See the following script for details:
where the npt_relax.py contains the following command:
After submit the job, there are 4 nodes applied successfully, and there are 4 input and output files created by SSCHA in the calculation folder. But unfortunately, there are only one output file updated timely, the others keeps 0 bits. When I ssh into the other nodes and check the process, I found that there is no process in the other 3 nodes. Why this happend ? By the way, I checked such operation in the login nodes, it work well by following code:
Besides, I also checked submit 4 python job on one node, there are 4 input and output files created, but only one output file update. So this seems to indicate that there is a problem with the slurm scheduling system? How to fix it? In addtion, I found that the sscha can create input files correctly by the mpirun -np NPROC python *py , can I perform the scf calculation with these input files separately? And then collect them together for sscha calculations and create the input files for the next ITERATIONS? If it is possible, how to set for the sscha calculations? I have did some minimisations by similar way, but I am curious whether this operation can also be applied to NPT calculations. Any suggestions and discussions will be greatly appreciated. Thanks and Regards! |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 6 replies
-
Dear Jianguo Si, We recently implemented a workaround to the protection from the cluster.
You initialize You can find here an example of how it works. You need to readapt the script a bit (changing the modules to load quantum espresso and the keywords for submitting jobs...) . In this way you can submit your SSCHA calculation as a long serial job, and it will automatically submit subjobs within the cluster for higher parallelization. This correctly exploits MPI and all parallelization. |
Beta Was this translation helpful? Give feedback.
-
Re-post! |
Beta Was this translation helpful? Give feedback.
-
Dear Dr. Lorenzo Monacelli, I have upload the python script and the log files as attachments (named 1113.zip), please check it. Besides, I used the Cluster module to define the HPC related information. For me, there are two HPC devices available. I submit the python script at the login node in HPC-1 by Thanks & Regards! |
Beta Was this translation helpful? Give feedback.
Dear Jianguo Si,
We recently implemented a workaround to the protection from the cluster.
(This is available only from the repository's master; it has not yet been released to PyPy, so you must clone the last development version from github. We will release it in version 1.5 in the next few months.)
Instead of using the Cluster class, you can replace it with LocalCluster.
You initialize
my_cluster
exactly as done in the standard Cluster calculation.You can find here an example of how it works.
You need to readapt the script a bit (changing the modules to load quantum espresso and the keywords for submitting job…