Is it possible to modify the slurm.conf files on the compute nodes of a "hpc-slurm"-based cluster? #2559

noahharrison64 · 2024-05-08T15:14:24Z

noahharrison64
May 8, 2024

Hi,

I'm sure this is quite a naive question since I don't have a deep understanding of slurm or the hpc-toolkit.

I want to force my compute nodes to reboot after each job is completed. I assumed this would be somewhat trivial, and attempted to achieve it by running sbatch submit with the '--reboot' flag. I also modified the slurm.conf files on the controller node to include a RebootProgram=/sbin/reboot line. However, when testing this, the jobs appeared to get stuck in continuous "CF/CONFIGURING' state. Checking the slurmtrld.log I saw the following line:

[2024-05-08T13:40:13.165] error: Node fep3-compute-ghpc-0 appears to have a different slurm.conf than the slurmctld. This could cause issues with communication and functionality. Please review both files and make sure they are the same. If this is expected ignore, and set DebugFlags=NO_CONF_HASH in your slurm.conf.

This made me realise I probably need to update the slurm.conf file on the compute nodes. Since these nodes are dynamic and booted from an image when a requested, it's unclear to me how I should go about this process. This leads me to 3 questions:

If I want to make changes to my compute node image, is it possible to do this without re-creating the image, but instead modifying the pre-existing image by creating a VM from it, SSHing in and making the necessary changes (e.g. modifying slurm.conf) and then saving this, overwriting the original image it was created from? Therefore I wouldn't need to re-terraform the whole cluster.
If the above option is not possible, is it possible to modify the slurm.conf file for the compute nodes in a different manner?
Are there any other ways to ensure the compute node reboots after a job is completed, wiping any changes made to the instance during the process of a job, even if there are other jobs held in the queue?

Many thanks,
Noah

mr0re1 · 2024-05-08T19:41:11Z

mr0re1
May 8, 2024
Maintainer

Hi @noahharrison64 ,

I want to force my compute nodes to reboot after each job is completed.

Are there any other ways to ensure the compute node reboots after a job is completed, wiping any changes made to the instance during the process of a job, even if there are other jobs held in the queue?

I don't know your use case, but would using exclusive nodes (nodes are created for each job, and after job is done nodes are getting destroyed) achieve the desired outcome as well?

This made me realise I probably need to update the slurm.conf file on the compute nodes.

Manually modifying slurm.conf is not recommended, especially if you expects nodes to get rebooted.
The slurm.conf is generated automatically on multiple occasions: node is created (restarted), new config has been detected in GCS bucket. So your manual changes will be lost.

If you want to have a custom slurm.conf you need to specify variable slurm_conf_tpl in schedmd-slurm-gcp-v6-controller module.

Therefore I wouldn't need to re-terraform the whole cluster.

"re-terraforming" is preferable way of making modifications (though it doesn't always work as expected).

0 replies

noahharrison64 · 2024-05-08T21:42:06Z

noahharrison64
May 8, 2024
Author

Hi @mr0re1,

Thanks for your reply.

I don't know your use case, but would using exclusive nodes (nodes are created for each job, and after job is done nodes are getting destroyed) achieve the desired outcome as well?

I don't see why this wouldn't work for my use case! How would you go about achieving this? My current toolkit blueprint is based off the hpc-slurm blueprint.

If you want to have a custom slurm.conf you need to specify variable slurm_conf_tpl in schedmd-slurm-gcp-v6-controller module.

So would I create a custom slurm.conf file based on the existing one but with the RebootProgram set, save it in my cloud shell work space and supply the path to this file as the variable slurm_conf_tpl?

Cheers,
Noah

0 replies

mr0re1 · 2024-05-08T22:06:11Z

mr0re1
May 8, 2024
Maintainer

I don't see why this wouldn't work for my use case! How would you go about achieving this? My current toolkit blueprint is based off the hpc-slurm blueprint.

Giving it a second thought , "exclusive" partition may be not a best solution for your problem, there would still be a chance of consecutive execution of jobs (back-to-back) on the same node.
If you still want to give it a try, you would need to make sure that partition module has a following settings:

exclusive == true (true by default);
partition_conf has an entry SuspendTime = 0.

So would I create a custom slurm.conf file based on the existing one but with the RebootProgram set, save it in my cloud shell work space and supply the path to this file as the variable slurm_conf_tpl?

Yes, please either use absolute path of ghpc_stage function to make sure that path stay correct during deployment creation.
You can use https://github.com/GoogleCloudPlatform/slurm-gcp/blob/master/etc/slurm.conf.tpl as an example of template

1 reply

mr0re1 May 10, 2024
Maintainer

PS: The ghpc_stage function will only be available in the next version of ghpc (ETA: early next week).

noahharrison64 · 2024-05-10T15:36:32Z

noahharrison64
May 10, 2024
Author

Hi @mr0re1,

Thanks for the info. I'm hoping I've managed to implement a fix that avoids the need to reboot. I'm running some tests over the weekend, if they fail I'll have a look into the advice you've given in more detail so I can properly modify the compute node conf files.

Thanks,
Noah

0 replies

noahharrison64 · 2024-05-13T13:35:00Z

noahharrison64
May 13, 2024
Author

Hi @mr0re1

Do you suggest supplying the slurm_conf_tpl filepath in the blueprint / slurm-controller / settings section? Will this automatically be transmitted to the compute node slurm confs even if we just update the slurm-controller module?

  - id: slurm_controller
    source: community/modules/scheduler/schedmd-slurm-gcp-v5-controller
    use:
    - network1
    - compute_partition
    # - homefs
    settings:
      slurm_conf_tpl: /home/nharrison/hpc-toolkit/slurm.conf.tpl 
      enable_reconfigure: true
      disable_controller_public_ips: false
      disk_size_gb: $(vars.disk_size)
      instance_image:
        family: $(vars.new_image_family)
        project: $(vars.project_id)
      cloud_parameters:
        resume_rate: 0
        resume_timeout: 600
        suspend_rate: 0
        suspend_timeout: 300
        no_comma_params: false

Also, is it possible to save the current state of my compute cluster login node (i.e. the file system) and then load this when I re-terraform the new compute partition group (which I assume will be necessary since the controller node will need updating)

0 replies

noahharrison64 · 2024-05-20T15:57:10Z

noahharrison64
May 20, 2024
Author

Also is the ghpc_stage function available now?

0 replies

mr0re1 · 2024-05-20T18:30:02Z

mr0re1
May 20, 2024
Maintainer

Hi @noahharrison64

is the ghpc_stage function available now?

Not yet, it will be available in the next release, though you can use develop instead.

I didn't realize you use SlurmGCP V5, I would recommend you to switch to V6. The reconfigure in V6 works much better.

Also, is it possible to save the current state of my compute cluster login node (i.e. the file system) and then load this when I re-terraform the new compute partition group (which I assume will be necessary since the controller node will need updating)

The /home on login is usually an NFS mounted from controller, you can persist state of the controller disk by supplying disk_auto_delete: false on controller module.

Do you suggest supplying the slurm_conf_tpl filepath in the blueprint / slurm-controller / settings section?

Yes

Will this automatically be transmitted to the compute node slurm confs even if we just update the slurm-controller module?

Yes

Please let us know if it doesn't WAI, once again I would advice to switch to V6.

0 replies

mr0re1 · 2024-05-22T20:47:50Z

mr0re1
May 22, 2024
Maintainer

UPD. ghpc_stage is available now in v1.33.0

0 replies

noahharrison64 · 2024-05-23T10:06:54Z

noahharrison64
May 23, 2024
Author

Thanks @mr0re1

The /home on login is usually an NFS mounted from controller, you can persist state of the controller disk by supplying disk_auto_delete: false on controller module.

I more meant is it possible to save the current state of my compute cluster before re-terraforming and making a new cluster. I'd like to be able to directly transfer the files that already exist onto my new login node when it is created. If i understand correctly your suggestion would only allow me to save the state of any new clusters created with "disk_auto_delete: false", rather than the one I have runner at the moment.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is it possible to modify the slurm.conf files on the compute nodes of a "hpc-slurm"-based cluster? #2559

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 9 comments 1 reply

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Is it possible to modify the slurm.conf files on the compute nodes of a "hpc-slurm"-based cluster? #2559

noahharrison64 May 8, 2024

Replies: 9 comments · 1 reply

mr0re1 May 8, 2024 Maintainer

noahharrison64 May 8, 2024 Author

mr0re1 May 8, 2024 Maintainer

mr0re1 May 10, 2024 Maintainer

noahharrison64 May 10, 2024 Author

noahharrison64 May 13, 2024 Author

noahharrison64 May 20, 2024 Author

mr0re1 May 20, 2024 Maintainer

mr0re1 May 22, 2024 Maintainer

noahharrison64 May 23, 2024 Author

noahharrison64
May 8, 2024

Replies: 9 comments 1 reply

mr0re1
May 8, 2024
Maintainer

noahharrison64
May 8, 2024
Author

mr0re1
May 8, 2024
Maintainer

mr0re1 May 10, 2024
Maintainer

noahharrison64
May 10, 2024
Author

noahharrison64
May 13, 2024
Author

noahharrison64
May 20, 2024
Author

mr0re1
May 20, 2024
Maintainer

mr0re1
May 22, 2024
Maintainer

noahharrison64
May 23, 2024
Author