Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slurm nodes installation under Ubuntu16.04 #2

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

jianlianggao
Copy link

The files can be used for installing Slurm nodes with Ubuntu 16.04. The Slurm master node is biomedia03

The files can be used for installing Slurm nodes with Ubuntu 16.04. The Slurm master node is biomedia03
Copy link
Member

@jopasserat jopasserat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple of changes, most critical are:

  • use Jinja templates
  • use Pillar

{% set rocsList = [] %}
{% for node, values in pillar['slurm']['nodes']['batch']['cpus'].items() %} {% if node.startswith('roc') %} {% set rocsListTrash = rocsList.append(node) %} {% endif %} {% endfor %}
NodeName=biomedia01 RealMemory=64000 CPUs=24 State=UNKNOWN

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be part of the Jinja template

NodeName=monal01 RealMemory=80000 CPUs=12 Gres=gpu:8 State=UNKNOWN

# Partitions
PartitionName=long Nodes=biomedia01,biomedia02,biomedia05,biomedia06,biomedia07,biomedia08,biomedia09,biomedia10,roc01,roc02,roc03 Default=YES MaxTime=43200
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all roc machines should be in long as well. it's fine to have two partitions overlapping

@@ -6,7 +6,7 @@ ArchiveSuspend=no
#ArchiveScript=/usr/sbin/slurm.dbd.archive
#AuthInfo=/var/run/munge/munge.socket.2
AuthType=auth/munge
DbdHost={{ pillar['slurm']['controller'] }}
DbdHost=biomedia03
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no hardcoded values please => use Pillar

init.sls Outdated
{% endif %}
install slurm packages from local repo:
pkg.installed:
- sources:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice 👍

@@ -9,4 +9,4 @@
# who wants to be able to SSH in as root via public-key on Biomedia servers.
# disable SSH for anybody but root
+:root:ALL
-:ALL EXCEPT (csg) dr jpassera bglocker:ALL
-:ALL EXCEPT (csg) (biomedia) dr jpassera bglocker jgao:ALL
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

regular users shouldn't have SSH access to the cluster nodes, hence the previous config

ConstrainSwapSpace=yes
AllowedSwapSpace=10.0
# Not well supported until Slurm v14.11.4 https://groups.google.com/d/msg/slurm-devel/oKAUed7AETs/Eb6thh9Lc0YJ
#ConstrainDevices=yes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should that be enabled and not commented out then?

@@ -3,7 +3,7 @@
# See the slurm.conf man page for more information.
#
# Workaround because Slurm does not recognize full hostname...
ControlMachine={{ pillar['slurm']['controller'] }}
ControlMachine=biomedia03
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pillar

@@ -1,3 +0,0 @@
#!/bin/bash
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This script is useful to add new members. it should stay there

@@ -1,13 +0,0 @@
#!/bin/bash
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Script used to bootstrap new minions. Any reason to remove it?

* Slurm
* Slurm Database
* SSH
To install Slurm nodes, you need to copy (on Slurm mater node)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe move that lower in an "Instructions" subsection instead of replacing what the formula contains?

@jianlianggao
Copy link
Author

jianlianggao commented Apr 30, 2018 via email

PartitionName=rocsShort Nodes={{ ','.join(rocsList) }} Default=NO MaxTime=60 Priority=5000
#PartitionName=long Nodes=biomedia01,biomedia02,biomedia03,biomedia05 Default=YES MaxTime=43200
#PartitionName=short Nodes=biomedia01,biomedia03,biomedia05 Default=NO MaxTime=60 Priority=5000
PartitionName=gpus Nodes=monal01 Default=NO MaxTime=10080 MaxCPUsPerNode=4 MaxMemPerNode=30720
Copy link
Member

@jopasserat jopasserat May 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can remove the MaxCPUsPerNode=4 MaxMemPerNode=30720 settings. Just legacy code here

@jianlianggao
Copy link
Author

jianlianggao commented May 2, 2018 via email

StoragePass={{ pillar['slurm']['db']['password'] }}
StorageLoc=slurmdb
StorageUser=slurm
StoragePass=1BUy4eVv7X
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

password in cleartext in the commit history...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants