Skip to content

Setup instructions

Gaudeval edited this page Jul 21, 2022 · 18 revisions

Please note that this project is an active work in progress, documentation and features are subject to frequent changes.

The following guide was produced using the Ubuntu 22.04 Server Linux distrubution for the cluster setup. Please adapt command and configuration parameters to your own distribution. The Controller and Worker should use the same version of Slurm to ensure they are compatible (this is easily achieved by using the same Linux distribution across all cluster members).

Cluster roles

The pakupi setup guide refers to a number of roles assumed by machines in the cluster. We recommend the use of distinct machines for each role, to mitigate the impact of reboots or issues during installation. This guide identifies the following roles:

  • Host: the machine used to configure the cluster
  • Controller: the main node for providing cluster services and interracting with the cluster
  • Worker: any node whose processsing power is made available to cluster

Preparing the Host

The Host is the main entry point to the cluster. It will ensure all machines in the cluster are correctly configured through Ansible. The Host should have access to the Internet to download the required packages, and network access to the cluster.

  • Install Ansible: Ansible is an IT automation framework which allows users to manage an inventory of machines, configure systems, and deploy software. The pakupi setup process relies on Ansible to check the configuration of the cluster, and run additional scripts if required. For more information on how to setup Ansible, see the Ansible installation guide. Under Ubuntu 22.04 you can install Ansible using the command: apt install ansible.

    CHECK: Run the following command in your terminal to ensure Ansible is correctly installed: ansible --version

  • Install pakupi Ansible requirements: The pakupi setup process uses community-provided Ansible roles and collections (see the Ansible user guide on roles). The requirements are described in the requirements-galaxy.yml at the root of the repository. The packages are available for download and review through Ansible galaxy. You can install the required packages using the command: ansible-galaxy install -r requirements-galaxy.yml.

    CHECK: Running the installation command: ansible-galaxy install -r requirements-galaxy.yml should report Nothing to do

  • Prepare a SSH key: SSH allows remote connections to the machines in the cluster, and it is used by Ansible to configure the system. Authentication through SSH can be password-based, using the target account password, or key-based, using a list of authorised public keys. Key-based authentication is recommended as it does not prompt the user for a password. To generate a new SSH key, use the ssh-keygen command. The created key will be placed in your home directory under ~/.ssh/id_rsa.pub and id_rsa for the private key (which should not be distributed).

    If you reuse an existing key, or specify a different target files for the generated key using the -f flag of ssh-keygen, please remember to add the key to your SSH agent (see man ssh-add).

    CHECK: Your public and private key files exist (respectively under ~/.ssh/id_rsa.pub and ~/.ssh/id_rsa.pub by default).

Preparing the Controller

Setup OS

Setup Network

Add to inventory

Setup DHCP server

Preparing a worker

Setup OS

Add SSH key

Setup Network

Identify MAC address

Add to the inventory

Quickstart

  • Install OS on master (Ubuntu Server 22.04):
    • Fixed IP
    • Open SSH
    • Keep user/password
  • Prepare master for Ansible:
    • Deploy ssh key
  • Install Ansible on host
    • Install required packages on host: ansible-galaxy install -r requirements-galaxy.yml
  • Update inventory to include master IP
  • Run DHCP setup playbook (01)
  • TODO: How to add new worker/client to the inventory/dhcp lease list?
  • Install OS on worker (Ubuntu Server 22.04):
    • Keep user/password
    • Setup worker for SSH access
    • Setup DHCP on Worker (/boot/firmware/network-config)
  • Run NIS setup playbook (02)
  • Run slurm setup playbook (03)
  • Run NFS setup playbook (04)
  • Run worker configuration playbook (05)
  • Run grafana setup playbook (06)
Clone this wiki locally