Skip to content

Specific instructions to run harvester with ssh rpc middleware

FaHui Lin edited this page May 23, 2019 · 5 revisions

Instructions

Basic description of SSH RPC middleware: see here

The queue configuration depends on the architecture of the cluster (HPC) and the use case.

Generic use case

Consider an HPC:

  • No outbound connectivity on login nodes and worker nodes
  • One can access login nodes via SSH
  • Login nodes have the same environment and mount the same shared filesystem as worker nodes do
  • Allowing service process to run on login nodes (no cputime limit per process or other limitations)
  • With DTNs (data transfer nodes) which has outbound connectivity and grid data transfer tools (globus, gfal, xroot, etc.)
  • DTNs are accessible from login nodes

Then it suffices to run harvester rpc_bot process on the login node of HPC, and let all harvester plugins run on the login node (run remotely).

That is, harvester runs the following plugins remotely:

  • submitter
  • monitor
  • sweeper
  • messenger
  • preparator
  • stager

The queue configuration (partial) may look like this:

                "preparator": {
                        "name": "SomePreparator",
                        "module": "pandaharvester.harvesterpreparator.some_preparator",
                        "basePath": "/some/remote/base/path",
                        "middleware": "rpc"
                },
                "submitter": {
                        "name":"SlurmSubmitter",
                        "module":"pandaharvester.harvestersubmitter.slurm_submitter",
                        "nCore": 9600,
                        "nCorePerNode": 48,
                        "templateFile": "/some/remote/template.sh",
                        "middleware": "rpc"
                },
                "messenger": {
                        "name": "SharedFileMessenger",
                        "module": "pandaharvester.harvestermessenger.shared_file_messenger",
                        "accessPoint": "/some/remote/path/${workerID}",
                        "middleware": "rpc"
                },
                "stager": {
                        "name":"SomeStager",
                        "module":"pandaharvester.harvesterstager.some_stager",
                        "middleware": "rpc"
                },
                "monitor": {
                        "name":"SlurmMonitor",
                        "module":"pandaharvester.harvestermonitor.slurm_monitor",
                        "middleware": "rpc"
                },
                "sweeper": {
                        "name": "SlurmSweeper",
                        "module": "pandaharvester.harvestersweeper.slurm_sweeper",
                        "middleware": "rpc"
                },
                "rpc": {
                        "name": "RpcHerder",
                        "module": "pandaharvester.harvestermiddleware.rpc_herder",
                        "remoteHost": "some.remote.host",
                        "remoteBindPort": 18861,
                        "numTunnels": 3,
                        "sshUserName": "someusername",
                        "sshPassword": null,
                        "privateKey": "/some/private/key",
                        "passPhrase": "somepassphrase",
                        "jumpHost": "some.jump.host",
                        "jumpPort": 22
                }

Note that the paths set for the plugin with rpc (e.g. messenger accessPoint) are the remote ones; i.e. on the HPC side.