-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/slurm remote support #250
Draft
mhrtmnn
wants to merge
30
commits into
esa-tu-darmstadt:develop
Choose a base branch
from
mhrtmnn:feature/SlurmRemoteSupport
base: develop
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Feature/slurm remote support #250
mhrtmnn
wants to merge
30
commits into
esa-tu-darmstadt:develop
from
mhrtmnn:feature/SlurmRemoteSupport
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Job scheduling logic shall be moved from {Compose,HLS}-Task into the Slurm object. For this, some additional information is required.
Preamble copies all files that are required for the current job to the SLURM node, postamble copies all generated artefacts back from node.
The absolute path to both scripts may be supplied via the key "PreambleScript" resp. "PostambleScript" in the SLURM JSON cfg file.
mhrtmnn
force-pushed
the
feature/SlurmRemoteSupport
branch
from
December 14, 2020 12:37
f1e7781
to
b7c0a20
Compare
Previously, a job would be broken into its tasks, and a new tapasco job would be created for each task. These jobs were then executed on the SLURM cluster. Refactor this, such that the original job is executed on the SLURM cluster as-is, which simplifies the SLURM logic.
Since SLURM cluster now processes whole jobs (instead of single tasks), dependencies (preamble) and produced artefacts (postamble) of multiple platform/architecture pairs may need to be transferred.
Executing Tapasco in SLURM mode, e.g. Tapasco can be installed via a SLURM job script like the following:
Note: Building toolflow via |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This pull request extends the SLURM support of tapasco such that remote compute nodes can be used for carrying out HLS and compose jobs.
The required architecture consist of three networked machines:
Host (front end):
Runs a tapasco instance that takes in the user CLI arguments and collects all required files for the selected job (e.g. kernel source files for HLS jobs or IPCores for compose jobs). These dependencies are copied over the network to a separate node referred to as
Workstation
. The artefacts that are generated by a job (e.g. IPCore for HLS, bitstream for compose) are copied back to the Host once the job finishes.Workstation:
In the simplest case a network attached storage. It is required, since in the general case we cannot directly push files to the SLURM compute node. Thus, the files are deposited in a known directory on this node, and the SLURM compute node can pull the files from here by itself.
SLURM node (back end):
Login node to the compute node that has SLURM control tools such as
sbatch
andsqueue
installed. The compute node runs its own tapasco instance.The above setup is configurable through a JSON config file. This PR contains an example file at
toolflow/vivado/common/SLURM/ESA.json
that describes an ESA internal compute node. Different configurations can be selected via tapasco CLI options at the Host, for example--slurm ESA
.