-
Notifications
You must be signed in to change notification settings - Fork 22
Allocating Resources
Resource allocation requests generally come in two forms. First, users include an initial resource request when submitting applications to the scheduler queue. Traditionally, this has been in the form of a static envelope designed to fit the maximum size of the job.
However, interest in dynamic resource management has also been growing as new programming models continue to emerge in the high-performance computing and data analytics areas. Of primary concern is the enablement of application-directed resource changes in partnership with the scheduler. Several broad categories are envisioned, including the ability to:
- request allocation of additional resources, including memory, bandwidth, and compute. This should be accomplished in a non-blocking manner so that the application can continue to progress while waiting for resources to become available. Note that the new allocation will be disjoint from (i.e., not affiliated with) the allocation of the requestor - thus the termination of one allocation will not impact the other.
- extend the reservation on currently allocated resources, subject to scheduling availability and priorities. This includes extending the time limit on current resources, and/or requesting additional resources be allocated to the requesting job. Any additional allocated resources will be considered as part of the current allocation, and thus will be released at the same time.
- release currently allocated resources that are no longer required. This is intended to support partial release of resources since all resources are normally released upon termination of the job. The identified use-cases include resource variations across discrete steps of a workflow, as well as applications that spawn sub-jobs and/or dynamically grow/shrink over time
- "lend" resources back to the scheduler with an expectation of getting them back at some later time in the job. This can be a proactive operation (e.g., to save on computing costs when resources are temporarily not required), or in response to scheduler requests in lieue of preemption. A corresponding ability to "reacquire" resources previously released would be required.
#define JSRUN_NRS “jsrun.nrs” #define JSRUN_USE_ALLOCATION “jsrun.use_allocation” #define JSRUN_GPUS_PER_RS "jsrun.gpu_per_rs" #define JSRUN_MEM_PER_RS “jsrun.mem_per_rs” #define JSRUN_CPUS_PER_RS “jsrun.cpu_per_rs" #define JSRUN_LATENCY_PRIORITY "jsrun.latency_priority" #define JSRUN_LAUNCH_ON_ROOT “jsrun.root_scheduling" #define JSRUN_USE_RESOURCE “jsrun.use_resource" #define JSRUN_RESOURCES_PER_HOST "jsrun.nrs_per_host"