-
Notifications
You must be signed in to change notification settings - Fork 4
/
README.txt
403 lines (285 loc) · 11.7 KB
/
README.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
_______________________
SLURM BANKING PLUGINS
John Doe
_______________________
Table of Contents
_________________
1. Introduction
2. Limitations
3. Build Requirements
4. Building
.. 1. On Savio
.. 2. NixOS
.. 3. Help
5. Usage
.. 1. Install, enable, and configure
..... 1. Install the `.so' files
..... 2. /etc/slurm/slurm.conf
..... 3. /etc/slurm/plugstack.conf
..... 4. /etc/slurm/bank-config.toml
.. 2. Help/Debugging
6. Developing
.. 1. Project Structure
.. 2. myBRC API Codegen
.. 3. Testing with myBRC
.. 4. Creating a Release
1 Introduction
==============
Slurm banking plugins provide allocation management to Slurm. The
plugins deduct service units for completed and running jobs and
prevent jobs from running if there are insufficient service units
available. The plugins interact with a REST API (provided by myBRC),
documented in the [spec/swagger.json]. The following three plugins are
used:
- [job_submit_plugin] (job submission stage): Estimate maximum job
cost based on submission parameters, and reject job if the API
reports that the user/account has insufficient service units
available.
- [spank_plugin] (job running stage): Report job and estimated cost to
the API.
- [job_completion_plugin] (job completing stage): Modify job in API to
reflect actual usage.
These plugins are written in [Rust] to help with safety. It uses
[rust-bindgen] to automatically generate the Rust foreign function
interface (FFI) bindings based on the Slurm C header files.
[spec/swagger.json] <./spec/swagger.json>
[job_submit_plugin] <./job_submit_plugin>
[spank_plugin] <./spank_plugin>
[job_completion_plugin] <./job_completion_plugin>
[Rust] <https://www.rust-lang.org>
[rust-bindgen] <https://github.com/rust-lang/rust-bindgen>
2 Limitations
=============
- Since the spank plugin cannot cancel a job, the user could overdraw
their service unit allocation if they had enough service units at
the time of submission, but not enough service units at the time the
job starts running, since the units are only actually deducted from
the balance when the job starts running.
- If `--ntasks' or `--cpus-per-task' are unspecified, the job
completion plugin will assume the value is 0 and will always allow
the job, as long as the balance is non-negative. This can be
improved in the future by checking whether the requested partition
is exclusive and how many CPUs each node has, and then using that
information to estimate the number of CPUs that will be used.
- If the myBRC API is offline (or returns errors), the submit plugin
will let the job go through.
3 Build Requirements
====================
- [Rust] (including [cargo])
- [OpenSSL] (needed for making an HTTPS connection to the API)
- [Slurm] header files and source code
- [Clang] (build dependency for [rust-bindgen])
[Rust] <https://www.rust-lang.org/>
[cargo] <https://doc.rust-lang.org/cargo/>
[OpenSSL] <https://openssl.org>
[Slurm] <https://github.com/SchedMD/slurm>
[Clang] <http://clang.llvm.org/get_started.html>
[rust-bindgen]
<https://rust-lang.github.io/rust-bindgen/requirements.html>
4 Building
==========
Since the Slurm `jobcomp' plugins need access to the
`src/common/slurm_jobcomp.h' header, we need access to the Slurm
source code `src' directory in order to build (as well as the normal
`<slurm/slurm.h>' headers on the `CPATH').
You will have to first run `./configure' on the Slurm source code,
otherwise `<slurm/slurm.h>' will not exist. If you don't run
`./configure', the Makefile will try to do it for you.
1. Edit the path at the top of the Makefile to point to the Slurm
source code directory, or symlink `./slurm' in this repository to
point to it.
2. Once you have all the dependencies, just run `make' :)
3. After building, you will find the `.so' files in the same directory
as the Makefile.
4.1 On Savio
~~~~~~~~~~~~
You will need the Rust and `clang' dependencies. Rust can be installed
following the instructions on [rustup.rs], and is easiest if installed
locally for each user. `clang' can be loaded as a module (or by
setting the environment variables).
The plugins can be built as an unprivileged user, as long as that user
can read the Slurm source code.
,----
| # Install Rust locally for your user, and select default installation
| curl --tlsv1.2 -sSf https://sh.rustup.rs | sh
| source $HOME/.cargo/env
|
| # Clone the plugins repository
| git clone https://github.com/ucb-rit/slurm-banking-plugins.git && cd slurm-banking-plugins
|
| # Compile clang from source and load environment
| scripts/clang/build-clang.sh
| source scripts/clang/clang-env.sh
|
| # Point to slurm source code (OR you can make a copy)
| rmdir slurm && cp -r /path/to/slurm/source slurm
|
| # Compile plugins
| make
`----
Then, follow the instructions in [Usage] to install, enable, and
configure the plugins.
*When adding the .so binaries to the nodes with Warewulf, you must use
"wwsh file import" instead of "wwsh file new". Make sure the format
in "wwsh file print" is listed as binary.*
[rustup.rs] <https://rustup.rs>
[Usage] See section 5
4.2 NixOS
~~~~~~~~~
`shell.nix' provides the environment for development on [NixOS]. I run
the following:
,----
| nix-shell
| make
`----
[NixOS] <https://nixos.org>
4.3 Help
~~~~~~~~
For additional reference on building, check [the build on travis-ci].
[the build on travis-ci]
<https://travis-ci.org/ucb-rit/slurm-banking-plugins>
5 Usage
=======
5.1 Install, enable, and configure
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
5.1.1 Install the `.so' files
-----------------------------
The `job_submit_slurm_banking.so' and `jobcomp_slurm_banking.so'
should be installed in `/usr/lib64/slurm/'. The
`spank_slurm_banking.so' plugin should be installed in
`/etc/slurm/spank/'.
,----
| make install
`----
5.1.2 /etc/slurm/slurm.conf
---------------------------
Enable the submit and completion plugins:
,----
| # other config options above...
| JobSubmitPlugins=job_submit/slurm_banking
| JobCompType=jobcomp/slurm_banking
`----
5.1.3 /etc/slurm/plugstack.conf
-------------------------------
Enable the spank plugin:
,----
| optional /etc/slurm/spank/spank_slurm_banking.so
`----
5.1.4 /etc/slurm/bank-config.toml
---------------------------------
Configure the plugin settings. Options that *must* be set properly
include the API URL, API token, and partition names. You can use the
example provided as a template.
,----
| cp bank-config.toml.example /etc/slurm/bank-config.toml
`----
5.2 Help/Debugging
~~~~~~~~~~~~~~~~~~
- The plugins log errors to the slurmd (spank plugin) and slurmctld
(job submit and job completion plugins) logs. You can filter for
their output by grepping for `_bank'.
- For a working example installation, refer to [the Docker files]
[the Docker files] <./docker>
6 Developing
============
I use the [docker-centos7-slurm] Docker container as a base, and build
the plugins on top of it. For newer versions of Slurm, we use our own
fork at [docker-centos7-slurm]. For CentOS 6 testing we also have
[docker-centos6-slurm].
`make docker-dev' builds the development container with Slurm (CentOS
7) plus all the other necessary dependencies for the plugins and drops
you into a shell. The code is stored in `/slurm-banking-plugins' in
the container.
Once in the container, check the Slurm version with `scontrol -V' and
checkout the corresponding Slurm version in
`/slurm-banking-plugins/slurm' so that the plugins are compiled
against the correct Slurm version:
,----
| pushd /slurm-banking-plugins/slurm
| git checkout tags/slurm-20-02-6-1 # for example
| popd
`----
After making your changes, use `make && make install' to compile and
install the plugins, copy the `plugstack.conf' and `bank-config.toml'
config files to `/etc/slurm/', make configuration changes as desired,
and finally restart Slurm with `supervisorctl restart all'.
If the services do not start correctly, try starting them one-by-one
with:
,----
| supervisorctl status # inspect status
| supervisorctl start slurmctld
`----
There is also the CentOS 6 equivalent with `make docker-centos6-dev'.
[docker-centos7-slurm]
<https://github.com/giovtorres/docker-centos7-slurm>
[docker-centos7-slurm] <https://github.com/ucb-rit/docker-centos7-slurm>
[docker-centos6-slurm] <https://github.com/ucb-rit/docker-centos6-slurm>
6.1 Project Structure
~~~~~~~~~~~~~~~~~~~~~
Each plugin is its own Rust project: [job_completion_plugin],
[job_submit_plugin], and [spank_plugin]. Each of these uses the
[slurm_banking] project, which includes the job calculation
functionality and helpers for calling the API. Communication with the
myBRC API is done through [mybrc_rest_client], described in the next
section.
[job_completion_plugin] <./job_completion_plugin>
[job_submit_plugin] <./job_submit_plugin>
[spank_plugin] <./spank_plugin>
[slurm_banking] <./slurm_banking>
[mybrc_rest_client] <./mybrc_rest_client>
6.2 myBRC API Codegen
~~~~~~~~~~~~~~~~~~~~~
I use [openapi-generator] to generate a library to abstract away
access to the API. The API is described by a schema file in
[spec/swagger.json]. This file is automatically generated by the myBRC
API, and can be obtained at `/swagger.json' on the myBRC API.
If the API spec changes and you need to update this plugin, just
regenerate the API client. First, put the new `swagger.json' in
[spec/swagger.json]. To generate the API client based on this new
schema, I use the Dockerized version of [swagger-codegen] like so:
,----
| docker run --rm -v $(shell pwd):/local openapitools/openapi-generator-cli generate \
| -i /local/spec/swagger.json \
| -g rust \
| -o /local/mybrc_rest_client \
| --library=reqwest
`----
You may find the generated files are not owned by your user, so just
run `chown -R $USER mybrc_rest_client'.
[openapi-generator] <https://github.com/OpenAPITools/openapi-generator>
[spec/swagger.json] <./spec/swagger.json>
[swagger-codegen] <https://github.com/swagger-api/swagger-codegen>
6.3 Testing with myBRC
~~~~~~~~~~~~~~~~~~~~~~
,----
| # Build mybrc-rest Docker image from scgup
| docker build -f Dockerfile.mybrc-rest -t mybrc-rest
|
| # Build slurm-banking-plugins-dev image
| make docker-dev
|
| # Launch containers
| docker run --name=mybrc-rest -d -p 8181:8181 mybrc-rest
| docker run \
| -v $(pwd)/job_submit_plugin/src:/slurm-banking-plugins/job_submit_plugin/src \
| -v $(pwd)/job_completion_plugin/src:/slurm-banking-plugins/job_completion_plugin/src \
| -v $(pwd)/slurm_banking/src:/slurm-banking-plugins/slurm_banking/src \
| --link mybrc-rest -it -h ernie slurm-banking-plugins-dev
`----
6.4 Creating a Release
~~~~~~~~~~~~~~~~~~~~~~
GitHub Actions is set up to automatically build [releases] for tags
starting with a `v'. For example, if I push a tag `v0.1.0', it will
build releases for the code at that point. There is a GitHub action to
build using Docker for CentOS 6 and CentOS 7. In each of these, you
may specify the version of Slurm to compile against in the "Compile
plugins" stage by changing the tag to checkout of the Slurm source
code. The GitHub Actions are in [.github/workflows]. In this example,
it's using `slurm-18-08-7-1' in the CentOS 6 build environment:
,----
| - name: Compile plugins
| run: |
| scripts/build-with-docker.sh slurm-18-08-7-1 slurm-banking-plugins-centos6:latest
`----
[releases] <https://github.com/ucb-rit/slurm-banking-plugins/releases>
[.github/workflows] <./.github/workflows>