Use fallocate to enforce disk space #42

julianhess · 2020-03-20T17:57:07Z

Slurm can track disk usage as a consumable resource, but it only checks for available space before launching a task. This is problematic — if there are 100 GBs remaining on the NFS disk and 20 tasks launching concurrently each require 10 GB, the NFS disk will ultimately fill, since for each task, Slurm will see 100 GB free at launch, and will not continuously monitor each task's disk consumption.

Qing had the idea of using fallocate to reserve the full amount of space each job will use ahead of time (as specified by the user). We would then iteratively shrink the file created by fallocate by the amount the job's workspace directory grows in size.

If the user underestimated the total amount of space used by a job, the job should be killed. Because the monitoring process is backgrounded, we would need to trap a signal sent from it.

Because the overhead of monitoring disk usage can be potentially high (e.g., a workspace directory with many subfolders or many little files), this should be disabled by default and only get activated if the user explicitly requests disk space as a consumable resource. This also means that we should set the default disk CRES in Slurm to 0. Finally, we should caution users that they should only reserve disk space for tasks that have nontrivial output sizes (e.g., localization, aligners, etc.)

The text was updated successfully, but these errors were encountered:

julianhess · 2021-04-21T21:40:10Z

I think it would be much easier to run a daemon that monitors how full the NFS is, and increase its size as it gets full. The wasted space would be much cheaper than the amount of developer time it would take to implement something this complicated.

This would be especially easy to implement now because the NFS is served from the controller.

julianhess added enhancement New feature or request Triage New issues which haven't been assigned to a project and need attention labels Mar 20, 2020

agraubert removed the Triage New issues which haven't been assigned to a project and need attention label Apr 2, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use fallocate to enforce disk space #42

Use fallocate to enforce disk space #42

julianhess commented Mar 20, 2020

julianhess commented Apr 21, 2021

Use fallocate to enforce disk space #42

Use fallocate to enforce disk space #42

Comments

julianhess commented Mar 20, 2020

julianhess commented Apr 21, 2021