Semantics for `runtime` `disks` mount points are confusing #672

adamnovak · 2024-07-02T19:02:41Z

The disks key in the runtime section is documented as providing "persistent volumes" at certain mount point paths. Those paths are specified as being "in the host environment" in WDL 1.2, and I think "on the host machine" in 1.1.

wdl/SPEC.md

Lines 5072 to 5088 in 664adc3

    
           ##### `disks` 
        
           * Accepted types: 
        
               * `Int`: Amount disk space to request, in `GiB`. 
        
               * `String`: A disk specification - one of the following: 
        
                   * `"<size>"`: Amount of disk space to request, in `GiB`. 
        
                   * `"<size> <units>"`: Amount of disk space to request, in the given units. 
        
                   * `"<mount-point> <size>"`: A mount point and the amount of disk space to request, in `GiB`. 
        
                   * `"<mount-point> <size> <units>"`: A mount point and the amount of disk space to request, in the given units. 
        
               * `Array[String]` - An array of disk specifications. 
        
           * Default value: `1 GiB` 
        
           The `disks` attribute provides a way to request one or more persistent volumes, each of which has a minimum size and is mounted at a specific location. When the `disks` attribute is provided, the execution engine must guarantee the requested resources are available or immediately fail the task prior to instantiating the command. 
        
           If a mount point is specified, then it must be an absolute path to a location in the host environment. If the mount point is omitted, it is assumed to be a persistent volume mounted at the root of the execution directory within a task. 
        
           The execution engine is free to provision any class(es) of persistent volume it has available (e.g., SSD or HDD). The [`disks` hint](#-disks) hint can be used to request specific attributes for the provisioned disks.

If the mount point is a host side path, then where is the storage expected to be mounted in the container where the WDL command section gets run? If the storage is meant to be mounted at the given path in the container, then why does that path need to also exist on the host?

Is this meant to just let the WDL task get access to a particular directory on the host, by mounting that path on the host into the container at the same path?

What kind of "persistence" specifically is supposed to be available? If two tasks run one after the other, and they both have a disks entry with a given mount point and size, should the second task be guaranteed to see files there written by the first task? Or can the "persistent" volume be a fresh empty directory for each task? Or is some kind of opportunistic sharing expected?

If a task requests a 100 GiB persistent volume, does it have to deal with the possibility that, upon being mounted, it already had 50 GiB of files in it left over from previous tasks and only has 50 GiB free space?

If two tasks run at the same time, can they ever share a persistent volume? Or does a task get exclusive ownership of a persistent volume while it is mounted?

We're trying to implement this in Toil in DataBiosphere/toil#5001 and so far we've come up with an implementation that just mounts the specified path from the host into the container. But I think it really makes more sense to mount fresh empty directories with the given amount of reserved space into the container instead, since that matches what I would imagine a workflow would actually want. But that completely ignores the "persistent" part of the spec.

Are there any workflow examples that use mount points, beyond the test examples in the spec that just measure their size? What kind of behavior do they expect w.r.t. persistence or the relationship between in-container and host-side paths?

The text was updated successfully, but these errors were encountered:

adamnovak · 2024-07-02T19:54:27Z

Cromwell's implementation provides explicitly ephemeral storage ("All disks are set to auto-delete after the job completes.") and appears to mount the requested amount of storage at the requested path as seen from the task's perspective, with no reference to the host machines' filesystems.

jdidion · 2024-07-24T00:53:11Z

The specified mount points must be available to the task. So if the task is executing in a Docker container, the volumes must be mounted in the container. Whether and how those volumes map to volumes on the host machine is up to the execution engine.

I don't think we ever considered whether it's allowed for the mount point to already exist. I think the way Cromwell does it is probably the most logical. If the mount point already exists, it should be an error, although I guess it would be ok for the execution engine to decide that its ok to attach an existing, empty mount point that has at least the amount of requested storage.

We probably want to add a separate volume requirement that would cause an existing volume to be mounted read-only to allow accessing files on a shared file system, EBS volume, etc.

jdidion linked a pull request Jul 24, 2024 that will close this issue

Fix many issues with examples #670

Open

1 task

jdidion self-assigned this Jul 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Semantics for `runtime` `disks` mount points are confusing #672

Semantics for `runtime` `disks` mount points are confusing #672

adamnovak commented Jul 2, 2024

adamnovak commented Jul 2, 2024

jdidion commented Jul 24, 2024

Semantics for runtime disks mount points are confusing #672

Semantics for runtime disks mount points are confusing #672

Comments

adamnovak commented Jul 2, 2024

adamnovak commented Jul 2, 2024

jdidion commented Jul 24, 2024

Semantics for `runtime` `disks` mount points are confusing #672

Semantics for `runtime` `disks` mount points are confusing #672