Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: Attach the disk of specified size to Batch VM #206

Open
tonybendis opened this issue Apr 12, 2021 · 3 comments · May be fixed by microsoft/ga4gh-tes#778
Open

Feature request: Attach the disk of specified size to Batch VM #206

tonybendis opened this issue Apr 12, 2021 · 3 comments · May be fixed by microsoft/ga4gh-tes#778
Assignees
Labels
enhancement New feature or request
Milestone

Comments

@tonybendis
Copy link
Contributor

tonybendis commented Apr 12, 2021

The Batch VM is currently selected based on resource requirements of the Cromwell task. The code selects the cheapest VM that has CPU count, memory and disk size that are (all three) equal or larger than requested. For tasks requiring large disk, this increases the cost of the VM because larger disk requirement will selected the VM that also has CPU/memory much higher that wanted.

Proposed solution:
In light of the fact that TES now supports out of the box local NVMe drives which are found on more than just series L SKUs but the code does NOT use the NVMe drive presence in the selection code described above, the following is the new proposal:

  1. Add total drive size of NVMe drives in any given SKU to the current VM whitelist.
  2. Add Standard LRS SSD storage to the PriceApi client, and capture this in the GenerateBatchVmSkus tool
  3. Change the logic to select the VM by using the greater of ResourceDiskSizeInGiB and NVMe drives total size for all SKUs where that value is >= the requested disk size, and add the additional cost of the requested disk size to the cost of the SKU for all other SKUs.
  4. When generating the pool specification, if the requested disk size is greater than the larger of the ResourceDiskSizeInGiB and NVMe drives total size, add a Standard LRS disk of the requested size to the pool spec and format and mount it to the same mount point as the current NVMe start-task.

Estimated effort: 3 days

Note: Premium SSD is not available for all SKUs, Premium v2 is not available through Batch (same with Ultra). To simplify the implementation and maintain costs, Standard SSD was chosen.

Consider dividing the requested capacity by the number of additional drives that can be attached in order to pool them to improve disk I/O.

Previous proposed solution:

Change the logic to select the VM based on CPU count and memory only, then attach the disk of requested size (up to a limit, settable in the configuration?). Additionally, to further improve disk I/O, attach multiple smaller disks and pool them together into single logical volume.

See https://learn.microsoft.com/en-us/azure/virtual-machines/premium-storage-performance

Example code to run on the batch VM to prepare the volume (assumes that 4 disks are attached):

  1. Create physical volumes:
    pvcreate /dev/sdc /dev/sdd /dev/sde /dev/sdf
    OR pvcreate /dev/sd[c-f]

  2. Create volume group (group of physical volumes):
    vgcreate vg1 /dev/sdc /dev/sdd /dev/sde /dev/sdf
    OR vgcreate vg1 /dev/sd[c-f]

  3. Create logical volume, striped over the volumes in the volume group:
    lvcreate -i 4 -I 512k -n lv1 -L $(vgs vg1 -o vg_size --noheadings) vg1 (4 is the number of stripes, needs to be equal to number of disks, 512k is the stripe size)
    OR
    lvcreate --extents 100%FREE --stripes 4 -I 512k --name lv1 vg1

  4. Create the file system on the logical volume:
    mkfs -t xfs /dev/vg1/lv1

  5. Mount the volume:
    mkdir -p /mnt1
    mount /dev/vg1/lv1 /mnt1

@tonybendis tonybendis added the enhancement New feature or request label Apr 12, 2021
@tonybendis tonybendis added this to the 2.3 milestone Apr 12, 2021
@patmagee
Copy link
Contributor

@tonybendis have you created a branch for this as of yet that I can poke around in?

@shanamatthews shanamatthews modified the milestones: 2.3, backlog Apr 19, 2021
@tonybendis tonybendis modified the milestones: backlog, 2.5 Aug 3, 2021
@tonybendis tonybendis modified the milestones: 2.5, backlog Sep 23, 2021
@olesya13 olesya13 modified the milestones: backlog, next Oct 14, 2022
@olesya13 olesya13 added the needs discussion Team discussion is needed label Oct 14, 2022
@olesya13 olesya13 removed the needs discussion Team discussion is needed label Nov 11, 2022
@ngambani
Copy link

@BMurri this looks like a long pending issue open since 2021, are we actively working on this or can this be closed?

@BMurri BMurri added the up for grabs Available for community contributions. Please ask in the issue if you'd like to implement it label Jan 31, 2024
@BMurri
Copy link
Collaborator

BMurri commented Jan 31, 2024

@ngambani This is another one that would be great for someone in the community to pitch in on. It can help control costs with some types of workflows that don't need more cores but still need more storage space. microsoft/ga4gh-tes#454 in a crude way would do the same thing, but without as much cost savings nor the more precise control.

@ngambani ngambani added the tobegroomed Add this label while creating new issues to get issues prioritized on the backlog label Feb 15, 2024
@BMurri BMurri removed the up for grabs Available for community contributions. Please ask in the issue if you'd like to implement it label Aug 30, 2024
@BMurri BMurri self-assigned this Aug 30, 2024
@BMurri BMurri removed the tobegroomed Add this label while creating new issues to get issues prioritized on the backlog label Aug 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants