Skip to content

Adjust wdl task runtime attributes (mem, cpu etc) depending on size of input #134

@holmeso

Description

@holmeso

A lot of our wdl tasks have been failing due to insufficient resources when dealing with data that is not within our usual expected size range.
Its been proposed (thanks Scott) that rather than dialling up all the resources requested by wdl tasks, a more "scientific" approach be used where we check the size of the input before running a task on that input. The resource request for that task can then be set appropriately.

For example, there was a recent failure in the VcfMerge task which takes 2 vcf files and merges them together.
The wdl task that calls this command is setup to use 20GB of memory.
Vcf files from WGS typically have around 5 million records in them. A recent run against some NIH data resulted in vcf files containing over 12 million reads. 20GB of memory was not enough for this dataset and the job ran out of memory.
If there was a means of checking the size of the vcf files before they are merged, the resources for the merge wdl task could be adjusted appropriately.

Describe the solution you'd like

I believe that there is a way within wdl of getting line counts / file size. Would then need to come up with some cutoffs and resource parameters

Describe alternatives you've considered
Bump all wdl task resources when running with large datasets.

Additional context
??

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions