Adjust wdl task runtime attributes (mem, cpu etc) depending on size of input


A lot of our wdl tasks have been failing due to insufficient resources when dealing with data that is not within our usual expected size range.
Its been proposed (thanks Scott) that rather than dialling up all the resources requested by wdl tasks, a more "scientific" approach be used where we check the size of the input before running a task on that input. The resource request for that task can then be set appropriately.


For example, there was a recent failure in the VcfMerge task which takes 2 vcf files and merges them together.
The wdl task that calls this command is setup to use 20GB of memory.
Vcf files from WGS typically have around 5 million records in them. A recent run against some NIH data resulted in vcf files containing over 12 million reads. 20GB of memory was not enough for this dataset and the job ran out of memory.
If there was a means of checking the size of the vcf files before they are merged, the resources for the merge wdl task could be adjusted appropriately.


**Describe the solution you'd like**

I believe that there is a way within wdl of getting line counts / file size. Would then need to come up with some cutoffs and resource parameters 

**Describe alternatives you've considered**
Bump all wdl task resources when running with large datasets.

**Additional context**
??


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Adjust wdl task runtime attributes (mem, cpu etc) depending on size of input #134

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Adjust wdl task runtime attributes (mem, cpu etc) depending on size of input #134

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions