Replace file system as source for status data to enable at scale compute tasks

## New feature

Hi! Thanks for using Nextflow and submitting the proposal 
for a new feature or the enhancement of an existing functionality. 

Please replace this text providing a short description of your 
proposal.

## Usage scenario 

(What's the main usage case and the deployment scenario addressed by this proposal)

Large scale compute tasks that run in parallel (i.e. 10,000 tasks running concurrently) currently created many many small data files that are required to manage the workflow status and enable a workflow restart.

As parallel workloads scale over large clusters or cloud environments, this creates issues with data management, IO bottlenecks and lock contention, all of which impacts and impedes data analysis at scale.

## Suggest implementation 

(Highlight the main building blocks of a possible implementation and/or related components)

By managing status data within a data base structure, ideally of a resilient structure, the impact of the small data packages will be significantly improved and managed in a single service location.

Various databases support resilient infrastructure, i.e. MongoDB, MySQL, PostgreSQL etc etc. Would it be reasonable to push data into one such backend and ideally manage connection pooling to remove or reduce the overhead of establishing new connections etc ?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Replace file system as source for status data to enable at scale compute tasks #1245

New feature

Usage scenario

Suggest implementation

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Replace file system as source for status data to enable at scale compute tasks #1245

Description

New feature

Usage scenario

Suggest implementation

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions