Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Eliminate config options for approximate input file size #427

Open
hannes-ucsc opened this issue Aug 24, 2016 · 0 comments
Open

Eliminate config options for approximate input file size #427

hannes-ucsc opened this issue Aug 24, 2016 · 0 comments

Comments

@hannes-ucsc
Copy link
Contributor

Asking the user to specify the input size is error prone and inconvenient.

Whenever we download a file to a local disk for the purpose of uploading it to the job store, we should switch to using Toil's import functionality instead. It uses streaming instead of local disk thereby eliminating the need for estimating a disk requirement for the import job. As imports are implemented in Toil right now, this approach might be less reliable and slower than using s3am but we can address those issues in Toil if and when they occur.

What do we do in cases where files are processed immediately after being downloaded from an external location and the job store upload is skipped? Not skipping is one option. Trying to determine the file size is another. For HTTP this can be done with a HEAD request, for S3 there is a API call, probably also being a HEAD request under the hood.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant