Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Advice about running on gcs? #190

Open
njbernstein opened this issue Apr 28, 2021 · 1 comment
Open

Advice about running on gcs? #190

njbernstein opened this issue Apr 28, 2021 · 1 comment

Comments

@njbernstein
Copy link

Hi there,

Do you have any advice about running needlestack on a large number of samples on google cloud?

Any chance you have a config already for it?

Do all reads get loaded into memory at the same time?

Do you ball park know how much ram would be necessary for 1000 samples or even 10,000 samples?

@mfoll
Copy link
Member

mfoll commented Jun 7, 2021

Hi,

Sorry we don't have a config ready for this. Maybe have a read at the Nextflow doc here: https://www.nextflow.io/docs/latest/google.html

The main parameters to deal with memory will be nsplit: the genome (or the target region if you provide a bed file) will be split in nsplit chunks. Each chunk will be run as a job, where reads will be processed by samtools and converted in text file that will be loaded full in memory by R. The more you increase nsplit the smaller this file will be.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants