Advice about running on gcs? #190

njbernstein · 2021-04-28T20:46:54Z

Hi there,

Do you have any advice about running needlestack on a large number of samples on google cloud?

Any chance you have a config already for it?

Do all reads get loaded into memory at the same time?

Do you ball park know how much ram would be necessary for 1000 samples or even 10,000 samples?

mfoll · 2021-06-07T07:41:45Z

Hi,

Sorry we don't have a config ready for this. Maybe have a read at the Nextflow doc here: https://www.nextflow.io/docs/latest/google.html

The main parameters to deal with memory will be nsplit: the genome (or the target region if you provide a bed file) will be split in nsplit chunks. Each chunk will be run as a job, where reads will be processed by samtools and converted in text file that will be loaded full in memory by R. The more you increase nsplit the smaller this file will be.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Advice about running on gcs? #190

Advice about running on gcs? #190

njbernstein commented Apr 28, 2021

mfoll commented Jun 7, 2021

Advice about running on gcs? #190

Advice about running on gcs? #190

Comments

njbernstein commented Apr 28, 2021

mfoll commented Jun 7, 2021