AWS Environment Setup

Getting Started on AWS

Have aws cli installed and run:

aws configure

Also, create a key pair using the instructions on this website

Cromwell

We will be using Cromwell as the job scheduler. Follow this tutorial to get an idea of how to configure Cromwell on your local machine.

Once you have Cromwell up and running on your local machine we will have to configure AWS Batch such that you can execute workflows on it using Cromwell.

This website gives an overview of how to configure the AWS environment properly. Go to the Cromwell subsection of the website and launch the three CloudFormation templates provided. These CloudFormation templates will help in launch and configuring all the necessary resources required to execute workflows using Cromwell on AWS. Keep in mind the stackname you provide during launching the genomics workflow core has to be provided as a reference to the CloudFormation template used for launching Cromwell.

Now you have to create a configuration file which will be used by Cromwell during execution. Under subsection "Configuring Cromwell to use AWS Batch" on this website you will find a sample configuration file. You can remove the database section of the configuration file as we will not be using that capability of Cromwell for this pipeline. Another sample configuration file can be found here.

Fill in the region, the root bucket, queue ARN and script bucket. The queue ARN will be of the batch queue which was created previously using the CloudFormation template. Go to the Batch section of the AWS console to find the previously created queue. Keep the script bucket to be one level higher than the root bucket. For example if your root bucket is s3://course/cromwell-execution, keep the script bucket as s3://course. Do not use the s3 prefix while entering the script bucket in the configuration file.

Now we are ready to execute pipelines on AWS using Cromwell.

Google Cloud Environment Setup

Getting Started on Google Cloud

Have Google Cloud SDK installed and run:

gcloud init

This will set up your default project and grant credentials to the Google Cloud SDK. Also, provide credentials so that dsub can call Google APIs:

gcloud auth application-default login

Install dsub using the instructions made in their github page. Try to familiarize yourself with the various parameters required to execute a job on Google Cloud using dsub, as it is the job scheduler we will be using.

For executing WDL files on Google Cloud follow the tutorial at: https://cloud.google.com/life-sciences/docs/tutorials/gatk

Name		Name	Last commit message	Last commit date
Latest commit History 66 Commits
Genomics		Genomics
Serverless		Serverless
Transcriptomics/rnaseq-bulk		Transcriptomics/rnaseq-bulk
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AWS Environment Setup

Getting Started on AWS

Cromwell

Google Cloud Environment Setup

Getting Started on Google Cloud

About

Releases

Packages

Contributors 3

Languages

StanfordBioinformatics/StanfordDeepMedicine

Folders and files

Latest commit

History

Repository files navigation

AWS Environment Setup

Getting Started on AWS

Cromwell

Google Cloud Environment Setup

Getting Started on Google Cloud

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages