Skip to content

Running h3aGWAS on Amazon EC2

Scott Hazelhurst edited this page May 16, 2017 · 4 revisions
  1. We assume you have an Amazon AWS account and have some familiariy with EC2. The easiest way to run is by building am Amazon Elastic File System (EFS) which persists between runs. Each time you run, you attach the EFS to the cluster you use. We assume you have
  • Your Amazon accessKey and secretKey
  • you have the ID of your EFS
  • you have the ID of the subnet you will use for your Amazon EC2.

Something like this should be put in the nextflow config file

    aws {
     accessKey ='AAAAAAAAAAAAAAAAA'
     secretKey = 'jaghdkGAHHGH13hg3hGAH18382GAJHAJHG11'
     region    ='eu-west-1'
    }

    cloud {
             ...
             ...
             ... other options
             imageId = "ami-94c9ebe7"      // AMI which has cloud-init installed
             sharedStorageId   = "fs-XXXXXXXXX"   // Set a common mount point for images
             sharedStorageMount = "/mnt/shared 
   	     subnetId = "subnet-XXXXXXX" 
    }
  1. Create the cloud. For the simple example, you only need to have one machine. If you have many, big files adjust accordingly.

nextflow cloud create h3agwascloud -c 1

  1. If successful, you will be given the ID of the headnode of the cluster to log in. You should see a message like

  2. ssh into the headnode. The EFS is mounted onto /mnt/shared. In our example, we will analyse the files sampleA.{bed,bim,fam} in the /mnt/shared/input directory

  3. Run the workflow -- you can run directly from github. The AMI doesn't have any of the bioinformatics software installed. Specify the docker profile and nextflow will run using Docker, fetching any necessary images.

nextflow run h3abionet/h3agwas --work_dir /mnt/share --input_pat sampleA --max_plink_cores 2 -profile docker

  1. The output can be found in /mnt/shared/output. The file sampleA.pdf is a report of the analysis that was done.

  2. Remember to shutdown the Amazon cluste to avoid unduly boosting Amazon's share price.