Skip to content

Commit

Permalink
finished reformatting all the gcp notebooks
Browse files Browse the repository at this point in the history
  • Loading branch information
kyleoconnell-NIH committed May 14, 2024
1 parent 78115a4 commit fd85e53
Show file tree
Hide file tree
Showing 8 changed files with 472 additions and 1,313 deletions.
60 changes: 53 additions & 7 deletions notebooks/GWASCoatColor/GWAS_coat_color.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -5,18 +5,44 @@
"id": "7a244bb3",
"metadata": {},
"source": [
"# GWAS in the cloud\n",
"We adapted the NIH CFDE tutorial from [here](https://training.nih-cfde.org/en/latest/Bioinformatic-Analyses/GWAS-in-the-cloud/background/) and fit it to a notebook. We have greatly simplified the instructions, so if you need or want more details, look at the full tutorial to find out more.\n",
"# GWAS in the Cloud\n",
"\n",
"## Overview \n",
"We retrofitted the NIH CFDE tutorial from [here](https://training.nih-cfde.org/en/latest/Bioinformatic-Analyses/GWAS-in-the-cloud/background/) to a notebook so that you could run it on Vertex AI. We have greatly simplified the instructions, so if you need or want more details, look at the full tutorial to find out more.\n",
"Most of this notebook is bash, but expects that you are using a Python kernel, until step 3, plotting, you will need to switch your kernel to R."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Learning Objectives\n",
"+ Learn how to run a GWAS analysis in Google Cloud"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Prerequisites\n",
"+ You only need access to a Vertex AI environment to run this notebook"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Get Started"
]
},
{
"cell_type": "markdown",
"id": "8fbf6304",
"metadata": {},
"source": [
"## 1. Setup\n",
"### Download the data\n",
"### Install packages and set up environment\n",
"\n",
"#### Download the data\n",
"use %%bash to denote a bash block. You can also use '!' to denote a single bash command within a Python notebook"
]
},
Expand Down Expand Up @@ -51,7 +77,7 @@
"id": "28aadbf8",
"metadata": {},
"source": [
"### Install dependencies\n",
"#### Install dependencies\n",
"Here we install mamba, which is faster than conda, but it can be tricky to add to path in a Sagemaker notebook so we just call the whole path. You could also skip this install and just use conda since that is preinstalled in the kernel."
]
},
Expand Down Expand Up @@ -93,7 +119,7 @@
"id": "3de2fc4c",
"metadata": {},
"source": [
"## 2. Analyze"
"## Begin the Analysis"
]
},
{
Expand Down Expand Up @@ -269,7 +295,7 @@
"id": "1f52e97c",
"metadata": {},
"source": [
"## 3. Plotting\n",
"## Plotting\n",
"In this tutorial, plotting is done in R, so at this point you can change your kernel to R in the top right. Wait for it to say 'idle' in the bottom left, then continue. You could also plot using Python native packages and maintain the Python notebook kernel."
]
},
Expand Down Expand Up @@ -353,6 +379,13 @@
" chrlabs = c(1:38, \"X\"), annotateTop=TRUE, cex = 1.2)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Conclusions"
]
},
{
"cell_type": "markdown",
"id": "26787d84",
Expand All @@ -362,6 +395,19 @@
"\n",
"The top associated mutation is a nonsense SNP in the gene MC1R known to control pigment production. The MC1R allele encoding yellow coat color contains a single base change (from C to T) at the 916th nucleotide."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Clean Up\n",
"You just need to stop this instance and optionally delete the instance and storage bucket"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": []
}
],
"metadata": {
Expand Down
85 changes: 54 additions & 31 deletions notebooks/LifeSciencesAPI/nextflow/Part1_LS_API_Nextflow.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,8 @@
"metadata": {},
"source": [
"# Use Nextflow to run workflows using the Cloud Life Sciences API Part I\n",
"\n",
"## Overview\n",
"Here we are going to walk through submitting simple jobs directly to the Life Sciences API, then dive into interacting with the API using Nextflow. We will run some basic Hello World jobs, then move to a more complex [nf-core Methylseq workflow](https://nf-co.re/methylseq). "
]
},
Expand All @@ -17,6 +19,38 @@
"<div class=\"alert alert-block alert-danger\"> <b>Warning:</b> Google Life Sciences API is depreciated and will no longer be avaible by July 8, 2025 on the platform. Please switch to the <a href=\"../../GoogleBatch/nextflow/Part1_GBatch_Nextflow.ipynb\">Google Batch Nextflow tutorials</a>. </div>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Learning Objectives\n",
"+ Learn to use Nextflow on Google Cloud\n",
"+ Learn to submit Nextflow jobs to Google Life Sciences API"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Prerequisites\n",
"Make sure that Cloud Life Sciences, Compute Engine, and Cloud Storage APIs are all enabled.\n",
"\n",
"You also want to make sure your Compute Engine Default Service Account has the following Roles:\n",
"\n",
" - lifesciences.workflowsRunner\n",
" - iam.serviceAccountUser\n",
" - serviceusage.serviceUsageConsumer\n",
" - storage.objectAdmin\n",
"Your Service Account should already have these roles assigned, but if not, reach out to Support to have your account updated."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Get Started"
]
},
{
"cell_type": "markdown",
"id": "0f8f4b85-9459-497d-97ec-5909e8aeacae",
Expand All @@ -25,7 +59,7 @@
"tags": []
},
"source": [
"## 1. Setup your environment"
"### Install packages and setup your environment"
]
},
{
Expand All @@ -35,7 +69,7 @@
"id": "f2e4a5ca-8a2b-4156-b83e-c89f0c1ffc9c"
},
"source": [
"### Create a bucket"
"#### Create a bucket"
]
},
{
Expand All @@ -45,7 +79,7 @@
"metadata": {},
"outputs": [],
"source": [
"#make sure you change this name, it needs to be globally unique\n",
"# make sure you change this name, it needs to be globally unique\n",
"%env BUCKET=gls-api-nextflow"
]
},
Expand All @@ -56,7 +90,7 @@
"metadata": {},
"outputs": [],
"source": [
"#will only create the bucket if it doesn't yet exist\n",
"# will only create the bucket if it doesn't yet exist\n",
"! gsutil ls gs://$BUCKET >& /dev/null || gsutil mb gs://$BUCKET"
]
},
Expand All @@ -67,7 +101,7 @@
"metadata": {},
"outputs": [],
"source": [
"#set versioning on the bucket so it can overwrite old files\n",
"# set versioning on the bucket so it can overwrite old files\n",
"! gsutil versioning set on gs://$BUCKET"
]
},
Expand All @@ -76,7 +110,7 @@
"id": "f5d588a5-83b2-42ef-a65f-64b2c80bca3f",
"metadata": {},
"source": [
"### Install dependencies"
"#### Install dependencies"
]
},
{
Expand Down Expand Up @@ -108,29 +142,12 @@
"! ./nextflow self-update"
]
},
{
"cell_type": "markdown",
"id": "07d1891a-8338-4592-a3a0-eaab55cd8df0",
"metadata": {},
"source": [
"### Ensure you have APIs enabled and IAM permissions\n",
"Make sure that Cloud Life Sciences, Compute Engine, and Cloud Storage APIs are all enabled.\n",
"\n",
"You also want to make sure your Compute Engine Default Service Account has the following Roles:\n",
"\n",
" - lifesciences.workflowsRunner\n",
" - iam.serviceAccountUser\n",
" - serviceusage.serviceUsageConsumer\n",
" - storage.objectAdmin\n",
"Your Service Account should already have these roles assigned, but if not, reach out to Support to have your account updated."
]
},
{
"cell_type": "markdown",
"id": "a73b5bf4-3e68-44c2-9874-02c637e730bf",
"metadata": {},
"source": [
"## 2. Submit Hello World to the API"
"## Submit Hello World to the API"
]
},
{
Expand Down Expand Up @@ -172,7 +189,7 @@
"metadata": {},
"outputs": [],
"source": [
"#set your operation ID here\n",
"# set your operation ID here\n",
"%env ID=10485099716669037373"
]
},
Expand All @@ -188,7 +205,7 @@
},
"outputs": [],
"source": [
"!gcloud beta lifesciences operations describe $ID"
"! gcloud beta lifesciences operations describe $ID"
]
},
{
Expand Down Expand Up @@ -232,7 +249,7 @@
"id": "33a142e0-bd9a-405d-91f9-827503ff5fb1"
},
"source": [
"## 3. Run Nextflow Locally"
"## Run Nextflow Locally"
]
},
{
Expand Down Expand Up @@ -327,7 +344,7 @@
"tags": []
},
"source": [
"## 4. Submit Nextflow Job to the Life Sciences API\n",
"## Submit Nextflow Job to the Life Sciences API\n",
"Create and modify your own config file to include a 'gls' profile block to tell Nextflow to submit the job to the API instead of running locally"
]
},
Expand Down Expand Up @@ -458,8 +475,14 @@
"id": "e386ccb3-aa6d-4a77-8d7d-c20ed0419f84"
},
"source": [
"You will notice in the above that to the left of the process within the __[ ]__ is actually a __tag__ you can search in Life Sciences and the text before the __/__ corresponds to the __temporary directories__ within your working directory. Feel free to delete the temporary directories once your workflow has succesfully completed.\n",
"\n",
"You will notice in the above that to the left of the process within the __[ ]__ is actually a __tag__ you can search in Life Sciences and the text before the __/__ corresponds to the __temporary directories__ within your working directory. Feel free to delete the temporary directories once your workflow has succesfully completed."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Conclusion\n",
"Congrats! You are done with Part I. If you want to keep going and learn how to use the Methylseq workflow with real data, then move to Part II. If not, then feel free to clean up your resources. "
]
},
Expand All @@ -468,7 +491,7 @@
"id": "f7bf5cba-995d-4404-94d1-9bc9c4a04482",
"metadata": {},
"source": [
"## 5. Clean up\n",
"## Clean up\n",
"If you want to clean up all resources associated with this tutorial then \n",
"+ delete your bucket with `gsutil rm -r $BUCKET`\n",
"+ delete this VM in either Vertex AI or Compute Engine"
Expand Down
Loading

0 comments on commit fd85e53

Please sign in to comment.