-
Notifications
You must be signed in to change notification settings - Fork 18
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
added the tutorials folder back so that cloud lab emails flows will s…
…till work for the next few months
- Loading branch information
1 parent
9af8b6c
commit 8d2d793
Showing
79 changed files
with
26,859 additions
and
0 deletions.
There are no files selected for viewing
4,910 changes: 4,910 additions & 0 deletions
4,910
tutorials/notebooks/ATACseq/ATACseq_Tutorial1_Preprocessing.ipynb
Large diffs are not rendered by default.
Oops, something went wrong.
4,476 changes: 4,476 additions & 0 deletions
4,476
tutorials/notebooks/ATACseq/ATACseq_Tutorial2_PeakDetection.ipynb
Large diffs are not rendered by default.
Oops, something went wrong.
3,026 changes: 3,026 additions & 0 deletions
3,026
tutorials/notebooks/ATACseq/ATACseq_Tutorial3_Downstream.ipynb
Large diffs are not rendered by default.
Oops, something went wrong.
1,205 changes: 1,205 additions & 0 deletions
1,205
tutorials/notebooks/DL-gwas-gcp-example/1-d10-run-first.ipynb
Large diffs are not rendered by default.
Oops, something went wrong.
1,057 changes: 1,057 additions & 0 deletions
1,057
tutorials/notebooks/DL-gwas-gcp-example/2-mse-run-second-in-jupyter.ipynb
Large diffs are not rendered by default.
Oops, something went wrong.
Large diffs are not rendered by default.
Oops, something went wrong.
Binary file added
BIN
+409 KB
tutorials/notebooks/DL-gwas-gcp-example/assets/00-create-new-notebook1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+253 KB
tutorials/notebooks/DL-gwas-gcp-example/assets/01-create-new-notebook2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+190 KB
tutorials/notebooks/DL-gwas-gcp-example/assets/01-r2-create-new-notebook2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+166 KB
tutorials/notebooks/DL-gwas-gcp-example/assets/01-r3-create-new-notebook2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+202 KB
tutorials/notebooks/DL-gwas-gcp-example/assets/02-create-new-notebook3.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+338 KB
tutorials/notebooks/DL-gwas-gcp-example/assets/04-upload-notebook-and-data.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+75.7 KB
tutorials/notebooks/DL-gwas-gcp-example/assets/06-pipeline-parameters.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+336 KB
tutorials/notebooks/DL-gwas-gcp-example/assets/07-pipeline-parameters-katib.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+20.9 KB
tutorials/notebooks/DL-gwas-gcp-example/assets/09-pipeline-metrics.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+420 KB
tutorials/notebooks/DL-gwas-gcp-example/assets/14-successful-katib-run.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+227 KB
tutorials/notebooks/DL-gwas-gcp-example/assets/click-set-cell-kind.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Oops, something went wrong.
Binary file added
BIN
+169 KB
tutorials/notebooks/DL-gwas-gcp-example/assets/enable-compute-engine.png
Oops, something went wrong.
Binary file added
BIN
+112 KB
tutorials/notebooks/DL-gwas-gcp-example/assets/enable-service-management.png
Oops, something went wrong.
Oops, something went wrong.
Binary file added
BIN
+103 KB
tutorials/notebooks/DL-gwas-gcp-example/assets/old-11-r2-setup-job.png
Oops, something went wrong.
Binary file added
BIN
+272 KB
tutorials/notebooks/DL-gwas-gcp-example/assets/old-x002-final-results-page.png
Oops, something went wrong.
Binary file added
BIN
+94.6 KB
tutorials/notebooks/DL-gwas-gcp-example/assets/run-minikf-startup.png
Oops, something went wrong.
Binary file added
BIN
+153 KB
tutorials/notebooks/DL-gwas-gcp-example/assets/service-management-api.png
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Binary file added
BIN
+388 KB
tutorials/notebooks/DL-gwas-gcp-example/assets/updated-pipeline-params.png
Oops, something went wrong.
Oops, something went wrong.
Binary file added
BIN
+1.26 MB
tutorials/notebooks/DL-gwas-gcp-example/assets/x002-final-results-page.png
Oops, something went wrong.
Oops, something went wrong.
Binary file added
BIN
+150 KB
tutorials/notebooks/DL-gwas-gcp-example/assets/x004-launch-terminal.png
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Binary file added
BIN
+226 KB
tutorials/notebooks/DL-gwas-gcp-example/assets/xx0001-navigate-to-experiment.png
Oops, something went wrong.
Binary file added
BIN
+206 KB
tutorials/notebooks/DL-gwas-gcp-example/assets/xx0003-pick-pipeline-step.png
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Binary file added
BIN
+272 KB
tutorials/notebooks/DL-gwas-gcp-example/nb_assets/x002-final-results-page.png
Oops, something went wrong.
394 changes: 394 additions & 0 deletions
394
tutorials/notebooks/GWASCoatColor/GWAS_coat_color.ipynb
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,394 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"cell_type": "markdown", | ||
"id": "7a244bb3", | ||
"metadata": {}, | ||
"source": [ | ||
"# GWAS in the cloud\n", | ||
"We adapted the NIH CFDE tutorial from [here](https://training.nih-cfde.org/en/latest/Bioinformatic-Analyses/GWAS-in-the-cloud/background/) and fit it to a notebook. We have greatly simplified the instructions, so if you need or want more details, look at the full tutorial to find out more.\n", | ||
"Most of this notebook is bash, but expects that you are using a Python kernel, until step 3, plotting, you will need to switch your kernel to R." | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "8fbf6304", | ||
"metadata": {}, | ||
"source": [ | ||
"## 1. Setup\n", | ||
"### Download the data\n", | ||
"use %%bash to denote a bash block. You can also use '!' to denote a single bash command within a Python notebook" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "8ec900bd", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"%%bash\n", | ||
"mkdir GWAS\n", | ||
"curl -LO https://de.cyverse.org/dl/d/E0A502CC-F806-4857-9C3A-BAEAA0CCC694/pruned_coatColor_maf_geno.vcf.gz\n", | ||
"curl -LO https://de.cyverse.org/dl/d/3B5C1853-C092-488C-8C2F-CE6E8526E96B/coatColor.pheno" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "4d43ae73", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"%%bash\n", | ||
"mv *.gz GWAS\n", | ||
"mv *.pheno GWAS\n", | ||
"ls GWAS" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "28aadbf8", | ||
"metadata": {}, | ||
"source": [ | ||
"### Install dependencies\n", | ||
"Here we install mamba, which is faster than conda, but it can be tricky to add to path in a Sagemaker notebook so we just call the whole path. You could also skip this install and just use conda since that is preinstalled in the kernel." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "b3ba3eef", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"! curl -L -O https://github.com/conda-forge/miniforge/releases/latest/download/Mambaforge-$(uname)-$(uname -m).sh\n", | ||
"! bash Mambaforge-$(uname)-$(uname -m).sh -b -p $HOME/mambaforge" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "f22059df-5a9c-4982-9b2f-bd15ce746bb2", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"#add to your path\n", | ||
"import os\n", | ||
"os.environ[\"PATH\"] += os.pathsep + os.environ[\"HOME\"]+\"/mambaforge/bin\"" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "b219074a", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"! mamba install -y -c bioconda plink vcftools" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "3de2fc4c", | ||
"metadata": {}, | ||
"source": [ | ||
"## 2. Analyze" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "013d960d", | ||
"metadata": {}, | ||
"source": [ | ||
"### Make map and ped files from the vcf file to feed into plink" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "e91c7a01", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"cd GWAS" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "6570875d", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"! vcftools --gzvcf pruned_coatColor_maf_geno.vcf.gz --plink --out coatColor" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "b9a38761", | ||
"metadata": {}, | ||
"source": [ | ||
"### Create a list of minor alleles\n", | ||
"\n", | ||
"For more info on these terms, look at step 2 [here](https://training.nih-cfde.org/en/latest/Bioinformatic-Analyses/GWAS-in-the-cloud/analyze/)." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "6c868a67", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"#unzip vcf\n", | ||
"! vcftools --gzvcf pruned_coatColor_maf_geno.vcf.gz --recode --out pruned_coatColor_maf_geno" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "8e11f991", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"#create list of minor alleles\n", | ||
"! cat pruned_coatColor_maf_geno.recode.vcf | awk 'BEGIN{FS=\"\\t\";OFS=\"\\t\";}/#/{next;}{{if($3==\".\")$3=$1\":\"$2;}print $3,$5;}' > minor_alleles" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "8cff47e3", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"! head minor_alleles" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "56d901c7", | ||
"metadata": {}, | ||
"source": [ | ||
"### Run quality controls" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "dafa14a6", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"#calculate missingness per locus\n", | ||
"! plink --file coatColor --make-pheno coatColor.pheno \"yellow\" --missing --out miss_stat --noweb --dog --reference-allele minor_alleles --allow-no-sex --adjust" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "5cf5f51b", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"#take a look at lmiss, which is the per locus rates of missingness\n", | ||
"! head miss_stat.lmiss" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "915bb263", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"#peek at imiss which is the individual rates of missingness\n", | ||
"! head miss_stat.imiss" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "4c11ca71", | ||
"metadata": {}, | ||
"source": [ | ||
"### Convert to plink binary format" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "3b8f2d7f", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"! plink --file coatColor --allow-no-sex --dog --make-bed --noweb --out coatColor.binary" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "e36f6cd7", | ||
"metadata": {}, | ||
"source": [ | ||
"### Run a simple association step (the GWAS part!)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "f926ef9b", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"! plink --bfile coatColor.binary --make-pheno coatColor.pheno \"yellow\" --assoc --reference-allele minor_alleles --allow-no-sex --adjust --dog --noweb --out coatColor" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "b397d484", | ||
"metadata": {}, | ||
"source": [ | ||
"### Identify statistical cutoffs\n", | ||
"This code finds the equivalent of 0.05 and 0.01 p value in the negative-log-transformed p values file. We will use these cutoffs to draw horizontal lines in the Manhattan plot for visualization of haplotypes that cross the 0.05 and 0.01 statistical threshold (i.e. have a statistically significant association with yellow coat color)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "b94e1e2a", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"%%bash\n", | ||
"unad_cutoff_sug=$(tail -n+2 coatColor.assoc.adjusted | awk '$10>=0.05' | head -n1 | awk '{print $3}')\n", | ||
"unad_cutoff_conf=$(tail -n+2 coatColor.assoc.adjusted | awk '$10>=0.01' | head -n1 | awk '{print $3}')" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "1f52e97c", | ||
"metadata": {}, | ||
"source": [ | ||
"## 3. Plotting\n", | ||
"In this tutorial, plotting is done in R, so at this point you can change your kernel to R in the top right. Wait for it to say 'idle' in the bottom left, then continue. You could also plot using Python native packages and maintain the Python notebook kernel." | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "effb5acd", | ||
"metadata": {}, | ||
"source": [ | ||
"### Install qqman" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "60feed89", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"install.packages('qqman', contriburl=contrib.url('http://cran.r-project.org/'))" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "d3f1fcd2", | ||
"metadata": {}, | ||
"source": [ | ||
"### Run the plotting function" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "a7e8cd2b", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"#make sure you are still CD in GWAS, when you change kernel it may reset to home\n", | ||
"setwd('GWAS')" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "7946a3a7", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"require(qqman)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "0d28ef2c", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"data=read.table(\"coatColor.assoc\", header=TRUE)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "8e5207be", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"data=data[!is.na(data$P),]" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "6330b1e0", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"manhattan(data, p = \"P\", col = c(\"blue4\", \"orange3\"),\n", | ||
" suggestiveline = 12,\n", | ||
" genomewideline = 15,\n", | ||
" chrlabs = c(1:38, \"X\"), annotateTop=TRUE, cex = 1.2)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "26787d84", | ||
"metadata": {}, | ||
"source": [ | ||
"In our graph, haplotypes in four parts of the genome (chromosome 2, 5, 28 and X) are found to be associated with an increased occurrence of the yellow coat color phenotype.\n", | ||
"\n", | ||
"The top associated mutation is a nonsense SNP in the gene MC1R known to control pigment production. The MC1R allele encoding yellow coat color contains a single base change (from C to T) at the 916th nucleotide." | ||
] | ||
} | ||
], | ||
"metadata": { | ||
"environment": { | ||
"kernel": "python3", | ||
"name": "tf2-gpu.2-11.m110", | ||
"type": "gcloud", | ||
"uri": "gcr.io/deeplearning-platform-release/tf2-gpu.2-11:m110" | ||
}, | ||
"kernelspec": { | ||
"display_name": "Python 3", | ||
"language": "python", | ||
"name": "python3" | ||
}, | ||
"language_info": { | ||
"codemirror_mode": { | ||
"name": "ipython", | ||
"version": 3 | ||
}, | ||
"file_extension": ".py", | ||
"mimetype": "text/x-python", | ||
"name": "python", | ||
"nbconvert_exporter": "python", | ||
"pygments_lexer": "ipython3", | ||
"version": "3.10.12" | ||
} | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 5 | ||
} |
Oops, something went wrong.