Skip to content

Commit

Permalink
added the tutorials folder back so that cloud lab emails flows will s…
Browse files Browse the repository at this point in the history
…till work for the next few months
  • Loading branch information
kyleoconnell-NIH committed Mar 21, 2024
1 parent 9af8b6c commit 8d2d793
Show file tree
Hide file tree
Showing 79 changed files with 26,859 additions and 0 deletions.
4,910 changes: 4,910 additions & 0 deletions tutorials/notebooks/ATACseq/ATACseq_Tutorial1_Preprocessing.ipynb

Large diffs are not rendered by default.

4,476 changes: 4,476 additions & 0 deletions tutorials/notebooks/ATACseq/ATACseq_Tutorial2_PeakDetection.ipynb

Large diffs are not rendered by default.

3,026 changes: 3,026 additions & 0 deletions tutorials/notebooks/ATACseq/ATACseq_Tutorial3_Downstream.ipynb

Large diffs are not rendered by default.

1,205 changes: 1,205 additions & 0 deletions tutorials/notebooks/DL-gwas-gcp-example/1-d10-run-first.ipynb

Large diffs are not rendered by default.

1,057 changes: 1,057 additions & 0 deletions tutorials/notebooks/DL-gwas-gcp-example/2-mse-run-second-in-jupyter.ipynb

Large diffs are not rendered by default.

210 changes: 210 additions & 0 deletions tutorials/notebooks/DL-gwas-gcp-example/README.md

Large diffs are not rendered by default.

Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
394 changes: 394 additions & 0 deletions tutorials/notebooks/GWASCoatColor/GWAS_coat_color.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,394 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "7a244bb3",
"metadata": {},
"source": [
"# GWAS in the cloud\n",
"We adapted the NIH CFDE tutorial from [here](https://training.nih-cfde.org/en/latest/Bioinformatic-Analyses/GWAS-in-the-cloud/background/) and fit it to a notebook. We have greatly simplified the instructions, so if you need or want more details, look at the full tutorial to find out more.\n",
"Most of this notebook is bash, but expects that you are using a Python kernel, until step 3, plotting, you will need to switch your kernel to R."
]
},
{
"cell_type": "markdown",
"id": "8fbf6304",
"metadata": {},
"source": [
"## 1. Setup\n",
"### Download the data\n",
"use %%bash to denote a bash block. You can also use '!' to denote a single bash command within a Python notebook"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "8ec900bd",
"metadata": {},
"outputs": [],
"source": [
"%%bash\n",
"mkdir GWAS\n",
"curl -LO https://de.cyverse.org/dl/d/E0A502CC-F806-4857-9C3A-BAEAA0CCC694/pruned_coatColor_maf_geno.vcf.gz\n",
"curl -LO https://de.cyverse.org/dl/d/3B5C1853-C092-488C-8C2F-CE6E8526E96B/coatColor.pheno"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "4d43ae73",
"metadata": {},
"outputs": [],
"source": [
"%%bash\n",
"mv *.gz GWAS\n",
"mv *.pheno GWAS\n",
"ls GWAS"
]
},
{
"cell_type": "markdown",
"id": "28aadbf8",
"metadata": {},
"source": [
"### Install dependencies\n",
"Here we install mamba, which is faster than conda, but it can be tricky to add to path in a Sagemaker notebook so we just call the whole path. You could also skip this install and just use conda since that is preinstalled in the kernel."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b3ba3eef",
"metadata": {},
"outputs": [],
"source": [
"! curl -L -O https://github.com/conda-forge/miniforge/releases/latest/download/Mambaforge-$(uname)-$(uname -m).sh\n",
"! bash Mambaforge-$(uname)-$(uname -m).sh -b -p $HOME/mambaforge"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f22059df-5a9c-4982-9b2f-bd15ce746bb2",
"metadata": {},
"outputs": [],
"source": [
"#add to your path\n",
"import os\n",
"os.environ[\"PATH\"] += os.pathsep + os.environ[\"HOME\"]+\"/mambaforge/bin\""
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b219074a",
"metadata": {},
"outputs": [],
"source": [
"! mamba install -y -c bioconda plink vcftools"
]
},
{
"cell_type": "markdown",
"id": "3de2fc4c",
"metadata": {},
"source": [
"## 2. Analyze"
]
},
{
"cell_type": "markdown",
"id": "013d960d",
"metadata": {},
"source": [
"### Make map and ped files from the vcf file to feed into plink"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e91c7a01",
"metadata": {},
"outputs": [],
"source": [
"cd GWAS"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "6570875d",
"metadata": {},
"outputs": [],
"source": [
"! vcftools --gzvcf pruned_coatColor_maf_geno.vcf.gz --plink --out coatColor"
]
},
{
"cell_type": "markdown",
"id": "b9a38761",
"metadata": {},
"source": [
"### Create a list of minor alleles\n",
"\n",
"For more info on these terms, look at step 2 [here](https://training.nih-cfde.org/en/latest/Bioinformatic-Analyses/GWAS-in-the-cloud/analyze/)."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "6c868a67",
"metadata": {},
"outputs": [],
"source": [
"#unzip vcf\n",
"! vcftools --gzvcf pruned_coatColor_maf_geno.vcf.gz --recode --out pruned_coatColor_maf_geno"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "8e11f991",
"metadata": {},
"outputs": [],
"source": [
"#create list of minor alleles\n",
"! cat pruned_coatColor_maf_geno.recode.vcf | awk 'BEGIN{FS=\"\\t\";OFS=\"\\t\";}/#/{next;}{{if($3==\".\")$3=$1\":\"$2;}print $3,$5;}' > minor_alleles"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "8cff47e3",
"metadata": {},
"outputs": [],
"source": [
"! head minor_alleles"
]
},
{
"cell_type": "markdown",
"id": "56d901c7",
"metadata": {},
"source": [
"### Run quality controls"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "dafa14a6",
"metadata": {},
"outputs": [],
"source": [
"#calculate missingness per locus\n",
"! plink --file coatColor --make-pheno coatColor.pheno \"yellow\" --missing --out miss_stat --noweb --dog --reference-allele minor_alleles --allow-no-sex --adjust"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "5cf5f51b",
"metadata": {},
"outputs": [],
"source": [
"#take a look at lmiss, which is the per locus rates of missingness\n",
"! head miss_stat.lmiss"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "915bb263",
"metadata": {},
"outputs": [],
"source": [
"#peek at imiss which is the individual rates of missingness\n",
"! head miss_stat.imiss"
]
},
{
"cell_type": "markdown",
"id": "4c11ca71",
"metadata": {},
"source": [
"### Convert to plink binary format"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "3b8f2d7f",
"metadata": {},
"outputs": [],
"source": [
"! plink --file coatColor --allow-no-sex --dog --make-bed --noweb --out coatColor.binary"
]
},
{
"cell_type": "markdown",
"id": "e36f6cd7",
"metadata": {},
"source": [
"### Run a simple association step (the GWAS part!)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f926ef9b",
"metadata": {},
"outputs": [],
"source": [
"! plink --bfile coatColor.binary --make-pheno coatColor.pheno \"yellow\" --assoc --reference-allele minor_alleles --allow-no-sex --adjust --dog --noweb --out coatColor"
]
},
{
"cell_type": "markdown",
"id": "b397d484",
"metadata": {},
"source": [
"### Identify statistical cutoffs\n",
"This code finds the equivalent of 0.05 and 0.01 p value in the negative-log-transformed p values file. We will use these cutoffs to draw horizontal lines in the Manhattan plot for visualization of haplotypes that cross the 0.05 and 0.01 statistical threshold (i.e. have a statistically significant association with yellow coat color)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b94e1e2a",
"metadata": {},
"outputs": [],
"source": [
"%%bash\n",
"unad_cutoff_sug=$(tail -n+2 coatColor.assoc.adjusted | awk '$10>=0.05' | head -n1 | awk '{print $3}')\n",
"unad_cutoff_conf=$(tail -n+2 coatColor.assoc.adjusted | awk '$10>=0.01' | head -n1 | awk '{print $3}')"
]
},
{
"cell_type": "markdown",
"id": "1f52e97c",
"metadata": {},
"source": [
"## 3. Plotting\n",
"In this tutorial, plotting is done in R, so at this point you can change your kernel to R in the top right. Wait for it to say 'idle' in the bottom left, then continue. You could also plot using Python native packages and maintain the Python notebook kernel."
]
},
{
"cell_type": "markdown",
"id": "effb5acd",
"metadata": {},
"source": [
"### Install qqman"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "60feed89",
"metadata": {},
"outputs": [],
"source": [
"install.packages('qqman', contriburl=contrib.url('http://cran.r-project.org/'))"
]
},
{
"cell_type": "markdown",
"id": "d3f1fcd2",
"metadata": {},
"source": [
"### Run the plotting function"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a7e8cd2b",
"metadata": {},
"outputs": [],
"source": [
"#make sure you are still CD in GWAS, when you change kernel it may reset to home\n",
"setwd('GWAS')"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "7946a3a7",
"metadata": {},
"outputs": [],
"source": [
"require(qqman)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "0d28ef2c",
"metadata": {},
"outputs": [],
"source": [
"data=read.table(\"coatColor.assoc\", header=TRUE)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "8e5207be",
"metadata": {},
"outputs": [],
"source": [
"data=data[!is.na(data$P),]"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "6330b1e0",
"metadata": {},
"outputs": [],
"source": [
"manhattan(data, p = \"P\", col = c(\"blue4\", \"orange3\"),\n",
" suggestiveline = 12,\n",
" genomewideline = 15,\n",
" chrlabs = c(1:38, \"X\"), annotateTop=TRUE, cex = 1.2)"
]
},
{
"cell_type": "markdown",
"id": "26787d84",
"metadata": {},
"source": [
"In our graph, haplotypes in four parts of the genome (chromosome 2, 5, 28 and X) are found to be associated with an increased occurrence of the yellow coat color phenotype.\n",
"\n",
"The top associated mutation is a nonsense SNP in the gene MC1R known to control pigment production. The MC1R allele encoding yellow coat color contains a single base change (from C to T) at the 916th nucleotide."
]
}
],
"metadata": {
"environment": {
"kernel": "python3",
"name": "tf2-gpu.2-11.m110",
"type": "gcloud",
"uri": "gcr.io/deeplearning-platform-release/tf2-gpu.2-11:m110"
},
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.12"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
Loading

0 comments on commit 8d2d793

Please sign in to comment.