diff --git a/tutorials/notebooks/ATACseq/ATACseq_Tutorial1_Preprocessing.ipynb b/tutorials/notebooks/ATACseq/ATACseq_Tutorial1_Preprocessing.ipynb
new file mode 100644
index 0000000..1d5b2e6
--- /dev/null
+++ b/tutorials/notebooks/ATACseq/ATACseq_Tutorial1_Preprocessing.ipynb
@@ -0,0 +1,4910 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "id": "f7ded04b-10a3-4eee-b7a2-9e1f8528c239",
+ "metadata": {
+ "tags": []
+ },
+ "source": [
+ "# ATAC-seq Module1: Preprocessing and Quality Control\n",
+ "\n",
+ " \n",
+ "\n",
+ "## Overview & Purpose\n",
+ "This short tutorial demonstrates the intial processing steps for ATAC-seq analysis. In this module we focus on generating quality reports of the fastq files, adapter trimming, mapping, and removal of PCR duplicates.\n",
+ "\n",
+ "In this tutorial we will process a randomly chosen published dataset. This is available from GEO: [GSE67382](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE67382)\n",
+ "Bao X, Rubin AJ, Qu K, Zhang J et al. A novel ATAC-seq approach reveals lineage-specific reinforcement of the open chromatin landscape via cooperation between BAF and p63. Genome Biol 2015 Dec 18;16:284. PMID: 26683334\n",
+ "\n",
+ "This dataset is paired-end 50 bp sequencing. We will analyze two samples representing NHEK cells with BAF depletion compared to a control. Note that to allow faster processing we have limited the reads to that of chromosome 4. \n",
+ " \n",
+ "### Required Files\n",
+ "In this stage of the module you will use the fastq files that have been prepared. In step1 we will copy these files over to your instance. You can also use this module on your own data or any published ATAC-seq dataset. \n",
+ "\n",
+ "
\n",
+ "STEP1: Setup Environment\n",
+ "
\n",
+ "\n",
+ "Initial items to configure your google cloud environment. In this step we will use conda to install the following packages:\n",
+ "\n",
+ "Quality Reporting:\n",
+ "[fastqc](https://anaconda.org/bioconda/fastqc), [multiqc](https://anaconda.org/bioconda/multiqc)\n",
+ "\n",
+ "Read Trimming: \n",
+ "[cutadapt](https://anaconda.org/bioconda/cutadapt), [trim-galore](https://anaconda.org/bioconda/trim-galore)\n",
+ "\n",
+ "Mapping:\n",
+ "[bowtie2](https://anaconda.org/bioconda/bowtie2)\n",
+ "\n",
+ "Deduplication:\n",
+ "[samtools](https://anaconda.org/bioconda/samtools), [picard](https://anaconda.org/bioconda/picard)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 25,
+ "id": "f2074873-7df9-46e7-9922-f1b1bb26f7c4",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Installed kernelspec ATACtraining in /home/jupyter/.local/share/jupyter/kernels/atactraining\n",
+ "Warning: 'bioconda' already in 'channels' list, moving to the top\n",
+ "Collecting package metadata (current_repodata.json): done\n",
+ "Solving environment: done\n",
+ "\n",
+ "# All requested packages already installed.\n",
+ "\n",
+ "Retrieving notices: ...working... done\n",
+ "Requirement already satisfied: jupyterquiz in /opt/conda/lib/python3.7/site-packages (2.0.1)\n"
+ ]
+ }
+ ],
+ "source": [
+ "!python -m ipykernel install --user --name ATACtraining\n",
+ "numthreads=!lscpu | grep '^CPU(s)'| awk '{print $2-1}'\n",
+ "numthreadsint = int(numthreads[0])\n",
+ "!conda config --prepend channels bioconda\n",
+ "#!python -m pip install --user --upgrade cutadapt \n",
+ "!conda install -y -c bioconda fastqc bowtie2 picard multiqc samtools trim-galore cutadapt\n",
+ "!pip install jupyterquiz\n",
+ "from jupyterquiz import display_quiz\n",
+ "from IPython.display import IFrame\n",
+ "from IPython.display import display\n",
+ "import pandas as pd"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "54f8e28a-543c-4eb9-8c6a-6107ce0613a8",
+ "metadata": {},
+ "source": [
+ "## Setup FileSystem\n",
+ "Now lets create some folders to stay organized and copy over our prepared fastq files. We're going to create a directory called \"Tutorial1\" which we'll use for this module. We'll then create subfolders for our InputFiles and for the files that we'll be creating during this module. We'll also copy over the fasta file for chromosome 4 as well as some bowtie2 index files (don't worry we'll teach you how to create these index files)."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 26,
+ "id": "8a730ba8-82e0-4ab2-a0ef-118a6910cc92",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "/home/jupyter\n",
+ "Copying gs://unmc_atac_data_examples/Tutorial1/images/adapterQuiz.json...\n",
+ "Copying gs://unmc_atac_data_examples/Tutorial1/images/ATACseqWorkflowLesson1.jpg...\n",
+ "Copying gs://unmc_atac_data_examples/Tutorial1/images/Fig1Published.jpg... \n",
+ "Copying gs://unmc_atac_data_examples/Tutorial1/images/PeaksExample.jpg... \n",
+ "Copying gs://unmc_atac_data_examples/Tutorial1/images/adapterinsert.jpg... \n",
+ "Copying gs://unmc_atac_data_examples/Tutorial1/images/alignmentQuiz.json... \n",
+ "Copying gs://unmc_atac_data_examples/Tutorial1/images/duplicateQuiz.json... \n",
+ "Copying gs://unmc_atac_data_examples/Tutorial1/images/mappingquality.json... \n",
+ "Copying gs://unmc_atac_data_examples/Tutorial1/images/fastqformat.jpg... \n",
+ "Copying gs://unmc_atac_data_examples/Tutorial1/InputFiles/CTL_R1.fastq.gz... \n",
+ "Copying gs://unmc_atac_data_examples/Tutorial1/InputFiles/CTL_R2.fastq.gz...\n",
+ "Copying gs://unmc_atac_data_examples/Tutorial1/InputFiles/Mutant_R1.fastq.gz... \n",
+ "Copying gs://unmc_atac_data_examples/Tutorial1/InputFiles/Mutant_R2.fastq.gz... \n",
+ "- [4/4 files][124.4 MiB/124.4 MiB] 100% Done \n",
+ "Operation completed over 4 objects/124.4 MiB. \n",
+ "Copying gs://unmc_atac_data_examples/Tutorial1/InputFiles/hg38chr4.1.bt2...\n",
+ "Copying gs://unmc_atac_data_examples/Tutorial1/InputFiles/hg38chr4.2.bt2...\n",
+ "Copying gs://unmc_atac_data_examples/Tutorial1/InputFiles/hg38chr4.4.bt2... \n",
+ "Copying gs://unmc_atac_data_examples/Tutorial1/InputFiles/hg38chr4.fa...\n",
+ "Copying gs://unmc_atac_data_examples/Tutorial1/InputFiles/hg38chr4.3.bt2... \n",
+ "Copying gs://unmc_atac_data_examples/Tutorial1/InputFiles/hg38chr4.rev.2.bt2... \n",
+ "Copying gs://unmc_atac_data_examples/Tutorial1/InputFiles/hg38chr4.rev.1.bt2... \n",
+ "/ [7/7 files][449.4 MiB/449.4 MiB] 100% Done \n",
+ "Operation completed over 7 objects/449.4 MiB. \n"
+ ]
+ }
+ ],
+ "source": [
+ "#These commands create our directory structure.\n",
+ "!cd $HOMEDIR\n",
+ "!mkdir -p Tutorial1\n",
+ "!mkdir -p Tutorial1/InputFiles\n",
+ "!mkdir -p Tutorial1/QC\n",
+ "!mkdir -p Tutorial1/Trimmed\n",
+ "!mkdir -p Tutorial1/Mapped\n",
+ "!mkdir -p Tutorial1/RefGenome\n",
+ "!mkdir -p Tutorial1/LessonImages\n",
+ "!cd ./Tutorial1\n",
+ "!echo $PWD\n",
+ "\n",
+ "#These commands help identify the google cloud storage bucket where the example files are held.\n",
+ "project_id = \"nosi-unmc-seq\"\n",
+ "original_bucket = \"gs://unmc_atac_data_examples/Tutorial1\"\n",
+ "!gsutil -m cp $original_bucket/images/* Tutorial1/LessonImages\n",
+ "#This command copies our example files to the Tutorial1/Inputfiles folder that we created above.\n",
+ "! gsutil -m cp $original_bucket/InputFiles/*fastq.gz Tutorial1/InputFiles\n",
+ "! gsutil -m cp $original_bucket/InputFiles/hg38* Tutorial1/RefGenome\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "51fc4b08-4f02-4f49-9a8b-48b02a122455",
+ "metadata": {},
+ "source": [
+ "\n",
+ "### OK\n",
+ "Let's make sure that the files copied correctly. You should see 4 files after running the following command:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 27,
+ "id": "bce79afb-044f-45da-b7e4-e9a5f93dd1c2",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "CTL_R1.fastq.gz Mutant_R1.fastq.gz Untitled.ipynb\n",
+ "CTL_R2.fastq.gz Mutant_R2.fastq.gz\n"
+ ]
+ }
+ ],
+ "source": [
+ "!ls Tutorial1/InputFiles\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "cf7aea89-8610-4430-a802-6f7d387deb8b",
+ "metadata": {},
+ "source": [
+ "\n",
+ "STEP2: QC\n",
+ "
\n",
+ "\n",
+ "Sequences are typically provided as files in fastq format. This format includes 4 lines per sequence.\n",
+ "\n",
+ " "
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "a39c3c75-9011-4a9e-a60e-bb8c503405bb",
+ "metadata": {},
+ "source": [
+ "Let's take a look at the sequence quality of the raw reads usinq fastqc:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 28,
+ "id": "f78e9802-ddd1-4abb-9bbf-3cc10e5d5d90",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " Sample \n",
+ " Filename \n",
+ " File type \n",
+ " Encoding \n",
+ " Total Sequences \n",
+ " Sequences flagged as poor quality \n",
+ " Sequence length \n",
+ " %GC \n",
+ " total_deduplicated_percentage \n",
+ " avg_sequence_length \n",
+ " basic_statistics \n",
+ " per_base_sequence_quality \n",
+ " per_sequence_quality_scores \n",
+ " per_base_sequence_content \n",
+ " per_sequence_gc_content \n",
+ " per_base_n_content \n",
+ " sequence_length_distribution \n",
+ " sequence_duplication_levels \n",
+ " overrepresented_sequences \n",
+ " adapter_content \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " 0 \n",
+ " CTL_R1 \n",
+ " CTL_R1.fastq.gz \n",
+ " Conventional base calls \n",
+ " Sanger / Illumina 1.9 \n",
+ " 721311.0 \n",
+ " 0.0 \n",
+ " 50.0 \n",
+ " 43.0 \n",
+ " 34.116497 \n",
+ " 50.0 \n",
+ " pass \n",
+ " pass \n",
+ " pass \n",
+ " fail \n",
+ " pass \n",
+ " pass \n",
+ " pass \n",
+ " fail \n",
+ " warn \n",
+ " pass \n",
+ " \n",
+ " \n",
+ " 1 \n",
+ " CTL_R2 \n",
+ " CTL_R2.fastq.gz \n",
+ " Conventional base calls \n",
+ " Sanger / Illumina 1.9 \n",
+ " 721311.0 \n",
+ " 0.0 \n",
+ " 50.0 \n",
+ " 42.0 \n",
+ " 33.459856 \n",
+ " 50.0 \n",
+ " pass \n",
+ " pass \n",
+ " pass \n",
+ " fail \n",
+ " warn \n",
+ " pass \n",
+ " pass \n",
+ " fail \n",
+ " fail \n",
+ " pass \n",
+ " \n",
+ " \n",
+ " 2 \n",
+ " Mutant_R1 \n",
+ " Mutant_R1.fastq.gz \n",
+ " Conventional base calls \n",
+ " Sanger / Illumina 1.9 \n",
+ " 848511.0 \n",
+ " 0.0 \n",
+ " 50.0 \n",
+ " 42.0 \n",
+ " 41.516085 \n",
+ " 50.0 \n",
+ " pass \n",
+ " pass \n",
+ " pass \n",
+ " fail \n",
+ " pass \n",
+ " pass \n",
+ " pass \n",
+ " fail \n",
+ " warn \n",
+ " pass \n",
+ " \n",
+ " \n",
+ " 3 \n",
+ " Mutant_R2 \n",
+ " Mutant_R2.fastq.gz \n",
+ " Conventional base calls \n",
+ " Sanger / Illumina 1.9 \n",
+ " 848511.0 \n",
+ " 0.0 \n",
+ " 50.0 \n",
+ " 42.0 \n",
+ " 37.731878 \n",
+ " 50.0 \n",
+ " pass \n",
+ " pass \n",
+ " pass \n",
+ " fail \n",
+ " warn \n",
+ " pass \n",
+ " pass \n",
+ " fail \n",
+ " fail \n",
+ " pass \n",
+ " \n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " Sample Filename File type \\\n",
+ "0 CTL_R1 CTL_R1.fastq.gz Conventional base calls \n",
+ "1 CTL_R2 CTL_R2.fastq.gz Conventional base calls \n",
+ "2 Mutant_R1 Mutant_R1.fastq.gz Conventional base calls \n",
+ "3 Mutant_R2 Mutant_R2.fastq.gz Conventional base calls \n",
+ "\n",
+ " Encoding Total Sequences Sequences flagged as poor quality \\\n",
+ "0 Sanger / Illumina 1.9 721311.0 0.0 \n",
+ "1 Sanger / Illumina 1.9 721311.0 0.0 \n",
+ "2 Sanger / Illumina 1.9 848511.0 0.0 \n",
+ "3 Sanger / Illumina 1.9 848511.0 0.0 \n",
+ "\n",
+ " Sequence length %GC total_deduplicated_percentage avg_sequence_length \\\n",
+ "0 50.0 43.0 34.116497 50.0 \n",
+ "1 50.0 42.0 33.459856 50.0 \n",
+ "2 50.0 42.0 41.516085 50.0 \n",
+ "3 50.0 42.0 37.731878 50.0 \n",
+ "\n",
+ " basic_statistics per_base_sequence_quality per_sequence_quality_scores \\\n",
+ "0 pass pass pass \n",
+ "1 pass pass pass \n",
+ "2 pass pass pass \n",
+ "3 pass pass pass \n",
+ "\n",
+ " per_base_sequence_content per_sequence_gc_content per_base_n_content \\\n",
+ "0 fail pass pass \n",
+ "1 fail warn pass \n",
+ "2 fail pass pass \n",
+ "3 fail warn pass \n",
+ "\n",
+ " sequence_length_distribution sequence_duplication_levels \\\n",
+ "0 pass fail \n",
+ "1 pass fail \n",
+ "2 pass fail \n",
+ "3 pass fail \n",
+ "\n",
+ " overrepresented_sequences adapter_content \n",
+ "0 warn pass \n",
+ "1 fail pass \n",
+ "2 warn pass \n",
+ "3 fail pass "
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "#This command runs fastqc on each fastq.gz file inside our InputFiles directory and stores the ouput reports in our QC directory.\n",
+ "!fastqc -t $numthreadsint -q -o Tutorial1/QC Tutorial1/InputFiles/*fastq.gz\n",
+ "\n",
+ "#We then use multiqc to summarize the report.\n",
+ "!multiqc -o Tutorial1/QC -f Tutorial1/QC 2> Tutorial1/QC/multiqc_log.txt\n",
+ "\n",
+ "#We'll load this into a pandas table to work in this context, but fastqc also produces an html report that you can browse.\n",
+ "dframe = pd.read_csv(\"Tutorial1/QC/multiqc_data/multiqc_fastqc.txt\", sep='\\t')\n",
+ "display(dframe)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "744e8b64-947b-43f4-9bb0-51bf42972bc3",
+ "metadata": {},
+ "source": [
+ "Alternatively, we can view the fastqc html files:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 29,
+ "id": "f2df8f13-211b-4584-8600-c5ab1edf8138",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ " \n",
+ " "
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "execution_count": 29,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "#We can display the resulting fastqc results.\n",
+ "IFrame(src='Tutorial1/QC/CTL_R1_fastqc.html', width=2000, height=1500)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "579a6da5-11b8-45c9-a9d2-3f63efb0fed4",
+ "metadata": {},
+ "source": [
+ "Look at the the \"Per base seqeuence content\" in the above FastQC report. We'll trim the reads to remove some of this effect. For now, think about possible explanations for this result.\n",
+ "\n",
+ "Also look at the \"Sequence Duplication Levels\". Sometimes duplicates appear due to the PCR amplification step of library preparation. We'll remove duplicates in a later step. \n",
+ "\n",
+ "Lastly, look at the report at the \"Overrepresented sequences\". What are some possible explanations for this result?"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "b228a0d9-0e88-490f-a53d-58ce20d40b64",
+ "metadata": {},
+ "source": [
+ "\n",
+ "Trimming\n",
+ "
\n",
+ "Next let's trim our sequences.\n",
+ "\n",
+ "Why is it particularly important to trim the reads in ATAC-seq? To understand let's review how ATAC-seq works. Tn5 inserts adapter sequences into accessible regions. \n",
+ "\n",
+ " \n",
+ "\n",
+ "Image source: [Grandi et al., Nature Protocols 2022](https://www.nature.com/articles/s41596-022-00692-9)\n",
+ "\n",
+ "\n",
+ "What would happen if the distance between inserted sites is short? For example our sequencing lenghth in the example dataset is 50 bp, so what would the sequence look like if our fragment (insert size) is only 30 bp long? "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 30,
+ "id": "6ca0ae04-b409-4c5d-b318-0887edc0f3c8",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "\n"
+ ]
+ },
+ {
+ "data": {
+ "text/html": [
+ " \n",
+ " "
+ ],
+ "text/plain": [
+ "
"
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "display_quiz(\"Tutorial1/LessonImages/adapterQuiz.json\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "7e910ae5-83af-476d-85a6-942242dece8a",
+ "metadata": {},
+ "source": [
+ "Let's use trim galore to prepare the sequences before mapping."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 35,
+ "id": "7d0a8fb1-c8bc-4a77-b7d0-d545cb2d7aab",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "done\n"
+ ]
+ }
+ ],
+ "source": [
+ "#This will trim off N's as well as nextera adapters present in ATAC-seq library preparation, placing the output in our Trimmed folder.\n",
+ "!trim_galore -j $numthreadsint -o Tutorial1/Trimmed --paired --nextera --trim-n --fastqc --suppress_warn Tutorial1/InputFiles/CTL_R1.fastq.gz Tutorial1/InputFiles/CTL_R2.fastq.gz > Tutorial1/Trimmed/trimgalore_errors.txt 2> Tutorial1/Trimmed/trimgalore_log2.txt\n",
+ "print(\"done\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "adcdef0f-1461-4282-9659-b86c71ea3f77",
+ "metadata": {},
+ "source": [
+ "Let's do this for the other sample as well."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 37,
+ "id": "b6275508-1f8c-42c9-b36e-3c627965478a",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "done\n"
+ ]
+ }
+ ],
+ "source": [
+ "#Trim the other sample\n",
+ "!trim_galore -j $numthreadsint -o Tutorial1/Trimmed --paired --nextera --trim-n --fastqc --suppress_warn Tutorial1/InputFiles/Mutant_R1.fastq.gz Tutorial1/InputFiles/Mutant_R2.fastq.gz > Tutorial1/Trimmed/trimgalore_errors.txt 2> Tutorial1/Trimmed/trimgalore_log2.txt\n",
+ "print(\"done\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "4e589ef3-ad38-4b18-86fa-e8afbc2b1001",
+ "metadata": {},
+ "source": [
+ "Now let's summarize the trimming results."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 38,
+ "id": "cedc5d1f-21f7-4bf7-8c8d-0f4209288b8e",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " Sample \n",
+ " Cutadapt_mqc-generalstats-cutadapt-percent_trimmed \n",
+ " FastQC_mqc-generalstats-fastqc-percent_duplicates \n",
+ " FastQC_mqc-generalstats-fastqc-percent_gc \n",
+ " FastQC_mqc-generalstats-fastqc-avg_sequence_length \n",
+ " FastQC_mqc-generalstats-fastqc-percent_fails \n",
+ " FastQC_mqc-generalstats-fastqc-total_sequences \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " 0 \n",
+ " CTL_R1 \n",
+ " 1.495896 \n",
+ " NaN \n",
+ " NaN \n",
+ " NaN \n",
+ " NaN \n",
+ " NaN \n",
+ " \n",
+ " \n",
+ " 1 \n",
+ " CTL_R1_val_1 \n",
+ " NaN \n",
+ " 65.541052 \n",
+ " 43.0 \n",
+ " 49.263820 \n",
+ " 20.0 \n",
+ " 717388.0 \n",
+ " \n",
+ " \n",
+ " 2 \n",
+ " CTL_R2 \n",
+ " 2.049704 \n",
+ " NaN \n",
+ " NaN \n",
+ " NaN \n",
+ " NaN \n",
+ " NaN \n",
+ " \n",
+ " \n",
+ " 3 \n",
+ " CTL_R2_val_2 \n",
+ " NaN \n",
+ " 66.321891 \n",
+ " 42.0 \n",
+ " 49.200137 \n",
+ " 30.0 \n",
+ " 717388.0 \n",
+ " \n",
+ " \n",
+ " 4 \n",
+ " Mutant_R1 \n",
+ " 1.641975 \n",
+ " NaN \n",
+ " NaN \n",
+ " NaN \n",
+ " NaN \n",
+ " NaN \n",
+ " \n",
+ " \n",
+ " 5 \n",
+ " Mutant_R1_val_1 \n",
+ " NaN \n",
+ " 58.276934 \n",
+ " 42.0 \n",
+ " 49.228323 \n",
+ " 20.0 \n",
+ " 841811.0 \n",
+ " \n",
+ " \n",
+ " 6 \n",
+ " Mutant_R2 \n",
+ " 2.211361 \n",
+ " NaN \n",
+ " NaN \n",
+ " NaN \n",
+ " NaN \n",
+ " NaN \n",
+ " \n",
+ " \n",
+ " 7 \n",
+ " Mutant_R2_val_2 \n",
+ " NaN \n",
+ " 62.044659 \n",
+ " 41.0 \n",
+ " 49.193254 \n",
+ " 30.0 \n",
+ " 841811.0 \n",
+ " \n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " Sample Cutadapt_mqc-generalstats-cutadapt-percent_trimmed \\\n",
+ "0 CTL_R1 1.495896 \n",
+ "1 CTL_R1_val_1 NaN \n",
+ "2 CTL_R2 2.049704 \n",
+ "3 CTL_R2_val_2 NaN \n",
+ "4 Mutant_R1 1.641975 \n",
+ "5 Mutant_R1_val_1 NaN \n",
+ "6 Mutant_R2 2.211361 \n",
+ "7 Mutant_R2_val_2 NaN \n",
+ "\n",
+ " FastQC_mqc-generalstats-fastqc-percent_duplicates \\\n",
+ "0 NaN \n",
+ "1 65.541052 \n",
+ "2 NaN \n",
+ "3 66.321891 \n",
+ "4 NaN \n",
+ "5 58.276934 \n",
+ "6 NaN \n",
+ "7 62.044659 \n",
+ "\n",
+ " FastQC_mqc-generalstats-fastqc-percent_gc \\\n",
+ "0 NaN \n",
+ "1 43.0 \n",
+ "2 NaN \n",
+ "3 42.0 \n",
+ "4 NaN \n",
+ "5 42.0 \n",
+ "6 NaN \n",
+ "7 41.0 \n",
+ "\n",
+ " FastQC_mqc-generalstats-fastqc-avg_sequence_length \\\n",
+ "0 NaN \n",
+ "1 49.263820 \n",
+ "2 NaN \n",
+ "3 49.200137 \n",
+ "4 NaN \n",
+ "5 49.228323 \n",
+ "6 NaN \n",
+ "7 49.193254 \n",
+ "\n",
+ " FastQC_mqc-generalstats-fastqc-percent_fails \\\n",
+ "0 NaN \n",
+ "1 20.0 \n",
+ "2 NaN \n",
+ "3 30.0 \n",
+ "4 NaN \n",
+ "5 20.0 \n",
+ "6 NaN \n",
+ "7 30.0 \n",
+ "\n",
+ " FastQC_mqc-generalstats-fastqc-total_sequences \n",
+ "0 NaN \n",
+ "1 717388.0 \n",
+ "2 NaN \n",
+ "3 717388.0 \n",
+ "4 NaN \n",
+ "5 841811.0 \n",
+ "6 NaN \n",
+ "7 841811.0 "
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "!multiqc -o Tutorial1/QC -f Tutorial1/Trimmed 2> Tutorial1/QC/multiqc_log.txt\n",
+ "\n",
+ "dframe = pd.read_csv(\"Tutorial1/QC/multiqc_data/multiqc_general_stats.txt\", sep='\\t')\n",
+ "display(dframe)\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "b9cc8fc5-fa07-4345-8f87-caeb5fccd239",
+ "metadata": {},
+ "source": [
+ "\n",
+ "Step3: Mapping\n",
+ "
\n",
+ "Our fastq files include sequences and quality scores for each base, but we want to figure out which genomic location these sequences came from. To do this we will map each sequence to a reference genome using bowtie2. \n",
+ " \n",
+ "\n",
+ "Mapping reads requires a reference genome. Due to time and memory considerations, in this tutorial we prepared that file for you and will only map to chr4. However, in a full analysis, we would map to the entire genome. To do so you would need a fasta file corresponding to the reference genome (e.g. [hg38.fa](https://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/)) from which you'd create an index of the genome using bowtie2-build. This can be done with the command: \n",
+ "\n",
+ "bowtie2-build reference_genome_file.fa outputprefix.\n",
+ "\n",
+ "As mentioned, we've gone ahead and created the index for you, and, earlier, you copied them into the RefGenome directory. These index files end in the bt2 extension. \n",
+ "\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 39,
+ "id": "1c5d8404-a5f5-49e7-9625-7deacdbcefa1",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Tutorial1/RefGenome/hg38chr4.1.bt2 Tutorial1/RefGenome/hg38chr4.4.bt2\n",
+ "Tutorial1/RefGenome/hg38chr4.2.bt2 Tutorial1/RefGenome/hg38chr4.rev.1.bt2\n",
+ "Tutorial1/RefGenome/hg38chr4.3.bt2 Tutorial1/RefGenome/hg38chr4.rev.2.bt2\n"
+ ]
+ }
+ ],
+ "source": [
+ "!ls Tutorial1/RefGenome/*bt2"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "afa82cbc-a69d-430f-ac83-e94ca498cdf3",
+ "metadata": {},
+ "source": [
+ "These index files were created from our fasta file:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 40,
+ "id": "c4d3279a-de20-4c18-9c0f-39c884b3d0c3",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Tutorial1/RefGenome/hg38chr4.fa\n"
+ ]
+ }
+ ],
+ "source": [
+ "!ls Tutorial1/RefGenome/*fa"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "1d77a351-d805-441b-b524-4f6d1d3358d9",
+ "metadata": {},
+ "source": [
+ "Notice that the single fasta file created mutiple index files. When we align we'll specify the prefix of the index files."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 41,
+ "id": "1c56662b-6cf5-4319-8b8c-f404d928ea67",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "717388 reads; of these:\n",
+ " 717388 (100.00%) were paired; of these:\n",
+ " 245685 (34.25%) aligned concordantly 0 times\n",
+ " 395268 (55.10%) aligned concordantly exactly 1 time\n",
+ " 76435 (10.65%) aligned concordantly >1 times\n",
+ " ----\n",
+ " 245685 pairs aligned concordantly 0 times; of these:\n",
+ " 32486 (13.22%) aligned discordantly 1 time\n",
+ " ----\n",
+ " 213199 pairs aligned 0 times concordantly or discordantly; of these:\n",
+ " 426398 mates make up the pairs; of these:\n",
+ " 391996 (91.93%) aligned 0 times\n",
+ " 19915 (4.67%) aligned exactly 1 time\n",
+ " 14487 (3.40%) aligned >1 times\n",
+ "72.68% overall alignment rate\n"
+ ]
+ }
+ ],
+ "source": [
+ "#Notes: The -x option specifies the prefix of the index. -1 specifies our left-end trimmed reads file. -2 specifies our right-end trimmed reads file. -S specifies our output file in sam format.\n",
+ "!bowtie2 -p $numthreadsint -x Tutorial1/RefGenome/hg38chr4 -1 Tutorial1/Trimmed/CTL_R1_val_1.fq.gz -2 Tutorial1/Trimmed/CTL_R2_val_2.fq.gz -S Tutorial1/Mapped/CTL.sam\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 42,
+ "id": "612a3af0-d5b7-4c65-8b6d-be91d30252aa",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "841811 reads; of these:\n",
+ " 841811 (100.00%) were paired; of these:\n",
+ " 198542 (23.59%) aligned concordantly 0 times\n",
+ " 524810 (62.34%) aligned concordantly exactly 1 time\n",
+ " 118459 (14.07%) aligned concordantly >1 times\n",
+ " ----\n",
+ " 198542 pairs aligned concordantly 0 times; of these:\n",
+ " 37182 (18.73%) aligned discordantly 1 time\n",
+ " ----\n",
+ " 161360 pairs aligned 0 times concordantly or discordantly; of these:\n",
+ " 322720 mates make up the pairs; of these:\n",
+ " 284663 (88.21%) aligned 0 times\n",
+ " 17772 (5.51%) aligned exactly 1 time\n",
+ " 20285 (6.29%) aligned >1 times\n",
+ "83.09% overall alignment rate\n"
+ ]
+ }
+ ],
+ "source": [
+ "##Let's do the same thing for our other sample.\n",
+ "!bowtie2 -p $numthreadsint -x Tutorial1/RefGenome/hg38chr4 -1 Tutorial1/Trimmed/Mutant_R1_val_1.fq.gz -2 Tutorial1/Trimmed/Mutant_R2_val_2.fq.gz -S Tutorial1/Mapped/Mutant.sam"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "fc919a58-4757-482d-b5c3-9a52716feecb",
+ "metadata": {},
+ "source": [
+ "### Answer the following question only if you are using the example dataset we provided. This question is simply a check to ensure everything was processed correctly."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 43,
+ "id": "990dbd77-7dbd-461c-bf40-b29621089b76",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "\n"
+ ]
+ },
+ {
+ "data": {
+ "text/html": [
+ " \n",
+ " "
+ ],
+ "text/plain": [
+ "
"
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "display_quiz(\"Tutorial1/LessonImages/alignmentQuiz.json\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "d96b5ade-acd6-4ee7-9352-f654cd78ad85",
+ "metadata": {},
+ "source": [
+ "Bowtie2 output a file in [sam format](https://samtools.github.io/hts-specs/SAMv1.pdf) which contains the original sequence, quality scores, and the genomic coordinates matching each read. \n",
+ "\n",
+ "In the next commands we'll convert the file to the more compressed [bam format](https://genome.ucsc.edu/goldenPath/help/bam.html) and sort the reads by chromosomal coordinates."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 44,
+ "id": "6dcaff27-b46c-4d50-bdf6-301f914e6671",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "done\n"
+ ]
+ }
+ ],
+ "source": [
+ "#This will convert to bam by using samtools view with the -b option. The h and S option tells samtools that the file has a header and is in sam format. We will pipe this to samtools sort. Pay attention to the \"-\" at the end of the sort command which tells samtools to use stdin.\n",
+ "!samtools view -q 10 -bhS Tutorial1/Mapped/CTL.sam | samtools sort -o Tutorial1/Mapped/CTL.bam - \n",
+ "print(\"done\")"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 45,
+ "id": "e82b8ef3-3b40-4283-8955-326120668070",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "done\n"
+ ]
+ }
+ ],
+ "source": [
+ "#Let's do the same thing for our Mutant sample.\n",
+ "!samtools view -q 10 -bhS Tutorial1/Mapped/Mutant.sam | samtools sort -o Tutorial1/Mapped/Mutant.bam - \n",
+ "print(\"done\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "52f5444e-b437-40b8-b222-6304baf38343",
+ "metadata": {},
+ "source": [
+ "You may have noticed the parameters -bhS and -q 10 in the above commands. Briefly, -bhS describes aspects of the file to samtools, such that you want to output a bam file (the b option), that it has a header (the h option), and that it is currently in sam format (the S option). We also specified -q 10 which removes reads with a mapping score <= 10. "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 46,
+ "id": "c7eeca9f-d6d0-43f5-b4f4-7a2cfe44ce5e",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "\n"
+ ]
+ },
+ {
+ "data": {
+ "text/html": [
+ " \n",
+ " "
+ ],
+ "text/plain": [
+ "
"
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "display_quiz(\"Tutorial1/LessonImages/mappingquality.json\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "947d41c9-3436-47f8-854d-bcf1625919b3",
+ "metadata": {},
+ "source": [
+ "\n",
+ "Step4: Removal of Duplicates\n",
+ "
\n",
+ "It's important to remove duplicates from our reads because part of the ATAC-seq method includes a PCR step for library amplification. This can create biases in the data resulting from PCR duplicates. To understand how PCR duplicates can affect the analysis, let's jump ahead a bit. Accessibile sites are represented by ATAC-seq \"peaks\" of signal.\n",
+ "\n",
+ " \n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 47,
+ "id": "38362bd5-120e-4cfe-943c-e1d7c55a0e82",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "\n"
+ ]
+ },
+ {
+ "data": {
+ "text/html": [
+ " \n",
+ " "
+ ],
+ "text/plain": [
+ "
"
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "display_quiz(\"Tutorial1/LessonImages/duplicateQuiz.json\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "e6002033-c000-4b2b-9e23-de065180f2f6",
+ "metadata": {},
+ "source": [
+ "Okay, let's remove these duplicates using Picard."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 48,
+ "id": "783c64d4-e9dc-4a15-bf85-7b1a341748b8",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "done\n"
+ ]
+ }
+ ],
+ "source": [
+ "#this will take the sorted bam file and remove duplicates, saving a new bam file and a summary in a text file.\n",
+ "!picard MarkDuplicates --REMOVE_DUPLICATES TRUE -I Tutorial1/Mapped/CTL.bam -O Tutorial1/Mapped/CTL_dedup.bam --METRICS_FILE Tutorial1/Mapped/CTL_dedup_metrics.txt --QUIET 2> Tutorial1/Mapped/PicardLog.txt\n",
+ "print(\"done\")"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 49,
+ "id": "0631658b-bd97-4d1c-9e56-2152c6910440",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "done\n"
+ ]
+ }
+ ],
+ "source": [
+ "#We also should do this for the other sample.\n",
+ "!picard MarkDuplicates --REMOVE_DUPLICATES TRUE -I Tutorial1/Mapped/Mutant.bam -O Tutorial1/Mapped/Mutant_dedup.bam --METRICS_FILE Tutorial1/Mapped/Mutant_dedup_metrics.txt --QUIET 2> Tutorial1/Mapped/PicardLog2.txt\n",
+ "print(\"done\")"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 50,
+ "id": "2308d570-94ca-4116-9492-493a6a87db79",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " Sample \n",
+ " Picard_mqc-generalstats-picard-PERCENT_DUPLICATION \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " 0 \n",
+ " CTL \n",
+ " 0.569867 \n",
+ " \n",
+ " \n",
+ " 1 \n",
+ " Mutant \n",
+ " 0.591115 \n",
+ " \n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " Sample Picard_mqc-generalstats-picard-PERCENT_DUPLICATION\n",
+ "0 CTL 0.569867 \n",
+ "1 Mutant 0.591115 "
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "#We can use multiqc to summarize the metrics\n",
+ "!multiqc -o Tutorial1/QC -f Tutorial1/Mapped 2> Tutorial1/Mapped/multiqc_log.txt\n",
+ "dframe = pd.read_csv(\"Tutorial1/QC/multiqc_data/multiqc_general_stats.txt\", sep='\\t')\n",
+ "display(dframe)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "12439dc4-b9ad-49fa-9ccb-4383a272bcaf",
+ "metadata": {},
+ "source": [
+ "\n",
+ "Great job! \n",
+ "
\n",
+ "We have completed the preprocessing steps and are ready to move on to some downstream analysis. Take a break here or move on to the next tutorial. \n"
+ ]
+ }
+ ],
+ "metadata": {
+ "environment": {
+ "kernel": "python3",
+ "name": "common-cpu.m94",
+ "type": "gcloud",
+ "uri": "gcr.io/deeplearning-platform-release/base-cpu:m94"
+ },
+ "kernelspec": {
+ "display_name": "Python 3",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.7.12"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/tutorials/notebooks/ATACseq/ATACseq_Tutorial2_PeakDetection.ipynb b/tutorials/notebooks/ATACseq/ATACseq_Tutorial2_PeakDetection.ipynb
new file mode 100644
index 0000000..d7b685c
--- /dev/null
+++ b/tutorials/notebooks/ATACseq/ATACseq_Tutorial2_PeakDetection.ipynb
@@ -0,0 +1,4476 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "id": "f7ded04b-10a3-4eee-b7a2-9e1f8528c239",
+ "metadata": {},
+ "source": [
+ "# ATAC-seq Module2: Visualization and Peak Identification\n",
+ "\n",
+ " \n",
+ "\n",
+ "## Overview & Purpose\n",
+ "In the previous section of this module we performed preprocessing quality control, mapping, and deduplication. In this section we will focus on visualization of the signal, create average plots of signal around transcription start sites (TSSs), and identification of peak signal. \n",
+ "\n",
+ "### Required Files\n",
+ "In this stage of the module you will use the deduplicated bam files that we prepared in the previous section. Don't worry if you are just jumping in now, we have examples of these files saved and will include a step that copies them for your use. You can also use this module on your own data or any published ATAC-seq dataset, but you should complete the mappping and deduplication steps first.\n",
+ "\n",
+ "\n",
+ "STEP1: Setup Environment\n",
+ "
\n",
+ "\n",
+ "Initial items to configure your google cloud environment. In this step we will use conda to install the following packages:\n",
+ "\n",
+ "Visualization:\n",
+ "[samtools](https://anaconda.org/bioconda/samtools), [deeptools](https://anaconda.org/bioconda/deeptools), [IGV](https://anaconda.org/bioconda/igv)\n",
+ "\n",
+ "Peak Identification:\n",
+ "[macs2](https://anaconda.org/bioconda/macs2)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 12,
+ "id": "f2074873-7df9-46e7-9922-f1b1bb26f7c4",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Installed kernelspec ATACtraining in /home/jupyter/.local/share/jupyter/kernels/atactraining\n",
+ "Warning: 'bioconda' already in 'channels' list, moving to the top\n",
+ "Collecting package metadata (current_repodata.json): done\n",
+ "Solving environment: done\n",
+ "\n",
+ "# All requested packages already installed.\n",
+ "\n",
+ "Retrieving notices: ...working... done\n",
+ "Requirement already satisfied: numpy in ./.local/lib/python3.7/site-packages (1.21.6)\n",
+ "Requirement already satisfied: numpydoc in ./.local/lib/python3.7/site-packages (1.4.0)\n",
+ "Requirement already satisfied: Jinja2>=2.10 in /opt/conda/lib/python3.7/site-packages (from numpydoc) (3.1.2)\n",
+ "Requirement already satisfied: sphinx>=3.0 in ./.local/lib/python3.7/site-packages (from numpydoc) (5.1.1)\n",
+ "Requirement already satisfied: MarkupSafe>=2.0 in /opt/conda/lib/python3.7/site-packages (from Jinja2>=2.10->numpydoc) (2.1.1)\n",
+ "Requirement already satisfied: sphinxcontrib-serializinghtml>=1.1.5 in ./.local/lib/python3.7/site-packages (from sphinx>=3.0->numpydoc) (1.1.5)\n",
+ "Requirement already satisfied: requests>=2.5.0 in /opt/conda/lib/python3.7/site-packages (from sphinx>=3.0->numpydoc) (2.28.1)\n",
+ "Requirement already satisfied: sphinxcontrib-qthelp in ./.local/lib/python3.7/site-packages (from sphinx>=3.0->numpydoc) (1.0.3)\n",
+ "Requirement already satisfied: sphinxcontrib-devhelp in ./.local/lib/python3.7/site-packages (from sphinx>=3.0->numpydoc) (1.0.2)\n",
+ "Requirement already satisfied: alabaster<0.8,>=0.7 in ./.local/lib/python3.7/site-packages (from sphinx>=3.0->numpydoc) (0.7.12)\n",
+ "Requirement already satisfied: imagesize in ./.local/lib/python3.7/site-packages (from sphinx>=3.0->numpydoc) (1.4.1)\n",
+ "Requirement already satisfied: packaging in /opt/conda/lib/python3.7/site-packages (from sphinx>=3.0->numpydoc) (21.3)\n",
+ "Requirement already satisfied: sphinxcontrib-applehelp in ./.local/lib/python3.7/site-packages (from sphinx>=3.0->numpydoc) (1.0.2)\n",
+ "Requirement already satisfied: babel>=1.3 in /opt/conda/lib/python3.7/site-packages (from sphinx>=3.0->numpydoc) (2.10.3)\n",
+ "Requirement already satisfied: sphinxcontrib-jsmath in ./.local/lib/python3.7/site-packages (from sphinx>=3.0->numpydoc) (1.0.1)\n",
+ "Requirement already satisfied: sphinxcontrib-htmlhelp>=2.0.0 in ./.local/lib/python3.7/site-packages (from sphinx>=3.0->numpydoc) (2.0.0)\n",
+ "Requirement already satisfied: snowballstemmer>=1.1 in ./.local/lib/python3.7/site-packages (from sphinx>=3.0->numpydoc) (2.2.0)\n",
+ "Requirement already satisfied: Pygments>=2.0 in /opt/conda/lib/python3.7/site-packages (from sphinx>=3.0->numpydoc) (2.12.0)\n",
+ "Requirement already satisfied: importlib-metadata>=4.4 in /opt/conda/lib/python3.7/site-packages (from sphinx>=3.0->numpydoc) (4.11.4)\n",
+ "Requirement already satisfied: docutils<0.20,>=0.14 in ./.local/lib/python3.7/site-packages (from sphinx>=3.0->numpydoc) (0.19)\n",
+ "Requirement already satisfied: pytz>=2015.7 in /opt/conda/lib/python3.7/site-packages (from babel>=1.3->sphinx>=3.0->numpydoc) (2022.1)\n",
+ "Requirement already satisfied: typing-extensions>=3.6.4 in /opt/conda/lib/python3.7/site-packages (from importlib-metadata>=4.4->sphinx>=3.0->numpydoc) (4.2.0)\n",
+ "Requirement already satisfied: zipp>=0.5 in /opt/conda/lib/python3.7/site-packages (from importlib-metadata>=4.4->sphinx>=3.0->numpydoc) (3.8.0)\n",
+ "Requirement already satisfied: idna<4,>=2.5 in /opt/conda/lib/python3.7/site-packages (from requests>=2.5.0->sphinx>=3.0->numpydoc) (3.3)\n",
+ "Requirement already satisfied: charset-normalizer<3,>=2 in /opt/conda/lib/python3.7/site-packages (from requests>=2.5.0->sphinx>=3.0->numpydoc) (2.1.0)\n",
+ "Requirement already satisfied: certifi>=2017.4.17 in /opt/conda/lib/python3.7/site-packages (from requests>=2.5.0->sphinx>=3.0->numpydoc) (2022.6.15)\n",
+ "Requirement already satisfied: urllib3<1.27,>=1.21.1 in /opt/conda/lib/python3.7/site-packages (from requests>=2.5.0->sphinx>=3.0->numpydoc) (1.26.9)\n",
+ "Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in /opt/conda/lib/python3.7/site-packages (from packaging->sphinx>=3.0->numpydoc) (3.0.9)\n",
+ "Requirement already satisfied: jupyterquiz in /opt/conda/lib/python3.7/site-packages (2.0.1)\n",
+ "Requirement already satisfied: igv-notebook in /opt/conda/lib/python3.7/site-packages (0.3.1)\n"
+ ]
+ }
+ ],
+ "source": [
+ "!python -m ipykernel install --user --name ATACtraining\n",
+ "numthreads=!lscpu | grep '^CPU(s)'| awk '{print $2-1}'\n",
+ "numthreadsint = int(numthreads[0])\n",
+ "!conda config --prepend channels bioconda\n",
+ "#!python -m pip install --user --upgrade pdf2image\n",
+ "#from pdf2image import convert_from_path, convert_from_bytes\n",
+ "!conda install -y -c bioconda samtools deeptools igv macs2\n",
+ "#!python -m pip install --user --upgrade macs3\n",
+ "#!conda install -y -c maximinio macs3 \n",
+ "!python -m pip install --user --upgrade numpy numpydoc\n",
+ "!pip install jupyterquiz\n",
+ "!pip install --user igv-notebook\n",
+ "import igv_notebook\n",
+ "from jupyterquiz import display_quiz\n",
+ "from IPython.display import IFrame\n",
+ "from IPython.display import display\n",
+ "from IPython.display import Image\n",
+ "import pandas as pd"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "54f8e28a-543c-4eb9-8c6a-6107ce0613a8",
+ "metadata": {},
+ "source": [
+ "## Setup FileSystem\n",
+ "Now lets create some folders to stay organized and copy over our prepared fastq files. We're going to create a directory called \"Tutorial1\" which we'll use for this module. We'll then create subfolders for our InputFiles and for the files that we'll be creating during this module. We'll also copy over the fasta file for chromosome 4 as well as some bowtie2 index files (don't worry we'll teach you how to create these index files)."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 13,
+ "id": "8a730ba8-82e0-4ab2-a0ef-118a6910cc92",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "/home/jupyter\n",
+ "Copying gs://unmc_atac_data_examples/Tutorial2/images/Peak.jpg...\n",
+ "Copying gs://unmc_atac_data_examples/Tutorial2/images/ATACseqWorkflowLesson2.jpg...\n",
+ "Copying gs://unmc_atac_data_examples/Tutorial2/images/BPMnorm.json... \n",
+ "Copying gs://unmc_atac_data_examples/Tutorial2/images/InsertSizeQuiz.json... \n",
+ "Copying gs://unmc_atac_data_examples/Tutorial2/images/adapterinsert9bp.jpg... \n",
+ "Copying gs://unmc_atac_data_examples/Tutorial2/images/igv.jpg... \n",
+ "Copying gs://unmc_atac_data_examples/Tutorial2/images/samformat.jpg... \n",
+ "Copying gs://unmc_atac_data_examples/Tutorial2/images/sizeProfile.jpg... \n",
+ "Copying gs://unmc_atac_data_examples/Tutorial2/Annotations/hg38_genes_chr4.bed...\n",
+ "Copying gs://unmc_atac_data_examples/Tutorial2/InputFiles/Mutant_dedup.bam... \n",
+ "Copying gs://unmc_atac_data_examples/Tutorial2/InputFiles/CTL_dedup.bam... \n",
+ "- [2/2 files][ 42.3 MiB/ 42.3 MiB] 100% Done \n",
+ "Operation completed over 2 objects/42.3 MiB. \n"
+ ]
+ }
+ ],
+ "source": [
+ "#These commands create our directory structure.\n",
+ "!cd $HOMEDIR\n",
+ "!mkdir -p Tutorial2\n",
+ "!mkdir -p Tutorial2/InputFiles\n",
+ "!mkdir -p Tutorial2/GenomeAnnotations\n",
+ "!mkdir -p Tutorial2/BigWigFiles\n",
+ "!mkdir -p Tutorial2/Peaks\n",
+ "!mkdir -p Tutorial2/LessonImages\n",
+ "!mkdir -p Tutorial2/Plots\n",
+ "!cd ./Tutorial2\n",
+ "!echo $PWD\n",
+ "\n",
+ "#These commands help identify the google cloud storage bucket where the example files are held.\n",
+ "project_id = \"nosi-unmc-seq\"\n",
+ "original_bucket = \"gs://unmc_atac_data_examples/Tutorial2\"\n",
+ "!gsutil -m cp $original_bucket/images/* Tutorial2/LessonImages\n",
+ "!gsutil -m cp $original_bucket/Annotations/* Tutorial2/GenomeAnnotations\n",
+ "#This command copies our example files to the Tutorial1/Inputfiles folder that we created above.\n",
+ "! gsutil -m cp $original_bucket/InputFiles/*bam Tutorial2/InputFiles\n",
+ "\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "51fc4b08-4f02-4f49-9a8b-48b02a122455",
+ "metadata": {},
+ "source": [
+ "\n",
+ "### OK\n",
+ "Let's make sure that the files copied correctly. You should see 2 .bam files after running the following command:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 14,
+ "id": "bce79afb-044f-45da-b7e4-e9a5f93dd1c2",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "CTL_Nucleosomal.bam CTL_shift.bam\t Mutant_dedup.bam.bai\n",
+ "CTL_THSS.bam\t Mutant_Nucleosomal.bam Mutant_shift.bam\n",
+ "CTL_dedup.bam\t Mutant_THSS.bam\n",
+ "CTL_dedup.bam.bai Mutant_dedup.bam\n"
+ ]
+ }
+ ],
+ "source": [
+ "!ls Tutorial2/InputFiles\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "cf7aea89-8610-4430-a802-6f7d387deb8b",
+ "metadata": {},
+ "source": [
+ "\n",
+ "STEP2: Visualization\n",
+ "
\n",
+ "Files in sam/bam format contain a lot of information including the original sequence of the reads, quality scores, and their corresponding chromosomal coordinates.\n",
+ "\n",
+ " \n",
+ "\n",
+ "### Please view this [site](https://www.samformat.info/sam-format-flag) for a more complete description of sam format and to see what the various sam flag values mean.\n",
+ "\n",
+ "Let's view the first few lines of one of our bam files:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 15,
+ "id": "a51f1ba8-3bf0-4715-bb97-4a029352a3cd",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "SRR1944627.37127681\t99\tchr4\t39845\t31\t50M\t=\t39881\t86\tATCTTTGTGGCATTCTCTGTATTTCCTGAATTTGAATGTTGGCCTGCCTT\tCCCFFFFFHHHHHJJJJJJJJJJJJJJJJJJIJIIIJJIJJJJJJJJJJJ\tMD:Z:50\tPG:Z:MarkDuplicates\tXG:i:0\tNM:i:0\tXM:i:0\tXN:i:0\tXO:i:0\tAS:i:0\tXS:i:0\tYS:i:0\tYT:Z:CP\n",
+ "SRR1944627.37127681\t147\tchr4\t39881\t31\t50M\t=\t39845\t-86\tTGTTGGCCTGCCTTGCTAGGTTGGGAAAGTTCTCCTGGATAATATCCTGA\tHEJJJIJJJJJJJJHJJJJJJIJJJJJJIJJIJJJJJHHHHHFFFFFCCC\tMD:Z:50\tPG:Z:MarkDuplicates\tXG:i:0\tNM:i:0\tXM:i:0\tXN:i:0\tXO:i:0\tAS:i:0\tXS:i:0\tYS:i:0\tYT:Z:CP\n",
+ "SRR1944627.50776065\t99\tchr4\t98978\t11\t49M\t=\t99304\t376\tGAGTCTCACTCTGTCACCCAGGCTGGAGTGCAGTGGCACGATCTCGGCT\t@C@FFFFFHHGHGHIIJJJJIGIBHEIICFCHGEHJGIJIHCHIIJGII\tMD:Z:49\tPG:Z:MarkDuplicates\tXG:i:0\tNM:i:0\tXM:i:0\tXN:i:0\tXO:i:0\tAS:i:0\tXS:i:0\tYS:i:-12\tYT:Z:CP\n",
+ "samtools view: writing to standard output failed: Broken pipe\n",
+ "samtools view: error closing standard output: -1\n"
+ ]
+ }
+ ],
+ "source": [
+ "!samtools view Tutorial2/InputFiles/CTL_dedup.bam | head -3"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "a39c3c75-9011-4a9e-a60e-bb8c503405bb",
+ "metadata": {},
+ "source": [
+ "While we can see the coordinates of each read, we will need a better way of visualizing the results. In this step we will create a binary file that summarizes the pileup of reads at basepair along our genome, in [bigwig](http://genome.ucsc.edu/goldenPath/help/bigWig.html) format. \n",
+ "\n",
+ "To create the bigwig files let's use the command bamCoverage, part of the [deeptools](https://deeptools.readthedocs.io/en/develop/) package."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 16,
+ "id": "f78e9802-ddd1-4abb-9bbf-3cc10e5d5d90",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "done\n"
+ ]
+ }
+ ],
+ "source": [
+ "# First we need to create an index of our bam file.\n",
+ "!samtools index Tutorial2/InputFiles/CTL_dedup.bam\n",
+ "\n",
+ "# Then we can create a bigwig file of the control sample.\n",
+ "!bamCoverage -b Tutorial2/InputFiles/CTL_dedup.bam -o Tutorial2/BigWigFiles/Control.bw -bs 1 -p $numthreadsint --normalizeUsing BPM 2> Tutorial2/BigWigFiles/bamCovLog_ctl.txt\n",
+ "\n",
+ "# Now let's rerun the commands for our mutant sample.\n",
+ "!samtools index Tutorial2/InputFiles/Mutant_dedup.bam\n",
+ "!bamCoverage -b Tutorial2/InputFiles/Mutant_dedup.bam -o Tutorial2/BigWigFiles/Mutant.bw -bs 1 -p $numthreadsint --normalizeUsing BPM 2> Tutorial2/BigWigFiles/bamCovLog_mut.txt\n",
+ "\n",
+ "print(\"done\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "744e8b64-947b-43f4-9bb0-51bf42972bc3",
+ "metadata": {},
+ "source": [
+ "In the above example we specify the bam file name after -b and the output file name after -o. \n",
+ "\n",
+ "We specified -bs 1, which tells bamCoverage the summarize the reads at every basepair; the default is to summarize at 50 bp resolution, but for ATAC-seq we find it useful to summarize the data at finer-scale. \n",
+ "\n",
+ "We also specified the number of threads to use with -p, which is held in a variable in our notebook.\n",
+ "\n",
+ "Lastly, we specified --normalizeUsing BPM. BPM stands for Bins Per Million mapped reads. What do you think this normalization does?"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 17,
+ "id": "f2df8f13-211b-4584-8600-c5ab1edf8138",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "\n"
+ ]
+ },
+ {
+ "data": {
+ "text/html": [
+ " \n",
+ " "
+ ],
+ "text/plain": [
+ "
"
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "display_quiz(\"Tutorial2/LessonImages/BPMnorm.json\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "b228a0d9-0e88-490f-a53d-58ce20d40b64",
+ "metadata": {},
+ "source": [
+ "\n",
+ "Genome Browser\n",
+ "
\n",
+ "\n",
+ "Now that we have our bigwig files, we can visualize the signal in a genome browser. We'll use [igv](https://igv.org/) in this example. \n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 18,
+ "id": "4611752a-36fe-4b76-8fc1-5764a69e74d8",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "application/javascript": [
+ "!function (global, factory) {window.igv = factory()}(this,(function(){\"use strict\";\n",
+ "/*!\n",
+ " * jQuery JavaScript Library v3.3.1 -ajax,-ajax/jsonp,-ajax/load,-ajax/parseXML,-ajax/script,-ajax/var/location,-ajax/var/nonce,-ajax/var/rquery,-ajax/xhr,-manipulation/_evalUrl,-event/ajax,-effects,-effects/Tween,-effects/animatedSelector\n",
+ " * https://jquery.com/\n",
+ " *\n",
+ " * Includes Sizzle.js\n",
+ " * https://sizzlejs.com/\n",
+ " *\n",
+ " * Copyright JS Foundation and other contributors\n",
+ " * Released under the MIT license\n",
+ " * https://jquery.org/license\n",
+ " *\n",
+ " * Date: 2018-01-20T17:24Z\n",
+ " */var t=[],e=window.document,n=Object.getPrototypeOf,r=t.slice,i=t.concat,o=t.push,s=t.indexOf,a={},c=a.toString,l=a.hasOwnProperty,h=l.toString,u=h.call(Object),f={},d=function(t){return\"function\"==typeof t&&\"number\"!=typeof t.nodeType},p=function(t){return null!=t&&t===t.window},g={type:!0,src:!0,noModule:!0};function m(t,n,r){var i,o=(n=n||e).createElement(\"script\");if(o.text=t,r)for(i in g)r[i]&&(o[i]=r[i]);n.head.appendChild(o).parentNode.removeChild(o)}function v(t){return null==t?t+\"\":\"object\"==typeof t||\"function\"==typeof t?a[c.call(t)]||\"object\":typeof t}var b=\"3.3.1 -ajax,-ajax/jsonp,-ajax/load,-ajax/parseXML,-ajax/script,-ajax/var/location,-ajax/var/nonce,-ajax/var/rquery,-ajax/xhr,-manipulation/_evalUrl,-event/ajax,-effects,-effects/Tween,-effects/animatedSelector\",w=function(t,e){return new w.fn.init(t,e)},y=/^[\\s\\uFEFF\\xA0]+|[\\s\\uFEFF\\xA0]+$/g;function x(t){var e=!!t&&\"length\"in t&&t.length,n=v(t);return!d(t)&&!p(t)&&(\"array\"===n||0===e||\"number\"==typeof e&&e>0&&e-1 in t)}w.fn=w.prototype={jquery:b,constructor:w,length:0,toArray:function(){return r.call(this)},get:function(t){return null==t?r.call(this):t<0?this[t+this.length]:this[t]},pushStack:function(t){var e=w.merge(this.constructor(),t);return e.prevObject=this,e},each:function(t){return w.each(this,t)},map:function(t){return this.pushStack(w.map(this,(function(e,n){return t.call(e,n,e)})))},slice:function(){return this.pushStack(r.apply(this,arguments))},first:function(){return this.eq(0)},last:function(){return this.eq(-1)},eq:function(t){var e=this.length,n=+t+(t<0?e:0);return this.pushStack(n>=0&&n+~]|[\\\\x20\\\\t\\\\r\\\\n\\\\f])[\\\\x20\\\\t\\\\r\\\\n\\\\f]*\"),q=new RegExp(\"=[\\\\x20\\\\t\\\\r\\\\n\\\\f]*([^\\\\]'\\\"]*?)[\\\\x20\\\\t\\\\r\\\\n\\\\f]*\\\\]\",\"g\"),W=new RegExp(z),$=new RegExp(\"^\"+D+\"$\"),G={ID:new RegExp(\"^#(\"+D+\")\"),CLASS:new RegExp(\"^\\\\.(\"+D+\")\"),TAG:new RegExp(\"^(\"+D+\"|[*])\"),ATTR:new RegExp(\"^\"+B),PSEUDO:new RegExp(\"^\"+z),CHILD:new RegExp(\"^:(only|first|last|nth|nth-last)-(child|of-type)(?:\\\\([\\\\x20\\\\t\\\\r\\\\n\\\\f]*(even|odd|(([+-]|)(\\\\d*)n|)[\\\\x20\\\\t\\\\r\\\\n\\\\f]*(?:([+-]|)[\\\\x20\\\\t\\\\r\\\\n\\\\f]*(\\\\d+)|))[\\\\x20\\\\t\\\\r\\\\n\\\\f]*\\\\)|)\",\"i\"),bool:new RegExp(\"^(?:\"+O+\")$\",\"i\"),needsContext:new RegExp(\"^[\\\\x20\\\\t\\\\r\\\\n\\\\f]*[>+~]|:(even|odd|eq|gt|lt|nth|first|last)(?:\\\\([\\\\x20\\\\t\\\\r\\\\n\\\\f]*((?:-\\\\d)?\\\\d*)[\\\\x20\\\\t\\\\r\\\\n\\\\f]*\\\\)|)(?=[^-]|$)\",\"i\")},Z=/^(?:input|select|textarea|button)$/i,X=/^h\\d$/i,Y=/^[^{]+\\{\\s*\\[native \\w/,K=/^(?:#([\\w-]+)|(\\w+)|\\.([\\w-]+))$/,Q=/[+~]/,J=new RegExp(\"\\\\\\\\([\\\\da-f]{1,6}[\\\\x20\\\\t\\\\r\\\\n\\\\f]?|([\\\\x20\\\\t\\\\r\\\\n\\\\f])|.)\",\"ig\"),tt=function(t,e,n){var r=\"0x\"+e-65536;return r!=r||n?e:r<0?String.fromCharCode(r+65536):String.fromCharCode(r>>10|55296,1023&r|56320)},et=/([\\0-\\x1f\\x7f]|^-?\\d)|^-$|[^\\0-\\x1f\\x7f-\\uFFFF\\w-]/g,nt=function(t,e){return e?\"\\0\"===t?\"�\":t.slice(0,-1)+\"\\\\\"+t.charCodeAt(t.length-1).toString(16)+\" \":\"\\\\\"+t},rt=function(){f()},it=vt((function(t){return!0===t.disabled&&(\"form\"in t||\"label\"in t)}),{dir:\"parentNode\",next:\"legend\"});try{I.apply(L=N.call(x.childNodes),x.childNodes),L[x.childNodes.length].nodeType}catch(t){I={apply:L.length?function(t,e){R.apply(t,N.call(e))}:function(t,e){for(var n=t.length,r=0;t[n++]=e[r++];);t.length=n-1}}}function ot(t,e,r,i){var o,a,l,h,u,p,v,b=e&&e.ownerDocument,_=e?e.nodeType:9;if(r=r||[],\"string\"!=typeof t||!t||1!==_&&9!==_&&11!==_)return r;if(!i&&((e?e.ownerDocument||e:x)!==d&&f(e),e=e||d,g)){if(11!==_&&(u=K.exec(t)))if(o=u[1]){if(9===_){if(!(l=e.getElementById(o)))return r;if(l.id===o)return r.push(l),r}else if(b&&(l=b.getElementById(o))&&w(e,l)&&l.id===o)return r.push(l),r}else{if(u[2])return I.apply(r,e.getElementsByTagName(t)),r;if((o=u[3])&&n.getElementsByClassName&&e.getElementsByClassName)return I.apply(r,e.getElementsByClassName(o)),r}if(n.qsa&&!E[t+\" \"]&&(!m||!m.test(t))){if(1!==_)b=e,v=t;else if(\"object\"!==e.nodeName.toLowerCase()){for((h=e.getAttribute(\"id\"))?h=h.replace(et,nt):e.setAttribute(\"id\",h=y),a=(p=s(t)).length;a--;)p[a]=\"#\"+h+\" \"+mt(p[a]);v=p.join(\",\"),b=Q.test(t)&&pt(e.parentNode)||e}if(v)try{return I.apply(r,b.querySelectorAll(v)),r}catch(t){}finally{h===y&&e.removeAttribute(\"id\")}}}return c(t.replace(V,\"$1\"),e,r,i)}function st(){var t=[];return function e(n,i){return t.push(n+\" \")>r.cacheLength&&delete e[t.shift()],e[n+\" \"]=i}}function at(t){return t[y]=!0,t}function ct(t){var e=d.createElement(\"fieldset\");try{return!!t(e)}catch(t){return!1}finally{e.parentNode&&e.parentNode.removeChild(e),e=null}}function lt(t,e){var n=e&&t,r=n&&1===t.nodeType&&1===e.nodeType&&t.sourceIndex-e.sourceIndex;if(r)return r;if(n)for(;n=n.nextSibling;)if(n===e)return-1;return t?1:-1}function ht(t){return function(e){return\"input\"===e.nodeName.toLowerCase()&&e.type===t}}function ut(t){return function(e){var n=e.nodeName.toLowerCase();return(\"input\"===n||\"button\"===n)&&e.type===t}}function ft(t){return function(e){return\"form\"in e?e.parentNode&&!1===e.disabled?\"label\"in e?\"label\"in e.parentNode?e.parentNode.disabled===t:e.disabled===t:e.isDisabled===t||e.isDisabled!==!t&&it(e)===t:e.disabled===t:\"label\"in e&&e.disabled===t}}function dt(t){return at((function(e){return e=+e,at((function(n,r){for(var i,o=t([],n.length,e),s=o.length;s--;)n[i=o[s]]&&(n[i]=!(r[i]=n[i]))}))}))}function pt(t){return t&&void 0!==t.getElementsByTagName&&t}for(e in n=ot.support={},o=ot.isXML=function(t){var e=t&&(t.ownerDocument||t).documentElement;return!!e&&\"HTML\"!==e.nodeName},f=ot.setDocument=function(t){var e,i,s=t?t.ownerDocument||t:x;return s!==d&&9===s.nodeType&&s.documentElement?(p=(d=s).documentElement,g=!o(d),x!==d&&(i=d.defaultView)&&i.top!==i&&(i.addEventListener?i.addEventListener(\"unload\",rt,!1):i.attachEvent&&i.attachEvent(\"onunload\",rt)),n.attributes=ct((function(t){return t.className=\"i\",!t.getAttribute(\"className\")})),n.getElementsByTagName=ct((function(t){return t.appendChild(d.createComment(\"\")),!t.getElementsByTagName(\"*\").length})),n.getElementsByClassName=Y.test(d.getElementsByClassName),n.getById=ct((function(t){return p.appendChild(t).id=y,!d.getElementsByName||!d.getElementsByName(y).length})),n.getById?(r.filter.ID=function(t){var e=t.replace(J,tt);return function(t){return t.getAttribute(\"id\")===e}},r.find.ID=function(t,e){if(void 0!==e.getElementById&&g){var n=e.getElementById(t);return n?[n]:[]}}):(r.filter.ID=function(t){var e=t.replace(J,tt);return function(t){var n=void 0!==t.getAttributeNode&&t.getAttributeNode(\"id\");return n&&n.value===e}},r.find.ID=function(t,e){if(void 0!==e.getElementById&&g){var n,r,i,o=e.getElementById(t);if(o){if((n=o.getAttributeNode(\"id\"))&&n.value===t)return[o];for(i=e.getElementsByName(t),r=0;o=i[r++];)if((n=o.getAttributeNode(\"id\"))&&n.value===t)return[o]}return[]}}),r.find.TAG=n.getElementsByTagName?function(t,e){return void 0!==e.getElementsByTagName?e.getElementsByTagName(t):n.qsa?e.querySelectorAll(t):void 0}:function(t,e){var n,r=[],i=0,o=e.getElementsByTagName(t);if(\"*\"===t){for(;n=o[i++];)1===n.nodeType&&r.push(n);return r}return o},r.find.CLASS=n.getElementsByClassName&&function(t,e){if(void 0!==e.getElementsByClassName&&g)return e.getElementsByClassName(t)},v=[],m=[],(n.qsa=Y.test(d.querySelectorAll))&&(ct((function(t){p.appendChild(t).innerHTML=\" \",t.querySelectorAll(\"[msallowcapture^='']\").length&&m.push(\"[*^$]=[\\\\x20\\\\t\\\\r\\\\n\\\\f]*(?:''|\\\"\\\")\"),t.querySelectorAll(\"[selected]\").length||m.push(\"\\\\[[\\\\x20\\\\t\\\\r\\\\n\\\\f]*(?:value|\"+O+\")\"),t.querySelectorAll(\"[id~=\"+y+\"-]\").length||m.push(\"~=\"),t.querySelectorAll(\":checked\").length||m.push(\":checked\"),t.querySelectorAll(\"a#\"+y+\"+*\").length||m.push(\".#.+[+~]\")})),ct((function(t){t.innerHTML=\" \";var e=d.createElement(\"input\");e.setAttribute(\"type\",\"hidden\"),t.appendChild(e).setAttribute(\"name\",\"D\"),t.querySelectorAll(\"[name=d]\").length&&m.push(\"name[\\\\x20\\\\t\\\\r\\\\n\\\\f]*[*^$|!~]?=\"),2!==t.querySelectorAll(\":enabled\").length&&m.push(\":enabled\",\":disabled\"),p.appendChild(t).disabled=!0,2!==t.querySelectorAll(\":disabled\").length&&m.push(\":enabled\",\":disabled\"),t.querySelectorAll(\"*,:x\"),m.push(\",.*:\")}))),(n.matchesSelector=Y.test(b=p.matches||p.webkitMatchesSelector||p.mozMatchesSelector||p.oMatchesSelector||p.msMatchesSelector))&&ct((function(t){n.disconnectedMatch=b.call(t,\"*\"),b.call(t,\"[s!='']:x\"),v.push(\"!=\",z)})),m=m.length&&new RegExp(m.join(\"|\")),v=v.length&&new RegExp(v.join(\"|\")),e=Y.test(p.compareDocumentPosition),w=e||Y.test(p.contains)?function(t,e){var n=9===t.nodeType?t.documentElement:t,r=e&&e.parentNode;return t===r||!(!r||1!==r.nodeType||!(n.contains?n.contains(r):t.compareDocumentPosition&&16&t.compareDocumentPosition(r)))}:function(t,e){if(e)for(;e=e.parentNode;)if(e===t)return!0;return!1},A=e?function(t,e){if(t===e)return u=!0,0;var r=!t.compareDocumentPosition-!e.compareDocumentPosition;return r||(1&(r=(t.ownerDocument||t)===(e.ownerDocument||e)?t.compareDocumentPosition(e):1)||!n.sortDetached&&e.compareDocumentPosition(t)===r?t===d||t.ownerDocument===x&&w(x,t)?-1:e===d||e.ownerDocument===x&&w(x,e)?1:h?P(h,t)-P(h,e):0:4&r?-1:1)}:function(t,e){if(t===e)return u=!0,0;var n,r=0,i=t.parentNode,o=e.parentNode,s=[t],a=[e];if(!i||!o)return t===d?-1:e===d?1:i?-1:o?1:h?P(h,t)-P(h,e):0;if(i===o)return lt(t,e);for(n=t;n=n.parentNode;)s.unshift(n);for(n=e;n=n.parentNode;)a.unshift(n);for(;s[r]===a[r];)r++;return r?lt(s[r],a[r]):s[r]===x?-1:a[r]===x?1:0},d):d},ot.matches=function(t,e){return ot(t,null,null,e)},ot.matchesSelector=function(t,e){if((t.ownerDocument||t)!==d&&f(t),e=e.replace(q,\"='$1']\"),n.matchesSelector&&g&&!E[e+\" \"]&&(!v||!v.test(e))&&(!m||!m.test(e)))try{var r=b.call(t,e);if(r||n.disconnectedMatch||t.document&&11!==t.document.nodeType)return r}catch(t){}return ot(e,d,null,[t]).length>0},ot.contains=function(t,e){return(t.ownerDocument||t)!==d&&f(t),w(t,e)},ot.attr=function(t,e){(t.ownerDocument||t)!==d&&f(t);var i=r.attrHandle[e.toLowerCase()],o=i&&T.call(r.attrHandle,e.toLowerCase())?i(t,e,!g):void 0;return void 0!==o?o:n.attributes||!g?t.getAttribute(e):(o=t.getAttributeNode(e))&&o.specified?o.value:null},ot.escape=function(t){return(t+\"\").replace(et,nt)},ot.error=function(t){throw new Error(\"Syntax error, unrecognized expression: \"+t)},ot.uniqueSort=function(t){var e,r=[],i=0,o=0;if(u=!n.detectDuplicates,h=!n.sortStable&&t.slice(0),t.sort(A),u){for(;e=t[o++];)e===t[o]&&(i=r.push(o));for(;i--;)t.splice(r[i],1)}return h=null,t},i=ot.getText=function(t){var e,n=\"\",r=0,o=t.nodeType;if(o){if(1===o||9===o||11===o){if(\"string\"==typeof t.textContent)return t.textContent;for(t=t.firstChild;t;t=t.nextSibling)n+=i(t)}else if(3===o||4===o)return t.nodeValue}else for(;e=t[r++];)n+=i(e);return n},r=ot.selectors={cacheLength:50,createPseudo:at,match:G,attrHandle:{},find:{},relative:{\">\":{dir:\"parentNode\",first:!0},\" \":{dir:\"parentNode\"},\"+\":{dir:\"previousSibling\",first:!0},\"~\":{dir:\"previousSibling\"}},preFilter:{ATTR:function(t){return t[1]=t[1].replace(J,tt),t[3]=(t[3]||t[4]||t[5]||\"\").replace(J,tt),\"~=\"===t[2]&&(t[3]=\" \"+t[3]+\" \"),t.slice(0,4)},CHILD:function(t){return t[1]=t[1].toLowerCase(),\"nth\"===t[1].slice(0,3)?(t[3]||ot.error(t[0]),t[4]=+(t[4]?t[5]+(t[6]||1):2*(\"even\"===t[3]||\"odd\"===t[3])),t[5]=+(t[7]+t[8]||\"odd\"===t[3])):t[3]&&ot.error(t[0]),t},PSEUDO:function(t){var e,n=!t[6]&&t[2];return G.CHILD.test(t[0])?null:(t[3]?t[2]=t[4]||t[5]||\"\":n&&W.test(n)&&(e=s(n,!0))&&(e=n.indexOf(\")\",n.length-e)-n.length)&&(t[0]=t[0].slice(0,e),t[2]=n.slice(0,e)),t.slice(0,3))}},filter:{TAG:function(t){var e=t.replace(J,tt).toLowerCase();return\"*\"===t?function(){return!0}:function(t){return t.nodeName&&t.nodeName.toLowerCase()===e}},CLASS:function(t){var e=S[t+\" \"];return e||(e=new RegExp(\"(^|[\\\\x20\\\\t\\\\r\\\\n\\\\f])\"+t+\"(\"+F+\"|$)\"))&&S(t,(function(t){return e.test(\"string\"==typeof t.className&&t.className||void 0!==t.getAttribute&&t.getAttribute(\"class\")||\"\")}))},ATTR:function(t,e,n){return function(r){var i=ot.attr(r,t);return null==i?\"!=\"===e:!e||(i+=\"\",\"=\"===e?i===n:\"!=\"===e?i!==n:\"^=\"===e?n&&0===i.indexOf(n):\"*=\"===e?n&&i.indexOf(n)>-1:\"$=\"===e?n&&i.slice(-n.length)===n:\"~=\"===e?(\" \"+i.replace(H,\" \")+\" \").indexOf(n)>-1:\"|=\"===e&&(i===n||i.slice(0,n.length+1)===n+\"-\"))}},CHILD:function(t,e,n,r,i){var o=\"nth\"!==t.slice(0,3),s=\"last\"!==t.slice(-4),a=\"of-type\"===e;return 1===r&&0===i?function(t){return!!t.parentNode}:function(e,n,c){var l,h,u,f,d,p,g=o!==s?\"nextSibling\":\"previousSibling\",m=e.parentNode,v=a&&e.nodeName.toLowerCase(),b=!c&&!a,w=!1;if(m){if(o){for(;g;){for(f=e;f=f[g];)if(a?f.nodeName.toLowerCase()===v:1===f.nodeType)return!1;p=g=\"only\"===t&&!p&&\"nextSibling\"}return!0}if(p=[s?m.firstChild:m.lastChild],s&&b){for(w=(d=(l=(h=(u=(f=m)[y]||(f[y]={}))[f.uniqueID]||(u[f.uniqueID]={}))[t]||[])[0]===_&&l[1])&&l[2],f=d&&m.childNodes[d];f=++d&&f&&f[g]||(w=d=0)||p.pop();)if(1===f.nodeType&&++w&&f===e){h[t]=[_,d,w];break}}else if(b&&(w=d=(l=(h=(u=(f=e)[y]||(f[y]={}))[f.uniqueID]||(u[f.uniqueID]={}))[t]||[])[0]===_&&l[1]),!1===w)for(;(f=++d&&f&&f[g]||(w=d=0)||p.pop())&&((a?f.nodeName.toLowerCase()!==v:1!==f.nodeType)||!++w||(b&&((h=(u=f[y]||(f[y]={}))[f.uniqueID]||(u[f.uniqueID]={}))[t]=[_,w]),f!==e)););return(w-=i)===r||w%r==0&&w/r>=0}}},PSEUDO:function(t,e){var n,i=r.pseudos[t]||r.setFilters[t.toLowerCase()]||ot.error(\"unsupported pseudo: \"+t);return i[y]?i(e):i.length>1?(n=[t,t,\"\",e],r.setFilters.hasOwnProperty(t.toLowerCase())?at((function(t,n){for(var r,o=i(t,e),s=o.length;s--;)t[r=P(t,o[s])]=!(n[r]=o[s])})):function(t){return i(t,0,n)}):i}},pseudos:{not:at((function(t){var e=[],n=[],r=a(t.replace(V,\"$1\"));return r[y]?at((function(t,e,n,i){for(var o,s=r(t,null,i,[]),a=t.length;a--;)(o=s[a])&&(t[a]=!(e[a]=o))})):function(t,i,o){return e[0]=t,r(e,null,o,n),e[0]=null,!n.pop()}})),has:at((function(t){return function(e){return ot(t,e).length>0}})),contains:at((function(t){return t=t.replace(J,tt),function(e){return(e.textContent||e.innerText||i(e)).indexOf(t)>-1}})),lang:at((function(t){return $.test(t||\"\")||ot.error(\"unsupported lang: \"+t),t=t.replace(J,tt).toLowerCase(),function(e){var n;do{if(n=g?e.lang:e.getAttribute(\"xml:lang\")||e.getAttribute(\"lang\"))return(n=n.toLowerCase())===t||0===n.indexOf(t+\"-\")}while((e=e.parentNode)&&1===e.nodeType);return!1}})),target:function(e){var n=t.location&&t.location.hash;return n&&n.slice(1)===e.id},root:function(t){return t===p},focus:function(t){return t===d.activeElement&&(!d.hasFocus||d.hasFocus())&&!!(t.type||t.href||~t.tabIndex)},enabled:ft(!1),disabled:ft(!0),checked:function(t){var e=t.nodeName.toLowerCase();return\"input\"===e&&!!t.checked||\"option\"===e&&!!t.selected},selected:function(t){return t.parentNode&&t.parentNode.selectedIndex,!0===t.selected},empty:function(t){for(t=t.firstChild;t;t=t.nextSibling)if(t.nodeType<6)return!1;return!0},parent:function(t){return!r.pseudos.empty(t)},header:function(t){return X.test(t.nodeName)},input:function(t){return Z.test(t.nodeName)},button:function(t){var e=t.nodeName.toLowerCase();return\"input\"===e&&\"button\"===t.type||\"button\"===e},text:function(t){var e;return\"input\"===t.nodeName.toLowerCase()&&\"text\"===t.type&&(null==(e=t.getAttribute(\"type\"))||\"text\"===e.toLowerCase())},first:dt((function(){return[0]})),last:dt((function(t,e){return[e-1]})),eq:dt((function(t,e,n){return[n<0?n+e:n]})),even:dt((function(t,e){for(var n=0;n=0;)t.push(r);return t})),gt:dt((function(t,e,n){for(var r=n<0?n+e:n;++r1?function(e,n,r){for(var i=t.length;i--;)if(!t[i](e,n,r))return!1;return!0}:t[0]}function wt(t,e,n,r,i){for(var o,s=[],a=0,c=t.length,l=null!=e;a-1&&(o[l]=!(s[l]=u))}}else v=wt(v===s?v.splice(p,v.length):v),i?i(null,s,v,c):I.apply(s,v)}))}function xt(t){for(var e,n,i,o=t.length,s=r.relative[t[0].type],a=s||r.relative[\" \"],c=s?1:0,h=vt((function(t){return t===e}),a,!0),u=vt((function(t){return P(e,t)>-1}),a,!0),f=[function(t,n,r){var i=!s&&(r||n!==l)||((e=n).nodeType?h(t,n,r):u(t,n,r));return e=null,i}];c1&&bt(f),c>1&&mt(t.slice(0,c-1).concat({value:\" \"===t[c-2].type?\"*\":\"\"})).replace(V,\"$1\"),n,c0,i=t.length>0,o=function(o,s,a,c,h){var u,p,m,v=0,b=\"0\",w=o&&[],y=[],x=l,k=o||i&&r.find.TAG(\"*\",h),S=_+=null==x?1:Math.random()||.1,C=k.length;for(h&&(l=s===d||s||h);b!==C&&null!=(u=k[b]);b++){if(i&&u){for(p=0,s||u.ownerDocument===d||(f(u),a=!g);m=t[p++];)if(m(u,s||d,a)){c.push(u);break}h&&(_=S)}n&&((u=!m&&u)&&v--,o&&w.push(u))}if(v+=b,n&&b!==v){for(p=0;m=e[p++];)m(w,y,s,a);if(o){if(v>0)for(;b--;)w[b]||y[b]||(y[b]=M.call(c));y=wt(y)}I.apply(c,y),h&&!o&&y.length>0&&v+e.length>1&&ot.uniqueSort(c)}return h&&(_=S,l=x),w};return n?at(o):o}(o,i)),a.selector=t}return a},c=ot.select=function(t,e,n,i){var o,c,l,h,u,f=\"function\"==typeof t&&t,d=!i&&s(t=f.selector||t);if(n=n||[],1===d.length){if((c=d[0]=d[0].slice(0)).length>2&&\"ID\"===(l=c[0]).type&&9===e.nodeType&&g&&r.relative[c[1].type]){if(!(e=(r.find.ID(l.matches[0].replace(J,tt),e)||[])[0]))return n;f&&(e=e.parentNode),t=t.slice(c.shift().value.length)}for(o=G.needsContext.test(t)?0:c.length;o--&&(l=c[o],!r.relative[h=l.type]);)if((u=r.find[h])&&(i=u(l.matches[0].replace(J,tt),Q.test(c[0].type)&&pt(e.parentNode)||e))){if(c.splice(o,1),!(t=i.length&&mt(c)))return I.apply(n,i),n;break}}return(f||a(t,d))(i,e,!g,n,!e||Q.test(t)&&pt(e.parentNode)||e),n},n.sortStable=y.split(\"\").sort(A).join(\"\")===y,n.detectDuplicates=!!u,f(),ot}(window);w.find=_,w.expr=_.selectors,w.expr[\":\"]=w.expr.pseudos,w.uniqueSort=w.unique=_.uniqueSort,w.text=_.getText,w.isXMLDoc=_.isXML,w.contains=_.contains,w.escapeSelector=_.escape;var k=function(t,e,n){for(var r=[],i=void 0!==n;(t=t[e])&&9!==t.nodeType;)if(1===t.nodeType){if(i&&w(t).is(n))break;r.push(t)}return r},S=function(t,e){for(var n=[];t;t=t.nextSibling)1===t.nodeType&&t!==e&&n.push(t);return n},C=w.expr.match.needsContext;function E(t,e){return t.nodeName&&t.nodeName.toLowerCase()===e.toLowerCase()}var A=/^<([a-z][^\\/\\0>:\\x20\\t\\r\\n\\f]*)[\\x20\\t\\r\\n\\f]*\\/?>(?:<\\/\\1>|)$/i;function T(t,e,n){return d(e)?w.grep(t,(function(t,r){return!!e.call(t,r,t)!==n})):e.nodeType?w.grep(t,(function(t){return t===e!==n})):\"string\"!=typeof e?w.grep(t,(function(t){return s.call(e,t)>-1!==n})):w.filter(e,t,n)}w.filter=function(t,e,n){var r=e[0];return n&&(t=\":not(\"+t+\")\"),1===e.length&&1===r.nodeType?w.find.matchesSelector(r,t)?[r]:[]:w.find.matches(t,w.grep(e,(function(t){return 1===t.nodeType})))},w.fn.extend({find:function(t){var e,n,r=this.length,i=this;if(\"string\"!=typeof t)return this.pushStack(w(t).filter((function(){for(e=0;e1?w.uniqueSort(n):n},filter:function(t){return this.pushStack(T(this,t||[],!1))},not:function(t){return this.pushStack(T(this,t||[],!0))},is:function(t){return!!T(this,\"string\"==typeof t&&C.test(t)?w(t):t||[],!1).length}});var L,M=/^(?:\\s*(<[\\w\\W]+>)[^>]*|#([\\w-]+))$/,R=w.fn.init=function(t,n,r){var i,o;if(!t)return this;if(r=r||L,\"string\"==typeof t){if(!(i=\"<\"===t[0]&&\">\"===t[t.length-1]&&t.length>=3?[null,t,null]:M.exec(t))||!i[1]&&n)return!n||n.jquery?(n||r).find(t):this.constructor(n).find(t);if(i[1]){if(n=n instanceof w?n[0]:n,w.merge(this,w.parseHTML(i[1],n&&n.nodeType?n.ownerDocument||n:e,!0)),A.test(i[1])&&w.isPlainObject(n))for(i in n)d(this[i])?this[i](n[i]):this.attr(i,n[i]);return this}return(o=e.getElementById(i[2]))&&(this[0]=o,this.length=1),this}return t.nodeType?(this[0]=t,this.length=1,this):d(t)?void 0!==r.ready?r.ready(t):t(w):w.makeArray(t,this)};R.prototype=w.fn,L=w(e);var I=/^(?:parents|prev(?:Until|All))/,N={children:!0,contents:!0,next:!0,prev:!0};function P(t,e){for(;(t=t[e])&&1!==t.nodeType;);return t}w.fn.extend({has:function(t){var e=w(t,this),n=e.length;return this.filter((function(){for(var t=0;t-1:1===n.nodeType&&w.find.matchesSelector(n,t))){o.push(n);break}return this.pushStack(o.length>1?w.uniqueSort(o):o)},index:function(t){return t?\"string\"==typeof t?s.call(w(t),this[0]):s.call(this,t.jquery?t[0]:t):this[0]&&this[0].parentNode?this.first().prevAll().length:-1},add:function(t,e){return this.pushStack(w.uniqueSort(w.merge(this.get(),w(t,e))))},addBack:function(t){return this.add(null==t?this.prevObject:this.prevObject.filter(t))}}),w.each({parent:function(t){var e=t.parentNode;return e&&11!==e.nodeType?e:null},parents:function(t){return k(t,\"parentNode\")},parentsUntil:function(t,e,n){return k(t,\"parentNode\",n)},next:function(t){return P(t,\"nextSibling\")},prev:function(t){return P(t,\"previousSibling\")},nextAll:function(t){return k(t,\"nextSibling\")},prevAll:function(t){return k(t,\"previousSibling\")},nextUntil:function(t,e,n){return k(t,\"nextSibling\",n)},prevUntil:function(t,e,n){return k(t,\"previousSibling\",n)},siblings:function(t){return S((t.parentNode||{}).firstChild,t)},children:function(t){return S(t.firstChild)},contents:function(t){return E(t,\"iframe\")?t.contentDocument:(E(t,\"template\")&&(t=t.content||t),w.merge([],t.childNodes))}},(function(t,e){w.fn[t]=function(n,r){var i=w.map(this,e,n);return\"Until\"!==t.slice(-5)&&(r=n),r&&\"string\"==typeof r&&(i=w.filter(r,i)),this.length>1&&(N[t]||w.uniqueSort(i),I.test(t)&&i.reverse()),this.pushStack(i)}}));var O=/[^\\x20\\t\\r\\n\\f]+/g;function F(t){return t}function D(t){throw t}function B(t,e,n,r){var i;try{t&&d(i=t.promise)?i.call(t).done(e).fail(n):t&&d(i=t.then)?i.call(t,e,n):e.apply(void 0,[t].slice(r))}catch(t){n.apply(void 0,[t])}}w.Callbacks=function(t){t=\"string\"==typeof t?function(t){var e={};return w.each(t.match(O)||[],(function(t,n){e[n]=!0})),e}(t):w.extend({},t);var e,n,r,i,o=[],s=[],a=-1,c=function(){for(i=i||t.once,r=e=!0;s.length;a=-1)for(n=s.shift();++a-1;)o.splice(n,1),n<=a&&a--})),this},has:function(t){return t?w.inArray(t,o)>-1:o.length>0},empty:function(){return o&&(o=[]),this},disable:function(){return i=s=[],o=n=\"\",this},disabled:function(){return!o},lock:function(){return i=s=[],n||e||(o=n=\"\"),this},locked:function(){return!!i},fireWith:function(t,n){return i||(n=[t,(n=n||[]).slice?n.slice():n],s.push(n),e||c()),this},fire:function(){return l.fireWith(this,arguments),this},fired:function(){return!!r}};return l},w.extend({Deferred:function(t){var e=[[\"notify\",\"progress\",w.Callbacks(\"memory\"),w.Callbacks(\"memory\"),2],[\"resolve\",\"done\",w.Callbacks(\"once memory\"),w.Callbacks(\"once memory\"),0,\"resolved\"],[\"reject\",\"fail\",w.Callbacks(\"once memory\"),w.Callbacks(\"once memory\"),1,\"rejected\"]],n=\"pending\",r={state:function(){return n},always:function(){return i.done(arguments).fail(arguments),this},catch:function(t){return r.then(null,t)},pipe:function(){var t=arguments;return w.Deferred((function(n){w.each(e,(function(e,r){var o=d(t[r[4]])&&t[r[4]];i[r[1]]((function(){var t=o&&o.apply(this,arguments);t&&d(t.promise)?t.promise().progress(n.notify).done(n.resolve).fail(n.reject):n[r[0]+\"With\"](this,o?[t]:arguments)}))})),t=null})).promise()},then:function(t,n,r){var i=0;function o(t,e,n,r){return function(){var s=this,a=arguments,c=function(){var c,l;if(!(t=i&&(n!==D&&(s=void 0,a=[r]),e.rejectWith(s,a))}};t?l():(w.Deferred.getStackHook&&(l.stackTrace=w.Deferred.getStackHook()),window.setTimeout(l))}}return w.Deferred((function(i){e[0][3].add(o(0,i,d(r)?r:F,i.notifyWith)),e[1][3].add(o(0,i,d(t)?t:F)),e[2][3].add(o(0,i,d(n)?n:D))})).promise()},promise:function(t){return null!=t?w.extend(t,r):r}},i={};return w.each(e,(function(t,o){var s=o[2],a=o[5];r[o[1]]=s.add,a&&s.add((function(){n=a}),e[3-t][2].disable,e[3-t][3].disable,e[0][2].lock,e[0][3].lock),s.add(o[3].fire),i[o[0]]=function(){return i[o[0]+\"With\"](this===i?void 0:this,arguments),this},i[o[0]+\"With\"]=s.fireWith})),r.promise(i),t&&t.call(i,i),i},when:function(t){var e=arguments.length,n=e,i=Array(n),o=r.call(arguments),s=w.Deferred(),a=function(t){return function(n){i[t]=this,o[t]=arguments.length>1?r.call(arguments):n,--e||s.resolveWith(i,o)}};if(e<=1&&(B(t,s.done(a(n)).resolve,s.reject,!e),\"pending\"===s.state()||d(o[n]&&o[n].then)))return s.then();for(;n--;)B(o[n],a(n),s.reject);return s.promise()}});var z=/^(Eval|Internal|Range|Reference|Syntax|Type|URI)Error$/;w.Deferred.exceptionHook=function(t,e){window.console&&window.console.warn&&t&&z.test(t.name)&&window.console.warn(\"jQuery.Deferred exception: \"+t.message,t.stack,e)},w.readyException=function(t){window.setTimeout((function(){throw t}))};var H=w.Deferred();function V(){e.removeEventListener(\"DOMContentLoaded\",V),window.removeEventListener(\"load\",V),w.ready()}w.fn.ready=function(t){return H.then(t).catch((function(t){w.readyException(t)})),this},w.extend({isReady:!1,readyWait:1,ready:function(t){(!0===t?--w.readyWait:w.isReady)||(w.isReady=!0,!0!==t&&--w.readyWait>0||H.resolveWith(e,[w]))}}),w.ready.then=H.then,\"complete\"===e.readyState||\"loading\"!==e.readyState&&!e.documentElement.doScroll?window.setTimeout(w.ready):(e.addEventListener(\"DOMContentLoaded\",V),window.addEventListener(\"load\",V));var j=function(t,e,n,r,i,o,s){var a=0,c=t.length,l=null==n;if(\"object\"===v(n))for(a in i=!0,n)j(t,e,a,n[a],!0,o,s);else if(void 0!==r&&(i=!0,d(r)||(s=!0),l&&(s?(e.call(t,r),e=null):(l=e,e=function(t,e,n){return l.call(w(t),n)})),e))for(;a1,null,!0)},removeData:function(t){return this.each((function(){Y.remove(this,t)}))}}),w.extend({queue:function(t,e,n){var r;if(t)return e=(e||\"fx\")+\"queue\",r=X.get(t,e),n&&(!r||Array.isArray(n)?r=X.access(t,e,w.makeArray(n)):r.push(n)),r||[]},dequeue:function(t,e){e=e||\"fx\";var n=w.queue(t,e),r=n.length,i=n.shift(),o=w._queueHooks(t,e);\"inprogress\"===i&&(i=n.shift(),r--),i&&(\"fx\"===e&&n.unshift(\"inprogress\"),delete o.stop,i.call(t,(function(){w.dequeue(t,e)}),o)),!r&&o&&o.empty.fire()},_queueHooks:function(t,e){var n=e+\"queueHooks\";return X.get(t,n)||X.access(t,n,{empty:w.Callbacks(\"once memory\").add((function(){X.remove(t,[e+\"queue\",n])}))})}}),w.fn.extend({queue:function(t,e){var n=2;return\"string\"!=typeof t&&(e=t,t=\"fx\",n--),arguments.length\\x20\\t\\r\\n\\f]+)/i,ht=/^$|^module$|\\/(?:java|ecma)script/i,ut={option:[1,\"\",\" \"],thead:[1,\"\"],col:[2,\"\"],tr:[2,\"\"],td:[3,\"\"],_default:[0,\"\",\"\"]};function ft(t,e){var n;return n=void 0!==t.getElementsByTagName?t.getElementsByTagName(e||\"*\"):void 0!==t.querySelectorAll?t.querySelectorAll(e||\"*\"):[],void 0===e||e&&E(t,e)?w.merge([t],n):n}function dt(t,e){for(var n=0,r=t.length;n-1)i&&i.push(o);else if(l=w.contains(o.ownerDocument,o),s=ft(u.appendChild(o),\"script\"),l&&dt(s),n)for(h=0;o=s[h++];)ht.test(o.type||\"\")&&n.push(o);return u}!function(){var t=e.createDocumentFragment().appendChild(e.createElement(\"div\")),n=e.createElement(\"input\");n.setAttribute(\"type\",\"radio\"),n.setAttribute(\"checked\",\"checked\"),n.setAttribute(\"name\",\"t\"),t.appendChild(n),f.checkClone=t.cloneNode(!0).cloneNode(!0).lastChild.checked,t.innerHTML=\"\",f.noCloneChecked=!!t.cloneNode(!0).lastChild.defaultValue}();var mt=e.documentElement,vt=/^key/,bt=/^(?:mouse|pointer|contextmenu|drag|drop)|click/,wt=/^([^.]*)(?:\\.(.+)|)/;function yt(){return!0}function xt(){return!1}function _t(){try{return e.activeElement}catch(t){}}function kt(t,e,n,r,i,o){var s,a;if(\"object\"==typeof e){for(a in\"string\"!=typeof n&&(r=r||n,n=void 0),e)kt(t,a,n,r,e[a],o);return t}if(null==r&&null==i?(i=n,r=n=void 0):null==i&&(\"string\"==typeof n?(i=r,r=void 0):(i=r,r=n,n=void 0)),!1===i)i=xt;else if(!i)return t;return 1===o&&(s=i,i=function(t){return w().off(t),s.apply(this,arguments)},i.guid=s.guid||(s.guid=w.guid++)),t.each((function(){w.event.add(this,e,i,r,n)}))}w.event={global:{},add:function(t,e,n,r,i){var o,s,a,c,l,h,u,f,d,p,g,m=X.get(t);if(m)for(n.handler&&(n=(o=n).handler,i=o.selector),i&&w.find.matchesSelector(mt,i),n.guid||(n.guid=w.guid++),(c=m.events)||(c=m.events={}),(s=m.handle)||(s=m.handle=function(e){return void 0!==w&&w.event.triggered!==e.type?w.event.dispatch.apply(t,arguments):void 0}),l=(e=(e||\"\").match(O)||[\"\"]).length;l--;)d=g=(a=wt.exec(e[l])||[])[1],p=(a[2]||\"\").split(\".\").sort(),d&&(u=w.event.special[d]||{},d=(i?u.delegateType:u.bindType)||d,u=w.event.special[d]||{},h=w.extend({type:d,origType:g,data:r,handler:n,guid:n.guid,selector:i,needsContext:i&&w.expr.match.needsContext.test(i),namespace:p.join(\".\")},o),(f=c[d])||((f=c[d]=[]).delegateCount=0,u.setup&&!1!==u.setup.call(t,r,p,s)||t.addEventListener&&t.addEventListener(d,s)),u.add&&(u.add.call(t,h),h.handler.guid||(h.handler.guid=n.guid)),i?f.splice(f.delegateCount++,0,h):f.push(h),w.event.global[d]=!0)},remove:function(t,e,n,r,i){var o,s,a,c,l,h,u,f,d,p,g,m=X.hasData(t)&&X.get(t);if(m&&(c=m.events)){for(l=(e=(e||\"\").match(O)||[\"\"]).length;l--;)if(d=g=(a=wt.exec(e[l])||[])[1],p=(a[2]||\"\").split(\".\").sort(),d){for(u=w.event.special[d]||{},f=c[d=(r?u.delegateType:u.bindType)||d]||[],a=a[2]&&new RegExp(\"(^|\\\\.)\"+p.join(\"\\\\.(?:.*\\\\.|)\")+\"(\\\\.|$)\"),s=o=f.length;o--;)h=f[o],!i&&g!==h.origType||n&&n.guid!==h.guid||a&&!a.test(h.namespace)||r&&r!==h.selector&&(\"**\"!==r||!h.selector)||(f.splice(o,1),h.selector&&f.delegateCount--,u.remove&&u.remove.call(t,h));s&&!f.length&&(u.teardown&&!1!==u.teardown.call(t,p,m.handle)||w.removeEvent(t,d,m.handle),delete c[d])}else for(d in c)w.event.remove(t,d+e[l],n,r,!0);w.isEmptyObject(c)&&X.remove(t,\"handle events\")}},dispatch:function(t){var e,n,r,i,o,s,a=w.event.fix(t),c=new Array(arguments.length),l=(X.get(this,\"events\")||{})[a.type]||[],h=w.event.special[a.type]||{};for(c[0]=a,e=1;e=1))for(;l!==this;l=l.parentNode||this)if(1===l.nodeType&&(\"click\"!==t.type||!0!==l.disabled)){for(o=[],s={},n=0;n-1:w.find(i,this,null,[l]).length),s[i]&&o.push(r);o.length&&a.push({elem:l,handlers:o})}return l=this,c\\x20\\t\\r\\n\\f]*)[^>]*)\\/>/gi,Ct=/\n",
+ " "
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "display_quiz(\"Tutorial2/LessonImages/InsertSizeQuiz.json\")\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "a8a9b451-c231-4dc4-9a3b-e8da8b92fcec",
+ "metadata": {},
+ "source": [
+ "With paired-end ATAC-seq data we can separate by fragment size to obtain Transposase HyperSensitive Sites (THSS) and Nucleosomal Fragments. Alternatively, some choose to keep the data together as a more general measure of \"accessible\" sites.\n",
+ "\n",
+ "We'll show you how to separate the small and large fragments into different bam files."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 38,
+ "id": "69afce06-71ba-40e4-b977-62d69156b4df",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "#Filter by insert size:\n",
+ "!samtools view -h Tutorial2/InputFiles/CTL_dedup.bam | awk 'substr($0,1,1)==\"@\" || ($9>= 150 && $9<=250) || ($9<=-150 && $9>=-250)' | samtools view -b > Tutorial2/InputFiles/CTL_Nucleosomal.bam\n",
+ "!samtools view -h Tutorial2/InputFiles/CTL_dedup.bam | awk 'substr($0,1,1)==\"@\" || ($9>= 10 && $9<=125) || ($9<=-10 && $9>=-125)' | samtools view -b > Tutorial2/InputFiles/CTL_THSS.bam\n",
+ "#Do the same for the mutant:\n",
+ "!samtools view -h Tutorial2/InputFiles/Mutant_dedup.bam | awk 'substr($0,1,1)==\"@\" || ($9>= 150 && $9<=250) || ($9<=-150 && $9>=-250)' | samtools view -b > Tutorial2/InputFiles/Mutant_Nucleosomal.bam\n",
+ "!samtools view -h Tutorial2/InputFiles/Mutant_dedup.bam | awk 'substr($0,1,1)==\"@\" || ($9>= 10 && $9<=125) || ($9<=-10 && $9>=-125)' | samtools view -b > Tutorial2/InputFiles/Mutant_THSS.bam"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "f598298f-efeb-4df0-a41c-d6ec96fbaf32",
+ "metadata": {},
+ "source": [
+ "For the rest of this tutorial, we'll use the bam files that contain all the reads as many use this as a general measurement of \"accessibility\". However, you can use these split bam files to create bigwigs, view them in a genome browser, and create average profiles around features as demonstrated earlier. You can also use them in our downstream analysis in lieu of the combined file that we will show in our examples. "
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "e6002033-c000-4b2b-9e23-de065180f2f6",
+ "metadata": {},
+ "source": [
+ "\n",
+ "STEP3: Peak Detection\n",
+ "
\n",
+ " \n",
+ "\n",
+ "Accessible sites are loci with a pileup of reads in \"Peaks\". \n",
+ "\n",
+ "### Opitional Note:\n",
+ "Tn5 insertion of adapters leaves a 9 bp gap. In the end, this probably won't impact the results much. However, to be safe we can shift the reads to account for this insertion offset.\n",
+ "\n",
+ " \n",
+ "\n",
+ "Image adjusted from: [Grandi et al., Nature Protocols 2022](https://www.nature.com/articles/s41596-022-00692-9)\n",
+ "\n",
+ "The alignmentSieve command from [deeptools](https://anaconda.org/bioconda/deeptools) allows us to shift the reads accordingly.\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 39,
+ "id": "d5fde7ad-bc2d-4183-889f-391a66467ff0",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "!alignmentSieve -p $numthreadsint --ATACshift -b Tutorial2/InputFiles/CTL_dedup.bam -o Tutorial2/InputFiles/CTL_shift.bam\n",
+ "!alignmentSieve -p $numthreadsint --ATACshift -b Tutorial2/InputFiles/Mutant_dedup.bam -o Tutorial2/InputFiles/Mutant_shift.bam"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "421549cb-3cf6-4a86-9b2f-34fdf109ff2d",
+ "metadata": {},
+ "source": [
+ "Let's identify Peaks genome-wide using [macs2](https://pypi.org/project/MACS2/)."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 40,
+ "id": "2bc4e422-7813-4b9f-b984-04948f40b26d",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "#If your data is single-end (not paired-end), use -f BAM instead.\n",
+ "!macs2 callpeak -f BAMPE -g hs --keep-dup all --cutoff-analysis -n CTL -t Tutorial2/InputFiles/CTL_shift.bam --outdir Tutorial2/Peaks/ 2> Tutorial2/Peaks/macs2_CTL.log\n",
+ "!macs2 callpeak -f BAMPE -g hs --keep-dup all --cutoff-analysis -n Mutant -t Tutorial2/InputFiles/Mutant_shift.bam --outdir Tutorial2/Peaks/ 2> Tutorial2/Peaks/macs2_Mutant.log"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "2b1c921c-fe40-4fd9-8370-1711a317b632",
+ "metadata": {},
+ "source": [
+ "macs2 provides a .narrowPeak file specififying the coordinates of the peaks, an .xls file with additional information, and a .bed file with the summits of the peaks.\n",
+ "\n",
+ "Let's view the first 10 lines of the .narrowPeak file."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 41,
+ "id": "43fcb3c4-3341-4255-8db9-ea822dfb62aa",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "chr4\t4098436\t4098780\tCTL_peak_1\t23\t.\t2.90799\t5.28767\t2.33528\t172\n",
+ "chr4\t26975641\t26975876\tCTL_peak_2\t23\t.\t2.90799\t5.28767\t2.33528\t117\n",
+ "chr4\t49751053\t49751289\tCTL_peak_3\t23\t.\t2.90799\t5.28767\t2.33528\t118\n",
+ "chr4\t49771937\t49772236\tCTL_peak_4\t157\t.\t11.172\t19.3961\t15.7605\t148\n",
+ "chr4\t49803060\t49803221\tCTL_peak_5\t20\t.\t2.88586\t4.99951\t2.07827\t80\n",
+ "chr4\t49842974\t49843212\tCTL_peak_6\t23\t.\t2.90799\t5.28767\t2.33528\t119\n",
+ "chr4\t49927479\t49927778\tCTL_peak_7\t36\t.\t3.81875\t6.69129\t3.6285\t200\n",
+ "chr4\t50048359\t50048660\tCTL_peak_8\t44\t.\t4.63347\t7.61679\t4.49353\t209\n",
+ "chr4\t50589614\t50589840\tCTL_peak_9\t23\t.\t2.90799\t5.28767\t2.33528\t113\n",
+ "chr4\t50622209\t50622416\tCTL_peak_10\t18\t.\t2.86406\t4.76454\t1.86719\t103\n"
+ ]
+ }
+ ],
+ "source": [
+ "!head Tutorial2/Peaks/CTL_peaks.narrowPeak"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "d05e89eb-5ea0-44c8-8575-876a91a6a3fd",
+ "metadata": {},
+ "source": [
+ "We can also visually inspect the peaks compared to the signal in igv:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 17,
+ "id": "1d443da3-afeb-4fa3-ae64-1e2c64b6dbc1",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "application/javascript": [
+ "!function (global, factory) {window.igv = factory()}(this,(function(){\"use strict\";\n",
+ "/*!\n",
+ " * jQuery JavaScript Library v3.3.1 -ajax,-ajax/jsonp,-ajax/load,-ajax/parseXML,-ajax/script,-ajax/var/location,-ajax/var/nonce,-ajax/var/rquery,-ajax/xhr,-manipulation/_evalUrl,-event/ajax,-effects,-effects/Tween,-effects/animatedSelector\n",
+ " * https://jquery.com/\n",
+ " *\n",
+ " * Includes Sizzle.js\n",
+ " * https://sizzlejs.com/\n",
+ " *\n",
+ " * Copyright JS Foundation and other contributors\n",
+ " * Released under the MIT license\n",
+ " * https://jquery.org/license\n",
+ " *\n",
+ " * Date: 2018-01-20T17:24Z\n",
+ " */var t=[],e=window.document,n=Object.getPrototypeOf,r=t.slice,i=t.concat,o=t.push,s=t.indexOf,a={},c=a.toString,l=a.hasOwnProperty,h=l.toString,u=h.call(Object),f={},d=function(t){return\"function\"==typeof t&&\"number\"!=typeof t.nodeType},p=function(t){return null!=t&&t===t.window},g={type:!0,src:!0,noModule:!0};function m(t,n,r){var i,o=(n=n||e).createElement(\"script\");if(o.text=t,r)for(i in g)r[i]&&(o[i]=r[i]);n.head.appendChild(o).parentNode.removeChild(o)}function v(t){return null==t?t+\"\":\"object\"==typeof t||\"function\"==typeof t?a[c.call(t)]||\"object\":typeof t}var b=\"3.3.1 -ajax,-ajax/jsonp,-ajax/load,-ajax/parseXML,-ajax/script,-ajax/var/location,-ajax/var/nonce,-ajax/var/rquery,-ajax/xhr,-manipulation/_evalUrl,-event/ajax,-effects,-effects/Tween,-effects/animatedSelector\",w=function(t,e){return new w.fn.init(t,e)},y=/^[\\s\\uFEFF\\xA0]+|[\\s\\uFEFF\\xA0]+$/g;function x(t){var e=!!t&&\"length\"in t&&t.length,n=v(t);return!d(t)&&!p(t)&&(\"array\"===n||0===e||\"number\"==typeof e&&e>0&&e-1 in t)}w.fn=w.prototype={jquery:b,constructor:w,length:0,toArray:function(){return r.call(this)},get:function(t){return null==t?r.call(this):t<0?this[t+this.length]:this[t]},pushStack:function(t){var e=w.merge(this.constructor(),t);return e.prevObject=this,e},each:function(t){return w.each(this,t)},map:function(t){return this.pushStack(w.map(this,(function(e,n){return t.call(e,n,e)})))},slice:function(){return this.pushStack(r.apply(this,arguments))},first:function(){return this.eq(0)},last:function(){return this.eq(-1)},eq:function(t){var e=this.length,n=+t+(t<0?e:0);return this.pushStack(n>=0&&n+~]|[\\\\x20\\\\t\\\\r\\\\n\\\\f])[\\\\x20\\\\t\\\\r\\\\n\\\\f]*\"),q=new RegExp(\"=[\\\\x20\\\\t\\\\r\\\\n\\\\f]*([^\\\\]'\\\"]*?)[\\\\x20\\\\t\\\\r\\\\n\\\\f]*\\\\]\",\"g\"),W=new RegExp(z),$=new RegExp(\"^\"+D+\"$\"),G={ID:new RegExp(\"^#(\"+D+\")\"),CLASS:new RegExp(\"^\\\\.(\"+D+\")\"),TAG:new RegExp(\"^(\"+D+\"|[*])\"),ATTR:new RegExp(\"^\"+B),PSEUDO:new RegExp(\"^\"+z),CHILD:new RegExp(\"^:(only|first|last|nth|nth-last)-(child|of-type)(?:\\\\([\\\\x20\\\\t\\\\r\\\\n\\\\f]*(even|odd|(([+-]|)(\\\\d*)n|)[\\\\x20\\\\t\\\\r\\\\n\\\\f]*(?:([+-]|)[\\\\x20\\\\t\\\\r\\\\n\\\\f]*(\\\\d+)|))[\\\\x20\\\\t\\\\r\\\\n\\\\f]*\\\\)|)\",\"i\"),bool:new RegExp(\"^(?:\"+O+\")$\",\"i\"),needsContext:new RegExp(\"^[\\\\x20\\\\t\\\\r\\\\n\\\\f]*[>+~]|:(even|odd|eq|gt|lt|nth|first|last)(?:\\\\([\\\\x20\\\\t\\\\r\\\\n\\\\f]*((?:-\\\\d)?\\\\d*)[\\\\x20\\\\t\\\\r\\\\n\\\\f]*\\\\)|)(?=[^-]|$)\",\"i\")},Z=/^(?:input|select|textarea|button)$/i,X=/^h\\d$/i,Y=/^[^{]+\\{\\s*\\[native \\w/,K=/^(?:#([\\w-]+)|(\\w+)|\\.([\\w-]+))$/,Q=/[+~]/,J=new RegExp(\"\\\\\\\\([\\\\da-f]{1,6}[\\\\x20\\\\t\\\\r\\\\n\\\\f]?|([\\\\x20\\\\t\\\\r\\\\n\\\\f])|.)\",\"ig\"),tt=function(t,e,n){var r=\"0x\"+e-65536;return r!=r||n?e:r<0?String.fromCharCode(r+65536):String.fromCharCode(r>>10|55296,1023&r|56320)},et=/([\\0-\\x1f\\x7f]|^-?\\d)|^-$|[^\\0-\\x1f\\x7f-\\uFFFF\\w-]/g,nt=function(t,e){return e?\"\\0\"===t?\"�\":t.slice(0,-1)+\"\\\\\"+t.charCodeAt(t.length-1).toString(16)+\" \":\"\\\\\"+t},rt=function(){f()},it=vt((function(t){return!0===t.disabled&&(\"form\"in t||\"label\"in t)}),{dir:\"parentNode\",next:\"legend\"});try{I.apply(L=N.call(x.childNodes),x.childNodes),L[x.childNodes.length].nodeType}catch(t){I={apply:L.length?function(t,e){R.apply(t,N.call(e))}:function(t,e){for(var n=t.length,r=0;t[n++]=e[r++];);t.length=n-1}}}function ot(t,e,r,i){var o,a,l,h,u,p,v,b=e&&e.ownerDocument,_=e?e.nodeType:9;if(r=r||[],\"string\"!=typeof t||!t||1!==_&&9!==_&&11!==_)return r;if(!i&&((e?e.ownerDocument||e:x)!==d&&f(e),e=e||d,g)){if(11!==_&&(u=K.exec(t)))if(o=u[1]){if(9===_){if(!(l=e.getElementById(o)))return r;if(l.id===o)return r.push(l),r}else if(b&&(l=b.getElementById(o))&&w(e,l)&&l.id===o)return r.push(l),r}else{if(u[2])return I.apply(r,e.getElementsByTagName(t)),r;if((o=u[3])&&n.getElementsByClassName&&e.getElementsByClassName)return I.apply(r,e.getElementsByClassName(o)),r}if(n.qsa&&!E[t+\" \"]&&(!m||!m.test(t))){if(1!==_)b=e,v=t;else if(\"object\"!==e.nodeName.toLowerCase()){for((h=e.getAttribute(\"id\"))?h=h.replace(et,nt):e.setAttribute(\"id\",h=y),a=(p=s(t)).length;a--;)p[a]=\"#\"+h+\" \"+mt(p[a]);v=p.join(\",\"),b=Q.test(t)&&pt(e.parentNode)||e}if(v)try{return I.apply(r,b.querySelectorAll(v)),r}catch(t){}finally{h===y&&e.removeAttribute(\"id\")}}}return c(t.replace(V,\"$1\"),e,r,i)}function st(){var t=[];return function e(n,i){return t.push(n+\" \")>r.cacheLength&&delete e[t.shift()],e[n+\" \"]=i}}function at(t){return t[y]=!0,t}function ct(t){var e=d.createElement(\"fieldset\");try{return!!t(e)}catch(t){return!1}finally{e.parentNode&&e.parentNode.removeChild(e),e=null}}function lt(t,e){var n=e&&t,r=n&&1===t.nodeType&&1===e.nodeType&&t.sourceIndex-e.sourceIndex;if(r)return r;if(n)for(;n=n.nextSibling;)if(n===e)return-1;return t?1:-1}function ht(t){return function(e){return\"input\"===e.nodeName.toLowerCase()&&e.type===t}}function ut(t){return function(e){var n=e.nodeName.toLowerCase();return(\"input\"===n||\"button\"===n)&&e.type===t}}function ft(t){return function(e){return\"form\"in e?e.parentNode&&!1===e.disabled?\"label\"in e?\"label\"in e.parentNode?e.parentNode.disabled===t:e.disabled===t:e.isDisabled===t||e.isDisabled!==!t&&it(e)===t:e.disabled===t:\"label\"in e&&e.disabled===t}}function dt(t){return at((function(e){return e=+e,at((function(n,r){for(var i,o=t([],n.length,e),s=o.length;s--;)n[i=o[s]]&&(n[i]=!(r[i]=n[i]))}))}))}function pt(t){return t&&void 0!==t.getElementsByTagName&&t}for(e in n=ot.support={},o=ot.isXML=function(t){var e=t&&(t.ownerDocument||t).documentElement;return!!e&&\"HTML\"!==e.nodeName},f=ot.setDocument=function(t){var e,i,s=t?t.ownerDocument||t:x;return s!==d&&9===s.nodeType&&s.documentElement?(p=(d=s).documentElement,g=!o(d),x!==d&&(i=d.defaultView)&&i.top!==i&&(i.addEventListener?i.addEventListener(\"unload\",rt,!1):i.attachEvent&&i.attachEvent(\"onunload\",rt)),n.attributes=ct((function(t){return t.className=\"i\",!t.getAttribute(\"className\")})),n.getElementsByTagName=ct((function(t){return t.appendChild(d.createComment(\"\")),!t.getElementsByTagName(\"*\").length})),n.getElementsByClassName=Y.test(d.getElementsByClassName),n.getById=ct((function(t){return p.appendChild(t).id=y,!d.getElementsByName||!d.getElementsByName(y).length})),n.getById?(r.filter.ID=function(t){var e=t.replace(J,tt);return function(t){return t.getAttribute(\"id\")===e}},r.find.ID=function(t,e){if(void 0!==e.getElementById&&g){var n=e.getElementById(t);return n?[n]:[]}}):(r.filter.ID=function(t){var e=t.replace(J,tt);return function(t){var n=void 0!==t.getAttributeNode&&t.getAttributeNode(\"id\");return n&&n.value===e}},r.find.ID=function(t,e){if(void 0!==e.getElementById&&g){var n,r,i,o=e.getElementById(t);if(o){if((n=o.getAttributeNode(\"id\"))&&n.value===t)return[o];for(i=e.getElementsByName(t),r=0;o=i[r++];)if((n=o.getAttributeNode(\"id\"))&&n.value===t)return[o]}return[]}}),r.find.TAG=n.getElementsByTagName?function(t,e){return void 0!==e.getElementsByTagName?e.getElementsByTagName(t):n.qsa?e.querySelectorAll(t):void 0}:function(t,e){var n,r=[],i=0,o=e.getElementsByTagName(t);if(\"*\"===t){for(;n=o[i++];)1===n.nodeType&&r.push(n);return r}return o},r.find.CLASS=n.getElementsByClassName&&function(t,e){if(void 0!==e.getElementsByClassName&&g)return e.getElementsByClassName(t)},v=[],m=[],(n.qsa=Y.test(d.querySelectorAll))&&(ct((function(t){p.appendChild(t).innerHTML=\" \",t.querySelectorAll(\"[msallowcapture^='']\").length&&m.push(\"[*^$]=[\\\\x20\\\\t\\\\r\\\\n\\\\f]*(?:''|\\\"\\\")\"),t.querySelectorAll(\"[selected]\").length||m.push(\"\\\\[[\\\\x20\\\\t\\\\r\\\\n\\\\f]*(?:value|\"+O+\")\"),t.querySelectorAll(\"[id~=\"+y+\"-]\").length||m.push(\"~=\"),t.querySelectorAll(\":checked\").length||m.push(\":checked\"),t.querySelectorAll(\"a#\"+y+\"+*\").length||m.push(\".#.+[+~]\")})),ct((function(t){t.innerHTML=\" \";var e=d.createElement(\"input\");e.setAttribute(\"type\",\"hidden\"),t.appendChild(e).setAttribute(\"name\",\"D\"),t.querySelectorAll(\"[name=d]\").length&&m.push(\"name[\\\\x20\\\\t\\\\r\\\\n\\\\f]*[*^$|!~]?=\"),2!==t.querySelectorAll(\":enabled\").length&&m.push(\":enabled\",\":disabled\"),p.appendChild(t).disabled=!0,2!==t.querySelectorAll(\":disabled\").length&&m.push(\":enabled\",\":disabled\"),t.querySelectorAll(\"*,:x\"),m.push(\",.*:\")}))),(n.matchesSelector=Y.test(b=p.matches||p.webkitMatchesSelector||p.mozMatchesSelector||p.oMatchesSelector||p.msMatchesSelector))&&ct((function(t){n.disconnectedMatch=b.call(t,\"*\"),b.call(t,\"[s!='']:x\"),v.push(\"!=\",z)})),m=m.length&&new RegExp(m.join(\"|\")),v=v.length&&new RegExp(v.join(\"|\")),e=Y.test(p.compareDocumentPosition),w=e||Y.test(p.contains)?function(t,e){var n=9===t.nodeType?t.documentElement:t,r=e&&e.parentNode;return t===r||!(!r||1!==r.nodeType||!(n.contains?n.contains(r):t.compareDocumentPosition&&16&t.compareDocumentPosition(r)))}:function(t,e){if(e)for(;e=e.parentNode;)if(e===t)return!0;return!1},A=e?function(t,e){if(t===e)return u=!0,0;var r=!t.compareDocumentPosition-!e.compareDocumentPosition;return r||(1&(r=(t.ownerDocument||t)===(e.ownerDocument||e)?t.compareDocumentPosition(e):1)||!n.sortDetached&&e.compareDocumentPosition(t)===r?t===d||t.ownerDocument===x&&w(x,t)?-1:e===d||e.ownerDocument===x&&w(x,e)?1:h?P(h,t)-P(h,e):0:4&r?-1:1)}:function(t,e){if(t===e)return u=!0,0;var n,r=0,i=t.parentNode,o=e.parentNode,s=[t],a=[e];if(!i||!o)return t===d?-1:e===d?1:i?-1:o?1:h?P(h,t)-P(h,e):0;if(i===o)return lt(t,e);for(n=t;n=n.parentNode;)s.unshift(n);for(n=e;n=n.parentNode;)a.unshift(n);for(;s[r]===a[r];)r++;return r?lt(s[r],a[r]):s[r]===x?-1:a[r]===x?1:0},d):d},ot.matches=function(t,e){return ot(t,null,null,e)},ot.matchesSelector=function(t,e){if((t.ownerDocument||t)!==d&&f(t),e=e.replace(q,\"='$1']\"),n.matchesSelector&&g&&!E[e+\" \"]&&(!v||!v.test(e))&&(!m||!m.test(e)))try{var r=b.call(t,e);if(r||n.disconnectedMatch||t.document&&11!==t.document.nodeType)return r}catch(t){}return ot(e,d,null,[t]).length>0},ot.contains=function(t,e){return(t.ownerDocument||t)!==d&&f(t),w(t,e)},ot.attr=function(t,e){(t.ownerDocument||t)!==d&&f(t);var i=r.attrHandle[e.toLowerCase()],o=i&&T.call(r.attrHandle,e.toLowerCase())?i(t,e,!g):void 0;return void 0!==o?o:n.attributes||!g?t.getAttribute(e):(o=t.getAttributeNode(e))&&o.specified?o.value:null},ot.escape=function(t){return(t+\"\").replace(et,nt)},ot.error=function(t){throw new Error(\"Syntax error, unrecognized expression: \"+t)},ot.uniqueSort=function(t){var e,r=[],i=0,o=0;if(u=!n.detectDuplicates,h=!n.sortStable&&t.slice(0),t.sort(A),u){for(;e=t[o++];)e===t[o]&&(i=r.push(o));for(;i--;)t.splice(r[i],1)}return h=null,t},i=ot.getText=function(t){var e,n=\"\",r=0,o=t.nodeType;if(o){if(1===o||9===o||11===o){if(\"string\"==typeof t.textContent)return t.textContent;for(t=t.firstChild;t;t=t.nextSibling)n+=i(t)}else if(3===o||4===o)return t.nodeValue}else for(;e=t[r++];)n+=i(e);return n},r=ot.selectors={cacheLength:50,createPseudo:at,match:G,attrHandle:{},find:{},relative:{\">\":{dir:\"parentNode\",first:!0},\" \":{dir:\"parentNode\"},\"+\":{dir:\"previousSibling\",first:!0},\"~\":{dir:\"previousSibling\"}},preFilter:{ATTR:function(t){return t[1]=t[1].replace(J,tt),t[3]=(t[3]||t[4]||t[5]||\"\").replace(J,tt),\"~=\"===t[2]&&(t[3]=\" \"+t[3]+\" \"),t.slice(0,4)},CHILD:function(t){return t[1]=t[1].toLowerCase(),\"nth\"===t[1].slice(0,3)?(t[3]||ot.error(t[0]),t[4]=+(t[4]?t[5]+(t[6]||1):2*(\"even\"===t[3]||\"odd\"===t[3])),t[5]=+(t[7]+t[8]||\"odd\"===t[3])):t[3]&&ot.error(t[0]),t},PSEUDO:function(t){var e,n=!t[6]&&t[2];return G.CHILD.test(t[0])?null:(t[3]?t[2]=t[4]||t[5]||\"\":n&&W.test(n)&&(e=s(n,!0))&&(e=n.indexOf(\")\",n.length-e)-n.length)&&(t[0]=t[0].slice(0,e),t[2]=n.slice(0,e)),t.slice(0,3))}},filter:{TAG:function(t){var e=t.replace(J,tt).toLowerCase();return\"*\"===t?function(){return!0}:function(t){return t.nodeName&&t.nodeName.toLowerCase()===e}},CLASS:function(t){var e=S[t+\" \"];return e||(e=new RegExp(\"(^|[\\\\x20\\\\t\\\\r\\\\n\\\\f])\"+t+\"(\"+F+\"|$)\"))&&S(t,(function(t){return e.test(\"string\"==typeof t.className&&t.className||void 0!==t.getAttribute&&t.getAttribute(\"class\")||\"\")}))},ATTR:function(t,e,n){return function(r){var i=ot.attr(r,t);return null==i?\"!=\"===e:!e||(i+=\"\",\"=\"===e?i===n:\"!=\"===e?i!==n:\"^=\"===e?n&&0===i.indexOf(n):\"*=\"===e?n&&i.indexOf(n)>-1:\"$=\"===e?n&&i.slice(-n.length)===n:\"~=\"===e?(\" \"+i.replace(H,\" \")+\" \").indexOf(n)>-1:\"|=\"===e&&(i===n||i.slice(0,n.length+1)===n+\"-\"))}},CHILD:function(t,e,n,r,i){var o=\"nth\"!==t.slice(0,3),s=\"last\"!==t.slice(-4),a=\"of-type\"===e;return 1===r&&0===i?function(t){return!!t.parentNode}:function(e,n,c){var l,h,u,f,d,p,g=o!==s?\"nextSibling\":\"previousSibling\",m=e.parentNode,v=a&&e.nodeName.toLowerCase(),b=!c&&!a,w=!1;if(m){if(o){for(;g;){for(f=e;f=f[g];)if(a?f.nodeName.toLowerCase()===v:1===f.nodeType)return!1;p=g=\"only\"===t&&!p&&\"nextSibling\"}return!0}if(p=[s?m.firstChild:m.lastChild],s&&b){for(w=(d=(l=(h=(u=(f=m)[y]||(f[y]={}))[f.uniqueID]||(u[f.uniqueID]={}))[t]||[])[0]===_&&l[1])&&l[2],f=d&&m.childNodes[d];f=++d&&f&&f[g]||(w=d=0)||p.pop();)if(1===f.nodeType&&++w&&f===e){h[t]=[_,d,w];break}}else if(b&&(w=d=(l=(h=(u=(f=e)[y]||(f[y]={}))[f.uniqueID]||(u[f.uniqueID]={}))[t]||[])[0]===_&&l[1]),!1===w)for(;(f=++d&&f&&f[g]||(w=d=0)||p.pop())&&((a?f.nodeName.toLowerCase()!==v:1!==f.nodeType)||!++w||(b&&((h=(u=f[y]||(f[y]={}))[f.uniqueID]||(u[f.uniqueID]={}))[t]=[_,w]),f!==e)););return(w-=i)===r||w%r==0&&w/r>=0}}},PSEUDO:function(t,e){var n,i=r.pseudos[t]||r.setFilters[t.toLowerCase()]||ot.error(\"unsupported pseudo: \"+t);return i[y]?i(e):i.length>1?(n=[t,t,\"\",e],r.setFilters.hasOwnProperty(t.toLowerCase())?at((function(t,n){for(var r,o=i(t,e),s=o.length;s--;)t[r=P(t,o[s])]=!(n[r]=o[s])})):function(t){return i(t,0,n)}):i}},pseudos:{not:at((function(t){var e=[],n=[],r=a(t.replace(V,\"$1\"));return r[y]?at((function(t,e,n,i){for(var o,s=r(t,null,i,[]),a=t.length;a--;)(o=s[a])&&(t[a]=!(e[a]=o))})):function(t,i,o){return e[0]=t,r(e,null,o,n),e[0]=null,!n.pop()}})),has:at((function(t){return function(e){return ot(t,e).length>0}})),contains:at((function(t){return t=t.replace(J,tt),function(e){return(e.textContent||e.innerText||i(e)).indexOf(t)>-1}})),lang:at((function(t){return $.test(t||\"\")||ot.error(\"unsupported lang: \"+t),t=t.replace(J,tt).toLowerCase(),function(e){var n;do{if(n=g?e.lang:e.getAttribute(\"xml:lang\")||e.getAttribute(\"lang\"))return(n=n.toLowerCase())===t||0===n.indexOf(t+\"-\")}while((e=e.parentNode)&&1===e.nodeType);return!1}})),target:function(e){var n=t.location&&t.location.hash;return n&&n.slice(1)===e.id},root:function(t){return t===p},focus:function(t){return t===d.activeElement&&(!d.hasFocus||d.hasFocus())&&!!(t.type||t.href||~t.tabIndex)},enabled:ft(!1),disabled:ft(!0),checked:function(t){var e=t.nodeName.toLowerCase();return\"input\"===e&&!!t.checked||\"option\"===e&&!!t.selected},selected:function(t){return t.parentNode&&t.parentNode.selectedIndex,!0===t.selected},empty:function(t){for(t=t.firstChild;t;t=t.nextSibling)if(t.nodeType<6)return!1;return!0},parent:function(t){return!r.pseudos.empty(t)},header:function(t){return X.test(t.nodeName)},input:function(t){return Z.test(t.nodeName)},button:function(t){var e=t.nodeName.toLowerCase();return\"input\"===e&&\"button\"===t.type||\"button\"===e},text:function(t){var e;return\"input\"===t.nodeName.toLowerCase()&&\"text\"===t.type&&(null==(e=t.getAttribute(\"type\"))||\"text\"===e.toLowerCase())},first:dt((function(){return[0]})),last:dt((function(t,e){return[e-1]})),eq:dt((function(t,e,n){return[n<0?n+e:n]})),even:dt((function(t,e){for(var n=0;n=0;)t.push(r);return t})),gt:dt((function(t,e,n){for(var r=n<0?n+e:n;++r1?function(e,n,r){for(var i=t.length;i--;)if(!t[i](e,n,r))return!1;return!0}:t[0]}function wt(t,e,n,r,i){for(var o,s=[],a=0,c=t.length,l=null!=e;a-1&&(o[l]=!(s[l]=u))}}else v=wt(v===s?v.splice(p,v.length):v),i?i(null,s,v,c):I.apply(s,v)}))}function xt(t){for(var e,n,i,o=t.length,s=r.relative[t[0].type],a=s||r.relative[\" \"],c=s?1:0,h=vt((function(t){return t===e}),a,!0),u=vt((function(t){return P(e,t)>-1}),a,!0),f=[function(t,n,r){var i=!s&&(r||n!==l)||((e=n).nodeType?h(t,n,r):u(t,n,r));return e=null,i}];c1&&bt(f),c>1&&mt(t.slice(0,c-1).concat({value:\" \"===t[c-2].type?\"*\":\"\"})).replace(V,\"$1\"),n,c0,i=t.length>0,o=function(o,s,a,c,h){var u,p,m,v=0,b=\"0\",w=o&&[],y=[],x=l,k=o||i&&r.find.TAG(\"*\",h),S=_+=null==x?1:Math.random()||.1,C=k.length;for(h&&(l=s===d||s||h);b!==C&&null!=(u=k[b]);b++){if(i&&u){for(p=0,s||u.ownerDocument===d||(f(u),a=!g);m=t[p++];)if(m(u,s||d,a)){c.push(u);break}h&&(_=S)}n&&((u=!m&&u)&&v--,o&&w.push(u))}if(v+=b,n&&b!==v){for(p=0;m=e[p++];)m(w,y,s,a);if(o){if(v>0)for(;b--;)w[b]||y[b]||(y[b]=M.call(c));y=wt(y)}I.apply(c,y),h&&!o&&y.length>0&&v+e.length>1&&ot.uniqueSort(c)}return h&&(_=S,l=x),w};return n?at(o):o}(o,i)),a.selector=t}return a},c=ot.select=function(t,e,n,i){var o,c,l,h,u,f=\"function\"==typeof t&&t,d=!i&&s(t=f.selector||t);if(n=n||[],1===d.length){if((c=d[0]=d[0].slice(0)).length>2&&\"ID\"===(l=c[0]).type&&9===e.nodeType&&g&&r.relative[c[1].type]){if(!(e=(r.find.ID(l.matches[0].replace(J,tt),e)||[])[0]))return n;f&&(e=e.parentNode),t=t.slice(c.shift().value.length)}for(o=G.needsContext.test(t)?0:c.length;o--&&(l=c[o],!r.relative[h=l.type]);)if((u=r.find[h])&&(i=u(l.matches[0].replace(J,tt),Q.test(c[0].type)&&pt(e.parentNode)||e))){if(c.splice(o,1),!(t=i.length&&mt(c)))return I.apply(n,i),n;break}}return(f||a(t,d))(i,e,!g,n,!e||Q.test(t)&&pt(e.parentNode)||e),n},n.sortStable=y.split(\"\").sort(A).join(\"\")===y,n.detectDuplicates=!!u,f(),ot}(window);w.find=_,w.expr=_.selectors,w.expr[\":\"]=w.expr.pseudos,w.uniqueSort=w.unique=_.uniqueSort,w.text=_.getText,w.isXMLDoc=_.isXML,w.contains=_.contains,w.escapeSelector=_.escape;var k=function(t,e,n){for(var r=[],i=void 0!==n;(t=t[e])&&9!==t.nodeType;)if(1===t.nodeType){if(i&&w(t).is(n))break;r.push(t)}return r},S=function(t,e){for(var n=[];t;t=t.nextSibling)1===t.nodeType&&t!==e&&n.push(t);return n},C=w.expr.match.needsContext;function E(t,e){return t.nodeName&&t.nodeName.toLowerCase()===e.toLowerCase()}var A=/^<([a-z][^\\/\\0>:\\x20\\t\\r\\n\\f]*)[\\x20\\t\\r\\n\\f]*\\/?>(?:<\\/\\1>|)$/i;function T(t,e,n){return d(e)?w.grep(t,(function(t,r){return!!e.call(t,r,t)!==n})):e.nodeType?w.grep(t,(function(t){return t===e!==n})):\"string\"!=typeof e?w.grep(t,(function(t){return s.call(e,t)>-1!==n})):w.filter(e,t,n)}w.filter=function(t,e,n){var r=e[0];return n&&(t=\":not(\"+t+\")\"),1===e.length&&1===r.nodeType?w.find.matchesSelector(r,t)?[r]:[]:w.find.matches(t,w.grep(e,(function(t){return 1===t.nodeType})))},w.fn.extend({find:function(t){var e,n,r=this.length,i=this;if(\"string\"!=typeof t)return this.pushStack(w(t).filter((function(){for(e=0;e1?w.uniqueSort(n):n},filter:function(t){return this.pushStack(T(this,t||[],!1))},not:function(t){return this.pushStack(T(this,t||[],!0))},is:function(t){return!!T(this,\"string\"==typeof t&&C.test(t)?w(t):t||[],!1).length}});var L,M=/^(?:\\s*(<[\\w\\W]+>)[^>]*|#([\\w-]+))$/,R=w.fn.init=function(t,n,r){var i,o;if(!t)return this;if(r=r||L,\"string\"==typeof t){if(!(i=\"<\"===t[0]&&\">\"===t[t.length-1]&&t.length>=3?[null,t,null]:M.exec(t))||!i[1]&&n)return!n||n.jquery?(n||r).find(t):this.constructor(n).find(t);if(i[1]){if(n=n instanceof w?n[0]:n,w.merge(this,w.parseHTML(i[1],n&&n.nodeType?n.ownerDocument||n:e,!0)),A.test(i[1])&&w.isPlainObject(n))for(i in n)d(this[i])?this[i](n[i]):this.attr(i,n[i]);return this}return(o=e.getElementById(i[2]))&&(this[0]=o,this.length=1),this}return t.nodeType?(this[0]=t,this.length=1,this):d(t)?void 0!==r.ready?r.ready(t):t(w):w.makeArray(t,this)};R.prototype=w.fn,L=w(e);var I=/^(?:parents|prev(?:Until|All))/,N={children:!0,contents:!0,next:!0,prev:!0};function P(t,e){for(;(t=t[e])&&1!==t.nodeType;);return t}w.fn.extend({has:function(t){var e=w(t,this),n=e.length;return this.filter((function(){for(var t=0;t-1:1===n.nodeType&&w.find.matchesSelector(n,t))){o.push(n);break}return this.pushStack(o.length>1?w.uniqueSort(o):o)},index:function(t){return t?\"string\"==typeof t?s.call(w(t),this[0]):s.call(this,t.jquery?t[0]:t):this[0]&&this[0].parentNode?this.first().prevAll().length:-1},add:function(t,e){return this.pushStack(w.uniqueSort(w.merge(this.get(),w(t,e))))},addBack:function(t){return this.add(null==t?this.prevObject:this.prevObject.filter(t))}}),w.each({parent:function(t){var e=t.parentNode;return e&&11!==e.nodeType?e:null},parents:function(t){return k(t,\"parentNode\")},parentsUntil:function(t,e,n){return k(t,\"parentNode\",n)},next:function(t){return P(t,\"nextSibling\")},prev:function(t){return P(t,\"previousSibling\")},nextAll:function(t){return k(t,\"nextSibling\")},prevAll:function(t){return k(t,\"previousSibling\")},nextUntil:function(t,e,n){return k(t,\"nextSibling\",n)},prevUntil:function(t,e,n){return k(t,\"previousSibling\",n)},siblings:function(t){return S((t.parentNode||{}).firstChild,t)},children:function(t){return S(t.firstChild)},contents:function(t){return E(t,\"iframe\")?t.contentDocument:(E(t,\"template\")&&(t=t.content||t),w.merge([],t.childNodes))}},(function(t,e){w.fn[t]=function(n,r){var i=w.map(this,e,n);return\"Until\"!==t.slice(-5)&&(r=n),r&&\"string\"==typeof r&&(i=w.filter(r,i)),this.length>1&&(N[t]||w.uniqueSort(i),I.test(t)&&i.reverse()),this.pushStack(i)}}));var O=/[^\\x20\\t\\r\\n\\f]+/g;function F(t){return t}function D(t){throw t}function B(t,e,n,r){var i;try{t&&d(i=t.promise)?i.call(t).done(e).fail(n):t&&d(i=t.then)?i.call(t,e,n):e.apply(void 0,[t].slice(r))}catch(t){n.apply(void 0,[t])}}w.Callbacks=function(t){t=\"string\"==typeof t?function(t){var e={};return w.each(t.match(O)||[],(function(t,n){e[n]=!0})),e}(t):w.extend({},t);var e,n,r,i,o=[],s=[],a=-1,c=function(){for(i=i||t.once,r=e=!0;s.length;a=-1)for(n=s.shift();++a-1;)o.splice(n,1),n<=a&&a--})),this},has:function(t){return t?w.inArray(t,o)>-1:o.length>0},empty:function(){return o&&(o=[]),this},disable:function(){return i=s=[],o=n=\"\",this},disabled:function(){return!o},lock:function(){return i=s=[],n||e||(o=n=\"\"),this},locked:function(){return!!i},fireWith:function(t,n){return i||(n=[t,(n=n||[]).slice?n.slice():n],s.push(n),e||c()),this},fire:function(){return l.fireWith(this,arguments),this},fired:function(){return!!r}};return l},w.extend({Deferred:function(t){var e=[[\"notify\",\"progress\",w.Callbacks(\"memory\"),w.Callbacks(\"memory\"),2],[\"resolve\",\"done\",w.Callbacks(\"once memory\"),w.Callbacks(\"once memory\"),0,\"resolved\"],[\"reject\",\"fail\",w.Callbacks(\"once memory\"),w.Callbacks(\"once memory\"),1,\"rejected\"]],n=\"pending\",r={state:function(){return n},always:function(){return i.done(arguments).fail(arguments),this},catch:function(t){return r.then(null,t)},pipe:function(){var t=arguments;return w.Deferred((function(n){w.each(e,(function(e,r){var o=d(t[r[4]])&&t[r[4]];i[r[1]]((function(){var t=o&&o.apply(this,arguments);t&&d(t.promise)?t.promise().progress(n.notify).done(n.resolve).fail(n.reject):n[r[0]+\"With\"](this,o?[t]:arguments)}))})),t=null})).promise()},then:function(t,n,r){var i=0;function o(t,e,n,r){return function(){var s=this,a=arguments,c=function(){var c,l;if(!(t=i&&(n!==D&&(s=void 0,a=[r]),e.rejectWith(s,a))}};t?l():(w.Deferred.getStackHook&&(l.stackTrace=w.Deferred.getStackHook()),window.setTimeout(l))}}return w.Deferred((function(i){e[0][3].add(o(0,i,d(r)?r:F,i.notifyWith)),e[1][3].add(o(0,i,d(t)?t:F)),e[2][3].add(o(0,i,d(n)?n:D))})).promise()},promise:function(t){return null!=t?w.extend(t,r):r}},i={};return w.each(e,(function(t,o){var s=o[2],a=o[5];r[o[1]]=s.add,a&&s.add((function(){n=a}),e[3-t][2].disable,e[3-t][3].disable,e[0][2].lock,e[0][3].lock),s.add(o[3].fire),i[o[0]]=function(){return i[o[0]+\"With\"](this===i?void 0:this,arguments),this},i[o[0]+\"With\"]=s.fireWith})),r.promise(i),t&&t.call(i,i),i},when:function(t){var e=arguments.length,n=e,i=Array(n),o=r.call(arguments),s=w.Deferred(),a=function(t){return function(n){i[t]=this,o[t]=arguments.length>1?r.call(arguments):n,--e||s.resolveWith(i,o)}};if(e<=1&&(B(t,s.done(a(n)).resolve,s.reject,!e),\"pending\"===s.state()||d(o[n]&&o[n].then)))return s.then();for(;n--;)B(o[n],a(n),s.reject);return s.promise()}});var z=/^(Eval|Internal|Range|Reference|Syntax|Type|URI)Error$/;w.Deferred.exceptionHook=function(t,e){window.console&&window.console.warn&&t&&z.test(t.name)&&window.console.warn(\"jQuery.Deferred exception: \"+t.message,t.stack,e)},w.readyException=function(t){window.setTimeout((function(){throw t}))};var H=w.Deferred();function V(){e.removeEventListener(\"DOMContentLoaded\",V),window.removeEventListener(\"load\",V),w.ready()}w.fn.ready=function(t){return H.then(t).catch((function(t){w.readyException(t)})),this},w.extend({isReady:!1,readyWait:1,ready:function(t){(!0===t?--w.readyWait:w.isReady)||(w.isReady=!0,!0!==t&&--w.readyWait>0||H.resolveWith(e,[w]))}}),w.ready.then=H.then,\"complete\"===e.readyState||\"loading\"!==e.readyState&&!e.documentElement.doScroll?window.setTimeout(w.ready):(e.addEventListener(\"DOMContentLoaded\",V),window.addEventListener(\"load\",V));var j=function(t,e,n,r,i,o,s){var a=0,c=t.length,l=null==n;if(\"object\"===v(n))for(a in i=!0,n)j(t,e,a,n[a],!0,o,s);else if(void 0!==r&&(i=!0,d(r)||(s=!0),l&&(s?(e.call(t,r),e=null):(l=e,e=function(t,e,n){return l.call(w(t),n)})),e))for(;a1,null,!0)},removeData:function(t){return this.each((function(){Y.remove(this,t)}))}}),w.extend({queue:function(t,e,n){var r;if(t)return e=(e||\"fx\")+\"queue\",r=X.get(t,e),n&&(!r||Array.isArray(n)?r=X.access(t,e,w.makeArray(n)):r.push(n)),r||[]},dequeue:function(t,e){e=e||\"fx\";var n=w.queue(t,e),r=n.length,i=n.shift(),o=w._queueHooks(t,e);\"inprogress\"===i&&(i=n.shift(),r--),i&&(\"fx\"===e&&n.unshift(\"inprogress\"),delete o.stop,i.call(t,(function(){w.dequeue(t,e)}),o)),!r&&o&&o.empty.fire()},_queueHooks:function(t,e){var n=e+\"queueHooks\";return X.get(t,n)||X.access(t,n,{empty:w.Callbacks(\"once memory\").add((function(){X.remove(t,[e+\"queue\",n])}))})}}),w.fn.extend({queue:function(t,e){var n=2;return\"string\"!=typeof t&&(e=t,t=\"fx\",n--),arguments.length\\x20\\t\\r\\n\\f]+)/i,ht=/^$|^module$|\\/(?:java|ecma)script/i,ut={option:[1,\"\",\" \"],thead:[1,\"\"],col:[2,\"\"],tr:[2,\"\"],td:[3,\"\"],_default:[0,\"\",\"\"]};function ft(t,e){var n;return n=void 0!==t.getElementsByTagName?t.getElementsByTagName(e||\"*\"):void 0!==t.querySelectorAll?t.querySelectorAll(e||\"*\"):[],void 0===e||e&&E(t,e)?w.merge([t],n):n}function dt(t,e){for(var n=0,r=t.length;n-1)i&&i.push(o);else if(l=w.contains(o.ownerDocument,o),s=ft(u.appendChild(o),\"script\"),l&&dt(s),n)for(h=0;o=s[h++];)ht.test(o.type||\"\")&&n.push(o);return u}!function(){var t=e.createDocumentFragment().appendChild(e.createElement(\"div\")),n=e.createElement(\"input\");n.setAttribute(\"type\",\"radio\"),n.setAttribute(\"checked\",\"checked\"),n.setAttribute(\"name\",\"t\"),t.appendChild(n),f.checkClone=t.cloneNode(!0).cloneNode(!0).lastChild.checked,t.innerHTML=\"\",f.noCloneChecked=!!t.cloneNode(!0).lastChild.defaultValue}();var mt=e.documentElement,vt=/^key/,bt=/^(?:mouse|pointer|contextmenu|drag|drop)|click/,wt=/^([^.]*)(?:\\.(.+)|)/;function yt(){return!0}function xt(){return!1}function _t(){try{return e.activeElement}catch(t){}}function kt(t,e,n,r,i,o){var s,a;if(\"object\"==typeof e){for(a in\"string\"!=typeof n&&(r=r||n,n=void 0),e)kt(t,a,n,r,e[a],o);return t}if(null==r&&null==i?(i=n,r=n=void 0):null==i&&(\"string\"==typeof n?(i=r,r=void 0):(i=r,r=n,n=void 0)),!1===i)i=xt;else if(!i)return t;return 1===o&&(s=i,i=function(t){return w().off(t),s.apply(this,arguments)},i.guid=s.guid||(s.guid=w.guid++)),t.each((function(){w.event.add(this,e,i,r,n)}))}w.event={global:{},add:function(t,e,n,r,i){var o,s,a,c,l,h,u,f,d,p,g,m=X.get(t);if(m)for(n.handler&&(n=(o=n).handler,i=o.selector),i&&w.find.matchesSelector(mt,i),n.guid||(n.guid=w.guid++),(c=m.events)||(c=m.events={}),(s=m.handle)||(s=m.handle=function(e){return void 0!==w&&w.event.triggered!==e.type?w.event.dispatch.apply(t,arguments):void 0}),l=(e=(e||\"\").match(O)||[\"\"]).length;l--;)d=g=(a=wt.exec(e[l])||[])[1],p=(a[2]||\"\").split(\".\").sort(),d&&(u=w.event.special[d]||{},d=(i?u.delegateType:u.bindType)||d,u=w.event.special[d]||{},h=w.extend({type:d,origType:g,data:r,handler:n,guid:n.guid,selector:i,needsContext:i&&w.expr.match.needsContext.test(i),namespace:p.join(\".\")},o),(f=c[d])||((f=c[d]=[]).delegateCount=0,u.setup&&!1!==u.setup.call(t,r,p,s)||t.addEventListener&&t.addEventListener(d,s)),u.add&&(u.add.call(t,h),h.handler.guid||(h.handler.guid=n.guid)),i?f.splice(f.delegateCount++,0,h):f.push(h),w.event.global[d]=!0)},remove:function(t,e,n,r,i){var o,s,a,c,l,h,u,f,d,p,g,m=X.hasData(t)&&X.get(t);if(m&&(c=m.events)){for(l=(e=(e||\"\").match(O)||[\"\"]).length;l--;)if(d=g=(a=wt.exec(e[l])||[])[1],p=(a[2]||\"\").split(\".\").sort(),d){for(u=w.event.special[d]||{},f=c[d=(r?u.delegateType:u.bindType)||d]||[],a=a[2]&&new RegExp(\"(^|\\\\.)\"+p.join(\"\\\\.(?:.*\\\\.|)\")+\"(\\\\.|$)\"),s=o=f.length;o--;)h=f[o],!i&&g!==h.origType||n&&n.guid!==h.guid||a&&!a.test(h.namespace)||r&&r!==h.selector&&(\"**\"!==r||!h.selector)||(f.splice(o,1),h.selector&&f.delegateCount--,u.remove&&u.remove.call(t,h));s&&!f.length&&(u.teardown&&!1!==u.teardown.call(t,p,m.handle)||w.removeEvent(t,d,m.handle),delete c[d])}else for(d in c)w.event.remove(t,d+e[l],n,r,!0);w.isEmptyObject(c)&&X.remove(t,\"handle events\")}},dispatch:function(t){var e,n,r,i,o,s,a=w.event.fix(t),c=new Array(arguments.length),l=(X.get(this,\"events\")||{})[a.type]||[],h=w.event.special[a.type]||{};for(c[0]=a,e=1;e=1))for(;l!==this;l=l.parentNode||this)if(1===l.nodeType&&(\"click\"!==t.type||!0!==l.disabled)){for(o=[],s={},n=0;n-1:w.find(i,this,null,[l]).length),s[i]&&o.push(r);o.length&&a.push({elem:l,handlers:o})}return l=this,c\\x20\\t\\r\\n\\f]*)[^>]*)\\/>/gi,Ct=/\n",
+ " "
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "display_quiz(\"Tutorial3/LessonImages/DiffPeaks.json\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "21691ec7-e2d0-474f-939e-ddc84b046452",
+ "metadata": {},
+ "source": [
+ "### Consider the below peak which was identified in both the control and mutant sample. A simple intersect would result in this peak being reported as unchanged between the two samples. To represent the differences we will use [manorm](https://anaconda.org/bioconda/manormfast).\n",
+ "\n",
+ " \n",
+ "\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 6,
+ "id": "17cea9d5-2b65-4513-bba3-7e2970c90aa9",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "done\n"
+ ]
+ }
+ ],
+ "source": [
+ "#We specify several non-default parameters to better reflect ATAC-seq data\n",
+ "!manorm --p1 Tutorial3/InputFiles/CTL_peaks.narrowPeak --p2 Tutorial3/InputFiles/Mutant_peaks.narrowPeak --r1 Tutorial3/InputFiles/CTL_dedup.bam --r2 Tutorial3/InputFiles/Mutant_dedup.bam --rf bam --n1 CTL --n2 Mutant --pe -w 1000 -o Tutorial3/DiffPeaks --wa 2> Tutorial3/DiffPeaks/log_manorm.txt\n",
+ "print(\"done\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "1e61bd4a-26e2-4ce6-a719-6b8c51ed9dca",
+ "metadata": {},
+ "source": [
+ "The above command will write out several files including the differential peaks for each sample as well as the unchanged peaks."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 7,
+ "id": "a51f1ba8-3bf0-4715-bb97-4a029352a3cd",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "CTL_vs_Mutant_M_above_1.0_biased_peaks.bed CTL_vs_Mutant_unbiased_peaks.bed\n",
+ "CTL_vs_Mutant_M_below_-1.0_biased_peaks.bed\n"
+ ]
+ }
+ ],
+ "source": [
+ "!ls Tutorial3/DiffPeaks/output_filters"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 8,
+ "id": "4d7d7656-9e43-4cce-a173-fee9b0a82dfa",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "chr4\t52059325\t52059732\tCTL_unique\t2.20155\n",
+ "chr4\t52298589\t52298799\tCTL_unique\t1.09775\n",
+ "chr4\t52550105\t52550494\tCTL_unique\t1.29536\n",
+ "chr4\t52698223\t52698464\tCTL_unique\t1.84416\n",
+ "chr4\t52834103\t52834470\tCTL_unique\t1.26119\n",
+ "chr4\t52884232\t52884622\tCTL_unique\t1.09835\n",
+ "chr4\t52968329\t52968671\tCTL_unique\t1.41519\n",
+ "chr4\t52993914\t52994157\tCTL_unique\t1.22576\n",
+ "chr4\t53595301\t53595477\tCTL_unique\t1.20393\n",
+ "chr4\t53702525\t53703113\tCTL_unique\t1.07373\n"
+ ]
+ }
+ ],
+ "source": [
+ "#Let's also check the format of these files\n",
+ "!head Tutorial3/DiffPeaks/output_filters/CTL_vs_Mutant_M_above_1.0_biased_peaks.bed"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 9,
+ "id": "42d1dbc3-7708-4907-8089-a76fee983a3e",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ " 124 Tutorial3/DiffPeaks/output_filters/CTL_vs_Mutant_M_above_1.0_biased_peaks.bed\n",
+ " 74 Tutorial3/DiffPeaks/output_filters/CTL_vs_Mutant_M_below_-1.0_biased_peaks.bed\n",
+ " 590 Tutorial3/DiffPeaks/output_filters/CTL_vs_Mutant_unbiased_peaks.bed\n",
+ " 788 total\n"
+ ]
+ }
+ ],
+ "source": [
+ "#We can also count how many are in each.\n",
+ "!wc -l Tutorial3/DiffPeaks/output_filters/*bed"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 10,
+ "id": "ff40bca1-bef8-48c8-8c9b-033ac6fec8c7",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "==== Stats ====\n",
+ "Total read pairs of sample 1: 167,920\n",
+ "Total read pairs of sample 2: 219,380\n",
+ "Total peaks of sample 1: 650 (unique: 277 common: 373)\n",
+ "Total peaks of sample 2: 560 (unique: 190 common: 370)\n",
+ "Number of merged common peaks: 369\n",
+ "M-A model: M = -0.04460 * A +0.18904\n",
+ "590 peaks are filtered as unbiased peaks\n",
+ "124 peaks are filtered as sample1-biased peaks\n",
+ "74 peaks are filtered as sample2-biased peaks\n"
+ ]
+ }
+ ],
+ "source": [
+ "#Our log file tells us this information as well\n",
+ "!tail Tutorial3/DiffPeaks/log_manorm.txt"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "f2832435-d335-476e-9bef-1c71797f10c7",
+ "metadata": {},
+ "source": [
+ "\n",
+ "Annotating Peaks\n",
+ "
\n",
+ "\n",
+ "Let's take the differential peaks and annotate them with nearby genes and perform gene ontology using [homer](https://anaconda.org/bioconda/homer).\n",
+ "\n",
+ "First we need to reformat the differential peaks file to the format required by homer.\n",
+ "\n",
+ "In an earlier command, we examined the format of manorm's ouput using head and saw that it outputs a five column format. We will change this to a 6 column bed format including a unique name for each peak."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 11,
+ "id": "7029964f-05bb-40e1-a007-69387a33485e",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "chr4\t52059325\t52059732\tCTL_unique_1\t2.20155\t+\n",
+ "chr4\t52298589\t52298799\tCTL_unique_2\t1.09775\t+\n",
+ "chr4\t52550105\t52550494\tCTL_unique_3\t1.29536\t+\n",
+ "chr4\t52698223\t52698464\tCTL_unique_4\t1.84416\t+\n",
+ "chr4\t52834103\t52834470\tCTL_unique_5\t1.26119\t+\n",
+ "chr4\t52884232\t52884622\tCTL_unique_6\t1.09835\t+\n",
+ "chr4\t52968329\t52968671\tCTL_unique_7\t1.41519\t+\n",
+ "chr4\t52993914\t52994157\tCTL_unique_8\t1.22576\t+\n",
+ "chr4\t53595301\t53595477\tCTL_unique_9\t1.20393\t+\n",
+ "chr4\t53702525\t53703113\tCTL_unique_10\t1.07373\t+\n"
+ ]
+ }
+ ],
+ "source": [
+ "#This command will reformat the peaks file including the line number in naming the peaks (NR) as well as a place-holder strand in the 6th column (note that peaks don't necessarily have a strand, but the format requires this column). The -F \\t tells awk that the file is tab delimited.\n",
+ "!awk '{print $1\"\\t\"$2\"\\t\"$3\"\\t\"$4\"_\"NR\"\\t\"$5\"\\t+\"}' Tutorial3/DiffPeaks/output_filters/CTL_vs_Mutant_M_above_1.0_biased_peaks.bed > Tutorial3/GenomeAnnotation/CTL_specific_peaks.bed\n",
+ "#Let's head this to compare\n",
+ "!head Tutorial3/GenomeAnnotation/CTL_specific_peaks.bed"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "0996284a-06bd-43f2-ae7b-ed6807e5623c",
+ "metadata": {},
+ "source": [
+ "Now let's configure homer to recognize our genome build. We aligned our reads to hg38, so we'll have homer use that."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 12,
+ "id": "32334764-854f-4739-a178-4273997248a2",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "done\n"
+ ]
+ }
+ ],
+ "source": [
+ "!perl /opt/conda/share/homer/configureHomer.pl -install hg38 2> Tutorial3/DiffPeaks/homer_log1.txt\n",
+ "print(\"done\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "b3f00783-28c0-4032-a841-42a5da716536",
+ "metadata": {},
+ "source": [
+ "Let's use that reformatted peak file to get nearby genes and perform gene onotology analysis."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 13,
+ "id": "a79558d5-24a2-4b4c-9795-48af65cf6545",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "\n",
+ "\tPeak file = Tutorial3/GenomeAnnotation/CTL_specific_peaks.bed\n",
+ "\tGenome = hg38\n",
+ "\tOrganism = human\n",
+ "\tWill perform Gene Ontology analysis - output to directory = Tutorial3/GenomeAnnotation/CTL_GO\n",
+ "\tPeak/BED file conversion summary:\n",
+ "\t\tBED/Header formatted lines: 124\n",
+ "\t\tpeakfile formatted lines: 0\n",
+ "\t\tDuplicated Peak IDs: 0\n",
+ "\n",
+ "\tPeak File Statistics:\n",
+ "\t\tTotal Peaks: 124\n",
+ "\t\tRedundant Peak IDs: 0\n",
+ "\t\tPeaks lacking information: 0 (need at least 5 columns per peak)\n",
+ "\t\tPeaks with misformatted coordinates: 0 (should be integer)\n",
+ "\t\tPeaks with misformatted strand: 0 (should be either +/- or 0/1)\n",
+ "\n",
+ "\tPeak file looks good!\n",
+ "\n",
+ "\tReading Positions...\n",
+ "\t-----------------------\n",
+ "\tFinding Closest TSS...\n",
+ "\tAnnotating:.\n",
+ "\t\tAnnotation\tNumber of peaks\tTotal size (bp)\tLog2 Ratio (obs/exp)\tLogP enrichment (+values depleted)\n",
+ "\t\t3UTR\t0.0\t1226327\t-0.852\t0.802\n",
+ "\t\tmiRNA\t0.0\t3258\t-0.003\t0.002\n",
+ "\t\tncRNA\t0.0\t315963\t-0.271\t0.206\n",
+ "\t\tTTS\t1.0\t1306500\t0.231\t-0.554\n",
+ "\t\tpseudo\t0.0\t40049\t-0.037\t0.026\n",
+ "\t\tExon\t0.0\t1490268\t-0.985\t0.975\n",
+ "\t\tIntron\t33.0\t73166083\t-0.532\t5.597\n",
+ "\t\tIntergenic\t84.0\t111121736\t0.213\t-3.865\n",
+ "\t\tPromoter\t5.0\t1403974\t2.450\t-6.038\n",
+ "\t\t5UTR\t1.0\t114432\t3.745\t-2.632\n",
+ "\tNOTE: If this part takes more than 2 minutes, there is a good chance\n",
+ "\t\tyour machine ran out of memory: consider hitting ctrl+C and rerunning\n",
+ "\t\tthe command with \"-noann\"\n",
+ "\tAnnotating:.\n",
+ "\t\tAnnotation\tNumber of peaks\tTotal size (bp)\tLog2 Ratio (obs/exp)\tLogP enrichment (+values depleted)\n",
+ "\t\t3UTR\t0.0\t1226327\t-0.852\t0.802\n",
+ "\t\tRetroposon\t0.0\t200838\t-0.178\t0.131\n",
+ "\t\tRC?\t0.0\t2850\t-0.003\t0.002\n",
+ "\t\tRNA\t0.0\t6910\t-0.006\t0.005\n",
+ "\t\tmiRNA\t0.0\t3258\t-0.003\t0.002\n",
+ "\t\tncRNA\t0.0\t315963\t-0.271\t0.206\n",
+ "\t\tTTS\t1.0\t1306500\t0.232\t-0.554\n",
+ "\t\tLINE\t18.0\t45014382\t-0.705\t4.777\n",
+ "\t\tsrpRNA\t0.0\t13945\t-0.013\t0.009\n",
+ "\t\tSINE\t6.0\t18936280\t-1.041\t3.470\n",
+ "\t\tRC\t0.0\t20949\t-0.020\t0.014\n",
+ "\t\ttRNA\t0.0\t2852\t-0.003\t0.002\n",
+ "\t\tDNA?\t0.0\t27318\t-0.025\t0.018\n",
+ "\t\tpseudo\t0.0\t40049\t-0.037\t0.026\n",
+ "\t\tDNA\t1.0\t6724040\t-2.132\t2.750\n",
+ "\t\tExon\t0.0\t1490268\t-0.985\t0.975\n",
+ "\t\tIntron\t25.0\t38474934\t-0.005\t0.605\n",
+ "\t\tIntergenic\t39.0\t48657251\t0.298\t-2.484\n",
+ "\t\tPromoter\t5.0\t1403974\t2.450\t-6.038\n",
+ "\t\t5UTR\t1.0\t114432\t3.745\t-2.632\n",
+ "\t\tLTR?\t0.0\t92183\t-0.084\t0.060\n",
+ "\t\tscRNA\t0.0\t6881\t-0.006\t0.004\n",
+ "\t\tCpG-Island\t0.0\t373419\t-0.315\t0.244\n",
+ "\t\tLow_complexity\t0.0\t365373\t-0.309\t0.238\n",
+ "\t\tLTR\t27.0\t20757981\t0.997\t-7.957\n",
+ "\t\tSimple_repeat\t1.0\t2265635\t-0.563\t0.572\n",
+ "\t\tsnRNA\t0.0\t18664\t-0.017\t0.012\n",
+ "\t\tUnknown\t0.0\t49238\t-0.046\t0.032\n",
+ "\t\tSINE?\t0.0\t130\t-0.000\t0.000\n",
+ "\t\tSatellite\t0.0\t2306654\t-1.335\t1.513\n",
+ "\t\trRNA\t0.0\t7760\t-0.007\t0.005\n",
+ "\tPerforming Gene Ontology Analysis...\n",
+ "rm: cannot remove '0.710786658429669.bg.tmp': No such file or directory\n",
+ "\tCounting Tags in Peaks from each directory...\n",
+ "\tOrganism: human\n",
+ "\tLoading Gene Informaiton...\n",
+ "\tOutputing Annotation File...\n",
+ "\tDone annotating peaks file\n",
+ "\n"
+ ]
+ }
+ ],
+ "source": [
+ "!annotatePeaks.pl Tutorial3/GenomeAnnotation/CTL_specific_peaks.bed hg38 -go Tutorial3/GenomeAnnotation/CTL_GO -annStats Tutorial3/GenomeAnnotation/CTL_annStats.txt > Tutorial3/GenomeAnnotation/CTL_specific_Annotated.txt"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "ccb425f8-7be0-4854-9344-48479b57ab70",
+ "metadata": {},
+ "source": [
+ "Let's look at the output files. First, let's look at the first 2 lines of at our annotation stats."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 14,
+ "id": "efa56f92-b6be-42dc-a3df-65a03b201e93",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " annotation \n",
+ " peakcount \n",
+ " size \n",
+ " foldenrichment \n",
+ " log10significance \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " 1 \n",
+ " 5UTR \n",
+ " 1.0 \n",
+ " 114432 \n",
+ " 3.745 \n",
+ " -2.632 \n",
+ " \n",
+ " \n",
+ " 14 \n",
+ " Promoter \n",
+ " 5.0 \n",
+ " 1403974 \n",
+ " 2.450 \n",
+ " -6.038 \n",
+ " \n",
+ " \n",
+ " 11 \n",
+ " LTR \n",
+ " 27.0 \n",
+ " 20757981 \n",
+ " 0.997 \n",
+ " -7.957 \n",
+ " \n",
+ " \n",
+ " 6 \n",
+ " Intergenic \n",
+ " 39.0 \n",
+ " 48657251 \n",
+ " 0.298 \n",
+ " -2.484 \n",
+ " \n",
+ " \n",
+ " 24 \n",
+ " TTS \n",
+ " 1.0 \n",
+ " 1306500 \n",
+ " 0.232 \n",
+ " -0.554 \n",
+ " \n",
+ " \n",
+ " 23 \n",
+ " TTS \n",
+ " 1.0 \n",
+ " 1306500 \n",
+ " 0.231 \n",
+ " -0.554 \n",
+ " \n",
+ " \n",
+ " 7 \n",
+ " Intergenic \n",
+ " 84.0 \n",
+ " 111121736 \n",
+ " 0.213 \n",
+ " -3.865 \n",
+ " \n",
+ " \n",
+ " 20 \n",
+ " SINE? \n",
+ " 0.0 \n",
+ " 130 \n",
+ " -0.000 \n",
+ " 0.000 \n",
+ " \n",
+ " \n",
+ " 33 \n",
+ " tRNA \n",
+ " 0.0 \n",
+ " 2852 \n",
+ " -0.003 \n",
+ " 0.002 \n",
+ " \n",
+ " \n",
+ " 26 \n",
+ " miRNA \n",
+ " 0.0 \n",
+ " 3258 \n",
+ " -0.003 \n",
+ " 0.002 \n",
+ " \n",
+ " \n",
+ " 16 \n",
+ " RC? \n",
+ " 0.0 \n",
+ " 2850 \n",
+ " -0.003 \n",
+ " 0.002 \n",
+ " \n",
+ " \n",
+ " 8 \n",
+ " Intron \n",
+ " 25.0 \n",
+ " 38474934 \n",
+ " -0.005 \n",
+ " 0.605 \n",
+ " \n",
+ " \n",
+ " 30 \n",
+ " scRNA \n",
+ " 0.0 \n",
+ " 6881 \n",
+ " -0.006 \n",
+ " 0.004 \n",
+ " \n",
+ " \n",
+ " 17 \n",
+ " RNA \n",
+ " 0.0 \n",
+ " 6910 \n",
+ " -0.006 \n",
+ " 0.005 \n",
+ " \n",
+ " \n",
+ " 29 \n",
+ " rRNA \n",
+ " 0.0 \n",
+ " 7760 \n",
+ " -0.007 \n",
+ " 0.005 \n",
+ " \n",
+ " \n",
+ " 32 \n",
+ " srpRNA \n",
+ " 0.0 \n",
+ " 13945 \n",
+ " -0.013 \n",
+ " 0.009 \n",
+ " \n",
+ " \n",
+ " 31 \n",
+ " snRNA \n",
+ " 0.0 \n",
+ " 18664 \n",
+ " -0.017 \n",
+ " 0.012 \n",
+ " \n",
+ " \n",
+ " 15 \n",
+ " RC \n",
+ " 0.0 \n",
+ " 20949 \n",
+ " -0.020 \n",
+ " 0.014 \n",
+ " \n",
+ " \n",
+ " 4 \n",
+ " DNA? \n",
+ " 0.0 \n",
+ " 27318 \n",
+ " -0.025 \n",
+ " 0.018 \n",
+ " \n",
+ " \n",
+ " 28 \n",
+ " pseudo \n",
+ " 0.0 \n",
+ " 40049 \n",
+ " -0.037 \n",
+ " 0.026 \n",
+ " \n",
+ " \n",
+ " 25 \n",
+ " Unknown \n",
+ " 0.0 \n",
+ " 49238 \n",
+ " -0.046 \n",
+ " 0.032 \n",
+ " \n",
+ " \n",
+ " 12 \n",
+ " LTR? \n",
+ " 0.0 \n",
+ " 92183 \n",
+ " -0.084 \n",
+ " 0.060 \n",
+ " \n",
+ " \n",
+ " 18 \n",
+ " Retroposon \n",
+ " 0.0 \n",
+ " 200838 \n",
+ " -0.178 \n",
+ " 0.131 \n",
+ " \n",
+ " \n",
+ " 27 \n",
+ " ncRNA \n",
+ " 0.0 \n",
+ " 315963 \n",
+ " -0.271 \n",
+ " 0.206 \n",
+ " \n",
+ " \n",
+ " 13 \n",
+ " Low_complexity \n",
+ " 0.0 \n",
+ " 365373 \n",
+ " -0.309 \n",
+ " 0.238 \n",
+ " \n",
+ " \n",
+ " 2 \n",
+ " CpG-Island \n",
+ " 0.0 \n",
+ " 373419 \n",
+ " -0.315 \n",
+ " 0.244 \n",
+ " \n",
+ " \n",
+ " 9 \n",
+ " Intron \n",
+ " 33.0 \n",
+ " 73166083 \n",
+ " -0.532 \n",
+ " 5.597 \n",
+ " \n",
+ " \n",
+ " 22 \n",
+ " Simple_repeat \n",
+ " 1.0 \n",
+ " 2265635 \n",
+ " -0.563 \n",
+ " 0.572 \n",
+ " \n",
+ " \n",
+ " 10 \n",
+ " LINE \n",
+ " 18.0 \n",
+ " 45014382 \n",
+ " -0.705 \n",
+ " 4.777 \n",
+ " \n",
+ " \n",
+ " 0 \n",
+ " 3UTR \n",
+ " 0.0 \n",
+ " 1226327 \n",
+ " -0.852 \n",
+ " 0.802 \n",
+ " \n",
+ " \n",
+ " 5 \n",
+ " Exon \n",
+ " 0.0 \n",
+ " 1490268 \n",
+ " -0.985 \n",
+ " 0.975 \n",
+ " \n",
+ " \n",
+ " 19 \n",
+ " SINE \n",
+ " 6.0 \n",
+ " 18936280 \n",
+ " -1.041 \n",
+ " 3.470 \n",
+ " \n",
+ " \n",
+ " 21 \n",
+ " Satellite \n",
+ " 0.0 \n",
+ " 2306654 \n",
+ " -1.335 \n",
+ " 1.513 \n",
+ " \n",
+ " \n",
+ " 3 \n",
+ " DNA \n",
+ " 1.0 \n",
+ " 6724040 \n",
+ " -2.132 \n",
+ " 2.750 \n",
+ " \n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " annotation peakcount size foldenrichment log10significance\n",
+ "1 5UTR 1.0 114432 3.745 -2.632\n",
+ "14 Promoter 5.0 1403974 2.450 -6.038\n",
+ "11 LTR 27.0 20757981 0.997 -7.957\n",
+ "6 Intergenic 39.0 48657251 0.298 -2.484\n",
+ "24 TTS 1.0 1306500 0.232 -0.554\n",
+ "23 TTS 1.0 1306500 0.231 -0.554\n",
+ "7 Intergenic 84.0 111121736 0.213 -3.865\n",
+ "20 SINE? 0.0 130 -0.000 0.000\n",
+ "33 tRNA 0.0 2852 -0.003 0.002\n",
+ "26 miRNA 0.0 3258 -0.003 0.002\n",
+ "16 RC? 0.0 2850 -0.003 0.002\n",
+ "8 Intron 25.0 38474934 -0.005 0.605\n",
+ "30 scRNA 0.0 6881 -0.006 0.004\n",
+ "17 RNA 0.0 6910 -0.006 0.005\n",
+ "29 rRNA 0.0 7760 -0.007 0.005\n",
+ "32 srpRNA 0.0 13945 -0.013 0.009\n",
+ "31 snRNA 0.0 18664 -0.017 0.012\n",
+ "15 RC 0.0 20949 -0.020 0.014\n",
+ "4 DNA? 0.0 27318 -0.025 0.018\n",
+ "28 pseudo 0.0 40049 -0.037 0.026\n",
+ "25 Unknown 0.0 49238 -0.046 0.032\n",
+ "12 LTR? 0.0 92183 -0.084 0.060\n",
+ "18 Retroposon 0.0 200838 -0.178 0.131\n",
+ "27 ncRNA 0.0 315963 -0.271 0.206\n",
+ "13 Low_complexity 0.0 365373 -0.309 0.238\n",
+ "2 CpG-Island 0.0 373419 -0.315 0.244\n",
+ "9 Intron 33.0 73166083 -0.532 5.597\n",
+ "22 Simple_repeat 1.0 2265635 -0.563 0.572\n",
+ "10 LINE 18.0 45014382 -0.705 4.777\n",
+ "0 3UTR 0.0 1226327 -0.852 0.802\n",
+ "5 Exon 0.0 1490268 -0.985 0.975\n",
+ "19 SINE 6.0 18936280 -1.041 3.470\n",
+ "21 Satellite 0.0 2306654 -1.335 1.513\n",
+ "3 DNA 1.0 6724040 -2.132 2.750"
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "#Clean up duplicate entries\n",
+ "!sort -u Tutorial3/GenomeAnnotation/CTL_annStats.txt | grep -v Annotation > Tutorial3/GenomeAnnotation/CTL_annStats_clean.txt\n",
+ "\n",
+ "#Load results into a pandas table\n",
+ "annstats = pd.read_csv(\"Tutorial3/GenomeAnnotation/CTL_annStats_clean.txt\", sep='\\t', header=None, names=['annotation','peakcount','size','foldenrichment','log10significance'])\n",
+ "\n",
+ "#View entries sorted by enrichment\n",
+ "annstats_sorted = annstats.sort_values(by=[\"foldenrichment\"], ascending=False)\n",
+ "display(annstats_sorted)\n",
+ "\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "74893643-1429-4894-afeb-e4af6d745a1d",
+ "metadata": {},
+ "source": [
+ "From this we can see highest enrichment in 5' UTRs and promoters.\n",
+ "\n",
+ "Let's plot the results as a barplot.\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 15,
+ "id": "4fe8630e-b47a-4bb4-b9c1-80c6e93f1644",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ ""
+ ]
+ },
+ "execution_count": 15,
+ "metadata": {},
+ "output_type": "execute_result"
+ },
+ {
+ "data": {
+ "image/png": "\n",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {
+ "needs_background": "light"
+ },
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "annstats_sorted.plot.bar(x=\"annotation\", y=\"foldenrichment\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "ddd988cb-5409-4734-bcf7-5a740a9e9a96",
+ "metadata": {},
+ "source": [
+ "Homer also outputs the nearest annotation for each peak. Let's look at the first few lines of our annotation file."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 16,
+ "id": "bff56b0d-72a8-4532-b1f7-fa4f064f2c91",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "PeakID (cmd=annotatePeaks.pl Tutorial3/GenomeAnnotation/CTL_specific_peaks.bed hg38 -go Tutorial3/GenomeAnnotation/CTL_GO -annStats Tutorial3/GenomeAnnotation/CTL_annStats.txt)\tChr\tStart\tEnd\tStrand\tPeak Score\tFocus Ratio/Region Size\tAnnotation\tDetailed Annotation\tDistance to TSS\tNearest PromoterID\tEntrez ID\tNearest Unigene\tNearest Refseq\tNearest Ensembl\tGene Name\tGene Alias\tGene Description\tGene Type\n",
+ "CTL_unique_56\tchr4\t64144792\t64145494\t+\t3.01611\tNA\tIntergenic\tHERVK11-int|LTR|ERVK\t264307\tNM_001010874\t253017\tHs.227752\tNM_001010874\tENSG00000205678\tTECRL\tCPVT3|GPSN2L|SRD5A2L2|TERL\ttrans-2,3-enoyl-CoA reductase like\tprotein-coding\n",
+ "merged_common_90\tchr4\t54545647\t54546360\t+\t2.38644\tNA\tIntergenic\tIntergenic\t61128\tNR_134657\t339978\t\tNR_134657\tENSG00000250456\tLINC02260\t-\tlong intergenic non-protein coding RNA 2260\tncRNA\n",
+ "CTL_unique_51\tchr4\t62835674\t62836120\t+\t2.34453\tNA\tIntergenic\tIntergenic\t-674132\tNR_110595\t101927186\tHs.723269\tNR_110595\t\tADGRL3-AS1\tLPHN3-AS1\tadhesion G protein-coupled receptor L3 antisense RNA 1\tncRNA\n"
+ ]
+ }
+ ],
+ "source": [
+ "!head -4 Tutorial3/GenomeAnnotation/CTL_specific_Annotated.txt"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "7193cee9-f1cc-40b5-84be-a1aae44bd217",
+ "metadata": {},
+ "source": [
+ "Lastly, let's take a look at the gene ontology results"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 17,
+ "id": "59b16ae4-3d55-4363-8d27-4d812a98b115",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "biocyc.txt\t\tinteractions.txt\t prints.txt\n",
+ "biological_process.txt\tinterpro.txt\t\t prosite.txt\n",
+ "cellular_component.txt\tkegg.txt\t\t reactome.txt\n",
+ "chromosome.txt\t\tlipidmaps.txt\t\t smart.txt\n",
+ "cosmic.txt\t\tmolecular_function.txt\t smpdb.txt\n",
+ "gene3d.txt\t\tmsigdb.txt\t\t wikipathways.txt\n",
+ "geneOntology.html\tpathwayInteractionDB.txt\n",
+ "gwas.txt\t\tpfam.txt\n"
+ ]
+ }
+ ],
+ "source": [
+ "#list the files in our GO directory\n",
+ "!ls Tutorial3/GenomeAnnotation/CTL_GO/"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "b2ea0509-a6a4-4df0-8d69-dec674e7e3f1",
+ "metadata": {},
+ "source": [
+ "Let's view the top terms in the biological_process category."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 18,
+ "id": "98cdaf7b-a83b-40de-bff4-9587642d60fb",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " TermID \n",
+ " Term \n",
+ " Enrichment \n",
+ " logP \n",
+ " Genes in Term \n",
+ " Target Genes in Term \n",
+ " Fraction of Targets in Term \n",
+ " Total Target Genes \n",
+ " Total Genes \n",
+ " Entrez Gene IDs \n",
+ " Gene Symbols \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " 0 \n",
+ " GO:0052695 \n",
+ " cellular glucuronidation \n",
+ " 8.476896e-07 \n",
+ " -13.980751 \n",
+ " 18 \n",
+ " 3 \n",
+ " 0.15 \n",
+ " 20 \n",
+ " 18680 \n",
+ " 79799,10941,7364 \n",
+ " UGT2A3,UGT2A1,UGT2B7 \n",
+ " \n",
+ " \n",
+ " 1 \n",
+ " GO:0006063 \n",
+ " uronic acid metabolic process \n",
+ " 1.833506e-06 \n",
+ " -13.209280 \n",
+ " 23 \n",
+ " 3 \n",
+ " 0.15 \n",
+ " 20 \n",
+ " 18680 \n",
+ " 79799,7364,10941 \n",
+ " UGT2A3,UGT2B7,UGT2A1 \n",
+ " \n",
+ " \n",
+ " 2 \n",
+ " GO:0019585 \n",
+ " glucuronate metabolic process \n",
+ " 1.833506e-06 \n",
+ " -13.209280 \n",
+ " 23 \n",
+ " 3 \n",
+ " 0.15 \n",
+ " 20 \n",
+ " 18680 \n",
+ " 79799,10941,7364 \n",
+ " UGT2A3,UGT2A1,UGT2B7 \n",
+ " \n",
+ " \n",
+ " 3 \n",
+ " GO:0010817 \n",
+ " regulation of hormone levels \n",
+ " 1.370570e-05 \n",
+ " -11.197699 \n",
+ " 528 \n",
+ " 6 \n",
+ " 0.30 \n",
+ " 20 \n",
+ " 18680 \n",
+ " 6783,5978,9575,7364,27284,2044 \n",
+ " SULT1E1,REST,CLOCK,UGT2B7,SULT1B1,EPHA5 \n",
+ " \n",
+ " \n",
+ " 4 \n",
+ " GO:0006068 \n",
+ " ethanol catabolic process \n",
+ " 7.141780e-05 \n",
+ " -9.546963 \n",
+ " 12 \n",
+ " 2 \n",
+ " 0.10 \n",
+ " 20 \n",
+ " 18680 \n",
+ " 27284,6783 \n",
+ " SULT1B1,SULT1E1 \n",
+ " \n",
+ " \n",
+ " 5 \n",
+ " GO:0034310 \n",
+ " primary alcohol catabolic process \n",
+ " 1.134005e-04 \n",
+ " -9.084585 \n",
+ " 15 \n",
+ " 2 \n",
+ " 0.10 \n",
+ " 20 \n",
+ " 18680 \n",
+ " 6783,27284 \n",
+ " SULT1E1,SULT1B1 \n",
+ " \n",
+ " \n",
+ " 6 \n",
+ " GO:0051923 \n",
+ " sulfation \n",
+ " 1.295173e-04 \n",
+ " -8.951696 \n",
+ " 16 \n",
+ " 2 \n",
+ " 0.10 \n",
+ " 20 \n",
+ " 18680 \n",
+ " 27284,6783 \n",
+ " SULT1B1,SULT1E1 \n",
+ " \n",
+ " \n",
+ " 7 \n",
+ " GO:0006067 \n",
+ " ethanol metabolic process \n",
+ " 2.259286e-04 \n",
+ " -8.395291 \n",
+ " 21 \n",
+ " 2 \n",
+ " 0.10 \n",
+ " 20 \n",
+ " 18680 \n",
+ " 27284,6783 \n",
+ " SULT1B1,SULT1E1 \n",
+ " \n",
+ " \n",
+ " 8 \n",
+ " GO:0050427 \n",
+ " 3'-phosphoadenosine 5'-phosphosulfate metaboli... \n",
+ " 2.963633e-04 \n",
+ " -8.123925 \n",
+ " 24 \n",
+ " 2 \n",
+ " 0.10 \n",
+ " 20 \n",
+ " 18680 \n",
+ " 27284,6783 \n",
+ " SULT1B1,SULT1E1 \n",
+ " \n",
+ " \n",
+ " 9 \n",
+ " GO:0034035 \n",
+ " purine ribonucleoside bisphosphate metabolic p... \n",
+ " 3.219272e-04 \n",
+ " -8.041185 \n",
+ " 25 \n",
+ " 2 \n",
+ " 0.10 \n",
+ " 20 \n",
+ " 18680 \n",
+ " 6783,27284 \n",
+ " SULT1E1,SULT1B1 \n",
+ " \n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " TermID Term \\\n",
+ "0 GO:0052695 cellular glucuronidation \n",
+ "1 GO:0006063 uronic acid metabolic process \n",
+ "2 GO:0019585 glucuronate metabolic process \n",
+ "3 GO:0010817 regulation of hormone levels \n",
+ "4 GO:0006068 ethanol catabolic process \n",
+ "5 GO:0034310 primary alcohol catabolic process \n",
+ "6 GO:0051923 sulfation \n",
+ "7 GO:0006067 ethanol metabolic process \n",
+ "8 GO:0050427 3'-phosphoadenosine 5'-phosphosulfate metaboli... \n",
+ "9 GO:0034035 purine ribonucleoside bisphosphate metabolic p... \n",
+ "\n",
+ " Enrichment logP Genes in Term Target Genes in Term \\\n",
+ "0 8.476896e-07 -13.980751 18 3 \n",
+ "1 1.833506e-06 -13.209280 23 3 \n",
+ "2 1.833506e-06 -13.209280 23 3 \n",
+ "3 1.370570e-05 -11.197699 528 6 \n",
+ "4 7.141780e-05 -9.546963 12 2 \n",
+ "5 1.134005e-04 -9.084585 15 2 \n",
+ "6 1.295173e-04 -8.951696 16 2 \n",
+ "7 2.259286e-04 -8.395291 21 2 \n",
+ "8 2.963633e-04 -8.123925 24 2 \n",
+ "9 3.219272e-04 -8.041185 25 2 \n",
+ "\n",
+ " Fraction of Targets in Term Total Target Genes Total Genes \\\n",
+ "0 0.15 20 18680 \n",
+ "1 0.15 20 18680 \n",
+ "2 0.15 20 18680 \n",
+ "3 0.30 20 18680 \n",
+ "4 0.10 20 18680 \n",
+ "5 0.10 20 18680 \n",
+ "6 0.10 20 18680 \n",
+ "7 0.10 20 18680 \n",
+ "8 0.10 20 18680 \n",
+ "9 0.10 20 18680 \n",
+ "\n",
+ " Entrez Gene IDs Gene Symbols \n",
+ "0 79799,10941,7364 UGT2A3,UGT2A1,UGT2B7 \n",
+ "1 79799,7364,10941 UGT2A3,UGT2B7,UGT2A1 \n",
+ "2 79799,10941,7364 UGT2A3,UGT2A1,UGT2B7 \n",
+ "3 6783,5978,9575,7364,27284,2044 SULT1E1,REST,CLOCK,UGT2B7,SULT1B1,EPHA5 \n",
+ "4 27284,6783 SULT1B1,SULT1E1 \n",
+ "5 6783,27284 SULT1E1,SULT1B1 \n",
+ "6 27284,6783 SULT1B1,SULT1E1 \n",
+ "7 27284,6783 SULT1B1,SULT1E1 \n",
+ "8 27284,6783 SULT1B1,SULT1E1 \n",
+ "9 6783,27284 SULT1E1,SULT1B1 "
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "\n",
+ "bp_GO = pd.read_csv(\"Tutorial3/GenomeAnnotation/CTL_GO/biological_process.txt\", sep='\\t')\n",
+ "\n",
+ "#keep most significant\n",
+ "bp_GO_top10 = bp_GO.nsmallest(10, \"logP\")\n",
+ "display(bp_GO_top10)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "d86df868-6af0-4bf9-880f-62142230bdc4",
+ "metadata": {},
+ "source": [
+ "We can also plot the enrichment scores\n",
+ "\n",
+ "Note that our results may look a little odd because we have severely downsampled the data to run quickly and focus on a single region of chr4. "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 19,
+ "id": "0aec57a7-412d-41fd-8aea-d8b164faa49b",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ ""
+ ]
+ },
+ "execution_count": 19,
+ "metadata": {},
+ "output_type": "execute_result"
+ },
+ {
+ "data": {
+ "image/png": "\n",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {
+ "needs_background": "light"
+ },
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "bp_GO_top10.plot.bar(x=\"Term\", y=\"Enrichment\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "a6dcf583-f555-48b6-b330-b9bdfdd295da",
+ "metadata": {},
+ "source": [
+ "Homer also saves an html file where you can navigate through the various categories."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 20,
+ "id": "deac4a6f-882c-42eb-acb5-f74712bbec2a",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ " \n",
+ " "
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "execution_count": 20,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "#View the html results\n",
+ "IFrame(src='Tutorial3/GenomeAnnotation/CTL_GO/geneOntology.html', width=900, height=600)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "c1e519a4-f4b8-4641-b03d-2e8d51ad9e6f",
+ "metadata": {},
+ "source": [
+ "In the above html you can click throught the different ontology categories to view enriched terms and scores for genes near our differential peaks. Note that there are links to motifs, but these lead to \"pages not found\" because we have yet to do this analysis. We will run motif analysis in the next section using TOBIAS."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "ff448764-630e-4898-8321-19bb6f6d357e",
+ "metadata": {},
+ "source": [
+ "\n",
+ "Motif Footprinting\n",
+ "
\n",
+ "\n",
+ "### ATAC-seq can be used to identify accessibility at transcription factor (TF) binding sites. We'll use [tobias](https://anaconda.org/bioconda/tobias).\n",
+ "\n",
+ " \n",
+ "\n",
+ "From: [Bentsen et al., Nat. Comm. 2020](https://www.nature.com/articles/s41467-020-18035-1)\n",
+ "\n",
+ "Tn5 insertion during ATAC-seq has a sequence bias. In our first step, let's correct for that bias."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 21,
+ "id": "bd4b3b34-2e1a-40f3-9f41-ad0bd6ae20f5",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "# TOBIAS 0.13.3 ATACorrect (run started 2022-09-08 17:20:57.460575)\n",
+ "# Working directory: /home/jupyter\n",
+ "# Command line call: TOBIAS ATACorrect --bam Tutorial3/InputFiles/CTL_dedup.bam --genome Tutorial3/InputFiles/chr4.fa --peaks Tutorial3/MotifFootprinting/MasterPeakList.bed --outdir Tutorial3/MotifFootprinting --prefix CTL --cores 3 --verbosity 1\n",
+ "\n",
+ "# ----- Input parameters -----\n",
+ "# bam:\tTutorial3/InputFiles/CTL_dedup.bam\n",
+ "# genome:\tTutorial3/InputFiles/chr4.fa\n",
+ "# peaks:\tTutorial3/MotifFootprinting/MasterPeakList.bed\n",
+ "# regions_in:\tNone\n",
+ "# regions_out:\tNone\n",
+ "# blacklist:\tNone\n",
+ "# extend:\t100\n",
+ "# split_strands:\tFalse\n",
+ "# norm_off:\tFalse\n",
+ "# track_off:\t[]\n",
+ "# k_flank:\t12\n",
+ "# read_shift:\t[4, -5]\n",
+ "# bg_shift:\t100\n",
+ "# window:\t100\n",
+ "# score_mat:\tDWM\n",
+ "# prefix:\tCTL\n",
+ "# outdir:\t/home/jupyter/Tutorial3/MotifFootprinting\n",
+ "# cores:\t3\n",
+ "# split:\t100\n",
+ "# verbosity:\t1\n",
+ "\n",
+ "\n",
+ "# ----- Output files -----\n",
+ "# /home/jupyter/Tutorial3/MotifFootprinting/CTL_uncorrected.bw\n",
+ "# /home/jupyter/Tutorial3/MotifFootprinting/CTL_bias.bw\n",
+ "# /home/jupyter/Tutorial3/MotifFootprinting/CTL_expected.bw\n",
+ "# /home/jupyter/Tutorial3/MotifFootprinting/CTL_corrected.bw\n",
+ "# /home/jupyter/Tutorial3/MotifFootprinting/CTL_atacorrect.pdf\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "# TOBIAS 0.13.3 ATACorrect (run started 2022-09-08 17:22:15.527430)\n",
+ "# Working directory: /home/jupyter\n",
+ "# Command line call: TOBIAS ATACorrect --bam Tutorial3/InputFiles/Mutant_dedup.bam --genome Tutorial3/InputFiles/chr4.fa --peaks Tutorial3/MotifFootprinting/MasterPeakList.bed --outdir Tutorial3/MotifFootprinting --prefix Mutant --cores 3 --verbosity 1\n",
+ "\n",
+ "# ----- Input parameters -----\n",
+ "# bam:\tTutorial3/InputFiles/Mutant_dedup.bam\n",
+ "# genome:\tTutorial3/InputFiles/chr4.fa\n",
+ "# peaks:\tTutorial3/MotifFootprinting/MasterPeakList.bed\n",
+ "# regions_in:\tNone\n",
+ "# regions_out:\tNone\n",
+ "# blacklist:\tNone\n",
+ "# extend:\t100\n",
+ "# split_strands:\tFalse\n",
+ "# norm_off:\tFalse\n",
+ "# track_off:\t[]\n",
+ "# k_flank:\t12\n",
+ "# read_shift:\t[4, -5]\n",
+ "# bg_shift:\t100\n",
+ "# window:\t100\n",
+ "# score_mat:\tDWM\n",
+ "# prefix:\tMutant\n",
+ "# outdir:\t/home/jupyter/Tutorial3/MotifFootprinting\n",
+ "# cores:\t3\n",
+ "# split:\t100\n",
+ "# verbosity:\t1\n",
+ "\n",
+ "\n",
+ "# ----- Output files -----\n",
+ "# /home/jupyter/Tutorial3/MotifFootprinting/Mutant_uncorrected.bw\n",
+ "# /home/jupyter/Tutorial3/MotifFootprinting/Mutant_bias.bw\n",
+ "# /home/jupyter/Tutorial3/MotifFootprinting/Mutant_expected.bw\n",
+ "# /home/jupyter/Tutorial3/MotifFootprinting/Mutant_corrected.bw\n",
+ "# /home/jupyter/Tutorial3/MotifFootprinting/Mutant_atacorrect.pdf\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "done\n"
+ ]
+ }
+ ],
+ "source": [
+ "#Tn5 has an insertion sequence bias, which Tobias can correct for. Let's use the master list of peaks provided by manorm, but we need to first remove the header and extra columns.\n",
+ "!cat Tutorial3/DiffPeaks/CTL_vs_Mutant_all_MAvalues.xls | cut -f 1-3 | grep -v start > Tutorial3/MotifFootprinting/MasterPeakList.bed\n",
+ "\n",
+ "#Now let's do the signal correction\n",
+ "!TOBIAS ATACorrect --bam Tutorial3/InputFiles/CTL_dedup.bam --genome Tutorial3/InputFiles/chr4.fa --peaks Tutorial3/MotifFootprinting/MasterPeakList.bed --outdir Tutorial3/MotifFootprinting --prefix CTL --cores $numthreadsint --verbosity 1\n",
+ "#Let's also do this for the mutant\n",
+ "!TOBIAS ATACorrect --bam Tutorial3/InputFiles/Mutant_dedup.bam --genome Tutorial3/InputFiles/chr4.fa --peaks Tutorial3/MotifFootprinting/MasterPeakList.bed --outdir Tutorial3/MotifFootprinting --prefix Mutant --cores $numthreadsint --verbosity 1\n",
+ "\n",
+ "print(\"done\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "21da1001-529f-4e62-8e44-2f475bfd35d4",
+ "metadata": {},
+ "source": [
+ "Now let's use the bias-corrected bigwig files to calculate footprint scores around peaks"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 22,
+ "id": "6ca0ae04-b409-4c5d-b318-0887edc0f3c8",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "# TOBIAS 0.13.3 ScoreBigwig (run started 2022-09-08 17:23:22.614510)\n",
+ "# Working directory: /home/jupyter\n",
+ "# Command line call: TOBIAS ScoreBigwig -s Tutorial3/MotifFootprinting/CTL_corrected.bw -r Tutorial3/MotifFootprinting/MasterPeakList.bed -o Tutorial3/MotifFootprinting/CTL_footprintscores.bw --cores 3 --verbosity 1\n",
+ "\n",
+ "# ----- Input parameters -----\n",
+ "# signal:\tTutorial3/MotifFootprinting/CTL_corrected.bw\n",
+ "# output:\tTutorial3/MotifFootprinting/CTL_footprintscores.bw\n",
+ "# regions:\tTutorial3/MotifFootprinting/MasterPeakList.bed\n",
+ "# score:\tfootprint\n",
+ "# absolute:\tFalse\n",
+ "# extend:\t100\n",
+ "# smooth:\t1\n",
+ "# min_limit:\tNone\n",
+ "# max_limit:\tNone\n",
+ "# fp_min:\t20\n",
+ "# fp_max:\t50\n",
+ "# flank_min:\t10\n",
+ "# flank_max:\t30\n",
+ "# window:\t100\n",
+ "# cores:\t3\n",
+ "# split:\t100\n",
+ "# verbosity:\t1\n",
+ "\n",
+ "\n",
+ "# ----- Output files -----\n",
+ "# Tutorial3/MotifFootprinting/CTL_footprintscores.bw\n",
+ "\n",
+ "\n",
+ "\n",
+ "# TOBIAS 0.13.3 ScoreBigwig (run started 2022-09-08 17:23:51.559028)\n",
+ "# Working directory: /home/jupyter\n",
+ "# Command line call: TOBIAS ScoreBigwig -s Tutorial3/MotifFootprinting/Mutant_corrected.bw -r Tutorial3/MotifFootprinting/MasterPeakList.bed -o Tutorial3/MotifFootprinting/Mutant_footprintscores.bw --cores 3 --verbosity 1\n",
+ "\n",
+ "# ----- Input parameters -----\n",
+ "# signal:\tTutorial3/MotifFootprinting/Mutant_corrected.bw\n",
+ "# output:\tTutorial3/MotifFootprinting/Mutant_footprintscores.bw\n",
+ "# regions:\tTutorial3/MotifFootprinting/MasterPeakList.bed\n",
+ "# score:\tfootprint\n",
+ "# absolute:\tFalse\n",
+ "# extend:\t100\n",
+ "# smooth:\t1\n",
+ "# min_limit:\tNone\n",
+ "# max_limit:\tNone\n",
+ "# fp_min:\t20\n",
+ "# fp_max:\t50\n",
+ "# flank_min:\t10\n",
+ "# flank_max:\t30\n",
+ "# window:\t100\n",
+ "# cores:\t3\n",
+ "# split:\t100\n",
+ "# verbosity:\t1\n",
+ "\n",
+ "\n",
+ "# ----- Output files -----\n",
+ "# Tutorial3/MotifFootprinting/Mutant_footprintscores.bw\n",
+ "\n",
+ "\n",
+ "\n"
+ ]
+ }
+ ],
+ "source": [
+ "!TOBIAS ScoreBigwig -s Tutorial3/MotifFootprinting/CTL_corrected.bw -r Tutorial3/MotifFootprinting/MasterPeakList.bed -o Tutorial3/MotifFootprinting/CTL_footprintscores.bw --cores $numthreadsint --verbosity 1\n",
+ "\n",
+ "#Let's do the same for our mutant sample\n",
+ "!TOBIAS ScoreBigwig -s Tutorial3/MotifFootprinting/Mutant_corrected.bw -r Tutorial3/MotifFootprinting/MasterPeakList.bed -o Tutorial3/MotifFootprinting/Mutant_footprintscores.bw --cores $numthreadsint --verbosity 1\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "affb7253-ed1b-4b21-8377-47f20f868eab",
+ "metadata": {},
+ "source": [
+ "Now that we have our corrected signal and footprint scores, let's do TF binding site prediciton as well as differential footprinting.\n",
+ "\n",
+ "Caution: this step searches throug the signal at every signal location corresponding to motifs in your jaspar file. Here we use all the motifs in the jaspar database. This can take several minutes..."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 23,
+ "id": "688f95c8-2649-4f7b-aa91-43ac8f4cd163",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "--2022-09-08 17:24:20-- https://jaspar.genereg.net/download/data/2022/CORE/JASPAR2022_CORE_vertebrates_non-redundant_pfms_jaspar.txt\n",
+ "Resolving jaspar.genereg.net (jaspar.genereg.net)... 193.60.222.202\n",
+ "Connecting to jaspar.genereg.net (jaspar.genereg.net)|193.60.222.202|:443... connected.\n",
+ "HTTP request sent, awaiting response... 200 OK\n",
+ "Length: 327864 (320K) [text/plain]\n",
+ "Saving to: ‘Tutorial3/MotifFootprinting/JASPAR2022_CORE_vertebrates_non-redundant_pfms_jaspar.txt.2’\n",
+ "\n",
+ "JASPAR2022_CORE_ver 100%[===================>] 320.18K 819KB/s in 0.4s \n",
+ "\n",
+ "2022-09-08 17:24:21 (819 KB/s) - ‘Tutorial3/MotifFootprinting/JASPAR2022_CORE_vertebrates_non-redundant_pfms_jaspar.txt.2’ saved [327864/327864]\n",
+ "\n",
+ "# TOBIAS 0.13.3 BINDetect (run started 2022-09-08 17:24:29.162477)\n",
+ "# Working directory: /home/jupyter\n",
+ "# Command line call: TOBIAS BINDetect --motifs Tutorial3/MotifFootprinting/JASPAR2022_CORE_vertebrates_non-redundant_pfms_jaspar.txt --signals Tutorial3/MotifFootprinting/CTL_footprintscores.bw Tutorial3/MotifFootprinting/Mutant_footprintscores.bw --genome Tutorial3/InputFiles/chr4.fa --peaks Tutorial3/MotifFootprinting/MasterPeakList.bed --outdir Tutorial3/MotifFootprinting/DiffMotifs --cond_names CTL Mutant --cores 3 --verbosity 1\n",
+ "\n",
+ "# ----- Input parameters -----\n",
+ "# signals:\t['Tutorial3/MotifFootprinting/CTL_footprintscores.bw', 'Tutorial3/MotifFootprinting/Mutant_footprintscores.bw']\n",
+ "# peaks:\tTutorial3/MotifFootprinting/MasterPeakList.bed\n",
+ "# motifs:\t['Tutorial3/MotifFootprinting/JASPAR2022_CORE_vertebrates_non-redundant_pfms_jaspar.txt']\n",
+ "# genome:\tTutorial3/InputFiles/chr4.fa\n",
+ "# cond_names:\t['CTL', 'Mutant']\n",
+ "# peak_header:\tNone\n",
+ "# naming:\tname_id\n",
+ "# motif_pvalue:\t0.0001\n",
+ "# bound_pvalue:\t0.001\n",
+ "# pseudo:\tNone\n",
+ "# time_series:\tFalse\n",
+ "# skip_excel:\tFalse\n",
+ "# output_peaks:\tNone\n",
+ "# norm_off:\tFalse\n",
+ "# prefix:\tbindetect\n",
+ "# outdir:\t/home/jupyter/Tutorial3/MotifFootprinting/DiffMotifs\n",
+ "# cores:\t3\n",
+ "# split:\t100\n",
+ "# debug:\tFalse\n",
+ "# verbosity:\t1\n",
+ "\n",
+ "\n",
+ "# ----- Output files -----\n",
+ "# /home/jupyter/Tutorial3/MotifFootprinting/DiffMotifs/*/beds/*_CTL_bound.bed\n",
+ "# /home/jupyter/Tutorial3/MotifFootprinting/DiffMotifs/*/beds/*_CTL_unbound.bed\n",
+ "# /home/jupyter/Tutorial3/MotifFootprinting/DiffMotifs/*/beds/*_Mutant_bound.bed\n",
+ "# /home/jupyter/Tutorial3/MotifFootprinting/DiffMotifs/*/beds/*_Mutant_unbound.bed\n",
+ "# /home/jupyter/Tutorial3/MotifFootprinting/DiffMotifs/*/beds/*_all.bed\n",
+ "# /home/jupyter/Tutorial3/MotifFootprinting/DiffMotifs/*/plots/*_log2fcs.pdf\n",
+ "# /home/jupyter/Tutorial3/MotifFootprinting/DiffMotifs/*/*_overview.txt\n",
+ "# /home/jupyter/Tutorial3/MotifFootprinting/DiffMotifs/*/*_overview.xlsx\n",
+ "# /home/jupyter/Tutorial3/MotifFootprinting/DiffMotifs/bindetect_distances.txt\n",
+ "# /home/jupyter/Tutorial3/MotifFootprinting/DiffMotifs/bindetect_results.txt\n",
+ "# /home/jupyter/Tutorial3/MotifFootprinting/DiffMotifs/bindetect_results.xlsx\n",
+ "# /home/jupyter/Tutorial3/MotifFootprinting/DiffMotifs/bindetect_figures.pdf\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "done\n"
+ ]
+ }
+ ],
+ "source": [
+ "#First, we'll download the current jaspar motifs\n",
+ "!wget https://jaspar.genereg.net/download/data/2022/CORE/JASPAR2022_CORE_vertebrates_non-redundant_pfms_jaspar.txt -P Tutorial3/MotifFootprinting/\n",
+ "\n",
+ "#Next we can calculate statistics for each motif represented in our jaspar motif file. If we list both our CTL and Mutant sample, it will calculate the differential footprint score for us as well.\n",
+ "!TOBIAS BINDetect --motifs Tutorial3/MotifFootprinting/JASPAR2022_CORE_vertebrates_non-redundant_pfms_jaspar.txt --signals Tutorial3/MotifFootprinting/CTL_footprintscores.bw Tutorial3/MotifFootprinting/Mutant_footprintscores.bw --genome Tutorial3/InputFiles/chr4.fa --peaks Tutorial3/MotifFootprinting/MasterPeakList.bed --outdir Tutorial3/MotifFootprinting/DiffMotifs --cond_names CTL Mutant --cores $numthreadsint --verbosity 1\n",
+ "\n",
+ "print(\"done\")"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 24,
+ "id": "880bab0f-ee58-4f1b-8e62-e1e6e1811809",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ " \n",
+ " "
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "execution_count": 24,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "#View the html results\n",
+ "IFrame(src='Tutorial3/MotifFootprinting/DiffMotifs/bindetect_CTL_Mutant.html', width=900, height=600)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "e5180387-42ea-4d87-b1f2-2d320640af2b",
+ "metadata": {},
+ "source": [
+ "In the above html file you can hover over each point to see the motif name and the sequence. This type of plot is a volcano plot showing the differntial signal on the x-axis and the significance values on the y-axis.\n",
+ "\n",
+ "For example, the original paper focused on TP63, which is one of our differential dots in the html file. \n",
+ "\n",
+ " \n",
+ "\n",
+ "Let's visualize the averge footprint at TP63 motifs."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 25,
+ "id": "f9683a4e-48ba-426b-91cb-5befa06b1d32",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "# TOBIAS 0.13.3 PlotAggregate (run started 2022-09-08 17:32:40.206683)\n",
+ "# Working directory: /home/jupyter\n",
+ "# Command line call: TOBIAS PlotAggregate --TFBS Tutorial3/MotifFootprinting/DiffMotifs/TP63_MA0525.2/beds/TP63_MA0525.2_all.bed --signals Tutorial3/MotifFootprinting/CTL_corrected.bw Tutorial3/MotifFootprinting/Mutant_corrected.bw --output Tutorial3/MotifFootprinting/TP63_footprint_compare.png --share_y both --verbosity 1 --plot_boundaries --flank 60 --smooth 2 --signal-on-x\n",
+ "\n",
+ "# ----- Input parameters -----\n",
+ "# TFBS:\t['Tutorial3/MotifFootprinting/DiffMotifs/TP63_MA0525.2/beds/TP63_MA0525.2_all.bed']\n",
+ "# signals:\t['Tutorial3/MotifFootprinting/CTL_corrected.bw', 'Tutorial3/MotifFootprinting/Mutant_corrected.bw']\n",
+ "# regions:\t[]\n",
+ "# whitelist:\t[]\n",
+ "# blacklist:\t[]\n",
+ "# output:\tTutorial3/MotifFootprinting/TP63_footprint_compare.png\n",
+ "# output_txt:\tNone\n",
+ "# title:\tAggregated signals\n",
+ "# flank:\t60\n",
+ "# TFBS_labels:\tNone\n",
+ "# signal_labels:\tNone\n",
+ "# region_labels:\tNone\n",
+ "# share_y:\tboth\n",
+ "# normalize:\tFalse\n",
+ "# negate:\tFalse\n",
+ "# smooth:\t2\n",
+ "# log_transform:\tFalse\n",
+ "# plot_boundaries:\tTrue\n",
+ "# signal_on_x:\tTrue\n",
+ "# remove_outliers:\t1\n",
+ "# verbosity:\t1\n",
+ "\n",
+ "\n",
+ "# ----- Output files -----\n",
+ "# Tutorial3/MotifFootprinting/TP63_footprint_compare.png\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n"
+ ]
+ }
+ ],
+ "source": [
+ "#IFrame(src='Tutorial2/MotifFootprinting/MYBL1_MA0776.1/plots/MYBL1_MA0776.1_log2fcs.pdf', width=900, height=600) \n",
+ "\n",
+ "#note change to Tutorial3\n",
+ "#!TOBIAS PlotAggregate --TFBS Tutorial3/MotifFootprinting/DiffMotifs/TP63_MA0525.2/beds/TP63_MA0525.2_all.bed --signals Tutorial3/MotifFootprinting/CTL_corrected.bw Tutorial3/MotifFootprinting/Mutant_corrected.bw --output Tutorial3/MotifFootprinting/TP63_footprint_compare.png --share_y both --verbosity 1 --plot_boundaries --flank 60 --smooth 2\n",
+ "!TOBIAS PlotAggregate --TFBS Tutorial3/MotifFootprinting/DiffMotifs/TP63_MA0525.2/beds/TP63_MA0525.2_all.bed --signals Tutorial3/MotifFootprinting/CTL_corrected.bw Tutorial3/MotifFootprinting/Mutant_corrected.bw --output Tutorial3/MotifFootprinting/TP63_footprint_compare.png --share_y both --verbosity 1 --plot_boundaries --flank 60 --smooth 2 --signal-on-x"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 26,
+ "id": "ad9b80da-1111-4079-a308-d7cb6ddc2a2d",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ " \n",
+ " "
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "execution_count": 26,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "IFrame(src='Tutorial3/MotifFootprinting/TP63_footprint_compare.png', width=600, height=400) "
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "a0b72adc-4b71-4234-9549-6381a56bfb39",
+ "metadata": {},
+ "source": [
+ "We can also get all the motifs that have differential footprints:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 27,
+ "id": "5a06ef84-9a27-4d55-be1d-fed572711daa",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " output_prefix \n",
+ " name \n",
+ " motif_id \n",
+ " cluster \n",
+ " total_tfbs \n",
+ " CTL_mean_score \n",
+ " CTL_bound \n",
+ " Mutant_mean_score \n",
+ " Mutant_bound \n",
+ " CTL_Mutant_change \n",
+ " CTL_Mutant_pvalue \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " 0 \n",
+ " Arnt_MA0004.1 \n",
+ " Arnt \n",
+ " MA0004.1 \n",
+ " C_MYC \n",
+ " 38 \n",
+ " 85.96989 \n",
+ " 16 \n",
+ " 108.49836 \n",
+ " 17 \n",
+ " -0.38044 \n",
+ " 6.770010e-46 \n",
+ " \n",
+ " \n",
+ " 1 \n",
+ " AhrArnt_MA0006.1 \n",
+ " Ahr::Arnt \n",
+ " MA0006.1 \n",
+ " C_Ahr::Arnt \n",
+ " 48 \n",
+ " 122.76261 \n",
+ " 24 \n",
+ " 115.51752 \n",
+ " 22 \n",
+ " 0.16911 \n",
+ " 1.449010e-19 \n",
+ " \n",
+ " \n",
+ " 2 \n",
+ " Ddit3Cebpa_MA0019.1 \n",
+ " Ddit3::Cebpa \n",
+ " MA0019.1 \n",
+ " C_Ddit3::Cebpa \n",
+ " 62 \n",
+ " 80.02984 \n",
+ " 24 \n",
+ " 80.33066 \n",
+ " 23 \n",
+ " 0.01484 \n",
+ " 3.154620e-01 \n",
+ " \n",
+ " \n",
+ " 3 \n",
+ " Mecom_MA0029.1 \n",
+ " Mecom \n",
+ " MA0029.1 \n",
+ " C_Mecom \n",
+ " 74 \n",
+ " 61.73375 \n",
+ " 20 \n",
+ " 58.21405 \n",
+ " 21 \n",
+ " 0.04212 \n",
+ " 6.585970e-03 \n",
+ " \n",
+ " \n",
+ " 4 \n",
+ " FOXF2_MA0030.1 \n",
+ " FOXF2 \n",
+ " MA0030.1 \n",
+ " C_FOXD1 \n",
+ " 70 \n",
+ " 54.91816 \n",
+ " 16 \n",
+ " 57.50246 \n",
+ " 12 \n",
+ " 0.02761 \n",
+ " 8.515360e-02 \n",
+ " \n",
+ " \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " \n",
+ " \n",
+ " 836 \n",
+ " ZNF281_MA1630.2 \n",
+ " ZNF281 \n",
+ " MA1630.2 \n",
+ " C_ZNF281 \n",
+ " 442 \n",
+ " 87.47939 \n",
+ " 177 \n",
+ " 93.39031 \n",
+ " 192 \n",
+ " -0.09821 \n",
+ " 5.581870e-38 \n",
+ " \n",
+ " \n",
+ " 837 \n",
+ " BACH1_MA1633.2 \n",
+ " BACH1 \n",
+ " MA1633.2 \n",
+ " C_JUNB \n",
+ " 292 \n",
+ " 107.34839 \n",
+ " 180 \n",
+ " 93.78244 \n",
+ " 132 \n",
+ " 0.44431 \n",
+ " 1.568740e-86 \n",
+ " \n",
+ " \n",
+ " 838 \n",
+ " Prdm4_MA1647.2 \n",
+ " Prdm4 \n",
+ " MA1647.2 \n",
+ " C_Prdm4 \n",
+ " 90 \n",
+ " 65.87505 \n",
+ " 28 \n",
+ " 60.36949 \n",
+ " 24 \n",
+ " 0.21506 \n",
+ " 1.012680e-39 \n",
+ " \n",
+ " \n",
+ " 839 \n",
+ " THAP1_MA0597.2 \n",
+ " THAP1 \n",
+ " MA0597.2 \n",
+ " C_THAP1 \n",
+ " 236 \n",
+ " 90.56928 \n",
+ " 101 \n",
+ " 82.79266 \n",
+ " 92 \n",
+ " 0.16360 \n",
+ " 6.101670e-48 \n",
+ " \n",
+ " \n",
+ " 840 \n",
+ " NR5A1_MA1540.2 \n",
+ " NR5A1 \n",
+ " MA1540.2 \n",
+ " C_NR5A1 \n",
+ " 88 \n",
+ " 60.87309 \n",
+ " 20 \n",
+ " 63.31646 \n",
+ " 24 \n",
+ " 0.00384 \n",
+ " 2.234170e-01 \n",
+ " \n",
+ " \n",
+ "
\n",
+ "
841 rows × 11 columns
\n",
+ "
"
+ ],
+ "text/plain": [
+ " output_prefix name motif_id cluster total_tfbs \\\n",
+ "0 Arnt_MA0004.1 Arnt MA0004.1 C_MYC 38 \n",
+ "1 AhrArnt_MA0006.1 Ahr::Arnt MA0006.1 C_Ahr::Arnt 48 \n",
+ "2 Ddit3Cebpa_MA0019.1 Ddit3::Cebpa MA0019.1 C_Ddit3::Cebpa 62 \n",
+ "3 Mecom_MA0029.1 Mecom MA0029.1 C_Mecom 74 \n",
+ "4 FOXF2_MA0030.1 FOXF2 MA0030.1 C_FOXD1 70 \n",
+ ".. ... ... ... ... ... \n",
+ "836 ZNF281_MA1630.2 ZNF281 MA1630.2 C_ZNF281 442 \n",
+ "837 BACH1_MA1633.2 BACH1 MA1633.2 C_JUNB 292 \n",
+ "838 Prdm4_MA1647.2 Prdm4 MA1647.2 C_Prdm4 90 \n",
+ "839 THAP1_MA0597.2 THAP1 MA0597.2 C_THAP1 236 \n",
+ "840 NR5A1_MA1540.2 NR5A1 MA1540.2 C_NR5A1 88 \n",
+ "\n",
+ " CTL_mean_score CTL_bound Mutant_mean_score Mutant_bound \\\n",
+ "0 85.96989 16 108.49836 17 \n",
+ "1 122.76261 24 115.51752 22 \n",
+ "2 80.02984 24 80.33066 23 \n",
+ "3 61.73375 20 58.21405 21 \n",
+ "4 54.91816 16 57.50246 12 \n",
+ ".. ... ... ... ... \n",
+ "836 87.47939 177 93.39031 192 \n",
+ "837 107.34839 180 93.78244 132 \n",
+ "838 65.87505 28 60.36949 24 \n",
+ "839 90.56928 101 82.79266 92 \n",
+ "840 60.87309 20 63.31646 24 \n",
+ "\n",
+ " CTL_Mutant_change CTL_Mutant_pvalue \n",
+ "0 -0.38044 6.770010e-46 \n",
+ "1 0.16911 1.449010e-19 \n",
+ "2 0.01484 3.154620e-01 \n",
+ "3 0.04212 6.585970e-03 \n",
+ "4 0.02761 8.515360e-02 \n",
+ ".. ... ... \n",
+ "836 -0.09821 5.581870e-38 \n",
+ "837 0.44431 1.568740e-86 \n",
+ "838 0.21506 1.012680e-39 \n",
+ "839 0.16360 6.101670e-48 \n",
+ "840 0.00384 2.234170e-01 \n",
+ "\n",
+ "[841 rows x 11 columns]"
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "#!load the results as a pandas table Tutorial2/MotifFootprinting/bindetect_results.txt\n",
+ "dframe = pd.read_csv(\"Tutorial3/MotifFootprinting/DiffMotifs/bindetect_results.txt\", sep='\\t')\n",
+ "display(dframe)\n",
+ "DiffMotifs = dframe[dframe['CTL_Mutant_pvalue'] < .05]\n",
+ "#Write out to a tab separated file\n",
+ "DiffMotifs.to_csv('Tutorial3/MotifFootprinting/DiffMotifs_p05.txt')"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "12439dc4-b9ad-49fa-9ccb-4383a272bcaf",
+ "metadata": {},
+ "source": [
+ "\n",
+ "Great job! \n",
+ "
\n",
+ "Thank you for completing these tutorials. Feel free to download these notebooks, customize, and use them to process your own data. \n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "b3de930a-e91c-4d95-8d78-f98689f503ed",
+ "metadata": {},
+ "outputs": [],
+ "source": []
+ }
+ ],
+ "metadata": {
+ "environment": {
+ "kernel": "python3",
+ "name": "common-cpu.m94",
+ "type": "gcloud",
+ "uri": "gcr.io/deeplearning-platform-release/base-cpu:m94"
+ },
+ "kernelspec": {
+ "display_name": "Python 3",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.7.12"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/tutorials/notebooks/DL-gwas-gcp-example/1-d10-run-first.ipynb b/tutorials/notebooks/DL-gwas-gcp-example/1-d10-run-first.ipynb
new file mode 100644
index 0000000..9e1fd1e
--- /dev/null
+++ b/tutorials/notebooks/DL-gwas-gcp-example/1-d10-run-first.ipynb
@@ -0,0 +1,1205 @@
+{
+ "cells": [
+ {
+ "cell_type": "code",
+ "execution_count": 1,
+ "id": "ddee0923-844f-42c8-9273-7e32d7178628",
+ "metadata": {
+ "tags": [
+ "skip"
+ ]
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Defaulting to user installation because normal site-packages is not writeable\n",
+ "Requirement already satisfied: pip in /usr/local/lib/python3.8/dist-packages (22.0.4)\n",
+ "Collecting pip\n",
+ " Downloading pip-22.3.1-py3-none-any.whl (2.1 MB)\n",
+ "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m2.1/2.1 MB\u001b[0m \u001b[31m29.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0ma \u001b[36m0:00:01\u001b[0m\n",
+ "\u001b[?25hInstalling collected packages: pip\n",
+ "Successfully installed pip-22.3.1\n",
+ "\u001b[33mWARNING: You are using pip version 22.0.4; however, version 22.3.1 is available.\n",
+ "You should consider upgrading via the '/usr/bin/python3 -m pip install --upgrade pip' command.\u001b[0m\u001b[33m\n",
+ "\u001b[0mDefaulting to user installation because normal site-packages is not writeable\n",
+ "Collecting pendulum==2.1.2\n",
+ " Downloading pendulum-2.1.2-cp38-cp38-manylinux1_x86_64.whl (155 kB)\n",
+ "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m155.7/155.7 kB\u001b[0m \u001b[31m6.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+ "\u001b[?25hCollecting pytzdata>=2020.1\n",
+ " Downloading pytzdata-2020.1-py2.py3-none-any.whl (489 kB)\n",
+ "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m490.0/490.0 kB\u001b[0m \u001b[31m25.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+ "\u001b[?25hRequirement already satisfied: python-dateutil<3.0,>=2.6 in /usr/local/lib/python3.8/dist-packages (from pendulum==2.1.2) (2.8.2)\n",
+ "Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.8/dist-packages (from python-dateutil<3.0,>=2.6->pendulum==2.1.2) (1.16.0)\n",
+ "Installing collected packages: pytzdata, pendulum\n",
+ "Successfully installed pendulum-2.1.2 pytzdata-2020.1\n",
+ "Defaulting to user installation because normal site-packages is not writeable\n",
+ "Requirement already satisfied: pandas==1.4.4 in /usr/local/lib/python3.8/dist-packages (1.4.4)\n",
+ "Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.8/dist-packages (from pandas==1.4.4) (2022.5)\n",
+ "Requirement already satisfied: python-dateutil>=2.8.1 in /usr/local/lib/python3.8/dist-packages (from pandas==1.4.4) (2.8.2)\n",
+ "Requirement already satisfied: numpy>=1.18.5 in /usr/local/lib/python3.8/dist-packages (from pandas==1.4.4) (1.19.5)\n",
+ "Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.8/dist-packages (from python-dateutil>=2.8.1->pandas==1.4.4) (1.16.0)\n",
+ "Defaulting to user installation because normal site-packages is not writeable\n",
+ "Requirement already satisfied: matplotlib==3.3.4 in /usr/local/lib/python3.8/dist-packages (3.3.4)\n",
+ "Requirement already satisfied: numpy>=1.15 in /usr/local/lib/python3.8/dist-packages (from matplotlib==3.3.4) (1.19.5)\n",
+ "Requirement already satisfied: pillow>=6.2.0 in /usr/local/lib/python3.8/dist-packages (from matplotlib==3.3.4) (9.2.0)\n",
+ "Requirement already satisfied: python-dateutil>=2.1 in /usr/local/lib/python3.8/dist-packages (from matplotlib==3.3.4) (2.8.2)\n",
+ "Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.3 in /usr/local/lib/python3.8/dist-packages (from matplotlib==3.3.4) (3.0.9)\n",
+ "Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.8/dist-packages (from matplotlib==3.3.4) (0.11.0)\n",
+ "Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.8/dist-packages (from matplotlib==3.3.4) (1.4.4)\n",
+ "Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.8/dist-packages (from python-dateutil>=2.1->matplotlib==3.3.4) (1.16.0)\n",
+ "Defaulting to user installation because normal site-packages is not writeable\n",
+ "Collecting tensorflow==2.11.0\n",
+ " Downloading tensorflow-2.11.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (588.3 MB)\n",
+ "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m588.3/588.3 MB\u001b[0m \u001b[31m4.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m00:01\u001b[0m00:01\u001b[0m\n",
+ "\u001b[?25hRequirement already satisfied: wrapt>=1.11.0 in /usr/local/lib/python3.8/dist-packages (from tensorflow==2.11.0) (1.14.1)\n",
+ "Collecting tensorboard<2.12,>=2.11\n",
+ " Downloading tensorboard-2.11.0-py3-none-any.whl (6.0 MB)\n",
+ "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m6.0/6.0 MB\u001b[0m \u001b[31m100.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m00:01\u001b[0m\n",
+ "\u001b[?25hRequirement already satisfied: packaging in /usr/local/lib/python3.8/dist-packages (from tensorflow==2.11.0) (21.3)\n",
+ "Collecting tensorflow-estimator<2.12,>=2.11.0\n",
+ " Downloading tensorflow_estimator-2.11.0-py2.py3-none-any.whl (439 kB)\n",
+ "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m439.2/439.2 kB\u001b[0m \u001b[31m57.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+ "\u001b[?25hCollecting tensorflow-io-gcs-filesystem>=0.23.1\n",
+ " Downloading tensorflow_io_gcs_filesystem-0.29.0-cp38-cp38-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (2.4 MB)\n",
+ "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m2.4/2.4 MB\u001b[0m \u001b[31m92.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+ "\u001b[?25hCollecting opt-einsum>=2.3.2\n",
+ " Downloading opt_einsum-3.3.0-py3-none-any.whl (65 kB)\n",
+ "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m65.5/65.5 kB\u001b[0m \u001b[31m10.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+ "\u001b[?25hCollecting protobuf<3.20,>=3.9.2\n",
+ " Downloading protobuf-3.19.6-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.1 MB)\n",
+ "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m1.1/1.1 MB\u001b[0m \u001b[31m92.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+ "\u001b[?25hCollecting google-pasta>=0.1.1\n",
+ " Downloading google_pasta-0.2.0-py3-none-any.whl (57 kB)\n",
+ "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m57.5/57.5 kB\u001b[0m \u001b[31m11.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+ "\u001b[?25hCollecting libclang>=13.0.0\n",
+ " Downloading libclang-14.0.6-py2.py3-none-manylinux2010_x86_64.whl (14.1 MB)\n",
+ "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m14.1/14.1 MB\u001b[0m \u001b[31m109.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m00:01\u001b[0m00:01\u001b[0m\n",
+ "\u001b[?25hCollecting absl-py>=1.0.0\n",
+ " Downloading absl_py-1.3.0-py3-none-any.whl (124 kB)\n",
+ "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m124.6/124.6 kB\u001b[0m \u001b[31m26.8 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+ "\u001b[?25hCollecting gast<=0.4.0,>=0.2.1\n",
+ " Downloading gast-0.4.0-py3-none-any.whl (9.8 kB)\n",
+ "Requirement already satisfied: setuptools in /usr/local/lib/python3.8/dist-packages (from tensorflow==2.11.0) (62.0.0)\n",
+ "Collecting astunparse>=1.6.0\n",
+ " Downloading astunparse-1.6.3-py2.py3-none-any.whl (12 kB)\n",
+ "Requirement already satisfied: termcolor>=1.1.0 in /usr/local/lib/python3.8/dist-packages (from tensorflow==2.11.0) (2.0.1)\n",
+ "Collecting flatbuffers>=2.0\n",
+ " Downloading flatbuffers-23.1.4-py2.py3-none-any.whl (26 kB)\n",
+ "Requirement already satisfied: six>=1.12.0 in /usr/local/lib/python3.8/dist-packages (from tensorflow==2.11.0) (1.16.0)\n",
+ "Collecting keras<2.12,>=2.11.0\n",
+ " Downloading keras-2.11.0-py2.py3-none-any.whl (1.7 MB)\n",
+ "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m1.7/1.7 MB\u001b[0m \u001b[31m76.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+ "\u001b[?25hCollecting numpy>=1.20\n",
+ " Downloading numpy-1.24.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (17.3 MB)\n",
+ "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m17.3/17.3 MB\u001b[0m \u001b[31m108.2 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m00:01\u001b[0m00:01\u001b[0m\n",
+ "\u001b[?25hCollecting h5py>=2.9.0\n",
+ " Downloading h5py-3.7.0-cp38-cp38-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (4.5 MB)\n",
+ "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m4.5/4.5 MB\u001b[0m \u001b[31m91.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m:00:01\u001b[0m\n",
+ "\u001b[?25hRequirement already satisfied: typing-extensions>=3.6.6 in /usr/local/lib/python3.8/dist-packages (from tensorflow==2.11.0) (3.10.0.2)\n",
+ "Requirement already satisfied: grpcio<2.0,>=1.24.3 in /usr/local/lib/python3.8/dist-packages (from tensorflow==2.11.0) (1.50.0)\n",
+ "Requirement already satisfied: wheel<1.0,>=0.23.0 in /usr/lib/python3/dist-packages (from astunparse>=1.6.0->tensorflow==2.11.0) (0.30.0)\n",
+ "Requirement already satisfied: requests<3,>=2.21.0 in /usr/local/lib/python3.8/dist-packages (from tensorboard<2.12,>=2.11->tensorflow==2.11.0) (2.28.1)\n",
+ "Requirement already satisfied: google-auth<3,>=1.6.3 in /usr/local/lib/python3.8/dist-packages (from tensorboard<2.12,>=2.11->tensorflow==2.11.0) (1.34.0)\n",
+ "Collecting markdown>=2.6.8\n",
+ " Downloading Markdown-3.4.1-py3-none-any.whl (93 kB)\n",
+ "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m93.3/93.3 kB\u001b[0m \u001b[31m23.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+ "\u001b[?25hCollecting werkzeug>=1.0.1\n",
+ " Downloading Werkzeug-2.2.2-py3-none-any.whl (232 kB)\n",
+ "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m232.7/232.7 kB\u001b[0m \u001b[31m37.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+ "\u001b[?25hCollecting tensorboard-data-server<0.7.0,>=0.6.0\n",
+ " Downloading tensorboard_data_server-0.6.1-py3-none-manylinux2010_x86_64.whl (4.9 MB)\n",
+ "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m4.9/4.9 MB\u001b[0m \u001b[31m114.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m00:01\u001b[0m\n",
+ "\u001b[?25hCollecting google-auth-oauthlib<0.5,>=0.4.1\n",
+ " Downloading google_auth_oauthlib-0.4.6-py2.py3-none-any.whl (18 kB)\n",
+ "Collecting tensorboard-plugin-wit>=1.6.0\n",
+ " Downloading tensorboard_plugin_wit-1.8.1-py3-none-any.whl (781 kB)\n",
+ "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m781.3/781.3 kB\u001b[0m \u001b[31m83.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+ "\u001b[?25hRequirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in /usr/local/lib/python3.8/dist-packages (from packaging->tensorflow==2.11.0) (3.0.9)\n",
+ "Requirement already satisfied: pyasn1-modules>=0.2.1 in /usr/local/lib/python3.8/dist-packages (from google-auth<3,>=1.6.3->tensorboard<2.12,>=2.11->tensorflow==2.11.0) (0.2.8)\n",
+ "Requirement already satisfied: cachetools<5.0,>=2.0.0 in /usr/local/lib/python3.8/dist-packages (from google-auth<3,>=1.6.3->tensorboard<2.12,>=2.11->tensorflow==2.11.0) (4.2.2)\n",
+ "Requirement already satisfied: rsa<5,>=3.1.4 in /usr/local/lib/python3.8/dist-packages (from google-auth<3,>=1.6.3->tensorboard<2.12,>=2.11->tensorflow==2.11.0) (4.9)\n",
+ "Requirement already satisfied: requests-oauthlib>=0.7.0 in /usr/local/lib/python3.8/dist-packages (from google-auth-oauthlib<0.5,>=0.4.1->tensorboard<2.12,>=2.11->tensorflow==2.11.0) (1.3.1)\n",
+ "Requirement already satisfied: importlib-metadata>=4.4 in /usr/local/lib/python3.8/dist-packages (from markdown>=2.6.8->tensorboard<2.12,>=2.11->tensorflow==2.11.0) (5.0.0)\n",
+ "Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.8/dist-packages (from requests<3,>=2.21.0->tensorboard<2.12,>=2.11->tensorflow==2.11.0) (3.2)\n",
+ "Requirement already satisfied: charset-normalizer<3,>=2 in /usr/local/lib/python3.8/dist-packages (from requests<3,>=2.21.0->tensorboard<2.12,>=2.11->tensorflow==2.11.0) (2.1.1)\n",
+ "Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.8/dist-packages (from requests<3,>=2.21.0->tensorboard<2.12,>=2.11->tensorflow==2.11.0) (2021.5.30)\n",
+ "Requirement already satisfied: urllib3<1.27,>=1.21.1 in /usr/local/lib/python3.8/dist-packages (from requests<3,>=2.21.0->tensorboard<2.12,>=2.11->tensorflow==2.11.0) (1.26.12)\n",
+ "Requirement already satisfied: MarkupSafe>=2.1.1 in /usr/local/lib/python3.8/dist-packages (from werkzeug>=1.0.1->tensorboard<2.12,>=2.11->tensorflow==2.11.0) (2.1.1)\n",
+ "Requirement already satisfied: zipp>=0.5 in /usr/local/lib/python3.8/dist-packages (from importlib-metadata>=4.4->markdown>=2.6.8->tensorboard<2.12,>=2.11->tensorflow==2.11.0) (3.10.0)\n",
+ "Requirement already satisfied: pyasn1<0.5.0,>=0.4.6 in /usr/local/lib/python3.8/dist-packages (from pyasn1-modules>=0.2.1->google-auth<3,>=1.6.3->tensorboard<2.12,>=2.11->tensorflow==2.11.0) (0.4.8)\n",
+ "Requirement already satisfied: oauthlib>=3.0.0 in /usr/local/lib/python3.8/dist-packages (from requests-oauthlib>=0.7.0->google-auth-oauthlib<0.5,>=0.4.1->tensorboard<2.12,>=2.11->tensorflow==2.11.0) (3.2.2)\n",
+ "Installing collected packages: tensorboard-plugin-wit, libclang, flatbuffers, werkzeug, tensorflow-io-gcs-filesystem, tensorflow-estimator, tensorboard-data-server, protobuf, numpy, keras, google-pasta, gast, astunparse, absl-py, opt-einsum, markdown, h5py, google-auth-oauthlib, tensorboard, tensorflow\n",
+ "\u001b[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.\n",
+ "yellowbrick 1.3 requires numpy<1.20,>=1.16.0, but you have numpy 1.24.1 which is incompatible.\n",
+ "scipy 1.6.3 requires numpy<1.23.0,>=1.16.5, but you have numpy 1.24.1 which is incompatible.\n",
+ "ml-metadata 0.30.0 requires absl-py<0.13,>=0.9, but you have absl-py 1.3.0 which is incompatible.\n",
+ "kserve 0.8.0 requires numpy~=1.19.2, but you have numpy 1.24.1 which is incompatible.\n",
+ "kfp 1.7.post0+41.gb3589b6c6 requires absl-py<=0.11,>=0.9, but you have absl-py 1.3.0 which is incompatible.\u001b[0m\u001b[31m\n",
+ "\u001b[0mSuccessfully installed absl-py-1.3.0 astunparse-1.6.3 flatbuffers-23.1.4 gast-0.4.0 google-auth-oauthlib-0.4.6 google-pasta-0.2.0 h5py-3.7.0 keras-2.11.0 libclang-14.0.6 markdown-3.4.1 numpy-1.24.1 opt-einsum-3.3.0 protobuf-3.19.6 tensorboard-2.11.0 tensorboard-data-server-0.6.1 tensorboard-plugin-wit-1.8.1 tensorflow-2.11.0 tensorflow-estimator-2.11.0 tensorflow-io-gcs-filesystem-0.29.0 werkzeug-2.2.2\n"
+ ]
+ }
+ ],
+ "source": [
+ "\n",
+ "# mark in Kale as skip\n",
+ "! pip3 install --upgrade pip\n",
+ "! pip3 install pendulum==2.1.2\n",
+ "! pip3 install pandas==1.4.4\n",
+ "! pip3 install matplotlib==3.3.4\n",
+ "! pip3 install --upgrade tensorflow==2.11.0\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 2,
+ "id": "7352040d-d4ca-41af-81ca-8df30e7de6cb",
+ "metadata": {
+ "tags": [
+ "skip"
+ ]
+ },
+ "outputs": [],
+ "source": [
+ "# mark in Kale as skip\n",
+ "import os\n",
+ "\n",
+ "if not os.getenv(\"IS_TESTING\"):\n",
+ " # Automatically restart kernel after installs\n",
+ " import IPython\n",
+ "\n",
+ " app = IPython.Application.instance()\n",
+ " app.kernel.do_shutdown(True)\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 6,
+ "id": "619de0db-3878-4254-93a4-4404553d0ecf",
+ "metadata": {
+ "tags": [
+ "imports"
+ ]
+ },
+ "outputs": [],
+ "source": [
+ "\n",
+ "# mark in katib as imports\n",
+ "import pendulum\n",
+ "import numpy as np\n",
+ "import pandas as pd\n",
+ "import warnings\n",
+ "import tensorflow as tf\n",
+ "from tensorflow.keras.models import Model\n",
+ "from tensorflow.keras.layers import Dense, Flatten, Conv1D, Dropout, BatchNormalization, Lambda\n",
+ "from tensorflow.keras.regularizers import l1,l2, L1L2\n",
+ "import matplotlib.pyplot as plt\n",
+ "from scipy import stats\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 14,
+ "id": "42103b5f-1c53-4047-99ba-690e5bb7d398",
+ "metadata": {
+ "tags": [
+ "skip"
+ ]
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "'2023-01-051310'"
+ ]
+ },
+ "execution_count": 14,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "# Set time in the pipeline parameters and in Katib (as string and the only value) to the output of this\n",
+ "pendulum.now(tz='America/New_York').__str__()[:16].replace('T','').replace(':','').replace('_','-')"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 8,
+ "id": "fd604bf6-0d00-4178-9deb-28465c915c29",
+ "metadata": {
+ "tags": [
+ "pipeline-parameters"
+ ]
+ },
+ "outputs": [],
+ "source": [
+ "# mark in Kayle as pipeline parameters\n",
+ "data_file_to_run = \"IMP_height.txt\"\n",
+ "\n",
+ "experiment_description = \"Soy Height GWAS\"\n",
+ "\n",
+ "learning_rate = 0.001 # 0.0001957\n",
+ "conv_1_dropout_rate = 0.50 # Dropout rate for first convolutional layer\n",
+ "conv_1_kernel_l1 = 0.2 # L1 and l2 regularization for the first conv1d layer's weights\n",
+ "conv_1_kernel_l2 = 0.6\n",
+ "conv_1_bias_l2 = 0.6 # L1 and l2 regularization for the first conv1d layer's bias\n",
+ "conv_1_activity_l2 = 0.00001 # L1 and l2 activity regularization for the first conv1d layer\n",
+ "\n",
+ "conv_x_kernel_l1 = 0.3\n",
+ "conv_x_kernel_l2 = 0.3\n",
+ "conv_x_bias_l2 = 0.6\n",
+ "conv_x_activity_l2 = 0.0001\n",
+ "\n",
+ "dense_x_kernel_l1 = 0.3\n",
+ "dense_x_kernel_l2 = 0.6\n",
+ "dense_x_bias_l2 = 0.0001\n",
+ "dense_x_activity_l2 = 0.0001\n",
+ "\n",
+ "dense_out_kernel_l1 = 0.1\n",
+ "dense_out_kernel_l2 = 0.6\n",
+ "dense_out_bias_l2 = 0.6\n",
+ "dense_out_activity_l2 = 0.00001\n",
+ "\n",
+ "conv_initializer = 'TruncatedNormal' # # 'TruncatedNormal' 'glorot_uniform' \"GlorotNormal\", \"HeNormal\" 'random_normal' \n",
+ "dese_initializer = \"GlorotNormal\" # 'TruncatedNormal' # \"GlorotUniform\"\n",
+ "\n",
+ "dropout_rate = 0.151\n",
+ "num_dense_layers = 3\n",
+ "num_dense_units = 20\n",
+ "\n",
+ "conv_activation = \"elu\" # \"linear\"\n",
+ "activation = \"elu\"\n",
+ "loss = 'huber_loss' # 'mean_squared_error' 'mean_absolute_error'\n",
+ "\n",
+ "final_activation_scale_factor = 3.5\n",
+ "\n",
+ "batch_size = 30\n",
+ "epochs = 2\n",
+ "\n",
+ "time = '2023-01-051310'\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 4,
+ "id": "e06747ae-77b6-4524-b803-fc0d87512fbb",
+ "metadata": {
+ "tags": [
+ "block:preprocessing"
+ ]
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Preprocessing successful\n"
+ ]
+ }
+ ],
+ "source": [
+ "# mark in Kayle as pipeline step: \"preprocessing\": Depends on none\n",
+ "\n",
+ "ht = pd.read_csv(data_file_to_run, sep = '\\t')\n",
+ "ht_pd = ht_relevant_cols = ht.drop(columns = ['strain', 'height', 'folds'])\n",
+ "phenotypes_norm = ht_pd.pop(\"norm_phe\")\n",
+ "for col in ht_pd.columns:\n",
+ " ht_pd[col] = ht_pd[col].astype('category')\n",
+ "ohe_height_genotypes = pd.get_dummies(ht_pd)\n",
+ "\n",
+ "def train_test_splitting(row, split_ratio):\n",
+ " string_of_row = \"\".join([str(l) for l in list(row.values)])\n",
+ " return (abs(hash(string_of_row)) % 10) / 10 < split_ratio\n",
+ "belongs_in_train_set_index =\\\n",
+ " np.array([train_test_splitting(ohe_height_genotypes.loc[i],0.7)\n",
+ " for i in np.arange(ht_pd.shape[0])])\n",
+ "\n",
+ "train_ohe_height_genotypes = ohe_height_genotypes[belongs_in_train_set_index]\n",
+ "val_ohe_height_genotypes = ohe_height_genotypes[~belongs_in_train_set_index]\n",
+ "\n",
+ "train_phenotypes_norm = phenotypes_norm[belongs_in_train_set_index]\n",
+ "val_phenotypes_norm = phenotypes_norm[~belongs_in_train_set_index]\n",
+ "\n",
+ "# Make sure the number of rows in test and train add up to the original rows\n",
+ "assert train_ohe_height_genotypes.shape[0] + val_ohe_height_genotypes.shape[0] == ht_pd.shape[0]\n",
+ "\n",
+ "# Data as a numpy array...\n",
+ "train_ohe_height_genotypes_np = train_ohe_height_genotypes.values\n",
+ "val_ohe_height_genotypes_np = val_ohe_height_genotypes.values\n",
+ "\n",
+ "train_phenotypes_norm_np = train_phenotypes_norm.values\n",
+ "val_phenotypes_norm_np = val_phenotypes_norm.values\n",
+ "\n",
+ "# Reshape to fit the conv1D network. \n",
+ "train_np_ohe_reshaped_for_conv_1_d =\\\n",
+ " train_ohe_height_genotypes_np.reshape((train_ohe_height_genotypes_np.shape[0],\n",
+ " train_ohe_height_genotypes_np.shape[1], 1))\n",
+ "val_np_ohe_reshaped_for_conv_1_d =\\\n",
+ " val_ohe_height_genotypes_np.reshape((val_ohe_height_genotypes_np.shape[0],\n",
+ " val_ohe_height_genotypes_np.shape[1],1))\n",
+ "\n",
+ "np.save('train_data_ready', train_np_ohe_reshaped_for_conv_1_d)\n",
+ "np.save('val_data_ready', val_np_ohe_reshaped_for_conv_1_d)\n",
+ "np.save('train_labels_ready', train_phenotypes_norm_np)\n",
+ "np.save('val_labels_ready',val_phenotypes_norm_np)\n",
+ "\n",
+ "# Since the data was reshaped for the convolutional 1D neural network, we are also saving the \n",
+ "# non-reshaped data to be used for calculating saliency.\n",
+ "\n",
+ "np.save(\"val_snps_for_saliency\", val_ohe_height_genotypes_np)\n",
+ "\n",
+ "print(\"Preprocessing successful\")\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 41,
+ "id": "dd01e8b2-78ba-4315-9e12-9153b1c57ce5",
+ "metadata": {
+ "tags": [
+ "block:train",
+ "prev:preprocessing"
+ ]
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Epoch 1/2\n",
+ " 37/120 [========>.....................] - ETA: 1:07 - loss: 569.6301 - mean_absolute_error: 2.0097"
+ ]
+ },
+ {
+ "ename": "KeyboardInterrupt",
+ "evalue": "",
+ "output_type": "error",
+ "traceback": [
+ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
+ "\u001b[0;31mKeyboardInterrupt\u001b[0m Traceback (most recent call last)",
+ "Cell \u001b[0;32mIn [41], line 132\u001b[0m\n\u001b[1;32m 125\u001b[0m \u001b[38;5;66;03m# qa_data_model = Model(inputs = inputs, outputs = outputs)\u001b[39;00m\n\u001b[1;32m 126\u001b[0m our_data_model\u001b[38;5;241m.\u001b[39mcompile(loss\u001b[38;5;241m=\u001b[39mloss,\n\u001b[1;32m 127\u001b[0m optimizer\u001b[38;5;241m=\u001b[39mtf\u001b[38;5;241m.\u001b[39mkeras\u001b[38;5;241m.\u001b[39moptimizers\u001b[38;5;241m.\u001b[39mAdam(learning_rate\u001b[38;5;241m=\u001b[39mlearning_rate),\n\u001b[1;32m 128\u001b[0m metrics\u001b[38;5;241m=\u001b[39m[tf\u001b[38;5;241m.\u001b[39mkeras\u001b[38;5;241m.\u001b[39mmetrics\u001b[38;5;241m.\u001b[39mMeanAbsoluteError()],\n\u001b[1;32m 129\u001b[0m jit_compile\u001b[38;5;241m=\u001b[39m\u001b[38;5;28;01mTrue\u001b[39;00m)\n\u001b[1;32m 131\u001b[0m history \u001b[38;5;241m=\u001b[39m\\\n\u001b[0;32m--> 132\u001b[0m \u001b[43mour_data_model\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mfit\u001b[49m\u001b[43m(\u001b[49m\u001b[43mx\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43m \u001b[49m\u001b[43mtrain_snps\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 133\u001b[0m \u001b[43m \u001b[49m\u001b[43my\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43m \u001b[49m\u001b[43mtrain_phenotypes\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 134\u001b[0m \u001b[43m \u001b[49m\u001b[43mbatch_size\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mbatch_size\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 135\u001b[0m \u001b[43m \u001b[49m\u001b[43mepochs\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mepochs\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 136\u001b[0m \u001b[43m \u001b[49m\u001b[43mvalidation_data\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43m(\u001b[49m\u001b[43mval_snps\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mval_phenotypes\u001b[49m\u001b[43m)\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 137\u001b[0m \u001b[43m \u001b[49m\u001b[43mshuffle\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43m \u001b[49m\u001b[38;5;28;43;01mTrue\u001b[39;49;00m\u001b[43m,\u001b[49m\n\u001b[1;32m 138\u001b[0m \u001b[43m \u001b[49m\u001b[43muse_multiprocessing\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[38;5;28;43;01mTrue\u001b[39;49;00m\u001b[43m)\u001b[49m\n\u001b[1;32m 139\u001b[0m \u001b[38;5;66;03m# Requirement 6: save and log your artifact. \u001b[39;00m\n\u001b[1;32m 140\u001b[0m \u001b[38;5;66;03m# I'm adding a random number to the file name as an\u001b[39;00m\n\u001b[1;32m 141\u001b[0m \u001b[38;5;66;03m# extra layer of safety nets against race conditions\u001b[39;00m\n\u001b[1;32m 142\u001b[0m \u001b[38;5;66;03m# / file name conflicts\u001b[39;00m\n\u001b[1;32m 143\u001b[0m \n\u001b[1;32m 144\u001b[0m \u001b[38;5;66;03m# tn = str(int(np.random.random() * 10 ** 12))\u001b[39;00m\n\u001b[1;32m 145\u001b[0m model_folder \u001b[38;5;241m=\u001b[39m \u001b[38;5;124mf\u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124ma-\u001b[39m\u001b[38;5;132;01m{\u001b[39;00mtime\u001b[38;5;132;01m}\u001b[39;00m\u001b[38;5;124m-model\u001b[39m\u001b[38;5;124m\"\u001b[39m\n",
+ "File \u001b[0;32m~/.local/lib/python3.8/site-packages/keras/utils/traceback_utils.py:65\u001b[0m, in \u001b[0;36mfilter_traceback..error_handler\u001b[0;34m(*args, **kwargs)\u001b[0m\n\u001b[1;32m 63\u001b[0m filtered_tb \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;01mNone\u001b[39;00m\n\u001b[1;32m 64\u001b[0m \u001b[38;5;28;01mtry\u001b[39;00m:\n\u001b[0;32m---> 65\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[43mfn\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43margs\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43mkwargs\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m 66\u001b[0m \u001b[38;5;28;01mexcept\u001b[39;00m \u001b[38;5;167;01mException\u001b[39;00m \u001b[38;5;28;01mas\u001b[39;00m e:\n\u001b[1;32m 67\u001b[0m filtered_tb \u001b[38;5;241m=\u001b[39m _process_traceback_frames(e\u001b[38;5;241m.\u001b[39m__traceback__)\n",
+ "File \u001b[0;32m~/.local/lib/python3.8/site-packages/keras/engine/training.py:1650\u001b[0m, in \u001b[0;36mModel.fit\u001b[0;34m(self, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, steps_per_epoch, validation_steps, validation_batch_size, validation_freq, max_queue_size, workers, use_multiprocessing)\u001b[0m\n\u001b[1;32m 1642\u001b[0m \u001b[38;5;28;01mwith\u001b[39;00m tf\u001b[38;5;241m.\u001b[39mprofiler\u001b[38;5;241m.\u001b[39mexperimental\u001b[38;5;241m.\u001b[39mTrace(\n\u001b[1;32m 1643\u001b[0m \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mtrain\u001b[39m\u001b[38;5;124m\"\u001b[39m,\n\u001b[1;32m 1644\u001b[0m epoch_num\u001b[38;5;241m=\u001b[39mepoch,\n\u001b[0;32m (...)\u001b[0m\n\u001b[1;32m 1647\u001b[0m _r\u001b[38;5;241m=\u001b[39m\u001b[38;5;241m1\u001b[39m,\n\u001b[1;32m 1648\u001b[0m ):\n\u001b[1;32m 1649\u001b[0m callbacks\u001b[38;5;241m.\u001b[39mon_train_batch_begin(step)\n\u001b[0;32m-> 1650\u001b[0m tmp_logs \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mtrain_function\u001b[49m\u001b[43m(\u001b[49m\u001b[43miterator\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m 1651\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m data_handler\u001b[38;5;241m.\u001b[39mshould_sync:\n\u001b[1;32m 1652\u001b[0m context\u001b[38;5;241m.\u001b[39masync_wait()\n",
+ "File \u001b[0;32m~/.local/lib/python3.8/site-packages/tensorflow/python/util/traceback_utils.py:150\u001b[0m, in \u001b[0;36mfilter_traceback..error_handler\u001b[0;34m(*args, **kwargs)\u001b[0m\n\u001b[1;32m 148\u001b[0m filtered_tb \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;01mNone\u001b[39;00m\n\u001b[1;32m 149\u001b[0m \u001b[38;5;28;01mtry\u001b[39;00m:\n\u001b[0;32m--> 150\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[43mfn\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43margs\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43mkwargs\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m 151\u001b[0m \u001b[38;5;28;01mexcept\u001b[39;00m \u001b[38;5;167;01mException\u001b[39;00m \u001b[38;5;28;01mas\u001b[39;00m e:\n\u001b[1;32m 152\u001b[0m filtered_tb \u001b[38;5;241m=\u001b[39m _process_traceback_frames(e\u001b[38;5;241m.\u001b[39m__traceback__)\n",
+ "File \u001b[0;32m~/.local/lib/python3.8/site-packages/tensorflow/python/eager/polymorphic_function/polymorphic_function.py:880\u001b[0m, in \u001b[0;36mFunction.__call__\u001b[0;34m(self, *args, **kwds)\u001b[0m\n\u001b[1;32m 877\u001b[0m compiler \u001b[38;5;241m=\u001b[39m \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mxla\u001b[39m\u001b[38;5;124m\"\u001b[39m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_jit_compile \u001b[38;5;28;01melse\u001b[39;00m \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mnonXla\u001b[39m\u001b[38;5;124m\"\u001b[39m\n\u001b[1;32m 879\u001b[0m \u001b[38;5;28;01mwith\u001b[39;00m OptionalXlaContext(\u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_jit_compile):\n\u001b[0;32m--> 880\u001b[0m result \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_call\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43margs\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43mkwds\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m 882\u001b[0m new_tracing_count \u001b[38;5;241m=\u001b[39m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39mexperimental_get_tracing_count()\n\u001b[1;32m 883\u001b[0m without_tracing \u001b[38;5;241m=\u001b[39m (tracing_count \u001b[38;5;241m==\u001b[39m new_tracing_count)\n",
+ "File \u001b[0;32m~/.local/lib/python3.8/site-packages/tensorflow/python/eager/polymorphic_function/polymorphic_function.py:919\u001b[0m, in \u001b[0;36mFunction._call\u001b[0;34m(self, *args, **kwds)\u001b[0m\n\u001b[1;32m 916\u001b[0m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_lock\u001b[38;5;241m.\u001b[39mrelease()\n\u001b[1;32m 917\u001b[0m \u001b[38;5;66;03m# In this case we have not created variables on the first call. So we can\u001b[39;00m\n\u001b[1;32m 918\u001b[0m \u001b[38;5;66;03m# run the first trace but we should fail if variables are created.\u001b[39;00m\n\u001b[0;32m--> 919\u001b[0m results \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_variable_creation_fn\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43margs\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43mkwds\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m 920\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_created_variables \u001b[38;5;129;01mand\u001b[39;00m \u001b[38;5;129;01mnot\u001b[39;00m ALLOW_DYNAMIC_VARIABLE_CREATION:\n\u001b[1;32m 921\u001b[0m \u001b[38;5;28;01mraise\u001b[39;00m \u001b[38;5;167;01mValueError\u001b[39;00m(\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mCreating variables on a non-first call to a function\u001b[39m\u001b[38;5;124m\"\u001b[39m\n\u001b[1;32m 922\u001b[0m \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124m decorated with tf.function.\u001b[39m\u001b[38;5;124m\"\u001b[39m)\n",
+ "File \u001b[0;32m~/.local/lib/python3.8/site-packages/tensorflow/python/eager/polymorphic_function/tracing_compiler.py:134\u001b[0m, in \u001b[0;36mTracingCompiler.__call__\u001b[0;34m(self, *args, **kwargs)\u001b[0m\n\u001b[1;32m 131\u001b[0m \u001b[38;5;28;01mwith\u001b[39;00m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_lock:\n\u001b[1;32m 132\u001b[0m (concrete_function,\n\u001b[1;32m 133\u001b[0m filtered_flat_args) \u001b[38;5;241m=\u001b[39m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_maybe_define_function(args, kwargs)\n\u001b[0;32m--> 134\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[43mconcrete_function\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_call_flat\u001b[49m\u001b[43m(\u001b[49m\n\u001b[1;32m 135\u001b[0m \u001b[43m \u001b[49m\u001b[43mfiltered_flat_args\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mcaptured_inputs\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mconcrete_function\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mcaptured_inputs\u001b[49m\u001b[43m)\u001b[49m\n",
+ "File \u001b[0;32m~/.local/lib/python3.8/site-packages/tensorflow/python/eager/polymorphic_function/monomorphic_function.py:1745\u001b[0m, in \u001b[0;36mConcreteFunction._call_flat\u001b[0;34m(self, args, captured_inputs, cancellation_manager)\u001b[0m\n\u001b[1;32m 1741\u001b[0m possible_gradient_type \u001b[38;5;241m=\u001b[39m gradients_util\u001b[38;5;241m.\u001b[39mPossibleTapeGradientTypes(args)\n\u001b[1;32m 1742\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m (possible_gradient_type \u001b[38;5;241m==\u001b[39m gradients_util\u001b[38;5;241m.\u001b[39mPOSSIBLE_GRADIENT_TYPES_NONE\n\u001b[1;32m 1743\u001b[0m \u001b[38;5;129;01mand\u001b[39;00m executing_eagerly):\n\u001b[1;32m 1744\u001b[0m \u001b[38;5;66;03m# No tape is watching; skip to running the function.\u001b[39;00m\n\u001b[0;32m-> 1745\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_build_call_outputs(\u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_inference_function\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mcall\u001b[49m\u001b[43m(\u001b[49m\n\u001b[1;32m 1746\u001b[0m \u001b[43m \u001b[49m\u001b[43mctx\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43margs\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mcancellation_manager\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mcancellation_manager\u001b[49m\u001b[43m)\u001b[49m)\n\u001b[1;32m 1747\u001b[0m forward_backward \u001b[38;5;241m=\u001b[39m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_select_forward_and_backward_functions(\n\u001b[1;32m 1748\u001b[0m args,\n\u001b[1;32m 1749\u001b[0m possible_gradient_type,\n\u001b[1;32m 1750\u001b[0m executing_eagerly)\n\u001b[1;32m 1751\u001b[0m forward_function, args_with_tangents \u001b[38;5;241m=\u001b[39m forward_backward\u001b[38;5;241m.\u001b[39mforward()\n",
+ "File \u001b[0;32m~/.local/lib/python3.8/site-packages/tensorflow/python/eager/polymorphic_function/monomorphic_function.py:378\u001b[0m, in \u001b[0;36m_EagerDefinedFunction.call\u001b[0;34m(self, ctx, args, cancellation_manager)\u001b[0m\n\u001b[1;32m 376\u001b[0m \u001b[38;5;28;01mwith\u001b[39;00m _InterpolateFunctionError(\u001b[38;5;28mself\u001b[39m):\n\u001b[1;32m 377\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m cancellation_manager \u001b[38;5;129;01mis\u001b[39;00m \u001b[38;5;28;01mNone\u001b[39;00m:\n\u001b[0;32m--> 378\u001b[0m outputs \u001b[38;5;241m=\u001b[39m \u001b[43mexecute\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mexecute\u001b[49m\u001b[43m(\u001b[49m\n\u001b[1;32m 379\u001b[0m \u001b[43m \u001b[49m\u001b[38;5;28;43mstr\u001b[39;49m\u001b[43m(\u001b[49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43msignature\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mname\u001b[49m\u001b[43m)\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 380\u001b[0m \u001b[43m \u001b[49m\u001b[43mnum_outputs\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_num_outputs\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 381\u001b[0m \u001b[43m \u001b[49m\u001b[43minputs\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43margs\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 382\u001b[0m \u001b[43m \u001b[49m\u001b[43mattrs\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mattrs\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 383\u001b[0m \u001b[43m \u001b[49m\u001b[43mctx\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mctx\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m 384\u001b[0m \u001b[38;5;28;01melse\u001b[39;00m:\n\u001b[1;32m 385\u001b[0m outputs \u001b[38;5;241m=\u001b[39m execute\u001b[38;5;241m.\u001b[39mexecute_with_cancellation(\n\u001b[1;32m 386\u001b[0m \u001b[38;5;28mstr\u001b[39m(\u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39msignature\u001b[38;5;241m.\u001b[39mname),\n\u001b[1;32m 387\u001b[0m num_outputs\u001b[38;5;241m=\u001b[39m\u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_num_outputs,\n\u001b[0;32m (...)\u001b[0m\n\u001b[1;32m 390\u001b[0m ctx\u001b[38;5;241m=\u001b[39mctx,\n\u001b[1;32m 391\u001b[0m cancellation_manager\u001b[38;5;241m=\u001b[39mcancellation_manager)\n",
+ "File \u001b[0;32m~/.local/lib/python3.8/site-packages/tensorflow/python/eager/execute.py:52\u001b[0m, in \u001b[0;36mquick_execute\u001b[0;34m(op_name, num_outputs, inputs, attrs, ctx, name)\u001b[0m\n\u001b[1;32m 50\u001b[0m \u001b[38;5;28;01mtry\u001b[39;00m:\n\u001b[1;32m 51\u001b[0m ctx\u001b[38;5;241m.\u001b[39mensure_initialized()\n\u001b[0;32m---> 52\u001b[0m tensors \u001b[38;5;241m=\u001b[39m \u001b[43mpywrap_tfe\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mTFE_Py_Execute\u001b[49m\u001b[43m(\u001b[49m\u001b[43mctx\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_handle\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mdevice_name\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mop_name\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 53\u001b[0m \u001b[43m \u001b[49m\u001b[43minputs\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mattrs\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mnum_outputs\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m 54\u001b[0m \u001b[38;5;28;01mexcept\u001b[39;00m core\u001b[38;5;241m.\u001b[39m_NotOkStatusException \u001b[38;5;28;01mas\u001b[39;00m e:\n\u001b[1;32m 55\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m name \u001b[38;5;129;01mis\u001b[39;00m \u001b[38;5;129;01mnot\u001b[39;00m \u001b[38;5;28;01mNone\u001b[39;00m:\n",
+ "\u001b[0;31mKeyboardInterrupt\u001b[0m: "
+ ]
+ }
+ ],
+ "source": [
+ "# mark in Kayle as pipeline step \"train\": depends on \"data-preprocessing\"\n",
+ "nb_classes = 3\n",
+ "\n",
+ "data_files = ['./train_data_ready.npy',\n",
+ " \"./val_data_ready.npy\",\n",
+ " \"./train_labels_ready.npy\",\n",
+ " \"./val_labels_ready.npy\"]\n",
+ "# artifact_bucket_root_name = artifacts_bucket.split('/')[-1]\n",
+ "# print(artifact_bucket_root_name)\n",
+ "# storage_client = storage.Client()\n",
+ "# bucket = storage_client.get_bucket(artifact_bucket_root_name)\n",
+ "\n",
+ "ht_np_train = np.load('train_data_ready.npy', allow_pickle=True)\n",
+ "ht_np_val = np.load('val_data_ready.npy', allow_pickle=True)\n",
+ "train_labels_np = np.load('train_labels_ready.npy', allow_pickle=True)\n",
+ "val_labels_np = np.load('val_labels_ready.npy', allow_pickle=True)\n",
+ "\n",
+ "train_snps = ht_np_train\n",
+ "train_phenotypes = train_labels_np\n",
+ "val_snps = ht_np_val\n",
+ "val_phenotypes = val_labels_np\n",
+ "\n",
+ "# print(\"min\")\n",
+ "# print(val_phenotypes.min())\n",
+ "# print(\"max\")\n",
+ "# print(val_phenotypes.max())\n",
+ "\n",
+ "inputs =\\\n",
+ " tf.keras.layers.Input(\n",
+ " shape=(train_snps.shape[1], \n",
+ " train_snps.shape[2])) # train_snps.shape[1] ,nb_classes))\n",
+ "\n",
+ "x = Conv1D(10,\n",
+ " nb_classes,\n",
+ " padding='same',\n",
+ " activation = conv_activation,\n",
+ " kernel_initializer = conv_initializer,\n",
+ " kernel_regularizer=tf.keras.regularizers.L1L2(l1=conv_1_kernel_l1, l2=conv_1_kernel_l2),\n",
+ " bias_regularizer=tf.keras.regularizers.L2(conv_1_bias_l2),\n",
+ " activity_regularizer=tf.keras.regularizers.L2(conv_1_activity_l2)\n",
+ " )(inputs)\n",
+ "\n",
+ " # kernel_initializer = conv_initializer ,\n",
+ " # kernel_regularizer=\"l2\", bias_regularizer = \"l2\")\n",
+ "x = Dropout(conv_1_dropout_rate)(x)\n",
+ "\n",
+ "x = Conv1D(10,\n",
+ " 20,\n",
+ " padding='same',\n",
+ " activation = conv_activation,\n",
+ " kernel_initializer = conv_initializer,\n",
+ " kernel_regularizer=tf.keras.regularizers.L1L2(l1=conv_x_kernel_l1, l2=conv_x_kernel_l2),\n",
+ " bias_regularizer=tf.keras.regularizers.L2(conv_x_bias_l2),\n",
+ " activity_regularizer=tf.keras.regularizers.L2(conv_x_activity_l2)\n",
+ " # kernel_initializer = 'TruncatedNormal',\n",
+ " # kernel_regularizer=\"l2\",\n",
+ " # bias_regularizer=\"l2\"\n",
+ " )(x) # Leaving l1 l2 on head layer only to see if this prevents everything from zeroing out.\n",
+ "\n",
+ "x = Dropout(dropout_rate)(x)\n",
+ "\n",
+ "\n",
+ "shortcut = Conv1D(10,\n",
+ " 4,\n",
+ " padding='same',\n",
+ " activation = conv_activation,\n",
+ " kernel_initializer = conv_initializer,\n",
+ " kernel_regularizer=tf.keras.regularizers.L1L2(l1=conv_x_kernel_l1, l2=conv_x_kernel_l2),\n",
+ " bias_regularizer=tf.keras.regularizers.L2(conv_x_bias_l2),\n",
+ " activity_regularizer=tf.keras.regularizers.L2(conv_x_activity_l2))(inputs)\n",
+ "shortcut = Dropout(dropout_rate)(shortcut)\n",
+ "x = tf.keras.layers.Add()([shortcut,x])\n",
+ "\n",
+ "x = Conv1D(10,\n",
+ " 4,\n",
+ " padding='same',\n",
+ " activation = conv_activation,\n",
+ " kernel_initializer = conv_initializer,\n",
+ " kernel_regularizer=tf.keras.regularizers.L1L2(l1=conv_x_kernel_l1, l2=conv_x_kernel_l2),\n",
+ " bias_regularizer=tf.keras.regularizers.L2(conv_x_bias_l2),\n",
+ " activity_regularizer=tf.keras.regularizers.L2(conv_x_activity_l2)\n",
+ " # kernel_initializer = 'TruncatedNormal', \n",
+ " # kernel_regularizer = \"l2\",\n",
+ " # bias_regularizer = \"l2\"\n",
+ " )(x)\n",
+ "\n",
+ "# x = Dropout(dropout_rate)(x)\n",
+ "\n",
+ "x = Flatten()(x)\n",
+ "# x = Dropout(dropout_rate)(x)\n",
+ "x = BatchNormalization()(x)\n",
+ "\n",
+ "if num_dense_layers > 0:\n",
+ " y = x\n",
+ " for i in np.arange(num_dense_layers):\n",
+ " y = Dense(num_dense_units, \n",
+ " activation,\n",
+ " kernel_initializer=dese_initializer,\n",
+ " kernel_regularizer=tf.keras.regularizers.L1L2(l1=dense_x_kernel_l1, l2=dense_x_kernel_l2),\n",
+ " bias_regularizer=tf.keras.regularizers.L2(dense_x_bias_l2),\n",
+ " activity_regularizer=tf.keras.regularizers.L2(dense_x_activity_l2)\n",
+ " )(y)\n",
+ " # y = Dropout(dropout_rate)(y)\n",
+ " y = BatchNormalization()(y)\n",
+ " \n",
+ " x = tf.keras.layers.Concatenate(axis=1)([x,y])\n",
+ " x = BatchNormalization()(x)\n",
+ "\n",
+ "outputs_unscaled = Dense(1,\n",
+ " activation=\"softsign\",\n",
+ " kernel_initializer=dese_initializer,\n",
+ " kernel_regularizer=tf.keras.regularizers.L1L2(l1=dense_out_kernel_l1, l2=dense_out_kernel_l2),\n",
+ " bias_regularizer=tf.keras.regularizers.L2(dense_out_bias_l2),\n",
+ " activity_regularizer=tf.keras.regularizers.L2(dense_out_activity_l2),\n",
+ " # bias_regularizer = \"l2\",\n",
+ " # kernel_initializer = 'TruncatedNormal',\n",
+ " name = 'out')(x) # Should have no activation\n",
+ "# Softsign coerces the output to the range {-1,1}. The labels are norm scaled, \n",
+ "# where the range {-2,2} or {-3,3} encompasses most values. We multiply by a scalar \n",
+ "# and the range will terminate at +/- said scalar. No telling which one is optimal,\n",
+ "# so we'll le the the tuner figure out what: \n",
+ "outputs = Lambda(lambda x: x * final_activation_scale_factor)(outputs_unscaled) \n",
+ "\n",
+ "our_data_model = Model(inputs = inputs, outputs = outputs)\n",
+ "# qa_data_model = Model(inputs = inputs, outputs = outputs)\n",
+ "our_data_model.compile(loss=loss,\n",
+ " optimizer=tf.keras.optimizers.Adam(learning_rate=learning_rate),\n",
+ " metrics=[tf.keras.metrics.MeanAbsoluteError()],\n",
+ " jit_compile=True)\n",
+ "\n",
+ "history =\\\n",
+ " our_data_model.fit(x = train_snps,\n",
+ " y = train_phenotypes,\n",
+ " batch_size=batch_size,\n",
+ " epochs=epochs,\n",
+ " validation_data=(val_snps, val_phenotypes),\n",
+ " shuffle= True,\n",
+ " use_multiprocessing=True)\n",
+ "# Requirement 6: save and log your artifact. \n",
+ "# I'm adding a random number to the file name as an\n",
+ "# extra layer of safety nets against race conditions\n",
+ "# / file name conflicts\n",
+ "\n",
+ "# tn = str(int(np.random.random() * 10 ** 12))\n",
+ "model_folder = f\"a-{time}-model\"\n",
+ "our_data_model.save(model_folder)\n",
+ "\n",
+ "history_df = pd.DataFrame(history.history)\n",
+ "\n",
+ "history_df[[\"mean_absolute_error\", \"val_mean_absolute_error\"]].plot()\n",
+ "plt.savefig(f'{model_folder}-history.png')\n",
+ "\n",
+ "print(model_folder)\n",
+ "\n",
+ "val_mean_absolute_error = float(history_df['val_mean_absolute_error'].values.min())\n",
+ "\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 34,
+ "id": "74954dbc-9133-40d5-9f23-400a39807724",
+ "metadata": {
+ "tags": [
+ "block:saliency_observed",
+ "prev:preprocessing"
+ ]
+ },
+ "outputs": [],
+ "source": [
+ "# mark in Kayle as pipeline step: \"saliency - known\": depends on \"data-preprocessing\"\n",
+ "\n",
+ "# Calculate observed p values: (Depends on data preprocessing)\n",
+ "\n",
+ "val_snps_s = np.load(\"val_snps_for_saliency.npy\", allow_pickle=True)\n",
+ "val_phenotypes_k = np.load('val_labels_ready.npy', allow_pickle=True)\n",
+ "\n",
+ "p_values = []\n",
+ "for i in np.arange(int(val_snps_s.shape[1] / 3)):\n",
+ " column_index_lower_bound = 3 * i\n",
+ " column_index_upper_bound = 3 * i + 3\n",
+ " data = val_snps_s[:,column_index_lower_bound:column_index_upper_bound]\n",
+ " data_reshaped = np.argmax(data, axis=1)\n",
+ " slope, intercept, r_value, p_value, std_err = stats.linregress(data_reshaped, val_phenotypes_k)\n",
+ " p_values.append(p_value)\n",
+ "p_values_observed_np = np.array(p_values)\n",
+ "np.save('p_values_observed_np', \n",
+ " p_values_observed_np, \n",
+ " allow_pickle=True)\n",
+ "\n",
+ "\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 35,
+ "id": "464ffbb4-eb9e-438e-ac1e-292a2e45a99f",
+ "metadata": {
+ "tags": [
+ "block:manhattan_observed",
+ "prev:saliency_observed"
+ ]
+ },
+ "outputs": [
+ {
+ "data": {
+ "image/png": "iVBORw0KGgoAAAANSUhEUgAAAikAAAHHCAYAAAB6NchxAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuNCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8QVMy6AAAACXBIWXMAAA9hAAAPYQGoP6dpAAEAAElEQVR4nOydd3hTZRvG75O0TfeClrZQWlZpKaPsvZfsIVuGqDhAUBRF9FOGqICKKFtERARFZO+9N5SWPVpaKHTvPZLzfn+kSZs2SbOTts+P61w0Zz5JTs65z7NejjHGQBAEQRAEYWEIzG0AQRAEQRCEMkikEARBEARhkZBIIQiCIAjCIiGRQhAEQRCERUIihSAIgiAIi4RECkEQBEEQFgmJFIIgCIIgLBISKQRBEARBWCQkUgiCIAiCsEhIpJiYBQsWgOM4JCcnm9sUi6JHjx7o0aOHSY715MkT9OvXDy4uLuA4Dnv27DHJcas6r7/+Ovz9/XXe1tHR0bAGETpRna9RHMdhwYIF5jaDKEW1ECl//PEHOI4Dx3G4cOFCueWMMfj6+oLjOAwePNgMFurHmjVr8Mcff5Sbf//+fSxYsADR0dEmt0mGv7+//LPnOA6enp7o2rUrdu/ebZD95+bmYsGCBThz5ozG20yZMgV37tzBN998gy1btqBNmzYGsUUVSUlJ+OCDDxAYGAg7Ozt4enqiXbt2mDt3LrKzs4167LLIfgs3btxQurxHjx5o2rSpSW3SBl2+bwBITEzEZ599hmbNmsHR0RG2trZo2LAhpk6dqnBN+Pfff8FxnNLzs0WLFuA4DqdPny63rG7duujUqVO5+e3atQPHcVi7dq1K2+7cuYNRo0bBz88Ptra2qF27Nvr27YuVK1dq9R4JoipSLUSKDFtbW2zbtq3c/LNnz+LFixcQiURmsEp/1ImUhQsXmlWkAEBISAi2bNmCLVu2YM6cOYiNjcXIkSOxbt06vfedm5uLhQsXanzTysvLw+XLl/Hmm2/i/fffx8SJE1GnTh297VBFamoq2rRpgz///BODBg3CL7/8go8++ggNGzbE2rVrq9TT6oYNG/Do0SOjHkPb7xsArl27huDgYKxYsQKtW7fG0qVLsWrVKowdOxbXrl1D165dce7cOQBAly5dAKDcw0xmZibu3r0LKysrXLx4UWFZTEwMYmJi5NvKePLkCa5fvw5/f39s3bpVqW2XLl1CmzZtEB4ejmnTpmHVqlV46623IBAI8PPPP2v8HgmiqmJlbgNMycCBA7Fjxw788ssvsLIqeevbtm1D69atq9QNw5KoXbs2Jk6cKH89efJkNGzYED/99BPeffddk9qSlJQEAHB1dTXYPnNycuDg4KB02caNG/H8+XNcvHix3JN2ZmYmbGxsDGaHubG2tja3CeVIS0vD8OHDYWVlhbCwMAQGBiosX7x4Mf755x/Y2dkBAHx8fFCvXr1yIuXy5ctgjGH06NHllslelxUpf/31Fzw9PfHjjz9i1KhRiI6OLhcO++abb+Di4oLr16+XOycTExN1fdtVAp7nUVhYCFtbW3ObQpiRauVJGT9+PFJSUnD8+HH5vMLCQvz333+YMGGC0m1++OEHdOrUCTVq1ICdnR1at26N//77r9x6HMfh/fffx549e9C0aVOIRCIEBwfjyJEjSvebnp6O119/Ha6urnBxccHUqVORm5ursM6mTZvQq1cveHp6QiQSoUmTJuXcxv7+/rh37x7Onj0rD6n06NEDf/zxB0aPHg0A6Nmzp3yZ7Al07969GDRoEHx8fCASidCgQQN8/fXXkEgkCvuXuf/v37+Pnj17wt7eHrVr18ayZcvUf9hq8PLyQlBQEKKiotSul5iYiDfffBO1atWCra0tWrRogc2bN8uXR0dHw8PDAwCwcOFC+XtUFVNesGAB/Pz8AACffPIJOI5TuGncunULAwYMgLOzMxwdHdG7d29cuXJFYR+ycMnZs2cxffp0eHp6qvXEREZGQigUokOHDuWWOTs7l7sA79ixA61bt4adnR1q1qyJiRMn4uXLl/LlmzZtAsdxuHXrVrn9ffvttxAKhQrrG4q//vpLbpe7uzvGjRuHmJgYhXWU5aSkpKRg0qRJcHZ2hqurK6ZMmYLw8HBwHKfU+/fy5UsMHz4cjo6O8PDwwJw5c+TnpLbfNwCsW7cOcXFxWLFiRTmBAkh/t+PHj0fbtm3l87p06YJbt24hLy9PPu/ixYsIDg7GgAEDcOXKFfA8r7CM4zh07txZYd/btm3DqFGjMHjwYLi4uCj14kZGRiI4OFipaPb09FT5vnTl1KlT6Nq1KxwcHODq6ophw4bhwYMHStdNTk7GmDFj4OzsjBo1auCDDz5Afn6+wjrHjx9Hly5d4OrqCkdHRzRu3Biff/65wjoFBQWYP38+GjZsCJFIBF9fX3z66acoKChQWE92Dd26dSuCg4MhEomwf/9+uLu7Y+rUqeXsy8zMhK2tLebMmaP1sQoKCjB79mx4eHjAyckJQ4cOxYsXL7T6LAkTwaoBmzZtYgDY9evXWadOndikSZPky/bs2cMEAgF7+fIl8/PzY4MGDVLYtk6dOmz69Ols1apVbPny5axdu3YMADtw4IDCegBYixYtmLe3N/v666/ZihUrWP369Zm9vT1LTk6Wrzd//nwGgLVs2ZKNHDmSrVmzhr311lsMAPv0008V9tm2bVv2+uuvs59++omtXLmS9evXjwFgq1atkq+ze/duVqdOHRYYGMi2bNnCtmzZwo4dO8YiIyPZrFmzGAD2+eefy5fFx8czxhgbPnw4GzNmDPv+++/Z2rVr2ejRoxkANmfOHAUbunfvznx8fJivry/74IMP2Jo1a1ivXr0YAHbo0KEKP3tln2lhYSGrVasW8/LyUjhO9+7d5a9zc3NZUFAQs7a2ZrNnz2a//PIL69q1KwPAVqxYwRhjLDs7m61du5YBYCNGjJC/x/DwcKW2hIeHs59++okBYOPHj2dbtmxhu3fvZowxdvfuXebg4CD//pYsWcLq1avHRCIRu3LlinwfsnOpSZMmrHv37mzlypVsyZIlKt//t99+ywCwP/74o8LPSrbvtm3bsp9++ol99tlnzM7Ojvn7+7O0tDTGGGOZmZnMzs6Offzxx+W2b9KkCevVq5dGxzhx4gRLSkoqN3Xq1IkFBwcrbLN48WLGcRwbO3YsW7NmDVu4cCGrWbOmgl2MMTZlyhTm5+cnfy2RSFjHjh2ZUChk77//Plu1ahXr27cva9GiBQPANm3apLCtra0tCw4OZm+88QZbu3Yte/XVVxkAtmbNGsaY9t83Y4x17NiR2dnZscLCQrWfS2nWr1/PALDTp0/L5/Xq1Yu9/fbbLCIiggFQOGZISAgLCgpS2MeVK1cYAHb+/HnGGGNvvPEGa9KkSblj9evXjzk5ObE7d+5obJ+uHD9+nFlZWbGAgAC2bNky+ffo5ubGoqKi5OvJrlHNmjVjQ4YMYatWrWITJ05kABSunXfv3mU2NjasTZs27Oeff2br1q1jc+bMYd26dZOvI5FIWL9+/Zi9vT378MMP2fr169n777/PrKys2LBhwxTsA8CCgoKYh4cHW7hwIVu9ejW7desWe+ONN5irqysrKChQWH/z5s3y67q2x5K9nwkTJrBVq1axkSNHsubNmzMAbP78+Qb5vAnDUO1EyqpVq5iTkxPLzc1ljDE2evRo1rNnT8aY8huqbD0ZhYWFrGnTpuVuBgCYjY0Ni4iIkM8LDw9nANjKlSvl82QXgDfeeENh+xEjRrAaNWqoPTZjjPXv35/Vr19fYV5wcLDCDV7Gjh07yl1s1e37nXfeYfb29iw/P18+r3v37gwA+/PPP+XzCgoKmJeXF3v11VfL7aMsfn5+rF+/fvKbYHh4OBs3bhwDwGbOnKlwnNLvYcWKFQwA++uvv+TzCgsLWceOHZmjoyPLzMxkjDGWlJSk1YUlKiqKAWDff/+9wvzhw4czGxsbFhkZKZ8XGxvLnJycFC66snOpS5cuTCwWV3i8+Ph45uHhwQCwwMBA9u6777Jt27ax9PR0hfUKCwuZp6cna9q0KcvLy5PPP3DgAAPAvvrqK/m88ePHMx8fHyaRSOTzQkNDy934lSGzX91UWqRER0czoVDIvvnmG4X93Llzh1lZWSnMLytSdu7cqSAqGZPeSGQit6xIAcAWLVqkcJyWLVuy1q1by19r+327ubmxkJCQcvMzMzMVxFl2drZ82b179xgA9vXXXzPGGCsqKmIODg5s8+bNjDHGatWqxVavXi3fj1AoZNOmTVPY//vvv898fX0Zz/OMMcaOHTvGALBbt24prHfs2DEmFAqZUChkHTt2ZJ9++ik7evSoVqJKU0JCQpinpydLSUmRzwsPD2cCgYBNnjxZPk92jRo6dKjC9tOnT1cQaDLBn5SUpPKYW7ZsYQKBQC7WZKxbt44BYBcvXpTPA8AEAgG7d++ewrpHjx5lANj+/fsV5g8cOFDhWqjpscLCwhgANn36dIX1JkyYQCLFAqlW4R4AGDNmDPLy8nDgwAFkZWXhwIEDKkM9AOSxakAa387IyEDXrl0RGhpabt0+ffqgQYMG8tfNmzeHs7Mznj59Wm7dsrkYXbt2RUpKCjIzM5UeOyMjA8nJyejevTuePn2KjIwMzd6wBu8rKysLycnJ6Nq1K3Jzc/Hw4UOFdR0dHRVySmxsbNCuXTul70sZx44dg4eHBzw8PNCiRQvs2LEDkyZNwtKlS1Vuc+jQIXh5eWH8+PHyedbW1pg1axays7Nx9uxZTd9qhUgkEhw7dgzDhw9H/fr15fO9vb0xYcIEXLhwQeF7AYBp06ZBKBRWuO9atWohPDwc7777LtLS0rBu3TpMmDABnp6e+Prrr8EYAwDcuHEDiYmJmD59ukIIaNCgQQgMDMTBgwfl8yZPnozY2FiFKpOtW7fCzs4Or776qkbvefXq1Th+/Hi5qXnz5grr7dq1CzzPY8yYMUhOTpZPXl5eaNSokdJKFxlHjhyBtbU1pk2bJp8nEAgwY8YMldso+11oep4pIzMzU2lp86RJk+TnpIeHB+bOnStfFhQUhBo1ashzTcLDw5GTkyPPKerUqZM8efby5cuQSCQK+ShisRjbt2/H2LFjwXEcAMjDtmUTaPv27YvLly9j6NChCA8Px7Jly9C/f3/Url0b+/bt0/l9lyUuLg5hYWF4/fXX4e7uLp/fvHlz9O3bF4cOHSq3TdnvaebMmQAgX1cWotq7d69C+Ks0O3bsQFBQEAIDAxXOn169egFAufOne/fuaNKkicK8Xr16oWbNmti+fbt8XlpaGo4fP46xY8dqfSyZ/bNmzVI4zocffqj0PRDmpVolzgKAh4cH+vTpg23btiE3NxcSiQSjRo1Suf6BAwewePFihIWFKcQ1ZRef0tStW7fcPDc3N6SlpVW4rpubGwDpj8/Z2RmANNY9f/58XL58uVy+SkZGBlxcXNS8U/Xcu3cP//vf/3Dq1KlyN+CyAqhOnTrl3q+bmxtu376t0bHat2+PxYsXg+M42NvbIygoqMLE1WfPnqFRo0YQCBR1dFBQkHy5oUhKSkJubi4aN25cbllQUBB4nkdMTAyCg4Pl8+vVq6fx/r29vbF27VqsWbMGT548wdGjR7F06VJ89dVX8Pb2xltvvSV/P8psCAwMVEjW7Nu3L7y9vbF161b07t0bPM/j77//xrBhw+Dk5KSRTe3atVNaeu3m5qaQQP7kyRMwxtCoUSOl+1GXLPvs2TN4e3vD3t5eYX7Dhg2Vrm9rayvPOSltj7Lfj6Y4OTkpLfNetGgR3n//fQDSz7M0HMehU6dOOHfuHHiex8WLF+Hp6Sm3u1OnTli1ahUAyMVKaZFy7NgxJCUloV27doiIiJDP79mzJ/7++28sXbpU4bxu27Ytdu3ahcLCQoSHh2P37t346aefMGrUKISFhZW7acvIzs5WeG9CobDc5ydD3fkVFBSEo0ePlksAL/udN2jQAAKBQF4tOHbsWPz2229466238Nlnn6F3794YOXIkRo0aJX9/T548wYMHD1TaVTY5WNnvysrKCq+++iq2bduGgoICiEQi7Nq1C0VFRQoiRdNjPXv2DAKBQOGBUtVnQ5ifaidSAGDChAmYNm0a4uPjMWDAAJU3zPPnz2Po0KHo1q0b1qxZA29vb1hbW2PTpk1Kk+BUPVnLnpa1WTcyMhK9e/dGYGAgli9fDl9fX9jY2ODQoUP46aefVD65aEJ6ejq6d+8OZ2dnLFq0CA0aNICtrS1CQ0Mxd+7ccvvW5n0po2bNmujTp4/O9loipT1RmsJxHAICAhAQEIBBgwahUaNG2Lp1K9566y2t9iMUCjFhwgRs2LABa9aswcWLFxEbG6vg7TIUPM+D4zgcPnxY6XlgyAZsmnimtCUwMBDh4eEoKipSEFRlPUZl6dKlC/bv3487d+6Uq8zq1KkTPvnkE7x8+RIXLlyAj4+PggdO5i0ZM2aM0n2fPXsWPXv2LDffxsYGbdu2Rdu2bREQEICpU6dix44dmD9/vtL9/PDDD1i4cKH8tZ+fn1HbDZR9ULGzs8O5c+dw+vRpHDx4EEeOHMH27dvRq1cvHDt2DEKhEDzPo1mzZli+fLnSffr6+pbbpzLGjRuH9evX4/Dhwxg+fDj+/fdfBAYGokWLFvJ1tD0WUTmoliJlxIgReOedd3DlyhUFF2JZdu7cCVtbWxw9elShh8qmTZuMbuP+/ftRUFCAffv2KXhdlLnXlXl11M0/c+YMUlJSsGvXLnTr1k0+v6JqG1Pi5+eH27dvg+d5hadOWShKVqWj6j1qg4eHB+zt7ZX2+Hj48CEEAoHBL3D169eHm5sb4uLiAJS8n0ePHsnd0zIePXokXy5j8uTJ+PHHH7F//34cPnwYHh4e6N+/v0FtBKRPz4wx1KtXDwEBAVpt6+fnh9OnTyM3N1fBm1Lau6At2n7fgwcPxpUrV7B7926VokEZpfulXLx4USEU0Lp1a4hEIpw5cwZXr17FwIED5ctycnKwd+9ejB07VqmHdtasWdi6datSkVIamZdLdn4oY/LkyQoeHHXCufT5VZaHDx+iZs2a5cronzx5ouDZiIiIAM/zChVcAoEAvXv3Ru/evbF8+XJ8++23+OKLL3D69Gl5+Ds8PBy9e/fW67farVs3eHt7Y/v27ejSpQtOnTqFL774QmEdTY/l5+cHnucRGRmp4D0xdo8fQjeqXU4KIH36W7t2LRYsWIAhQ4aoXE8oFILjOIWy3OjoaJO0UZc9VZb2VmRkZCgVSA4ODkhPT1c6H0C5Zcr2XVhYiDVr1uhrtsEYOHAg4uPjFUSkWCzGypUr4ejoiO7duwOA/Oan7P1rilAoRL9+/bB3716FJ9GEhARs27YNXbp0kYfgtOXq1avIyckpN//atWtISUmRXyTbtGkDT09PrFu3TiGsePjwYTx48ACDBg1S2L558+Zo3rw5fvvtN+zcuRPjxo1T6P1jKEaOHAmhUIiFCxeW85wxxpCSkqJy2/79+6OoqAgbNmyQz+N5HqtXr9bZHm2/7/feew+1atXC7Nmz8fjx43LLVXkD27RpA1tbW2zduhUvX75U8KSIRCK0atUKq1evRk5OjoJQ2L17N3JycjBjxgyMGjWq3DR48GDs3LlT/h2fPn1aqQ2yvAl1IYj69eujT58+8qlsCXRpvL29ERISgs2bNyt8dnfv3sWxY8cUhJaMst+TrAPugAEDAEgbFZYlJCQEAOTvb8yYMXj58qXCOSAjLy9P6W9DGQKBAKNGjcL+/fuxZcsWiMVihVCPNseS2f/LL78orLNixQqNbCFMS7X0pADS1ugVMWjQICxfvhyvvPIKJkyYgMTERKxevRoNGzbUOB9DV/r16wcbGxsMGTIE77zzDrKzs7FhwwZ4enqWe7pq3bo11q5di8WLF6Nhw4bw9PREr169EBISAqFQiKVLlyIjIwMikQi9evVCp06d4ObmhilTpmDWrFngOA5btmzROHxjCt5++22sX78er7/+Om7evAl/f3/8999/uHjxIlasWCHPvbCzs0OTJk2wfft2BAQEwN3dHU2bNtW6tfvixYvlPR+mT58OKysrrF+/HgUFBXr1hNmyZQu2bt2KESNGoHXr1rCxscGDBw/w+++/w9bWVt5TwtraGkuXLsXUqVPRvXt3jB8/HgkJCfj555/h7++P2bNnl9v35MmT5T0ijBHqAaRPp4sXL8a8efMQHR2N4cOHw8nJCVFRUdi9ezfefvtthT4VpRk+fDjatWuHjz/+GBEREQgMDMS+ffvkNzddnqy1/b7d3d2xe/duDBkyBC1atMC4cePQtm1bWFtbIyYmBjt27ABQPkdMFno5f/48RCIRWrdurbC8U6dO+PHHHwEo5qNs3boVNWrUUNoiHwCGDh2KDRs24ODBgxg5ciRmzpyJ3NxcjBgxAoGBgSgsLMSlS5ewfft2+Pv7K+0Poivff/89BgwYgI4dO+LNN99EXl4eVq5cCRcXF6W9ZqKiojB06FC88soruHz5Mv766y9MmDBBHmJZtGgRzp07h0GDBsHPzw+JiYlYs2YN6tSpI/9MJk2ahH///RfvvvsuTp8+jc6dO0MikeDhw4f4999/cfToUY2HpRg7dixWrlyJ+fPno1mzZvL8NBmaHiskJATjx4/HmjVrkJGRgU6dOuHkyZN6efgII2KWmiITU7oEWR3KSpA3btzIGjVqxEQiEQsMDGSbNm2Sl+iVBgCbMWOG0n1OmTJF/lq2bdmyPZmNpfsV7Nu3jzVv3pzZ2toyf39/tnTpUvb777+XWy8+Pp4NGjSIOTk5MQAKpbwbNmxg9evXZ0KhUKEc+eLFi6xDhw7Mzs6O+fj4yEsfS6/DmLQ0uGzfDMbKl5uqQtlnqoyyJciMMZaQkMCmTp3KatasyWxsbFizZs2UltheunSJtW7dmtnY2FRYQqiqBJkxaRlv//79maOjI7O3t2c9e/Zkly5dUlhH03NJxu3bt9knn3zCWrVqxdzd3ZmVlRXz9vZmo0ePZqGhoeXW3759O2vZsiUTiUTM3d2dvfbaa+zFixdK9x0XF8eEQiELCAjQyBZN7Ff1fe/cuZN16dKFOTg4MAcHBxYYGMhmzJjBHj16JF9H2TmRlJTEJkyYwJycnJiLiwt7/fXX2cWLFxkA9s8//yhs6+DgUO64yn5r2nzfMuLi4tgnn3zCmjRpwuzs7JhIJGL169dnkydPZufOnVO6zbx58xgA1qlTp3LLdu3axQAwJycneSl6QkICs7KyUuglUpbc3Fxmb2/PRowYwRhj7PDhw+yNN95ggYGBzNHRkdnY2LCGDRuymTNnsoSEhArfl7acOHGCde7cmdnZ2TFnZ2c2ZMgQdv/+fYV1ZJ/5/fv32ahRo5iTkxNzc3Nj77//vkJ5/MmTJ9mwYcOYj48Ps7GxYT4+Pmz8+PHs8ePHCvsrLCxkS5cuZcHBwUwkEjE3NzfWunVrtnDhQpaRkSFfT9U1VAbP88zX15cBYIsXL1a6jqbHysvLY7NmzWI1atRgDg4ObMiQISwmJoZKkC0QjjELenwmCEJjkpOT4e3tja+++gpffvmluc3RmD179mDEiBG4cOGC2hAFQRBEtcxJIYiqwB9//AGJRIJJkyaZ2xSVlG4tD0h70qxcuRLOzs5o1aqVmawiCKKyUG1zUgiisnLq1Cncv38f33zzDYYPH15uvBxLYubMmcjLy0PHjh1RUFCAXbt24dKlS/j22291KuMmCKJ6QeEegqhk9OjRA5cuXULnzp3x119/oXbt2uY2SSXbtm3Djz/+iIiICOTn56Nhw4Z477335I3UCIIg1EEihSAIgiAIi4RyUgiCIAiCsEhIpBAEQRAEYZFU+cRZnucRGxsLJycng7RQJwiCIKoujDFkZWXBx8en3ACnhiQ/Px+FhYV678fGxkZh5PSqRpUXKbGxsTSwFEEQBKEVMTExqFOnjlH2nZ+fj3p+johPlFS8cgV4eXkhKiqqygqVKi9SZO3TY2JidB5/hSAIgqgeZGZmwtfXV37vMAaFhYWIT5Tg2U1/ODvp7q3JzOLh1zoahYWFJFIqK7IQj7OzM4kUgiAIQiNMkR7g6MTB0Un34/Co+ikMlDhLEARBEIRFUuU9KQRBEARhiUgYD4kencokjDecMRYKeVIIgiAIwgzwYHpP2nDu3DkMGTIEPj4+4DgOe/bskS8rKirC3Llz0axZMzg4OMDHxweTJ09GbGysgd+1dpBIIQiCIAgzwBvgnzbk5OSgRYsWWL16dbllubm5CA0NxZdffonQ0FDs2rULjx49wtChQw31dnWCwj0EQRAEUQ0YMGAABgwYoHSZi4sLjh8/rjBv1apVaNeuHZ4/f466deuawsRykEghCIIgCDMgYQwSPYbPk22bmZmpMF8kEkEkEullGwBkZGSA4zi4urrqvS9doXAPQRAEQZgBQ+Wk+Pr6wsXFRT599913etuWn5+PuXPnYvz48WZt30GeFIIgCIKoxJRtVqqvF6WoqAhjxowBYwxr167V1zy9IJFCEARBEGaAB4NEywqdstsDhm1WKhMoz549w6lTp8zeBJVEioFhkjiwnD+B/AMAywWs6oGzfw2wHQKOo4+bIAiCkKJLGXHZ7Q2JTKA8efIEp0+fRo0aNQy6f12gu6YBYUV3wFKnACwPQPHAUUV3wTLmAnmHAbfV4Dhrs9pIEARBVE+ys7MREREhfx0VFYWwsDC4u7vD29sbo0aNQmhoKA4cOACJRIL4+HgAgLu7O2xsbMxiM4kUA8FYEVjau1LviULtevHfhWeBnN8Ax/fMYR5BEARhYRiqukdTbty4gZ49e8pff/TRRwCAKVOmYMGCBdi3bx8AICQkRGG706dPo0ePHjrbqQ8kUgxFwSmAT1KzAgPL/RNwmEZhH4IgCAI8oGU7tvLba0OPHj3A1AgbdcvMBZUgGwhWGIYKNR+fAkjiTGEOQRAEQVR66JHeUHAa6j3yohAEQRAAJHpW9+izbWWBPCkGgrPpDECsbg1AWBcQeJnKJIIgCMKCkTD9p6oOiRRDYdMRsAoAIFSxAgPnMA0cx5nSKoIgCMJC4Q0wVXVIpBgIjuPAua0HhN6yOcX/F4sW+6mA3RhzmEYQBEEQlRKzipRz585hyJAh8PHxAcdx2LNnj3xZUVER5s6di2bNmsHBwQE+Pj6YPHkyYmNjzWdwBXDC2uBqHgTn/C1g0xmwbgHYDgfnvgMC53nkRSEIgiDk8OAg0WPiUfXvKWYVKTk5OWjRogVWr15dbllubi5CQ0Px5ZdfIjQ0FLt27cKjR48wdOhQM1iqOYzPRJ7kOTIl8ciUpCEXDBJq4EYQBEGUgWf6T1Uds5aaDBgwAAMGDFC6zMXFBcePH1eYt2rVKrRr1w7Pnz9H3bp1TWGiVhQVXEFm6mSA5UMWLZSIn6Ag9y/YOy+AneNb5jWQIAiCICoRlaoeNiMjAxzHwdXVVeU6BQUFKCgokL/OzMw0gWUAz6chM3WKgkCRIm2Pn5u5AFbWQbAWdTaJPQRBEIRlIwvb6LN9VafSJM7m5+dj7ty5GD9+vNpRGb/77ju4uLjIJ19fX5PYV5D7r5KW+KURIi97g0lsIQiCICwfffJR9BU4lYVKIVJkIzMyxrB27Vq1686bNw8ZGRnyKSYmxjQ2FlwA1DbWkRSvQxAEQRCEJlh8uEcmUJ49e4ZTp06p9aIAgEgkgkgkMpF1pdEkg6kaZDkRBEEQGsEzDjzT3Ruiz7aVBYv2pMgEypMnT3DixAnUqFHD3CapxMqmPdR/nEJYi9qZyhyCIAjCwqFwT8WY1ZOSnZ2NiIgI+euoqCiEhYXB3d0d3t7eGDVqFEJDQ3HgwAFIJBLEx8cDANzd3WFjY2Mus5Viaz8OeVkrABRCucdEAlsHqu4hCIIgpEgggEQPX4HEgLZYKmb1pNy4cQMtW7ZEy5YtAQAfffQRWrZsia+++govX77Evn378OLFC4SEhMDb21s+Xbp0yZxmK0Ug9ICT+3pIdV/p1vjSv+0cP4SNbW9zmEYQBEEQlRKzelJ69OgBxlTnaahbZonY2PaBq+dJ5OdsQmH+MTBWBCub1rBzmAprUSdzm0cQBEFYEEzPnBRWDXJSLD5xtrIhtKoPB5ev4eDytblNIQiCICwY6pNSMRadOEsQBEEQRPWFPCkEQRAEYQYkTAAJ0yNxtnJlROgEiRSCIAiCMAM8OPB6BDT4atB7i8I9BEEQBEFYJORJIQiCIAgzQImzFUMihSAIgiDMgP45KRTuIQiCIAiCMAvkSSEIgiAIMyBNnNVjgEEK9xAEQRAEYQx4PcfuqQ7VPSRSCIIgCMIMUE5KxVBOCkEQBEEQFgl5UgiCIAjCDPAQUDO3CiCRQhAEQRBmQMI4SPQYyVifbSsLFO4hCIIgCMIiIU8KQRAEQZgBiZ7VPRIK9xAEQRAEYQx4JgCvR3UPT9U9BEEQBEEQ5oE8KQRBEARhBijcUzEkUgiCIAjCDPDQr0KHN5wpFguFewiCIAiCsEjIk0IQBEEQZkD/Zm5V389AIoUgCIIgzID+Y/eQSCEIgiAIwgjw4MBDn5wU6jhLEARBEARhFsiTQhAEQRBmgMI9FUMihSAIgiDMgP59Uqq+SKn675AgCIIgiEoJeVIIgiAIwgzwjAOvTzM3PbatLJBIIQiCIAgzwOsZ7qkOfVKq/jskCIIgCKJSQp4UgiAIgjADPBOA16NCR59tKwskUgiCIAjCDEjAQaJHQzZ9tq0sVH0ZRhAEQRAEzp07hyFDhsDHxwccx2HPnj0Kyxlj+Oqrr+Dt7Q07Ozv06dMHT548MY+xxZBIIQiCIAgzIAv36DNpQ05ODlq0aIHVq1crXb5s2TL88ssvWLduHa5evQoHBwf0798f+fn5hni7OkHhHoIgCIIwAxLoF7KRaLn+gAEDMGDAAKXLGGNYsWIF/ve//2HYsGEAgD///BO1atXCnj17MG7cOJ3t1AfypBAEQRCEGTCUJyUzM1NhKigo0NqWqKgoxMfHo0+fPvJ5Li4uaN++PS5fvmyw96wtJFIIgiAIohLj6+sLFxcX+fTdd99pvY/4+HgAQK1atRTm16pVS77MHFC4hyAIgiDMgKEGGIyJiYGzs7N8vkgk0ts2S4E8KQRBEARhBhg48HpMrDifxdnZWWHSRaR4eXkBABISEhTmJyQkyJeZAxIpBEEQBFHNqVevHry8vHDy5En5vMzMTFy9ehUdO3Y0m10U7iEIgiAIM2CocI+mZGdnIyIiQv46KioKYWFhcHd3R926dfHhhx9i8eLFaNSoEerVq4cvv/wSPj4+GD58uM426guJFIIgCIIwA6YeBfnGjRvo2bOn/PVHH30EAJgyZQr++OMPfPrpp8jJycHbb7+N9PR0dOnSBUeOHIGtra3ONuoLiRSCIAiCqAb06NEDjDGVyzmOw6JFi7Bo0SITWqUes+akVMYWvQRBEARhCCQQ6D1Vdcz6Ditji16CIAiCMASycI8+U1XHrOGeytiilyAIgiAI02CxviJdW/QWFBSUaxFMEARBEJYGD4HeU1XHYt+hri16v/vuO4X2wL6+vka1kyAIgiB0QcI4vaeqjsWKFF2ZN28eMjIy5FNMTIy5TSIIgiCIclBOSsVYrEjRtUWvSCQq1yKYIAiCIIjKh8WKFEtt0UsQBEEQhoAxAXg9JqZHt9rKglmreypji16CIAiCMAQScJBA95CNPttWFswqUipji16CIAiCIEyDWUVKZWzRSxAEQRCGgGfaj79TdvuqDo3dQxAEQRBmQJZbos/2VZ2q/w4JgiAIgqiUkCeFIAiCIMwADw68Hsmv+mxbWSCRQhAEQRBmQN+usdRxliAIgiAIwkyQJ4UgCIIgzAAlzlYMiRSCIAiCMAM89Bt/h3JSCIIgCIIwCkzPxFlWDURK1fcVEQRBEARRKSFPCkEQBEGYAZ7pGe6h6p7y5OXlITc3V/762bNnWLFiBY4dO2ZQwwiCIAiiKqPPCMj6Jt0aA2PoA63f4bBhw/Dnn38CANLT09G+fXv8+OOPGDZsGNauXauzIQRBEARBVF6MoQ+0FimhoaHo2rUrAOC///5DrVq18OzZM/z555/45ZdfdDKCIAiCIKobsnCPPpMlYQx9oHVOSm5uLpycnAAAx44dw8iRIyEQCNChQwc8e/ZMJyMIgiAIorpR1driG0MfaO1JadiwIfbs2YOYmBgcPXoU/fr1AwAkJibC2dlZJyMIgiAIgqjcGEMfaC1SvvrqK8yZMwf+/v5o164dOnbsCECqmlq2bKmTEQRBEARR3ahq4R5j6AOtwz2jRo1Cly5dEBcXhxYtWsjn9+7dGyNGjNDJCIIgCIKoblS1EmRj6AOd6pe8vLzg5OSE48ePIy8vDwDQtm1bBAYG6mQEQRAEQRCVH0PrA61FSkpKCnr37o2AgAAMHDgQcXFxAIA333wTH3/8sU5GEARBEER1o6qFe4yhD7QWKbNnz4a1tTWeP38Oe3t7+fyxY8fiyJEjOhlBEARBENWNqiZSjKEPtM5JOXbsGI4ePYo6deoozG/UqBGVIBMEQRCEhjDoV0bMDGeKQTCGPtDak5KTk6OgkGSkpqZCJBLpZARBEARBEJUbY+gDrUVK165d5W1vAYDjOPA8j2XLlqFnz546GUEQBEEQ1Y2qFu4xhj7QOtyzbNky9O7dGzdu3EBhYSE+/fRT3Lt3D6mpqbh48aJORhAEQRBEdaOqlSAbQx9o7Ulp2rQpHj9+jC5dumDYsGHIycnByJEjcevWLTRo0EAnIwiCIAiCqNwYQx9o7UkBABcXF3zxxRc6HZAgCIIgiKrnSQEMrw+0Finnzp1Tu7xbt246G0MQBEEQ1YWqJlKMoQ+0Fik9evQoN4/jSj4oiUSitREEQRAEQVRujKEPtM5JSUtLU5gSExNx5MgRtG3bFseOHdPaAIIgCIKojjDG6T1ZEsbQB1p7UlxcXMrN69u3L2xsbPDRRx/h5s2bOhlCEARBENUJHpxezdz02dYYGEMf6DTAoDJq1aqFR48eGWp3BEEQBEFUAfTRB1p7Um7fvq3wmjGGuLg4LFmyBCEhIToZQRAEQRDVjaqWOGsMfaC1SAkJCQHHcWBMcdSADh064Pfff9fJCIIgCIKobuibV2JpOSnG0Adai5SoqCiF1wKBAB4eHrC1tdXJAIIgCIKojlQ1T4ox9IHWIsXPz0/ngxEEQRAEUTUxhj7QSKT88ssvGu9w1qxZOhtDEARBENWFqhDuMbY+0Eik/PTTTxrtjOM4EikEQRAEoQFMz3CPJYgUY+sDjURK2TgTQRAEQRCEsfWBwfqkEARBEAShOQwAY3pMWh5PIpHgyy+/RL169WBnZ4cGDRrg66+/LleNY0noNAryixcvsG/fPjx//hyFhYUKy5YvX24QwwiCIAiiKsODA2fCjrNLly7F2rVrsXnzZgQHB+PGjRuYOnUqXFxcDJaqYWh9oLVIOXnyJIYOHYr69evj4cOHaNq0KaKjo8EYQ6tWrbQ2gCAIgiAI43Pp0iUMGzYMgwYNAgD4+/vj77//xrVr1wyyf2PoA63DPfPmzcOcOXNw584d2NraYufOnYiJiUH37t0xevRonYwgCIIgCF3JFRcgLi8dueICc5uiFYYaYDAzM1NhKihQ/jl06tQJJ0+exOPHjwEA4eHhuHDhAgYMGGCQ92MMfaC1J+XBgwf4+++/pRtbWSEvLw+Ojo5YtGgRhg0bhvfee08nQwiCIAhCGx6kv8SCO7vxKDMB0gwNDvUda+KrZsMR4m75Pb14xoEzQDM3X19fhfnz58/HggULyq3/2WefITMzE4GBgRAKhZBIJPjmm2/w2muv6WxDaYyhD7T2pDg4OMjjTN7e3oiMjJQvS05O1toAdVTGJB+CIAjC+ISmROG1i+sUBAoAPM1OxtTLv+FswkOz2mdKYmJikJGRIZ/mzZundL1///0XW7duxbZt2xAaGorNmzfjhx9+wObNmw1ihzH0gdaelA4dOuDChQsICgrCwIED8fHHH+POnTvYtWsXOnTooJMRqjBFkg9BEARR+Zh9cxt4+StFbwQDMDf0X1wZ8JWJrdIOWZWOPtsDgLOzM5ydnStc/5NPPsFnn32GcePGAQCaNWuGZ8+e4bvvvsOUKVN0N6QYY+gDrUXK8uXLkZ2dDQBYuHAhsrOzsX37djRq1MjglT3GTvIhCIIgKh/XkiORUZSHsuKkNPl8EQ69DMfA2i1MZ5iWmLrjbG5uLgQCxQCKUCgEz/MqttAOY+gDrUVK/fr15X87ODhg3bp1Oh1YEzp16oRff/0Vjx8/RkBAgDzJR92bLSgoUEgayszMNJp9BEEQhOm5nvIU6gSKFIaryZEkUkoxZMgQfPPNN6hbty6Cg4Nx69YtLF++HG+88YbONpTGGPpA65yUt956C2fOnNH7wJogc0sFBgbC2toaLVu2xIcffqg2yee7776Di4uLfCqbUEQQBEFUbuyEIg3XszGyJZWLlStXYtSoUZg+fTqCgoIwZ84cvPPOO/j6668Nsn9j6AOtRUpSUhJeeeUV+Pr64pNPPkF4eLhBDSqNLkk+8+bNU0ggiomJMZp9BEEQhOkZ4dtag7U4jPZra3Rb9IEvHrtHn0kbnJycsGLFCjx79gx5eXmIjIzE4sWLYWNjGDFnDH2gtUjZu3cv4uLi8OWXX+L69eto1aoVgoOD8e233yI6Olpvg0pTOsmnWbNmmDRpEmbPno3vvvtO5TYikUieRKRpMhFBEARReXATOaBtDX+obgzP0MDRAw2capnQKu3RqyW+nkm3xsAY+kCnsXvc3Nzw9ttv48yZM3j27Blef/11bNmyBQ0bNtTJCFUYO8mHIAiCMAypBTlY+/Achp5Yg95HVuCdS1txJv6x0VpGrG47GfUdPZUu87R1xh8dpxnluIR6DK0PdBq7R0ZRURFu3LiBq1evIjo6GrVqGVa1GjvJhyAIgtCfxxkJmHJhMzIL88EXezcS8jNxLiECw+u2wDethkHA6Z4gqgwboTV2dpuJE/H38HvkOSTnZ8PVxg6v1euEIXVaQshZ/vi5Um+IPomzBjTGwBhKH+gkUk6fPo1t27Zh586d4HkeI0eOxIEDB9CrVy+djFDFypUr8eWXX2L69OlITEyEj48P3nnnHXz1lWXXvhMEQVQXxDyPdy5vQ1ZRiUABAEnxHXTP83AEu3pjYoP2Bj82x3Ho690Ufb2bGnzfpsDU1T2mwND6QGuRUrt2baSmpuKVV17Br7/+iiFDhkAk0izTWltkST4rVqwwyv4J3eAZw8XESOyMDkNsbgY8bB0xvG4L9PQOgJXA8p9eCIIwHKfjHyE+T3WrBw7AHxFXMKF+O4N7UwjLwhj6QGuRsmDBAowePRqurq56HZionBRKxJh5ZQdOxz8BBw6suB318dhHaFmjDjZ2fg2O1sYRrQRBWB43kp/BihNAzJTnCjIAL3PTkZSfhVp2VMhQGgbVqb+abm9JGEMfaP3YO23aNBIo1Zhld0/gdHwEUEqgyMbNuJXyEp/e2GNW+wiCMC1chU3VtFuvOmGoUZAtBWPoA/LNExqTXVSAbZE3UKLfuTL/M5yIfYS43AzTG0cQhFloW9NPpRcFkF4d6ti7wcPW0XRGEVUGEimExlxPflZ8MVKl3jkAHPY8M16DP4IgLIvuXgGobe8KoYp8EwZgaqOO4CgfpTzMAFMVh0QKoTEvctKhyXgZDzMSTWANQRCWgJVAgHUdJ8DVxl4hpCMTLaP8WmF8vTbmMs+y0TfUY2HhHmOgdeJsQUEBxGIxHBwcjGEPYcH4ObprtB4lxxFE9aKhswcO9JmBndGhOPTiHnLEBWjk7Ilx9dugk0d98qKoQN+usZbWJ8UY+kBjT0pSUhIGDBgAR0dHODs7o0OHDoiIiDCYIYTl06ZmXZUu3RI49PFpbBJ7CIKwHFxt7PBmQGfs7PU2jvSbiZUdxqKzZwMSKNUAY+oDjUXK3LlzERYWhkWLFuGHH35Aeno6pk2jtsPVCXsrG4z2b6l2HT8HN7StWddEFhEEQVReqkp1jzH1gcbhnuPHj+OPP/5A//79AQCDBw9GUFAQCgoKjNbMjbA8PmveF/fT43E7LVZhPgfAxcYOazuNpScngiAITdA3r8RCRIox9YHGnpTY2Fi0aNFC/rpRo0YQiUSIi4vTywCicmFvZYNt3adgYcuBaOziCQcrG3jbOePdwC442PddNHT2MLeJBEEQhAkxpj7QKnFWKBSWe22sES4Jy8VGaIXx9VtjfP3W5jaFIAii0lKVEmeNpQ80FimMMQQEBCi48rOzs9GyZUsISo3XkpqaqrdRBEEQBFHlqSJ98Y2pDzQWKZs2bdJ65wRBEARBVG2MqQ80FilTpkwxmhEEQRAEUd3Qt0LHUqp7jKkPtG7mRhAEQRCEgbCQkI2lQiKFIAjCBNxMeoktD0NxM+kFbARC9PFthImNW8LX0dXcphGExUIihSAIwsisun0JP4Sdg5DjICmueIi+fw1/PLiBjb1Go4uPv3kNNABZhQXY8/QezsZGQczzCPHwxrhGLeBl72Ru0yyWqhLuMSYkUgiCIIzImZdP8UPYOQCQCxTZ3zyTYNrpnbj46ntwt7U3l4l6cyclHpOOb0dGYT4AaQTjXFwUVt2+jJ+6DMaQekHmNdBSqSLVPcaERkEmCIIwIhvvX1M55hUDkC8pwo6IO6Y1yoBkFOZj4vHtyCwqULjn8oxBzHh8cGE/7qbEm9NEC4YzwFS10ciT8tFHH2m8w+XLl+tsDEEQRFXjakKMggelLAzA1YTneKdpe9MZZUB2Rd5FZmG+yod6Dhw2PriBn7oMNqldhGkwtj7QSKTcunVL4XVoaCjEYjEaN5aOdvv48WMIhUK0bk0dSAmCIEpT1Z91T7+MVLtcwnicfGGYEXGrHFUg3GNsfaCRSDl9+rT87+XLl8PJyQmbN2+Gm5sbACAtLQ1Tp05F165ddTKCIAhCU/LFRXiYlgzGGALdPWBnZW1uk9TS3qsuLsZFq/SmcAA6eFXekcMLJZIK75VFPG8SWyodVUCkGFsfaJ04++OPP+LYsWNyAwDAzc0NixcvRr9+/fDxxx/rZAhBEIQ6CiUSrLh1EZsfhCK7qBAA4GBtg4mBIfi4VReIhJZZB/BWk3Y4FxuldBkHwM7KGqMbNDetUQYkpKYPrie+kIswmRaTpeEIOA4taniZyTrClBhDH2idOJuZmYmkpKRy85OSkpCVlaW1AQRBEBUh4Xm8e2oP1ty+KhcoAJBTVIgNd67jzeO7ILbQp/VuPvXwScvuAKCQQCvgONgIrbCh56tws7Uzl3l681pAiNQhwANMDEAMQMyBFQFMAkh4hteD2pjZSguFcfpPFoQx9IHWImXEiBGYOnUqdu3ahRcvXuDFixfYuXMn3nzzTYwcOVInIwiCINRx/HkETsZEginxb/NgOB8bjUPRj8xgmWbMaNYRewZOxrB6wfBzckMjlxp4N7gDTg2bhs7e/uY2Ty98nVwxul5zQILim6bsxskBPFDb1gW9atc3o4WWi2wUZH0mS8IY+kBr/+i6deswZ84cTJgwAUVFRdKdWFnhzTffxPfff6+TEQRBEOrY9ihcoRFaWQQch20PwzG0vuX24wip6YOQLj7mNsPgJOflYOeTu1CeIswhNjsTWx+GY2owFVZUdYyhD7QWKfb29lizZg2+//57REZKs7obNGgABwcHnQwgCIKoiOjMNLVlvDxjeJaVZkKLCBn/Pr4DCa/+kX7TvZskUpRRBRJnS2MMfaBzM7e4uDjExcWhUaNGcHBwALM0vxNBEFUGd1u7Ckt53USVN6+jMvMgNVFtnTUD8CwrHQUSsclsqjRUsZwUGYbUB1qLlJSUFPTu3RsBAQEYOHAg4uLiAABvvvkmVfYQBGEUXm3YVO1yDsCoRurXIYyDSGgFrgIJyQGw4qjBeVXHGPpA67Nm9uzZsLa2xvPnz2FvXzLWxNixY3HkyBGdjCAIQj3p+Xn4LfwGZp44gNknD2H34/vIF1efJ9ORDYNR18lVaXt5IcehtqMzRjVqZgbLiN51G0LCVFdWCTkOvXwbQCggkVIWjuk/WRLG0Ada56QcO3YMR48eRZ06dRTmN2rUCM+ePdPJCIIgVHPyWSRmHNuPAokYHDhwHLD7yX0suXIOWwaPQoB7TXObaHQcrG3w78DxeO/0XoQmxkIATlo8whia1qiFtb2Gw9lGZG4zqyV96zZEPWc3PM9KV5o3xDOGd5u3M4NllYAqlpNiDH2gtUjJyclRUEgyUlNTIRLRRYIgDMmj1CS8c3QvJDxffD1j8gtTcl4OJuz/F2fGvwVHGxuz2mkKvBycsHvwRNxJjseV+BgwxtDeyxctPLzNbVq1xkogwF+vjMHEI/8iKjMNQo6TnqJMWnX1fbcBaOfla24zLRN980osLCfFGPpAa5HStWtX/Pnnn/j6668BABzHged5LFu2DD179tTJCIIglLPx9k0wpqw7CCBhDMl5udjz5D4mBoeY2jSz0aymF5rVpA6mlkQdJxecePVNnHgegRPPI5AvFqNJDU+MCWiGmnZU+VldMIY+0FqkLFu2DL1798aNGzdQWFiITz/9FPfu3UNqaiouXryokxEEQSjnWFSE1IXOAJQN+xf3zToeHVmtRAphmVgJBHjFPwCv+AeY25TKQxUL9xhDH2idydS0aVM8fvwYXbp0wbBhw5CTk4ORI0fi1q1baNCggU5GEAShnAKJpIxAKdXRs3h+Xqk28QRBVCKYASYLwhj6QKcRuVxcXPDFF1/odECCIDSnsXsNhMUlQHo1Kh1/LvW3hcWljcndxAQsu3Qed5MSwTOGxjVqYmbbDujsWxecksofgiBMi6H1gdaelIYNG2LBggV48uSJwYwgCEI5Qe6eAJhUiPAoPzGGqPR08FW8mSJjDAvPnsKQ7X/hfMwzpOXnIaMgH9diX2DS3v8w88gBSCx0gEGCUEkV86QYQx9oLVJmzJiBgwcPonHjxmjbti1+/vlnxMfHG8wggiBKwQCO56T/y14r/M8hKTsH6fl55rTS6Gy/fwd/3L6lcvnBiMfYcOuGCS0iCANQxTrOGkMf6NTM7fr163j48CEGDhyI1atXw9fXF/369cOff/6plzEEQShiIxRK/yh+YpJFNOT/F8+3EghNa5gJ4RnDiquXKlxv3c3rKJJITGARQRDKMIY+0LkFYEBAABYuXIjHjx/j/PnzSEpKwtSpU3XdXaUgOTcXV17E4FZcLF0MCZPgJrKTek1UPTBxUqGSnld1PSnPMtKRkJNToWs7oyAfEWmppjGKIAxAVes4K8OQ+kCnxFkZ165dw7Zt27B9+3ZkZmZi9OjR+uzOYknKycGis6dxOOKJPPbvbmeHd9u0w5stW1HCHmE0Djx+qHbwNhm/3rqOxT37Gt8gM1Akq3DSAF5Ne3bCePCM4WjEE2y5HY5Hycmws7bCoEYBmNyiJWo7O5vbPMulipUgl8ZQ+kBrkfL48WNs3boVf//9N6KiotCrVy8sXboUI0eOhKOjo05GWDJpeXkY9e/fiM3KUkhOTM3Lw7fnzyIhOxtfdOtuRguVE5+dha23b+PIkyfIExehmWctTGwRgk6+viSqKhHPMzI0Wi82K9vIlpiPui4uEAo4pS3XSyPgODRwczeRVYQMnjHMOXoEex49gIDjwDOGtHzg91uh2HrnNv4c8SpaefuY20zCBBhDH2gtUgIDA9G2bVvMmDED48aNQ61atXQ6cGVhQ+gNxGZlqbxAbrx1E+OaNkMDd8u5OIbGxmLy7p3IF4vlwiohOxtHIyPwRstW+KJbdxIqlYQiWcWKuq+LA/xdXExijzmwtbKGi7UIqQX50hnKPgsmHcjO1srapLYRwNbb4djz6AEAKDzISRhDvliMt/btwaU3p9F3Uw0whj7QSqRIJBKsX78eo0aNgpubm94Ht3QYY/j7zm21T3BCjsOO+3fxWZduJrRMNXlFRXhz754SgVJsuoRJy0F+vxWKprVqYXhgkHkNJTRCCEAClG+TIqP4++1Up67JbDIHYsaXfAalP4tSP00rEt4mhzGGjbduyr+WsvCMIT0/H4eePMHIoCamNs/i4aBfXoklnfHG0gdaJc4KhULMnDkT6enpBjOgIl6+fImJEyeiRo0asLOzQ7NmzXDjhmlKDQskYmQUFKhdhwGIzcoyiT2asO/RQ2QU5IPnGcCXSq4q7k7KAdh486aZrSQ0xdvRueTqXzZ+Xervjr5VV6RkFxYiq6BQ8YKsJJbv6VD1ws2WTnp+Pp5nZKhNjbASCHAz9qXJbKpUmKEE2Vj3VGPpA53a4j99+tSgRqgiLS0NnTt3hrW1NQ4fPoz79+/jxx9/NJkXx0ZoBZFQvbOJA+Bqa2sSezTh+suXECjJ+pY/ePLAvcRE5IuLTG5bdYQxplejNV9nV3lflJKdQkG4cAywt666rvR8sVj+N8dKBgYoOzVyr2EW+6ozQoFmN0kBp3MhKWFAjH1PNYY+0DonZfHixZgzZw6+/vprtG7dGg4OiiNcOhswk3vp0qXw9fXFpk2b5PPq1atnsP1XhIDjMDwwEP/dv6cy5CNhDMMaW07opEgigSzKU/byIXfJMoCzKEdh1eNOfAI2XL+O4xGRKJJIUM/dDZNbtsS45s1gLdS8p4mzSKRQglz6NJRHNyw4w98QuNnawt3OHql5udIZZcM9xSd2Vz9/s9hXnXGyESGwZk08Sk5WeRqKeR6dfH1NalelwcTVPca+pxpDH2gtbwcOHIjw8HAMHToUderUgZubG9zc3ODq6mpwD8e+ffvQpk0bjB49Gp6enmjZsiU2bNigdpuCggJkZmYqTPrwTpu2sLWyglBJvFvAcejpXw+tvL31OoYhEXKcyvQFFM8XQPqETxiH4xEReHXrNhx5/ASFEgkYgKjUNCw8eQrv7NmrVY+dlt7e4CCQt8Uv7TkADwggQAsvb4MnQidkZ+NeQiISs81fNSQUCDCxeQvFp/Ey3iSR0JryrMwAx3F4u3VblfdKIcehtpMz+jZoaFK7Kg0Gaotf9p5XoCJNQZd7qjYYQx9o7Uk5ffq0TgfShadPn2Lt2rX46KOP8Pnnn+P69euYNWsWbGxsMGXKFKXbfPfdd1i4cKHBbPB3dcM/o8Zi1uGDiEpPg4Dj5Df4oY0D8U2vPhZVKeNkI6rQR8IAJOXmwrcKV4SYi8z8fHx44BB4xpSmj5yLisbm0Ft4q20bjfY3OrgpVly+VCx2uHJPTjxjeKNlK4PYDgDhcXH4/twFXHkeI5/Xxa8uPuneDcG1PA12HGW8zMjEqcinKBCLEeBRE138/SAo/m2907otzj17htsJ8QrhM5ko/6FffziLLCfsWp0Y1jgQT1JSsPbGNQg5aam47BrkZmeHTcNHwEpA4R5j4lvGUzV//nwsWLCg3Hq63FO1wRj6gGMW/EhtY2ODNm3a4NKlkpbYs2bNwvXr13H58mWl2xQUFCioyMzMTPj6+iIjI0OvUBRjDNdevsCD5CSIhFbo7l8PPk5OOu/PWPx86RJWXb1aYR7Ejffeg5udnYmsqj5sDr2Fr0+dVuuF9XFywrm339JY3J58+hTvHdgHxpg87Ci7GUxqEYIFPXoaRChff/ECk7f/B0mZPBoBx8FaKMS2caPRwghew/wiMb44egL77kvLWLniXhu1nZ3w05CBaFVb2mMjr6gIG2/dxJbwMCTl5oID0N3fH++1aY+2tWsb3C5CO8Lj47D1zm08SEqCvbU1BjRqhJFBwdKQZSUiMzMTLi4uet8zNDmG/zffQKBHTiOfn4/oL75ATEyMgq0ikQgiJZ+7LvdUc6NTx9nz589j/fr1ePr0KXbs2IHatWtjy5YtqFevHrp06WIw47y9vdGkiWLZWlBQEHbu3KlyG1Vfjr5wHIf2dXzRvo5lx1YHNm6MX65cUblcwHFoV6cOCRQjcS8hEQJOfeOx2KwsZBcWwknD87R3/fo4+Nok/BF2C8ciIlDES9DUsxamhISgT/0GBhEojDF8fuR4OYECSL01RRIJ/nf0BPa/PknvY5Xlw/0HcSoyqiR6U3z8uKxsTN6+E7snT0CjmjVgZ22N99t1wPS27ZFdWAiRUAiRlV5NswkD0sLLGy28LCf0XSkwUE6Ks7OzRoJKl3uqthhaH2jtg9u5cyf69+8POzs7hIaGyr0WGRkZ+Pbbb7U2QB2dO3fGo0ePFOY9fvwYfn5+Bj1OVaJRjRoYGBAgd5OXRjZnVocOpjWqGmEtFGgkGqy1dH83qlED3/Tug+vvvIuw92bgr1dHoW+DhgYLNd6KjUNUWppKDxzPGB4kJeF+QqJBjifjdlw8TkQ8VXpcmThae+WawnwBx8FZJKqyAkXC8zj66Ane2L4LfdZvwpgt/2BraDhyC6kij9APY99TjaEPtBYpixcvxrp167BhwwZYlyp77Ny5M0JDQ3UyQhWzZ8/GlStX8O233yIiIgLbtm3Dr7/+ihkzZhj0OFWN7/v3x6CAAADSC7osHuxoY4NVgwejPWXaG41eDepDzKseP0bAcWjvWwe2FlYy/CwtXaP1nhu4B8K++w/V5itIGMOhh4+rzYCeBWIx3vlvL97ffQAXo5/jWVo6wl7GYcGxUxi26S8kVJLhD7IKCrD1Zjj+d/g4Fh07jfNPo/Uqxa+yGChxVlOMfU81hj7Q+lHk0aNH6NatfHdVFxcXgzdxadu2LXbv3o158+Zh0aJFqFevHlasWIHXXnvNoMepathaW2PFoEH4sFMnHI2IQHZhIRq6u6N/w4YWd3OsavSoVw8N3N0RnZamNOTDM4Z327Uzg2XqcbLVLPTkbOCeQOn5+RVWmol5HnlFRbAWCpFbWIQzEU+RlpcPH2cndKnvp1VJt6Xz8/nLOB/1DACkDRlRfB/igJj0DHy49yD+njjWfAZqwMknkZi99xDyi8QQFgvQLTfD0NijJn4bOwJeTtR0T4a+Ixlru62x76nG0AdaixQvLy9ERETA399fYf6FCxdQv359nYxQx+DBgzF48GCD77c64O/mhnfatjW3GdUKoUCATa+OxOT//kN0WjqExUmgsrDMgt690K2ev1b7ZIwhOScXEp6Hh6OD/MJvSLr41YWjjQ2yCwtVruNmZ4e2dQyboFpHgzi6k40N7K2t8fvVm/j53GXkFRXJe/7UsLfDgld6o39gI4PaZQ7yioqw7Va4VJwU33zk7WAYIOEYbryIxf2ERDQxcqWVrtyNS8CMnfvlAr20VzEiOQVT/9mJ/W9OomofM2LMe6ox9IHWImXatGn44IMP8Pvvv4PjOMTGxuLy5cuYM2cOvvzyS52MIIiqRG0XZxx5fQqOPYnA8YgI5IvFCKhZE+OaN4OPltUCe+88wK+XruNJcgoAoKaDPSa1CcGbHdvAxoAeBFtra7zfqQOWnDmncp0PO3cyuNfi1WbBWH35qsrlQo7DmBbN8OeNMCw5UWKb7AEyJScPM3cewPqxw9CzoeEfkkzJw8Qk5BQUlRMocoq77V57/sJiRcqy0+fVNr6MSE7F6Yin6BtAfVMA6NzaXmF7C8IY+kBrkfLZZ5+B53n07t0bubm56NatG0QiEebMmYOZM2fqZARBVDWshUIMCmyMQYGNdd7Hz2cvYfWFqwpx5+TsXPx05hKOP4rAP1PGwsaAyaNvtmmNQrEEv1y6DAnPQygQQMLzsBYK8VGXznitZQuDHUuGr6sLZnRqj1WXygsVAcfBy8kJk1qGYOCvf6jdz5IT59CjQT2L6lmkLaxUjoG6btEFYsvMzwl7EYvL0TFqR70TADj+KIJEigwTd5w1NsbQB1pf4TiOwxdffIFPPvkEERERyM7ORpMmTeDoSHFGgjAUjxKTywmU0tyNS8Qb23Zhy6TRBrsxcxyH6R3bY3xIcxx6+BhJOTnwcnLEgMYBcDHi+FS17BzASQAmgEK7e55nGBIQgLCXscgvUn9jjkpJw8PEZATV8jCancamrquL2kaMMqGSX2R5VT4JWdmY9Nd/FQ7LywPIKxKrX6kaYeqcFGNjDH2g82OYjY0NmjRpgszMTJw4cQKNGzdGUBC1pSYIQ7D91h0IOIBXcxG69vwl/gu7i9Etmxn02G52dkbxmigjIikFC46ckl5sxVAQKRwH/HrpBka3bKrRvp6lplVqkZJZwYjrgPTjKbRAT8rfN2+jQCIp+f5UiRUGeDjam8oswkwYUh9onb00ZswYrFq1CgCQl5eHtm3bYsyYMWjevLlBG8IQRHXmSVIy1FQyy/n2+FnjG2NEtoXelo/yzAEQMOnEoWTE4+vPXmi0r8TsHOMZagIcbGwqXEfAcXAUVbyeqdl3T9otWJPwg5+rq1FtqVSYuATZ2BhDH2gtUs6dO4euXbsCAHbv3g2e55Geno5ffvkFixcv1skIgiAUKXcjkl2QeChcnHIKi5CSk2ta4wzI1egYuRgr/fAt/5sB8ZlZGu2rhkPlfkKv5eSIEB8vpY0YZfCMWWQlU2pOnuINs+wNlJX8X4tKkEtgJSEfXSZLEynG0Adai5SMjAy4u7sDAI4cOYJXX30V9vb2GDRoEJ48eaKTEQRBKPJKYECJKOGLL0gygcKXmpjU61JZyczLB6A2OgCJRAOXEoDmPl6GMcqMzOrWSWXfGAHHYWBQAOrXcDexVRUjKzWWecWU3jyLo0Ehtal1flXFGPpAa5Hi6+uLy5cvIycnB0eOHEG/fv0AAGlpabA1YnIdQVQnGnnUKHlRfNHnIP3ByvJLZUlzNsLK2x7e1tq6wmRRngE9GtZT6WEQcBy6N/CHr6uLUWw0JV3q++GHYQNgZy39Tq0FAgiL3/crgY2wZHB/c5qnErEsT4YHOAkgkAAQA1wRgCJAIAY4HhAJBfBytryBWc1GFQv3GEMfaH11+/DDD/Haa6/B0dERfn5+6NGjBwCpm6dZM8Mm8BFEdWVP+F0FcQKoKEvlAf8aria1zZDUcXHG89R0tetYCQT4ZlBfjNu8HS8zMsuN0FzbxRnfDOprZEtNx5DgQPRqVB+H7j9GVGoqHG1EeCWokUV6UGQIOA68RPq9yM5T2RNw6fNWyKiJmwJVrATZGPpAa5Eyffp0tGvXDjExMejbty8ExZ0D69evTzkpBFGK7PwC3IlNAM8Ymnh7ws1e85GnTzyIBKBeoMj+P/0wEq+2qpwPCL0a1celqOdq1+nSwA8ejg7Y/eYE/B16G//euouknBx4ODhgdEhTTGjd3ODt+s2Ng40NRodoVtVkbgrFYkgkTOrZ46T3TWU+Lw7S8uPzEdHo2tDfpDYSpsEY+kAnP3GbNm3Qpk0bMMbAilt+Dxo0SCcDCKKqUSgW48fjF7D9xh3ki6U9IawEAgxrEYR5r3SHowbj5IgZk4dzKgqHrD1/rdKKFGdbUcnToIo32qle3eJ1bfFOp3Z4p5PljX1UnUnOyVF4oq+oa8/RB09IpBRT1fqkAIbXBzr53jZu3IimTZvC1tYWtra2aNq0KX777TedjSCIqoKE5zHj7/3482qYXKAA0sTC3WH38frmnSjQoJlVg5pS135FF3wGIFnL6h6eZ7gY8QzfHjqDhftPYsfNO8gtNE+DsD1h9yEoVfmBMn8LGBD2PM4sthGaEZeh3cjMltjnhTAchtYHWntSvvrqKyxfvhwzZ85Ex44dAQCXL1/G7Nmz8fz5cyxatEhnYwiisnP2SRTOR0QrXcYzhruxCdgb/gBj2qj3fHSr749LkepbjAPSxVac5s8aCZnZeGfLbjxKSJYP8ia+zmPp4XP4aewgdG3kr/G+DMGTpJSSdvAcStqqFk8MwOPEylu9VB2wsSo1npOqWE8pWvpSdU9VxRj6QGuRsnbtWmzYsAHjx4+Xzxs6dCiaN2+OmTNnkkghqjX/3bwLIceVDLJWJjGOEwDbb9yuUKTciU2U3qS5ir0prf00G5m4SCLBm5t3IjolDYDiCLW5RYWYvnUv/nt3Ahp7ma5rq72NDYBclaWrHMr3jGGM4WF8El6mZcLVzhYt/XyMMjI0oRmNa3nAwcoKOUXiEpGp7KQt/m77BVlenxezUcUSZ42hD7QWKUVFRWjTpk25+a1bt4ZYTGMyENWbF+mZUoHCpCWXgOL1mkmAZynpFe6nSCKRXu81eDKd06eLRradfvgUkUmpSpcxBjCO4feLN7H01Vc02p8hGBgcgF8vXFeo2CnLgOAA+d9hz2OxaN8pPIxPks/zdHLAR/26YGjLJka1lVCOjVAIdzt75ORnltTHyyg1zAEAeNrbV/qme4akquWkGEMfaP34MWnSJKxdu7bc/F9//RWvvfaaTkYQxIvUDPxy/BI+2X4Ii/aexI2oFyqbWlkyHo720uu0EoEiIzevEAfDH6rdj4+rs3R7VixUlMEAF5EIjTxramTb8QcR8p4bypDwDEfvmbYh4/i2LeAgslHaA0XIcajhaI8RIcEAgNsv4jFl4w48SkhSWC8xKwef7TyKHdfvmMRmQpHk7By8TM8sOe8lJee/QqdkMZCena9RTla1oor0SAGMow808qR89NFH8r85jsNvv/2GY8eOoUOHDgCAq1ev4vnz55g8ebJORhDVm3Wnr2LliUsQcJzUU8xx+OfabbSv74uVrw3RqBrGUhge0gQXnzwvabZW5t4rC2t8vusougb4w9lOeensiJAm2Hw5VLpN6YtR6SdTBrzesbXGtuUVFpWEoVRQIBaD5xkEAsOMrFwRtZwdsfzVAfh091Gk5eaV5MnwPHxcnbH+teFwKv7+lxw8gyI13WeXHj6LwS0CYWdjbRLbCSlZ+YXyG6YsrQh8KaFSCjHP4+zjKPQLppBPVcHY+kAjkXLr1i2F161bSy+MkZHSXg41a9ZEzZo1ce/ePZ2MIKovu27ewy8nLgFAqTwO6f/Xo15g7o4jWD1pmLnM05r+TQLwP8ExFIl51WEaDiiS8NgX9gATO7ZUukqglwf6BTXEsQcR0hmlq16KsbexxpjWmpceN/Bwx5lHT1UKFQ6AXw1XkwmUuy/i8eORC7j6NKZYnAIebvZo7V8bA5s1RvdG9eS5JjGp6QiLUV/lk1tYhFMPIjGoRaAJrCdkeDo5VDhid2mSsir3QJAGpQrkpBhbH2gkUk6fPq3TzglCHTzPsPb0FdXLGZPmUSSmoIFnDZXrWRI2VkKlHhRlRCSmqF2+fPQgfLrrMA7dfVxumZPIBr9NHokaWgx7P7pNM2y4cF3tOq+1D9F4f/oQ+uwl3vhtZ7kxX5LScnAiMwKT27dSSIaNTk7TaL9RGq5HGA4HkQ16NK6HUw+jNFrfw8nByBZVHqpCToqx9YFeKfEvXrzAixeaDaNOEGWJTknDy7RMtesIOA5nHj41kUWGQdNKEztr9WEJK6EAy0cPwunZb2JUq6ZoUccLXRr6YcHg3jjz8TS0qKNdKWcdNxd82r87AJTLAeE4Du3r+WJMm+Za7VMXGGP4ctdxiHm+XMIszxjEEh7zdx9XyEmKS9dgJGQGpFXiEaErM18O6a3RevY21ugeUM/I1hCWgKH0gdYihed5LFq0CC4uLvDz84Ofnx9cXV3x9ddfg+c1G62UIAAgv3QCXankOkggH+FXwHEoqGTNnzydHDVy42paOuzt6ozFw/pi+7Tx+G3SSIxr2xwOZcpyNeX1Tq2wcvwQBPt4yufVdLTHrF4dsX7ScMWeF0bidkw8opLSVFb08IzhUXwyHsQmyuflFhaVb/ZWaiRoGbWrwCCDlZFaTo7o4F+nwvU+G9AdIuvKOyCmwdEnadYCk2eNoQ+0Plu++OILbNy4EUuWLEHnzp0BABcuXMCCBQuQn5+Pb775RidDiOoFYwy3o4tzDMrebEq1SReDR4CXZtUrlkJtN2c8S06XPgIoC/0Uv7+4CrxIxqJPUEP0CWqIjLx8FIolcHewM2mfEU1KsAHgeWoGmtSuBQDSihDZeSGBwsCLgLSfDATle6oQpmH71du49uSF4jDdpeAAfDWkF0ZX0B+oulEVwj2lMYY+0FqkbN68Gb/99huGDh0qn9e8eXPUrl0b06dPJ5FiZJIys/E0MRUiKysE+9aCtdD4T77GYO3Jq1h9/LL06lWqXFehrQKTDvveqq5mHgdLoW09X1x+8rykx4mS6hyOB1Jz88xgXQkuKiqLjI2znWbVWs6lqro8nBxKvCdQkvLDAEig1SCOhO7kFRbhwqNoZOTmw9PFESuPSZPfBbx0ZG5WWqgwwE4oxNAQ6mNT1TGGPtBapKSmpiIwsHz2fGBgIFJTlTeKIvQnISMb3+45hdP3nsrd5O6Odni7d3u81jkEnJr+F5bG85R0rDl+WfqCV+yGXhpZM7ODYQ8wqUsrk9l3PfIF/jx3E9ciX4CBoV0DX0zq2hLtG9bVaPthLQPxy7GLJT1OSn81pW6yXi6Ohja9UtCxQV04imyQXVCoch0XO1u0qVcSPujQoG6FAy0CwN0XCejbjMpbjQVjDFvO38KqY5eQWyAd74kBQKlnJQ7ly4/zeQkuP3mG3sENTWVq5aAKVPeUxhj6QGsfb4sWLbBq1apy81etWoUWLVroZAShnuSsHExY+TdO34sEzzP5iZ2alYcle87g5yMXzW2iVuy6flde5lrae6KsUSUHYN/NByaz7c9zoZi6bgfOPYxCTkEhcguKcP5hFN5cvxO/n7mh0T6c7e3gaGNTMhwNKzUVz7MWCvFK88ZGfCeWi8jaCjN6d1S7zsw+HRXyY3zcnOHt7FThvveF3tfbPkvhUWwS9t98gBN3niA7v8Dc5gAANp29iWX7z8oFijZkWch7sCiqWE6KMfSB1p6UZcuWYdCgQThx4oTCAEIxMTE4dOiQTkYQ6tl4+jqSMrPB8yXeBQDyO/nGk9cxsm1T1K3paiYLteNZcrpC0qS6il0G0+Vu3Hz6Asv2nwUg7b4qQ/b38oPn0aZ+bTSvq76q5vt9Z5GbVwj5uH9K3twH/TubLdxiCUzu3BL5YjFWn7wMiYSHUCCAhOdhJRRgVt9OGN+h/AWttptzhVU+qTnmDaEZgsj4FPxv+1HcjUmQzxNZCTGxWyvMfKWT2cYpys4vwOqjl3Te3se1YpFJVG6MoQ+0Findu3fH48ePsXr1ajx8KG3tPXLkSEyfPh0+Pj46GUGoRsLz2Hn1DnhZYiknbXolQxZOWHvsCr6bYLoxV5Izc3As/AkycvNRp4YL+jRvqHGnTydbEQScABJWqkeGCjgoDoRnLDJy8/H+pr1qFZNQwGHbxXC1IiUjNx97btyTfi8SlE8iLM6r6N64epdhchyHd3q0w9h2zXDkzmMkZeXA08kR/ZsFwNVeuXir7e6CW89iFQRkWWpW8h4cL1IyMHn1duSUCYUViCX4/dR1pGfnYcGYvmax7eTdSKWVdsoenBQoXqaL96WqU9USZ42hD3SqBfPx8aEEWRMhDTmIpb99JRcArjiZ4/LjZyaxR8LzWL7/PP46dwsMDEJOADHP45udNvjfqF4Y3Dqown0MDGmMndfvSl9U0PiMASZJDv5u92lpe281tkh4hrDol2r3czcmQdq6XXbxkQDgiqtPSonMm1EvUc/T3YDvoHLiam+Hce2lXpMiiQSZuQXILxTD1qb8pWl46ybYe1N1OEfAcRjVtqnRbDUFG05eQ05BoVIhxgDsvHYXk7q1QgMv0zc3TM3OhYDjlJeO81BezcZKlr9INU81m0VTxXJSAMPrAypYt3DsbKzlTyoqc2M5ICU7F4ViMWysSr7S+y8ScDj0ETJz81G7hguGtQ1GLVf9kjWX7z+PP8+Gyl+Li70hOQWFmLf1CBxENujZtIHafbRv4ItW/j4Ii45F8YDBKt8bx4CaRh41NSUrF0fCHlUomICKG7XJRKNCFUqxYJFVLEGASpXobGxSsnKx4cRV7L56D3mFRRBwHHo3a4i3+7ZH49oe8vWa1qkFZ5EImfkFSsu6HUTWmNApxKS2G5IiiQQHQh+o9RQJBRz2hz7AhwM1G/nakNRydVLZ24aDtKpHqVApDlPXJ1FeniooUgwNiRQLR+ZF0OSWViTmYWMF5BeK8dlfh3DqbqT8psoYw5ojlzFzYCe82budTrYkZ+Vg6/lbKpdzAH45dBE9guurvQlzHIfVrw/DnK0Hcenhc/mNXGk/EQ5oYOSL26PYJOmNoZS3QxkcB/RoUl/tvpr61oKASTVK2d3IK5h4IKSCvJbqQlJmNiau+AeJmdnymzPPGE7djcDZ+0+x7p2RaNNAWuWz4/IdZOcUC5QyJa5gQHZ2IRLSs+GiIlxk6eQWFKFQg8aFSZnmGfumZ5P6cBDZlAtFyZANLKik4h42VkK0b+hrZAuJqoh5MrCqKIwxnAx/ghnrd2Pi8r+xcPtxRMarH59FEyoUKAywFgpgL5LmhCz89zjO3JO2kpfwPCTF7cd5xvDzwYvYe023gZ5O3YmQVhepNgMR8SmISkyrcF/Odrb49a1X0b6hr+L7K/tkwQMDWhp3wDihbEC9ss3kytgl4DiM7ag+Q132oCkXXkxajskV5xTJKnyiEqlcHwC+33tWQaDIkPDS9vif/XUYkuKcpO0Xw6XfAw9wYsVJIAGsBBx2XrljjrdhEOxF1hBp0PHX09k8eTd2NtaYO7S79EXxdyAoArgiKPxmuFKTjLd7tzNbwq8lo1D5p+NU1aGzxkDcfhaHrvPW4KNNB3DhQTTuPI/Hrst3MXLJn5ixbjfEaoaYrwg3h+IGVWpOSHtrG3AchxcpGTgU+lClWxYA1h27olZsqCIzt0CjEXKz8jQvNZw1oDME4KQ/ttIfEQ8IOaCehxu6V+C90Jdmdb3kYTUFoVKmzG/hqL4VVlC9SMmQChXZRUSW8Cy7uTJAwAGRCSRS0rLzcDz8icrwBs8YEjOycemhNN/qZamcBoXS7uJ5Ep7hWSUeYNBaKMTQNk1KRLMSJDzD0Dbma4o2om0w2vnVhlACCFiJGBGIofL6NL5Tc7zTu70JraxEaFJiXNFUxdFJpGRkZODRo0d49OgRMjIyDG1TpePRyyRMWfGPNPFSCRceRuOTPw7qvH+JmCnePFHq7+LXecU9CM7ci6zQ9fIyNRORCdp7eOrUdFEbLwekIREfd81LDVv4eWPZxIGwsRJCwAFWHAcrjgMHwL+mOza88yqshMbV0vYiG4zv3AIcp+gBkcEB6BlUH8PbBle4LzsbawVBUu6pkgEQQ6Mn5qrO8+S0Cs8noYDDk/hkAICDrfqW9wKOg3MlL+ue1rsdnGxFKoXKuE4tzJpw/cepG7gZoZg8LhcqRYC1hENArRqo7+GGYa2CsHfOZHwxojflYFUjDK0PtLr6//bbb2jSpAnc3d3RpEkThb83btyotzGWjoTnkV8oVhidFQAW/XMcFVXJnroTgdQs3WLJEolE0dNQVqjwQJFEOjO/UFxuhFtl5BeKK1ynLL2aNlDb0lwo4NA1sB48nLVLzu3fIgCnvnobc4Z0x6CWgRjRrilWvTEMu+ZMgpeJeiu8P6AT+jSTdsO0EkhFkhUn9fC08vfBd69pVt5dz9MNNgJB+QecUt4VACjU4fOvathWMAo0IPWmyEaLHtQqUK2XgWcMA1pW7gZ53m7O+GvmOLTwUyzXtLexxvR+HTBveE8zWSa9/v12/JryhcWCnJcwPH2RgpjYdBy48gA/7TmPzNx8k9pZmahK4R5j6QONE2e///57LFiwALNmzUL//v1Rq5Z04K+EhAQcO3YMH3zwAdLS0jBnzhydjbFUHsQk4JvtJ3H3mbS5klDAoXMTfyyY0BcSnsfd5wkaZbb+ceomPhrWTevjS5Nni9Qmpsl0SUPvGhU+nVoJBDo1frOxssL8MX0w50+pV6i0VhMKODiIbPDJsO5a71fC87h0Pxqnbj7B0/gU2NlYw8nKBo28asLH3Vnr/emCtVCIHycPxvXIF/j9xDU8fJ4EK6EAnQL98b8xvWCj4cit6Tn5EIt5Rc9J6f+LWXPoMvq3bAw/TzdDvYVKRyPvmvBydUJ8BQ3aejSVhvsm92iFfdfvI79IXC6cKRRwaOzjgW5NKn//GT8PN2yeMQZPE1MRGZ8CkbUV2tSvI885MxdRCanIVuYtLnNuSyv2pDMvPYzG+7/uwR+zxmoUKq526BuysRCRYkx9oLFIWbVqFTZt2oQxY8YozA8KCkKPHj3QokULfPLJJ1VOpJwOj8Ds3/YrzJPwDOfuRqHf/37DoHaaJ3XGp6m/GKuirKtU2U/dwUbqCu8SWA8ezg5IycpVmpciFHDoHxIAFwfd3OL9WgRg7TQb/HLoIu6/SCy2D+gWVA8fD+0GPw/tbrpiCY/P/jiIE2ER8h4MmbkF2Ho6FP9duI3177+KZv6mqYTJyM3Hl5uPICE9Wz5v35V7OHTtAeZP6Ish7SvOBSgoKuUhURI2YsXzGQPWH72KbyeZrgGfpSEQcHi3Xwcs+Pe48uUch0GtAuHtJhWqvjVcsXH6KMz+4wDi07MgFAjAihPCW9evgx8mD6pSyZn1Pd0tqmy3qKK8urKCnAMkEobwqDhcfvQMnYP8jWgdYU6MqQ80FimJiYlo1kz1MNvNmjVDcnKy1gZYMvmFRfjot/0qy1LFEh57r93XqL8GANT1cNXNENl4PWqO4VQcr7cSCrBk4gC8++tucDyv4FURCjjUcnHCx0O19+aUpnOgPzoH+iMmJR2ZufnwcnVGDSfdeplsO3MLJ8MiAEBBVEl4hvxCMWat34uji97S2JOhK1m5+ej7xa8oEpe/EIslPL786yjcnezQuYIn9RrO9uC44p4RZQSK7H/Z7EM3HuCbif2rdbx+ZIemSM3OxarD0nbrXHGjGQnP0KtZA3w1po/C+k3reuHI/97AhQfRuBeTAGsrIboG+SOwtqcZrK9e+Hm4Sc9tZU/vxWHn0ue3rDeQwAo4EvqIRIoyqognxZj6QOMrf9u2bbFkyRJs3LgRVlaKm0kkEixduhRt27bVyQhL5bdj18qPYlua0j/YUk8PqpjQvaVOdtRydUJ6Tn7J+Vi6P0TxVaFBrZIOlG0b+mLrB+Ow4fg1nLwTAZ4x2IusMaJdU0zr2w7ujoZpjuZbwxXQo/ElzzNsPROq8nfGM4a07DycDI/AgDb6lSHn5hdi35X72HvlHlKzcuHl5oRXOzfDK20aw0ooxIRl29Q/KTLg2x2ncXC+epFiLRTC1d4WaVklcXhV/VIYD+QWFMLBVnWeT3XgrT7tMKRNE+y9fg8vUzPhYi/CwFaBKoWHUCBA9+D66B5s3KovQhF7kTU6B/rjwoPokpmym2ypn07Z850XA2nZucY3sBJStlRbl+0tAWPqA63CPf3794eXlxe6deumEHM6d+4cbGxscOzYMZ2MsFQOXHtQ8Vkgu8PykH6aykQNA/w8XHUWByM6NMXSnadLBFPpu3rxxWFER8V24IG1PfHj64NRUCRGTkEhnO1sjV4loy0pWTkKoRVlWAkFCI+K00ukpGTm4M2fduB5cf8WBiA5Iwe3o+Kw8+IdTO7dGjHJFWehv0zKUNmyvTSu9nZIy8qvcEwiAIiITUGL+tV3zKv41Cz8feYWDl17gOz8QtSp6YrR3ZpbVJjDnBRJJIhJTAfHcfD1cDX7b3jJpIF4ZdFvyM4rlHpOmPrnM3l5uBIPJVF1MKY+0FikNG/eHI8fP8Zff/2FK1eu4OlTabMwLy8vLF68GBMmTICzs2mSHE1Fdl6ZrPSyzSCFUKjaYOJS80qLCR7oFqT7U9+w9sH492I4niWWL9kUcBya+3ujZ3FlSllE1lYQGTlUoguMMfx16pYGK0LvhLv/bT6CF8npitqu2AV2NzoeP+w6q/G+7j9PQKuGtVUuZ4whOr5Ur44KTE/KUC/SqjKPYhIxbcV/yC01Vk1kbDK+++cUjlx/iNUzR2o8aGVVo0giwe9Hr+Pv07eQkSO9DtVwtsfEXq0wqU9rk+XepGbm4sDV+3iWmAZHWxH6tQ7AssmDMHPV7hIvctlBNMvAAOTm0+CCSqki4R5j6gOt7l5OTk5477338N577+l0sMqGrbU1svOK5E8MgOJvkYmlM+Tjs0gAVjzyLZPNl0j/33XhNua8qn3lCyB1s/4+cwwWbT+O03ci5RcHoYDDwDaB+HxUL7M/YWnLmv2XsPn4jZIieBUXOTHPo0PjujofJzohFVcePle5nGcM8SlZGvtdb0a8UCtSCsWSkguHBl+JUyVt4a4vPM/w8a/7kVNQqNBYUPZX+NM4rDtwGbNH6pc/VRmR8Dzm/Lof529HldyDGJCSnoufd13AoxeJ+HbqQKPnMu04G47v/z0DnjEIOA4MDH+euClfXtapqwoOGiTdVlOq0ijIxtIHBnvELioqQlxcHOrW1f2GYmm0buSLo9ceKSQ9loOVCBL5fY5XWAwOQF6+tL+KrhcWN0c7/PTmUMSlZuLOs3gIOA4t6/ughplaZOtDalYuNh27UTLEu4qbuVDAoXYNF3Ru4q/zsW5Fxla4jqziRpPQXlEFY6vYWJUaa0ndPosvLk4VNCirqlx+8AwvU1SPisszhp0X7uC9wZ0qDK9VNU7eeoJzt6NKHn7KLD9y7TGa1PXCpD6tjWbD6bAIfPfPKflreVJ7mXO6dMWaqnNdKOCqdam9WqqIJ6Ui9NEHBnv8vn//PurVq/w9CkoztXcbAKofsjkUezrVnCilnzYS0vR37Xu7O6NfywD0CWlUKQUKAJwIfQK+ePRk8CgJo5VOQmbSfi6/vDNML9e2xpJQk4sFB/hWUKFVbvgDVfssPjFszdz7wlzcjY6r8HvNyS9ETFK6aQyyIDYeuaYoUMqcQxyAn/47h32XdBuDqyIYY1h34HL5kcmLhUjZ62FFT/MSnmFUF9WVH0TVRx99UKliBEuWLAHHcfjwww9NcrzAup7ykccroqIkSQ7AhbtRBrGrNI9jknD6VgRCH7+QD8Rm6aRl50LACeS5PFzxYGXyKoHieYV5Epy/rd9n1qphnQrXUWi8purLlgmnCjxhZ8Ijy3cGZkomXupF8a+mCaJSgVLxL0tdh9mqytPYFIUwj9KLCwMW/nkM6Tl5Bj9+Qlo2nrxMViw1Zkr/VJyh4usc3iEYrRqoDpFWe5RdHzSdqgEa+1FbtWqldnlenuF/LKW5fv061q9fj+bNmxv1OMpQe5mUeUG5ip/aCwzYCv3O0zh8t/UkHsUkyed5uDjg/ZFdMLij+QYg0wQvd2e5oCotEDglkZS/TtzEhN4tdfam1PV0Rdem9XDpfrTSTrwCjkP35vVwKuypNPlZ5vpSSD6S/sfx0kHx1HH/eQJshAIU8mWaRgAKrzkA0wa0r7ZdODs28cPq/ZfUruPp6gi/WtUvTCDhmdQ7oS5cyEkXr9l7EZ9P6KNiJd3IK9QuyVXeCbtMXLymswMm926NiT1bVeteQOqoKjkpxtQHGouU+/fvY9y4cSpdNnFxcXj8+LHOhqgjOzsbr732GjZs2IDFixcb5Rj6oslPsHFdD4Mc6150PKb9sAOSMqGFpIwczN90FAWFYrza3fRiTlP6tGyEJf+cQn6BuMJUkKSMHLxMzkRdT1edj7doUn+8/fN/eBKbLG9GJetu27pRHQxq2wSnbz2VurfLXmxLCRQOkHqA1GAtFEpzj2SlmVyp3fBM/l77tgrApN7GyymwdIL9vBDSwAd3ouJUDuMwxYRVLJYEB2n/Rk3u69cexhj8+F5uThBZWyl2T64ADsWFA7x0lG9vdyf0atEIvZo3qLZCvDphTH2gsUhp2rQp2rdvrzJzNywsDBs2bNDJiIqYMWMGBg0ahD59+phcpATUqYnHL9R0ymOlbkIqvCmy3NA2Ab4GsWnFjnOQSHhpMltpn2zxVe2n/85hYIcg2FlovoODrQ0+erUbvt12quKVARUtLjXH1dEOWz4djyM3HmHflXtIzsyBj7szRnZuhp4tGmLWyt0KHg55Qi9K5smWN/RRH57pHOyP3w5flW/DGMB4BqGk1L4AeNjbQyzhYV2NR0P+YdpgvPPzTkTGpchFo1DAQcIzjO7aHB0C6yLiRRLqeLrCtpqWIquFodyDiiGwE1ljaMcm2HXhTnkBqeypgkGeVyYofh2XkoWtJ0Lx96lb+OatAejXtnIP/Gg0qkjirDH1gcYipXPnznj06JHK5U5OTujWzfDlgv/88w9CQ0Nx/fp1jdYvKChAQUGB/HVmpuoKAk1YNKU/xn2zVXq3UfZoI7u5cSUvFSIFxcv7tGqklx0SnsfJG0/w17EbuPc8sfjuV3xs+bGkiimvoAinwyIwsH2QXsc0JqO7tcDLpAz8efym2vVqONvDp6aL3scTWVthWMdgDOsYrDD/4fNEXLn3HJwA0lyS4lQJha9a9qXyQEqmerdl8/reaOrvhQfPE6RuezFTWr6+/VQYLt2Nxr+LplRboVLD2QF/z3sNp8IicOTGI2Tk5KOupytcbUU4du0Rdp0MBwDY21pjZLfmeHdYp2qRaGxnbY2cgiIwWeKsmpBPkF8to9gwfWhnXHsYg5jkdIUScQ5S0Q0BV3KtKyPAS8PzDF9sOIxGdWqinrce7amrKFUl3GNMfaCxL/Xnn3/GihUrVC5v0KABTp8+rZMRqoiJicEHH3yArVu3wtZWs34S3333HVxcXOSTr69+3osL4VGAWMWZIP2tyv8u3U9FBgfpvLaNdbdDLOExd81+fL7+IB5EKxcoCjYBSDRAJZGxmTWiK3w9XCBQ4dfmALzWu5VRe8DsOncHxcPFlCsfV6D4u63Icc1xHJa/OxT1vNwBnkGg6nsCEJOYjmWaepMMDGMM8alZiE3OKF+RZEKsrYTo36Yxfnp3KH7/eAwcrazx19GbCudvbn4Rth0PxfTlO7UKQVRW+rVpDI6rQKAU896QjkaxwcXBFps/HYfXerWCY+kyecYgkAAQM3DF16GKWgwxxrD9VLhR7CQsA2PqA4sO+N68eROJiYlo1aoVrKysYGVlhbNnz+KXX36BlZUVJJLymZbz5s1DRkaGfIqJ0T1mK+F5bD95CwJWXH0iZtJgMc+AIgaBmIHJ8gyUVXLwJX8XVtBfQx1bjtzA2bBIAMX9ClTc+GSveZ7Bwc7y+28IBBxWzBgOV0dbBaEii2H3aR2AiUbsBQEAz+JTi78r6YcqFyoSxf9l4rOJf8VPrjVdHLD189cQVNujZJ8qOHT5gdQDZiIYY9h15jZGfvY7hszZgGGfbsSgj3/FpoPXINbjHDUED6IT8M/JW0qX8YzhTmQs9py/Y2KrTM/EPq1gJcvFURMOGNKxiVG9E84Otpj9ajec/OFdHF/2Nt54pS24Yj0ruyZy4jIhZyUwABfvRBnNzkqNPpU9eoaKTF0tqytad0m6ffu20vkcx8HW1hZ169aFSGSYAdN69+6NO3cUL0pTp05FYGAg5s6dC6GwvJtcJBIZ7PjJ6TlIyZQOjCXziJRtjS/kOEhQkrfAiv+XIbtB1fPS7WIilvD4+0So/DqgrPikNLL54qLyNxyeZ7hw+yl2n7mNZ3FpcHa0xSsdAjG4SzAc7cwzyF09L3f8N38K9ly8i8PXHiIrrwD1vNwxqltzdDdB0p2Tva08YVb2IcuEClfawcABzRp5o46GI1lbC4VIyah4ULWCIjHyC8Umyx/66Z+z+Pt4qMK81MxcrN11AXciYvH9zKFmS1bdff6OPCdFGQzAf6fDMbZXS9MaZmLq+9TAz+8Px5x1+5FXUFxpU+pHzwGY0r8NZo3oahJ7rIVC1HB2wIxhnbH9+C25TWWL19RRWdojmBpzhXuMVS1rDH2gtUgJCQlRW05mbW2NsWPHYv369RqHaFTh5OSEpk0VB85zcHBAjRo1ys03BkINwgwcOHi6OiCxeKC8ct4NDqjl5oT2Qbp14o1PzURqpuLNrqLbtoDjkJCapTBPLOHxxdqDOHXzScmNIBG4HxWPrUdvYv1nYwyS+6ELro52eL1/W7ze3/SjaPdrG4DToU+kL3hIk59L6buSfB/gYWQC7kTGolkDzQYE1FRgmWpIgzuRseUEigzGgPPhT3Hs6iMM6GieXKZn8akqBYqMl0kZJrLGvHRs4odjy97G4WsPcfFONBLSs+DqYIcuzethVLdmsLEyfRfeIrGkRDRpSRs9wt2EYTFmtawx9IHWV8fdu3ejUaNG+PXXXxEWFoawsDD8+uuvaNy4MbZt24aNGzfi1KlT+N///qftri2OGs72qO9TQ20poITn8f6IznBztCuXWyEQcLASCvH1G6/o7BFQla+hDo5DuXDPHwevyW/GpW8EjAFJadn4ZOU+k4YdLIWeLRvCr5ZbybAGpZIAFbpqQir0Pl25X+OwSNvAii/Mbk52Jkuc3XXmttrmaAKOw3+nwkxiizJcHGwrPN8rQxjTUDjY2mBUt+b4acZQbPviNcwY0hGJ8RlY+c95HL5436B9lzTB2kqo8xAF4/tUbe+Xzpgh3FO6WtbQGEMfaH3GffPNN/j555/Rv39/+bxmzZqhTp06+PLLL3Ht2jU4ODjg448/xg8//KDt7ivkzJkzBt+nKjiOw+sD2uKrjUeULhcKOPh5uWNA+yC0buyLX/dfxqGrD1EkloDjgK7N6uPtIR0QWNdTZxu83J3hXcMZcaXHOZGdmCqu5xKeoXfrAPnrIrEE/xwPVRk6lvAMj58n4XZELFo0Mn1nyPyCIhy69AD7z91FUno2PN2cMKx7U7zSMQgiI4/bYm0lxA8zhmLMl5tVJyOXIjkjB+fCnqJXm4qrtWaO6irNOVGzzusD2mllrz48fZmi1lPBM4aouFST2VOWfu0CcSo0QuVyoYDDgA6WW7FmLGKTMvDud/9KB8IsxdI/T+K7GYPRsbny3hSGhuM4tGxUB5fvRiuEniq6T747rKPRqpAqPQYqQS5bxaoq7UHballtMYY+0PoOcOfOHfj5+ZWb7+fnJ88fCQkJQVxcnLa7tkiSU7OleShClMSFi/+3Fgqx/P2hEAg4eLk74asp/fDp+F5Iy8qFk73IIHkeAgGHif1b4/ttSjKjlSSnCDgOPVs1RH2fkhyYZ3GpyMjOV3scoYDDzYcvTC5S0rPz8N53OxD5oqTRWlJaNu5GxmHnqXCsmTsajvbGzZep71MDLRr64HZEbIUXDCuhAPej4jUSKTWcHfDJhJ5Ypuy7A9CvXWO81s90Dd0c7EXyz1gV9mYc8LBHSAM0rFMTUbHlxZRAwMFOZI3xfdR3tqxqpGflYuy8zUqrmnLzi/DRT3uwaf4EBBYndKdn5WH/ubu4fDsaEgmP5o18MKJXc/h4VBzKfRaXim1HbuLk1cfILxTD38cdo/uEYFDXYFgJBeB5hohnxR2uy157eFmZT8nMGs72mDuhF3q31q/9QlXGUDkpZatY58+fjwULFijMk1XLHj9+XO9UDFUYQx9oHe4JDAzEkiVLUFhYKJ9XVFSEJUuWIDAwEADw8uVL1KpV+ZXzs/g0rP73AgQ8wBVBmrMgG1tGDBTkinHtnmL1kK2NFbxrOBs0EXV0zxCM6CYdoEuZu14oFMjn92jZEAvfeEVhuaa/AXOEe77ZeAzRsSnFx4fC/4+fJ2HpnydNYsdbQzpo1DOOMQYrq4p/NkcvPcBrX/yJH/84Da4QsBEIYGMlhL3IGoF+nlj14Uh8+/YgA1iuOX3aBKh9jwIBh/7tA01nUBmsrIRY89EoNC/O+REIOHkSby03J6z7ZAy8azibzT5zsGjDUbVl1xKeYePeKwCA8McvMeLjjVj97wXcfBCDsMcv8dfhG3j1k99x6OJ9tce5cf85Jv5vC/aduYOsnAIUFUnw5HkSvt14HJ/8tAdisQShj2KQkp6jOHwFk4ZIhUWAoAAQFDAIixiEBQx1nJzQ0gye2epITEyMQlXrvHnzyq2jS7WsthhDH2jtSVm9ejWGDh2KOnXqyDOD79y5A4lEggMHDgAAnj59iunTp2u7a4tj75nbEBQnmXIoU+0B6UPDjhO3MLKncVvQCwQcPp/cB690CMSuM7fxNDYFjvYi9GjZEFZCAeJTM2Fva4PerQMUPCgy/L3d4ewgQmZOgeICmYpn0otdSIBpLyhxyZk4FxqpUkTxPMPxq4/wwbjuqOlq3BGfOzb1x1dT++Hr34+pXU/CM3Rqpt69vnHPFfy665L8oVIAQJLPg+M4NA30wc+fjDRLA7cBHYPwx8FrSEzLKu+p4KSeitG9Q0xuV2ncne2xYe5YPIhOwOV70RBLeDTxr4WOTf0hFAjAGMOVO9HYeSIcj54lwtbGGr3bBWBk7+bwdHcyq+2GJievQNqnqQLO3YpESno2PvxhN/ILihQeNmSN2L7+9SjqebsjqL5Xue3zC4vw2S/7UVgkUQw/FP9/6XY0th25iRpujgAUG7gBJXlcsm6zsmX3n8Zj5pId+PnTV+FoJ6oWjfi0xkDhHmdnZzg7qxfwulTLaosx9IHWIqVTp06IiorC1q1b5b34R48ejQkTJsDJSXqRmDRpkra7tUgiX6iP4TMGRMeaJobPcRxaN/ZFax2y5K2thBjTuyU27r8ifZIufvop2wn15y1nsfj9Qajr7WYwu9UR/uRlhb9Pnme4GxmHHq0bGt2eoV2bIie/EMu3nVG6XCjgEORfC80aeKvcR3RsKn7dJR04r6zXgjGG0Acx2HPmDkb3CTGQ1ZpjJ7LG2k9HY/bPexAVmwKhUCBPCK7hYo8fZw2Hl4Xc6IP8ayGoTE8axhiW/XESu07dVihV3nzgGrYfC8Uvc19Fs4aaVV5pioTn8fRFCgoKxajr7QZnB+O4yZVx9IrqDp6lYQzYc+Yu8goKVXrKOA7452goFr43sNyyE1cfIyunQGnrBAYAPPDHvmtYNH1gueUqu80Wt0+JeJ6Cwe//Co4DurRsgNeHtUOwmt9PdYNjxU3x9NheU0xRLWsMfaBTVqKTkxPeffddXTatVNjZWsvHFFGFsRM7DcWbQ9rj0fNEnL/1VNoxkpW/sETGJOGdRf9gy7eTULP4qcmYcBoNy6jZQGuGYlyflkhJz8HmQ9flN0LZOeDn5Y7vZw1TW2K35/Ttint9nAgzi0gBgNoeLvhn0WRcf/Ac1+4/g4RnaN7QB91a1IeVhbfnP3D+HnadkvZhKP358jxDfoEYH/+4B/t+nmaQcX4YY9h98jb+2HsFianS9gLWVgL06xSEWRO6w8XJTu9jVMTFW5EarWdrY4WbD56rDeVJeIZLt6OVLjt57bFSgVL675zcQtTxcIGLo21JfpuqRPNShsh+44wBl8Ke4lJ4FL7/aBg6tTBNsi9hegytD3S6w0ZGRmLFihV48OABACA4OBizZs1CgwYNDGaYJdCjdUOcuv5E5XKhgEPvtgEql1sSVlZCfD9zKNbvvIQ/915Tuo6EZ8jIycf2o7cwY5zxG0WFNK5doQgUCgUGfzpWB8dxeH90V/TvEIg9Z+8gKi4VTvYi9G3XGD1aNqjwRh4dV3Gvj5j4NEOarDUCAYf2wX5oH1w+wc2S2XbopsrEX54xZGTn48SVxxjcLbj8Clry63+XsGnPFYV5RWIehy/cx53Hsdi4aAKcjOxVyc4trLCSDwAGdm6CqJcpFe5PVUO1u49j5X/LD1PmHOYALN9yGjPHdMPiUiFRBbPkSWWylv6KRkt4Bo5jmL/mEA6sfKfSPOAZFQOFe3TFGNWyhtYHWifOHj16FE2aNMG1a9fQvHlzNG/eHFeuXEFwcDCOHz+ukxGWSq82jVDH01VpsirHAQKBABNeMV11RmnEYglOXH6Ij5bsxJTPtmDuj3twITRSYTCwsggFAsQnZartlcHzDAfP3TOGyeWo5e6Enm0aqewhI+A4DOgYBHdne5PYA0ifoBljaOTrgU8m9sKaT0Zh6Ywh6NM2QCNPg72tTYW9PmhEX+3JySvE05cpar0FQgGHsEcv9D7W0xfJ5QSKDJ5neJGYjq0Hb+h9nIqoVcNJOjaYqtOpWAzMHNsNzRv6qO3FJBRwSsV+bGKGYq4aY4AsB6/UBADXbz+Di50In0/tK624Kz0eiGy4DhUCpfTus3IKcOaG6oe/6oSsukefyZIwhj7QWsp+9tlnmD17NpYsWVJu/ty5c9G3b1+dDLFEbKytsOazUZj1wy5Ex6YqxPDtbW2w5P0haFCnpsntysrJx4ff7cT9yHgIBBx4niEyJgnnbkSiYwt/LPl4GGyslX+1aZm5FT7pZ+SoL1c2JJ9P7YOXSel4GJ0of0qWeVeCG3hhzqSeJrHjclgUth24jtAHLwAGNA3wwfhBrdGjrXblk73aNpK6z8tSfHcVchy6myC/pqqhWciPUxuK05QFaw6pXc7zDLtPhuOd0Z0NcjxVDOnWFEcvP1ScWSYW8/rgtnCws8GIXs3x12HVwknCM4ztV76hWnZeQcluSw0YWBZZ54UvVx7Azp+nYUDHIJwJfYKvVh0q6Zki8woI1X8mVkIBnr6o2PNDVD6MoQ+09qQ8ePAAb775Zrn5b7zxBu7fV1/mVhnxquGMf76Zgp8/HoFXezXHkG5N8fnUvjj08zto39Q87vLvfj2Gh1EJAEqy92XC48rtZ1jz93mV23rXdFbrSQEATxPko8iwFVmjuZ8XrCUcWPEAjryYRyMvdyyc9opJ+nZs3nMVHy3dhdAHL8DzDDxjuPskFvOW71P7WSqjR+uG8Pd2V/yMJQyCQkBYCKCA4cSZ+/h69WEkV4KRqi0Fe1sbNKrrAXWnroTn0TpIv/brKek5ePIsqUI3ekZ2vtE7vrZp4oturRoovudiISAA4O/lhkmDpc0AfTxc8NW0/hBwnMK5J/t7ypB2SvNAXiakg2Os3Jhj8iqdUhPHAxIJw96TtyGysYKdtXXJylqELRhjVOkjgxlgsiCMoQ+0FikeHh4ICwsrNz8sLAyenrp3VrVkBAIOHZvXw5yJvTDv9T4Y3qOZyQaEK0tcUgZOX3uiMqzDGMPuE7eRnVugdPmQ7k3VelIEHIcRvSouqS4oLMLV8GjsOXEbS389jtfnbsHrn27Byj/P4EV8ukbvhTGG+T8fxO5j4WBFPKyKAKtCwKoIeP48Fe8v/BdpmRUP0qcPDyLjsW77BQBQ+Exlf2/Zdw037z3XeH9WVkKs+mwUGtaVjoAs5AFhmXuZhGc4cu4+3vx8K1LSc/R8B9WHiQPblE2VkHuoBAIO7i726NVOv8Zhl8OiNOqXYyUUqPRWGgqO4/Dt+4Mxrl8riKyF8tCLkOPQs3UjrP9irEI/pgGdm+D3BRPQt0MgnB1s4WBng7bBfvjp4xGYPrqL0mNs/O8yZKOAy0M7DNJ+UDI7ZP8zgBczhD6Q/h5SM3LL3yyL96XuQ5TwDN1bV638RV2pauEeY+gDrX9l06ZNw9tvv42nT5+iU6dOAICLFy9i6dKl+Oijj3QygtCc0PsxFa5TWCTGvYg4tG/uX25Z04beGNAlCEculG/XLhRw8PVyw6t9WqjcN88zbN51BVv330BuXmG55RHPk/DvoVDMnzUQfTqpbwx2+9FLnLlWHJsuY4xEwpCcnoN/D4XinXHKL7CGYOfxMLXVOEIBhx1Hb6F1sOYDRHq4OWLzwtdw7mYE5n2/D0B5FzoDkJSajfX/XMDn7/Yvtw+iPP07BeLx80RsPXADQh5gkuIwBMcgshPi+w9Uhzk1Jb+wSBp2VLcSA1oF+Rp9hG5A2j7gw9d64K0RHXH7SSwkPI/Gfp4qe8IE1auFhe8O0GjfYrEEkc+TpSEeWbpVsUApnW5Skmci/T/qubTtQv06NaWfvywXpbjRJWMArFXnmXUKqYf6ZgiTWyRmTpw1NMbQB1r/or/88ks4OTnhxx9/lHe18/HxwYIFCzBr1iydjCA0p6J8EhmqPC0cx+F/b/dHbU9X/H34JnKKhYZQwKFPh8b4aFJPOKhpQ79800nsOhpe4XEX/HIIDet6wL9O+eZyMg6duSfNqRGXryRgAHgJw/5Td4wqUu5FxJX/TOUXXQYeDJevP8Wx8w/Qq1NjjUcs5jgOF65XXEJ65Nx9fDS1F7m/NYDjOHRpXh//7Q9VqFThGFCUJ8bKP8/gly9HQaRHYnIDXw9FJ4AydckBrw9rr/MxtIHnGW7ceYYn0YmwsbJCx9b1Dda0TsKzkoaORQCsUSJIlNw8ueKoTkZGHjKz8xHc0AuO9iKp17bU+gIGMDEDK5tnznFo7O+JhdPL92ohqgbG0AdaixSO4zB79mzMnj0bWVnSAa9kTVqqMzm5BTh+4SGiX6TAztYaPToEoHF9ww8N0Dyg4nJcoVCAQDXHFgoEeGtkR0wc3AYPniagSCxBQ18PuLuor6KJfpGiVqCUhgPw35FbmPOW6pE2k9Ny5AJF6XMXA9IyjBvuKffkzQBOwkp6QHAcxGIJFv5yCAdP38X380Zo/LR+427FYSKxhEdqRi58PF20Nb3aUVgkxuc/7oNEwpQ0ygPuPY7DH7uu6iVqQwJro66XG14kpJWElmT/F8dDQhrXRqsm+uW+aMKDiHh8uXw/4hIzIBBwYAxYsekUenYIwOczXoG9niNCi2ysFLwkMo9IacGh6nd56Ow9jBvUGrMn9cDX649KQw+lVuFkQ4gIACZLrOUZJg1qU61Gsq4IQ43dYykYQx9onZNSGicnJxIoAE5cfIghb63FD78ex+6jYdi69zre+HQLPlr8H3JU5Iboin/tGmjVpI7K5FeBgEPfToFw06Bs19bGGi0D66BdU78KBQoAHDxzr8KkWxkSnuFqeLTadayLq6VUVRNI1zFug7FurRuUlAyXEigcV7pSRPr/jTvP8es/FzXed35BkUbr0UVbM85cfYLM7HyVY0zxjGHX0TAUiXUfg4TjOCx6fxBsbaxhJVCM+wgA1HSxx8IZxvcExMSlYeaC7UhIlo5uy/NM/r7PXnuCecv2GGSsLUHpYYwlxepEjUCRzQstFuCDujXFx1N6Kd23bCgRgUQ6cQzIzDbs9bDSU8USZ0tjKH2g0SNhy5YtNS61Cw0N1cugysbNu8+xYMUB+W9bLClxQ1+//Qz/+3EffvpytEGPOX/GQLy3YDvikjPlFyrZt9PAtyY+mmKcst2k1CytfhPqerYAUi9GsfdcKRwAsZhHUZEE1tbGESvDejfH1oM3UFAgBi/mpR0y1Zzq/x26hWljO2vUiKpOLTdkZKkf7dPe1tok3UsrCy/j07HnaBiuhz8DALRuVhcjXglBHW83PHqaAKFQAIlEeVMyQNqDIzElC7VruepsQ2D9Wtj0zUT8secqTlx+CHGRBCIrK3Ro4Y9ZU3qilgkGOdy65xoKC8VKf0PSENBzhD94gRA9PTr2tjbIyS0EGIOAB1jxY2tFV/uE5Cz536P7tcTOg6F4Hpde4fG8ParXAJHVAWPrA41EyvDhw7XecXXhjx2XwanomsrzDNfCn+FBRByCGhpuvApPdyf88d1E7D11BwfO3EVaZi483Z0wvHdzDOoebLT8BjcXh+KTsWKpIhRwaBWs/gIqEFTcGJ8xhrz8QlhbG+dGXtPNESvmvYqPl+5CTmEBit0oKtcvEksQFZOMwAblB2ory6hXWuLeKvUiZUivZtqaXGU5dekRFi4/IM1HKr45P32ejP8OhuKr2YMqFCgybAwgaP183PHWyI5ISchE6J0YiAvFuHA5AqFhzzFhWFtMHtXBaImzjDEcO/+gJFeq7BMzBwiEHI6df6C3SLG1sUZOToG8xwnHA0yg7tFBuqSsSK/h4qCRSGnZpI5e9lZFLC1koy3G1gcaiZT58+cb1YjKSlZOPkLvqa+2EQoFOH35sUFFCgA4Odhi4pC2mDikrU7bFxQU4dCpu9h3/DaSkrPg5uKAgb2bYkjf5nB0UJ44O6B7E2w/eFOj/Ut4hlGvlG8eJR1npQiJyVm4eD1SGghXIwrsbK1hryaR1xA0D6iNrctex4i311f8CAkgK1uzZne9Ogbgn4M38CgqUelyT3dHvD3WeEnBpoIxhruPYnH28mPk54vhV8cd/Xs0gbMWHqLnsalYuPyANKxRar5MrCz66SBeHVz+fFI0RHq+GGLcqaSULLw7bxsys/IU5ufmFeK3fy4iNSMHH01TnW+lD0ViSUkPlmJNpnAjY9LP/JmeDdEYYygqLFLIS5HOLz6Imt9lUJmcN7/aNRD2UP2Aoc4OIqOXbVc6mKwcSo/tzYyx9QGdMXqQn19xzgEHIE+D9UxJVk4+PvzqXzx+WtLlNSMrH2v/PIs9R8Kw+tvxqOle/kIf4O+Jfl0CcfziQ5W/DVkH3I/f7K2QOJyWnoNtu69h//Hb0jFJZE+harKihAIOg3s21biiRh/kx6jIbckYGtTVrHzS2kqIlV+NwdJfj+HU5ccKF/BOLevhf9NfgZ2t5l4vsYTH1dAoRMckw87WGp3bNkQtM7vPs7Lz8fmSPQi79wLC4s+Q53ms3XIOn07vh1d6aDaOzq7DYQDU++iuhkZL/ygrbEudjPl5hRCLeb3Dg1t2XUVmVp7Karpdh8Pw6oCW8FNTvaYrNtZWsLe1Rm5uUYk4UVJldOdBLOKTMuGl4zmQkpaDrKwCeTWdzEnKMVTYNbZsYv7gXk2x58TtEuMUGtBJQ6gTh+n2QEVUb0ik6IGrsz3s7WyU9guRIeF5+NVxN6FVFfPzb6cQEZ0EQFGIMwYkJGXi6xUH8fOisUq3/d/0V+DiZIfdx8MhFpe43oVCAZzsRWgf4o/RA1qiSSnPUXJqNt759C8kpWSBlfXWyzzLSjwq7m4OmDzSNKWezo62xW51eVlP+ZUYg5WVAO5aPKk72ovw9YdDMHNSFsIevABjQLPGPlpX89y6G4NFyw8gOTW7uNKDYcWGk3ilZ1N8/G5fswzWxhjD50v24vaDlwCgEIopKpLgm58Po6abI9q08KtwX9duRaktr5fwDEkpWRAyaQNUAADPg+OlIQrZt8UAXL0VhS7tdB96QCLhcejUPbX2CAUcDp+5h3cndtP5OKrgeYaCfLFqgVI8j+cZdh2+hemTu+t0HAnPKwoUBSOY9EGi9O+y+G8BA+KTMhVWb9LIG8P6NMO+Y7elCbiC4p0yBo4D6tWtiZH9K/CEVUOqWnWPMSCRogfW1kIM7dMM/x4MVZkkam0lRP9uTUxsmWrS0nNw4twDlfZKeIbQOzGIjkmBv2/5p0QrKyHeHtsZjf08cO9JPBwdRLC1tkL0s2QUFolRx90ZNVwcFLZZ8esJJCVnSa9xKC5JlF38GKQXRIUnL2kW8vdzh8PdVXFfxkIoFICThRpkF2dAfqGVXajr6Fgq7FnDCf26BOm07eOnCfh44Q5IikVh6e/uyOl7KCgowsJPhuq0b3249ygOYWrCnQIBh807LmskUjS51lpZCYGCYu8CkwoUoNxwNpj37W588cEAvNKzqQZ7LU9eflGFlVk8Y0hKMc6wBpdvRoKXVdpU4Nk7eva+ziKlppsj7GytpZ7eUmko8gZtpX+Xst+DhMnLoctSz9sdAknxAh6QfascGCCuOJeoWqJvhU41ECnG96NXcV4f1RF1fdzKJdHJXn/6Tj+jD+muDY8iEzRqCHf3UWy5eTzP8NtfFzB88hos+ekI9h8Iw9//XsWmbRdx+sIjXLwWiT//vYwx037FgWNS129yajbOXX5S0oOhDNIyRQauiIEr5KVTEQPHM7i5mEagANLSUxcnW+nFtax7CQB4XvqEKDH9VWHzv5chkfBKk7MZYzh18REiiz1jpuTc1SfyEI8yeJ4h7N4LjXJ4Wgb7qk0HEnAcOraqBysrofQhXU2ZLAAsW30MGZl5Kpaqp2weijIYD9iKjPOMd+L8Q3BMffKqDE1CzqoQCgVoFlS7ZEap04tD8VM+D+k5X8pjJeEZQsokwCalZGH1pjNKj8MAPH+Zir/3XNPZVqL6opdIuXjxIgoKqnfdu5ODLdYunoDRA1spDIbXNMAHy/83CgM0jMlrQnZOAf7ZdQ0T392I/qN+wtDXVuGT+Ttwt9jdrgkCDfM7lFUurPztFP7cfhkFBcVJfWUfYSGLmDAsW30UobefIzomRT54GVfqSU3+gmfgigcvExRPHA8IxMC9h+WFkjHJLxBLL84SgBMXCxKegZNIbRQwIDYhw8Q2FeHC1Qi15dxCoQDHzz8woVVS8vOLNMkz1qhfTE03B7V9P3jGMKhnU3w5S9qjpGzzsLKIJRIcOXNPA+sUYYxh7uJdFY4/A658hYuhePI0oUQwqNPEDPCsqV8fitEDyoRglHyopXsZCQUc/OvUQMumilVFh07eVfuF8DzD7sNhBuntUpWQhyv1mCwVQ+kDvX5lAwYMQFhYGOrXr6+3IZUZZ0dbzHq9J959rStS03NgK7KGqwbN1LQhJTUb78/9G7GlBu/LLxDjWmg0roVGo3WIHxbPGw57e/WNwZo08oaNtRCFReobXpW9CMXGp2PngTI17hXcobbtuoqhr4SUj5vKcj9kT2pM6hKWFSRL4+QM85ftw7Z1b8HHy1X9gQyEWCyRWsAzuV1yk4VSw3gTe1KycwqUelDKkqmj10Af6tWtodCeXhmODiK4adAo8MSZB9K264yXisRizxvjpJ89x3G4/zgOE0d1QHxCBtZuPqt2fwKBANEx2le/3H0Yi+jnKSWPb8qqzxgDx3EQaCTRpMInMysfQqFAZeWcjKOn7+HZ89SSmIu6MmcOGDOkjUY2KOPB4zgsXLpfGubkUBJ+VXU4DnBzdcCSecPL9cWIiU2t8HgZmXnIzStUO+xGtaMKh3sMpQ/08qSQKlbExtoKXh4uBhcoAPDdisOIT0hXnMmkT/rgGW6GRuOT+Tsq3I+jgwhD+7VQ2XxHIODQrUMjeJfJvTh+5j640n3ONGiXcj0sGk6ONoqPYih+ApDfiJjUta1EyDAJw4r1Jyp8T4bCVmQt9+zIChTkPWcl0snBwbTdYV2c7CoMK/y/veuOj6Jo/9/ZS++hJfTeu3SRJigi9q7Yu4KKHfVVsYLKqyIK8loAFQRRQUSkCAjSew+hdwKEkJ5c2Z3fHzuzO9vujgAS+O33w5G73dlpOzPPM08bSum/xsiJuKJ7M0RFRjiSakkiuP7K1qotSRBk5xTg0NFTIAqF5Gfjg/c/VU+RJgGKTdsOYVvmEXw9YXFY9SvLSeWLV+wAoDNJFvA1j1JbDzgRAVnBT9NW47YHxuK6O0eh/20j8diz32Hx0h2O6b8crzJfvO22EhXh2rV9yhZnR5YVvDHsN5SU+ED8VD/1mLvEUvVU5Do1KiKtciIa1KmMJ+7pge8+vR81qqbqVaEU+QUl8EhSyIBekkTOi4F3ecbFdgqyiLPFH7gj5gLAoSOnsHr9Pv0CJ+qmMbBl62G8+vavePu1G4K67T5+b3ccOJKDVev3aS7D/G+jemkYMsh6Ku/J3CL1qHZefhgRBikFdu/NNhjlcUhcms7F9qb8CFE9WFat3XtOI86K6NG5IWbP3aKWL9aF/aUUaFzn7J/HFAyRkR5c3bslfpu9Iagt0VW9zp5aMVwkxEdjyKCr8PYnMyERYlBJSRJB3ZqVcM8tob2zxk9aypht9bdd3xMK7N2XjTHjFgU/AJBBlhX06NLotNoDACdOFOgeLxo/Yux3LnS4ooezIbQsK3j17V+wcs1ew/XtO7Lw+vvT8ch93XH3bZ0N9zZvO4STp4r0cqjaJ5Rzy1o1CKBQRJ5BMLnV6/fh+Il8bf55AhRUpqAS0aQ4HhD8b/gAR+ns/EUZmDR1JXbtOa5KvBxOPgZUNdFlnRqGZFhduDDjjJiUsWPHIi3t3120/z/CYnMSRES4dMUufD1hMR5/sKdjftFREfjwtZuwYt1e/PHXZhw9noeKqfHod3kL9Ohsv5BUrpCo8iaUqm7EoWRwFEhNicPO3cf0OltiPdCguy+VUQFWrNmDbl0ahijwzFE5NUGvop2RLwFOHC+w3jjHuPfWzvhn5S7knCq0ZVQevbtbyF39uUKfbk1QMTUOE6auwNpN6nkuiQkxuL5va9xzc6eQh+B5fQHMnr9V62+n0UChqrSOnyjQ0lGJ2vrPEgK0aFwNrZpVt+QTCrW5Rxsfrw78eLdODVExiCv69D/WWxgUEV9NWIzLOjdEnVpqees27McX3yxU6w9onjVm9aPaXFU9065NndNuH8c/y3daxjihUM+uYqBQkLk7C21b1rI8P37SUoz7Yak+f6leZ7v3QQjBPTf/O+EELihcBMHcnHC2+IMzYlLuuuuuM67AxQZKKTZsOoBpv6/Hjp1ZiImOQPfLGuO6/m1QqWLZjNwMRqx2qhETfvltLQbc1hmJCc5eRR6PhK4d6qNrh/ph1eHKXs3w9ff/6EWLLsQOlOWm/pfg+PF8w4KvnwMf5i6QUmTuzPpXmJRDh0+xMqHtMMWqA8DBQzkIBOTT2hEGZAVzF2zBxB+Xo7DQi5SUODz2UE9c2qkBKzcH2zKOQPJIaNOqpmWcVExNwNgPB+Dzbxfi7+U7NIlFWuUkPHDHpejf+/yG1m/bohbatqiFomIvSr0BJCfFhhWALzevGBN/WgG/L/SBgARAaYlfs9EglAIBgEbAuFATAsjAgBs6hn2eiIjr+7XBuB+X6W64FsZavfT8Y8GjzX43eXnwgijFx6PmoE/PZsjOKcR3k5apGUcIRJ9AdfkVuocIf6tXNqpkTwcrVu8OK92OXVYmZffeExj3w1LWDKrXi78PxpR4JIKArCAhLhpvPn8tGjcIfZTE/zdczHFSzhZ/4Kp7ziIopRjz9UL89MtqwzkjByYvx8/T1uCj925D8zLs7k43lLTPL2P1un24vHuT0y7LCWmVk9D90oZYvETVp6vSFBtGhar/VaqQiLtv6YQVa/bg97mbrGnCpR+E/CuqHgDYf+CkrpvnxYt/KQBJNcoMF3n5xbj7wa8Mbrh5+SV49c1fkFY5ETVqVMDa9fv18ghB755N8dxTVxqOA6hcMRFvvXgdTuUV4/DRU4iJjkS92pXPyfkx+/ZnY+nynfB6/ahTuzK6dW0U1jtQZIr8vGJAoahYMbhk58jRXDz9wkRk56jqDSKdpg2gAtV1XSHqoXgSmCG2ysR/+Mls/DKx3mmrFw4dOoWYSA9K/QHVkNtmo9u3V/OgAf28vgBy84rtbzIVDgGwafNBbNqsx5lRRAmpAkBWDcoBwHzKFQXFzFkb8fC93ZGYeHohDk7lFiE7uzCsOWgXPmHGrA3weAhkkxE5AUAC6pcKleJw5eXNUa92JfS8tLFri+KizHBHzlnEgr8z8NMvqwEYo2/y82qGvPkzfvr+CcTGOIvAKaU4djwfPm8AaWlJACH4eORsa8CzEPCG4fZ5unjjxWtxzerP1NgMfEHl1JsKcgcKfP7BHfB4JHRuX08N8c2fYckByh7lcn6HxlGKXt0bn/W2mJFfUIL9B7K1305dTShBcYkvpJcGx0NPjHOME3LsRAGOnTCqjyilWLAoA1nH8vDpR3dZJBKpyXFhecuUBSUlPrz3we9YunyXevgjIZBlBUlJsXh9yLVo366u7XOHj5zCN+MXY/E/mZo6qnWrmnjovu5o2cL+QLm33v8Np04VqTtxotpYkBAMV9NG6cjYeUyNWaNAk5QQBRbmMi+/BEuW70LPbuGPnZ27juGFVybD65dViYYEXSXJ6ukhBHeHsLPJOVXkqN4011MEAUBlAB5Anx+wlQgREMgKxey/NuPWG08v3PypXAcGygYVUxMw/vslWL5yFwIBBS2aVce2nVkqg8KlKGKbCEAJRXG+F0/c1+O06hUM2ScLsHdvNiKjPGjWpBqiLhamJ4jqPuznL3JcJG/63OLEiXws+icTBYWlqJqegh7dGiPWRtc+5ZdVmsGnGYpCUVBQivkLM3BNv9a25Sz4OwPfT1yGfftVYhkbE4kWLWogv6BUXdwig7sIiqhbO7zzZU4HkREe3H5TB0yYtIzpygGqUGafolkZIik+GmmVVFG0xyNhyOB+eGP4DPW+QpmIkqrxzSVYw28LqFWjAmpVP/vno5ixbMVuKEpoPpBSiiXLduKqK0JHM83IPILsMkQlVRSKLdsOY/GSTPh9Mpav3AW/T0aDBlVwTb/WqHwOzuuhlOLNd6ZpUh1F2NYXFJTilTd+xhef3oNGDY0i+0OHc/DkM9+juMhrsJfZvOUQnn1xEt5/+xZ07GB0QdyeeRSZO7NMFUBIg+z6daogY3uWpk4Ihc1bD50WkzL+hyXwczf0gCqHpx6iE+OAAhCCn35ZhRcH93PMRz2PycZeJoTdDcBsvmQSlEERsWT5ztNmUiqkxmuBlINVJiUxBkPfmQ6fP6CpGPcfyIaf9wdTVfBI0oD+Cs/WcVsnTxZi5BfzsGTpTm1dTUiIxu23dsJdt5+7k6j/LVzM6p6zBTfibBDIsoKRn8/D7Xd/idFjF2LSjyvwwYhZuPmOzzH3ry2GtD5fADt2HQvqdiVJBJu22IcR/+nnVXjn/RmG3XxJqR+r1u4FFHYUhp+CyEpwYytKUbtmRTRueG70vzf0bwuPcPgYAQyBhTyE4Pr+bQ2RSHt2bYx3X70elSsmAIQJsFlwNHUXbNMWShHhkTB0yPXnpB1mFBaVhiWoIoQ4i/IF7N59DENe/7nM9SEE+OCjWRj+0R/4Z8kOLFuxCz9MWo477vkSf87ZFDqD08S2jCNYvXafbdA4SimoQvHDj1Y7i8/HzEeRiUEBVCZHoRTD/zvLIFUEVENwA+1l7q5atF/biL8UR7Ny9XShQCmKisIPJFVY5MWylbt0wR5UAiAFqPYhVGXK587fqp9SbIOU5DgkcTWJoS1hVISQ8NsI4NSp8KUiHCnJcejcob52oKCd+z8oRWmRH15fwDAmZJlqrtGEvS8iZECghhQoLfLhVG4RzgR5+SUYNPgHLFu207CuFhZ68c24xRj5+dwzyt/FhQGXSQmC0WMX4Lff16mLNKUIsMW2pMSPYR/+gaXLd56Vco4fz8fYr/4GYMN7aAIKdfITEF28ak7MtkadLqlTJqPBcFAhNR5PPdobgHWXJ0kEtWtWxJ024vDuXRph6rePY9SwO9CrWxM9rLnIqCgUkBX1E1Dwzis3oH6dyme1/qWlfhw+cgqnThkX0OrVUh2eMIJSirQqwSUZGRlHMPDp75GXW8YAa1QV+/u8foBSKIwxVWQFiqzgwxGzsHGT85k5ZcHCRRlBQ9zLCsXS5TvhE4jziRP5WLl6j2M0XEqBU6eKsHL1HsN1cfcr7gR1RgU6s0IBMMLI1S6qUWsQRh0ACEHrljWd75tQUFDinB1lUYcD6sdfGsDRo6cc85JlBSVFPuM8Zf72IWclFQk+greRUiQnxwo/KbZuPYyxYxdi5Mi5+G3GOkdG7dH7uyM6KkJnFqnxb+tmNeH3B+w3XWzOUjEUgvi+GDM3c9bGUK0Nip9+XoXjJ/IdXe9nzNyA3XuOn1EZ5x1CXJoyfy5yuEyKA7JPFmD6jHWOY4AQ4Jtxi7VJnJNTCI8UXB2jKNR24fxzzqbgMmBqXNy49EJbHFgaUFUjFBmmsaCiUJw4kY9jx/Isu91guPHaS/Duf25Eg3o6AxEbE4mbrr0Eoz66y9FeQ5IIWreoicu7NTa2hzIpkVeB5KXweCk8fmD4+zMwbfraoCHhw0VubjE+HTkHN940EvfcMxY33zIKTz/9Pdau3QcA6NCuLiqkhj4rKD4uGpd2dj5hl1KK4R/NVHfaZ7CAEBOxMr//SaG8R0KAUoqDB05ix44sFBaWMmIWvL6KQg1nxRw+mhuyHEkiOHTYGI20Xds6zvMKOuMqBnUDgBZNqxnSBbNj8ngI+vQK/2DPlJQ4RETYLIcKVY1BZfU8KSgUkkwxaND3yHA4tsHnD0CWFebSy9oRoJrkwQmacbbZW8kObL6nV0rEiRP5KCgoxbPPTcJTT3+PqT+vwu8z12PkyLm45dZR+Ptv65EJ9epUxqgRA9BI9LihQGJiDAY/0QfRUZ6g70hbkyj0c324bRBbm+bM2+zY1nAwc9bGkMdBzJ5zZmWcb1zMwdzOFlybFAf8s9QoYjSDUjW41KHDp1CzRgV8O/4f9ayXIMxGYkIMeve0LpwHD51ik12xnMVAI0x2/UzpyxdztYpqPQlUQpKelhy0bZRS/D5jPaZMWYGsrDwAQMWKCbj55g645daOQXfUe/eewIzf1mLTpoOQJAk3X9UGPXo0QZMmVREdZoTPQwcFokUpU/1QmE8cLiz0YtSoecjMPIqXX+pfZulQbm4xBg6cgGPH8w2L3raMI3jp5cl47dXrcPnlzfDyc/0w5I2fg/IWzzzZJ6inQkbGERzk7RPDjJ9O1XkdqZFGiZve1Wv2aEH4Thd//bUF301YgsPM5ToiwoM69SuH5KkSEqIRLzCg4RgPKwq1hEGvXbMiOrSri7Xr90FhUhKKoOYoIITgistb4LuJy4xu8NaEAIC7b+98Wl5hsTFR6NWjKebNF877UZj0hGfNPhRAcaEXzzz9Pb744j40NKlWY6IjkZwUi7z8EvVZWWVQFNHF2KmdQrMoGLMqzgtRqgNgwZytmD93KxITY1HADkYUj27wlvrxzju/IbVCPFq3MroSN6qfhv+NvBe7957A4SOnEB8fjVbNayAy0oNly4JIiTlxZAyVmYGmAKAQHA2DiXVCICCHPOpBURQcO5Ff5jJcXBhwJSkOKCwoDcvVlO9CF/6doRJAjcAICygTy119ZUvExFgJeWxsJBRZ1k4ZFXfO4mmv2mJgs8vm9yIiPLi8p3MkTEopRo6cg5Ej52gMCqAaqH311UK89+5vjruX339fj4cf+hp//LEB+/ZlY8+e4/h9xjq8/NJkbNhwwLFMM9KqJOsLnEz1hRiwpVRz527BypXhxXWww7jx/+gMiiAmVWQFlAIj/vsnSkp86NS+Hj794E7Ur2dVMVWulIihr16PK3sHj+y6Zdsh7bu209RWbgFBGALtnTrQMwJVnL5s6Q5MmbwCv/6yGkd4jJcQ+OmnlRj2/u8agwKoBGH3rmNBd62SRHDt1W0MDGy9ulVQNT04Q+zxSOhqE+PmtZeuQT2uyhNev0rgKEhAgeRTP0Sm6N61EbZsOahuBELYZF1xeTPcN+CyoPWyQ2pCDFM3qvmbGRQtfg6rrxxQ8Phj4/DkE+M19aEsK1i+fBcq8+B6zKOHApokJhQ3aJacGeZ7QDHUC6xenEEx1Bec6aH45utFjuXVr1sZ3bs2Qrs2tTXGrmWLGiEZYMIYFHMqLmlRAmXf5ns8EmJjg296JElCyjnydPvXQM/C5yKHK0lxQPXqqSFVIIQQpKcl42ROoSreZZEhKfcKYBwGUSg8niDHkfGIkub8YRAsqKBON1Q88UgvJAQJ4rZx4wH8PmO97T1KgUWLtqNHz0z06GGMsZKRcRiffjIbgLo487ooikrs33zjZ/ww8UlUCuNU1g4d6qqbQibOp7zwIOcJ/TZjPToHUbM4oaTEhzlzNjO7DuOCStn/pSU+LFiQgf79W6N1y5r45osHcPxEATJ3HkXAL6Nq1RQ0bpgeliTn4AGjlEhlKIi9JEUBWrWqgc1bD2tSu4gIDyiVQWUlqKifyMCbb/wKSVKj8n7x+V/o0aMJXny5v63nGcAY0f8ttM9SppAAKBLRPT8YJImgerVU3GkK4y5JBA/d3x3vDv/dNk9CgJuuvwQpKVZCkpIchzEj78X4Cf9g0pQVWh9JsqpOEXk7EqDI3HQI6ZUTEeGRVNswAlDzCs0YiBuuueS0JUxZWbn4ZepqFm6FGqxHNAbFcEFH5vajeOjBr/HZqHsw9I1fsXfvCfW0cRYRV7MtoQD8bG3wiJwZRXx8DIqKvRoTKMsKoiM9SE2Nx7HjTFogK4bAbk71sWNUtmw5hJISn+PYMKP/Va3xw4/LoSgOgfZMDK0do0JPU00rBxQsW7YDW7ccBpEI2rSqhVWr9+g2KWxTw8tWiIya1VNVe71zZIN3ruF694SGy6Q44LJLGyIhIRqFhfaGZ5JEcGnnBkjldgxsxwSoaxOEXQQFQGXF8QyMxYsynONyyNQgItbEwapuR2VuZApCgJq1KqJCclxQNcDvM9YbAs3Ztev3GessTMovP7MAdQHZVirg98qYMP4fPP/C1Q4t0ZGcFIsr+7TA3Nmb9LYEWWQUhWLv3hMh87XDyZOFzADVek/j9STg4CHjiblVKieiSuXTjxCclpak520qR6yDJAHVqqXi0w/vQs6pQmzfkQVJImjRrDpuvHGkcwFMPcYhSj/++ScTBQWl+HDEHbaL9lxRf2/ezRMCyBTRER5UqZaCg4dUZisqKgJ9+7TAww/YBw3r3asZCgu9GD12PvwBGR6PBEVRDc2vv/YSPPZwL8emREZ6sH9/NiPk6nELElNVmAntyZOFmDdni7oZAHSVgw0qVghtW2TG7NmbtPIIVKZNg/gi7UCAvNxiPP3UdygsUGPiKLJ6mjNl7vWcGZcAQKZqPBRCecggDBzcC1WrpWDJ8p3w+WTUr1sZV1zeHPHx0di56ximTluNjev2ITvL/liGcEj0sWN5qBOmIXrFigl487XrMfTd6aAK1RgFj4domxTbMtm44sPr+/H/ICY2Cl27NUK1IMbpO3dm4T+vTEV2dgE8EWpkv4CsgER71HPFAorqAi5uNCjw5ai/cGBfNp57vt+FyaiI0veyPn+Rw2VSHBAVFYEXBl+Ft977DYAx9okkESTER+PxR9UFODU1HqnJ8cjNsXe541MnxiFybGGBg6skL5KXzYNXAapIWlzQKXD4YA7eHjoNV1zZAi8NudaWUdm3LzuohEhRqBanRcTatfvsGRQBs2dtxL33dUPlMIj7s09fiYXztiBAlZCSFAAhRb9OiIuL0upslzvvy/Wr9uKtw78gOTkOva9ogRYta5Rp0WvSuJrKrIqaQqoaXWoH1kkEycnxeP+tmyFJBJUqJuKyLnqfSQSQnciOzSF82i2FYt26fdi86SBatbaet3L4yCnmIaNoUU8BXaIECfB7Axg54i6UlPrh9fqRnpYccvd9/bVt0btXUyxYtB3HjuUhKTEGvXo0RZUQXlCAqlbl70eUoFjaJivIPVkEREqOFFmSCJo1q470MpwI7aQuI1SoUYgxavbmIoBGRMzt4id/c2b26KFTmPbTKhzYrzLLtWpXRHrFRHS+tCEaNkjDqy9eg8ce/daeSXHqNFNdNqzfHzaTAgBduzTEN18+iOkz1mHZCjWYW/Nm1dGvb0u8OuQnY7EKNcwzfv37CUtAKTB29Hz06t0ML7zc32K7lp1dgOefnYiSYh8AGJgg4leASAnEr1hsX3jTZ/2+AfXrV8ENN7YPu20uLhy4TEoQ9OjeBB/ERePbCf9ge+ZRAOpCeFnXRnjs4Z6oVjVFS1vsIHERsW2rvTdAMKhiUwBCbBJ1N23ccQL6rnre3C1o3qIGrr3uEkt+8Qmhxb32h8KF5tgVheL339biwYd7hkwbHR3JCCbsF1iqL3qSh+Dy0/DUEJGcHAfJIcAeBEKxZ/cx7Nl9DJIkYeaM9ejYuT7efPumsI2BOS65pA7S0pJw7ES+FldDYnY3vAYSgA6tajkaOKenp+Awd3ElpvdOg9Mjj0fCX39ttWVSEhNiQGVV+mbWXlBAZVwiVMYuNQxPJxEJCTG4rn+b03oGANLSkkE2H1JVnvwiNe0u2Q2JqhJJauOFQwiBJBE89oiz5CYYzFJObsuhIziDEgwW+xIKTRXI8eMPywzP7N1zAv95ZSquu6Ednn5WPZX8dDzw7OAPnP7ztWpWxNMDr8DTA6/QrokeXmrF9H4y95Ao6ft7YQb8fhlD37nZkOa36WtRUuyzj9GjUFCvzrLb2r8AGPP5Xxcmk2KSsJbp+YscruFsCHRoXxdjRt2LH79/HF+NuR+//vQU3nr9BgODAggLCFtgiV/RPyzOhVOoeic1kAYKo8GgYmVQRBAC/DJ1lS1h7tmzadC1VpIILu9tZQhat64d1oT4e6HV3dEJlkWXtY8EZNVwUlY/8MsoKfSWaZEuKfEF9dIyFa2VsWbVHnz639mnXZ4kEbzx+g2IiYyEBAopoGi7TG2HSYEF87bg/Xen2+bx5JO9jcG8BHePUGSSRza2Q5dLG0BxkFbwulVKTThtxuxMkBgfbWBIVfssfXybVTvET0F8ikVdVa1qCj764I4ynY0FAH2uaGHMk9dJNNAJOozsb1reFzOctZMK2GHG9LVYuXwXAKBx46rOZYQxxjt1qhcyTTgw2JqYjPiNCamuzlAoqKxgyaLt2LPbGNtkwfytQY22w7G7CAQU7Nl1LIzaly8QIKSLcdDP+W7AvwCXSQkT6WnJaFA/DclJsbb369arrA44ZvgHCIusokaudNqdtm5dC8GCU/GBDBl6sKsgoBQ4eDAHJSU+y72+fVuhQoUEW1WQJBHEx0fj2mutEpibbg4v9LZllxUEqSlqf2iLEKVWvTMAUOCnH5fjvx/8ERbDISI2NgpRUTauqIwYOk1yRaH4a+4WZJ+wtwEIhqZNq+HLL+9HalKcvpDYFLRkUSZ2MAmdiC5dGqJZM6Y24rtURjRDtV6SgHQHj5vDB3NsvTFE5Jws1Owq/g0sWpChq0Qo1eLDWIg4hWZcLskUUqkC4pVBfOrfG69pa3Cx5QEYw0Xz5jWQmBQrcKuMWdIkHzhziuDgDRMK37BAj7fcEmQOMumMU7mJiTGoVevsHJWxfPlOjZF0VD9yGz2qf3jcm9ksyNvG9fvx2otTkHUkL6xyQ/Xbd98tCa8BLi4ouExKCAQCMhbN34bXXpiMJx74GkNfmYoVS3daOP8bb2yvGtsJelkOLk5fvXQnAgGrtXyjRlUNhFpfbIyyQCPXHXoBtnOhjo+PxiefDtCM2DweSfMoqFgxAf/9+C7bE2xbtaqJOnWCL3KSRFCvfpWQ9eLod3Vr3fMYCGpvAQBzZ2/Cti2Hw84fUNvXu08LQyh/KMxLIIT4nlKK1avK5vocFxuJU/ykWScZNYAfv19q+/zHHw/A5b2aqo/zYFkAYqIjglZblimuvrq17b0jh3NVo8QgUBSK7OzTZ8zKivz8EtXQnNntGEa1MBeIaV4RMGYloEBSVMkXAPw9fyvuv/0LXHnZe7jysvdw142f4Q8HbzYRMpfasXLFDYZ6Daa5aaonEDS+EE9nO2tDjMMD+7ORfSIfS/7ejkp2RsGiekyso/DdQ89cXcSxZNF2tdpOEl3BiYDANAUo8PdfW/HHb+vwwlM/YM3K3c79KiAUg0KgMuEXHISQCGX+XORwbVKCoKiwFEOenYTtW49oUt9dmVlYuigTrdvVxvv/vVM7jTM1OUbfXdiAAMg9VYxl/+xA917GOCbNmrFImlRILIqYNSsx9eweSgEEITaSRNCocVXbmCwAUL16BYwb/yjWrNmD9ev3gyoULVrWRJcuDRwX2jUrd+PIvpO29zgUheK6G9o53g8EZCz/Zwcyth6GJElo1KwqKlZMQE5OoXq4X4gJ5/FI+POPDWje0v5kXScMuPtS/LM4E8XFXuZ1Ed7EJgSGMPCngw/em2F8n3agwLq1e21vRUVF4D+v34CHH+mJpUt3orTEh5o1K2LJggzM/zsDTh5Rd9zZGTVr2R/ImJAYE5ZbaDAX9rONihUTcPx4PiRFUEMpVDPspYzCafdsVC+UUmRuO4wnH/gKO3cYRf4njufj0w/+wIqlO/DOB7c71mPf3hPIz9clSLw8Qikon/w+CkRJMBjQ8vrIFHfd0wUTJ6q2JcZNDIVEJEZTQjPHZigBBQNuGgVKqcqLRBA1D6KeVE2ZRJDyYJL8wE5AUzXm5RZjZ+ZRNAmiDpNlBetW78X+vScQExuJzl0boZKNEXzGlkPGPrBUWP3j1MpTJ4swcsSfalJF9UykZT0okEu6FIrjh3Lxzn9+wVXXtEH7TvUuCG8f1wU5NFwmJQhGDJuJzAzV2NVMPzeu3Y93XvsF73ykLnyTJiwNye1LEsG2LYcsTMol7eoiPT0Zx7WAY8bnCAAiEVCvrC/eoscP37mwBV1RVGIVqi4dO9ZHx471Q9QaOJldgDdfmap69xCiGvGaF2pC0PeqlujSxT6WSWbGEbz58k84eaJAjSFBKRSFonrtimjYIB07th0BQhgnyrJSpiiWVaumYORnd+OD4TOxI0NQr4QgGJQC9RuknXZ5uaeKsH7tfvVHsEFBgJLi4Oqx9PQU3HxzB5SU+DBk8ERs23xIff8eCfDA8A5atqiBhx/p6ZhXj55N8NWXCxzvSxJBk6bVbAnTuUL/69piwreL1cizAGAOVAYAUM+sEo1ORRAA+TnFyA9yVtKKJTvx+7Q1uNbBuFIRpQKaPQoASdLqBQDwKmqcE87Ls7gdBMBtt3dCh071MOXHFVi+fBcUWUFSfDR8RX54vQFAchhuQQk+BZWpgd8lfjWSCyIIkpPj0KFzfcyfu0WvtwMTXup1ZrhXr9iN99+chsKCUm1J+UyajX7XtsGg567SgrxRSnHiWIHKKGkSHKIzzTbqOjMsTWVMhi4mMya4sm8LzJuzBRZQqo0XCqC4yIslizKxeEEGLuvZBK+9fSMiwjwixEX5RblW9wwbNgwdOnRAYmIiqlSpghtuuAGZmZn/StnHsnKxZOF2tvO0m/QUK5buxPZth+H1+sNSQ1BKbVUwkkTwxtCbEB0dYVRLsHu1alfEsGG3GiPS+lVRn+QNwOOV4fHL8PhkSF4ZndrXxWWncUR9KMyasR4Bv6yupSwaqBEETRqm44WXrrHdvRzPysOLT32PHGbfQWVF29Ef3n8S+3ccgxTGDl+SiG1gsHBQp25ljBn7ADp1rAeJMKlNkOifkkRQu04lNG9xelIbQPXMCMvqnqqeS6FQWurH809+h22bD4EbkUp+GaQ0AFLqBynxg5QGsG3dfuTnORPq9KopuPqaNkGPg7n5lg6YNmUlvh0zHzN+Xo38ME58PhNcf2M7VKueqjGuVtUB9L4UGBRiFnUThOzzH7+zV60BaoyhaC551FQ8jOjKgkssmJrJr6gfRQ2CxzcOzZvXwNvv3oLZc1/C44/2QuGpEvh8AX282XW+yOybo9I6MBwEAAlQ5J8sQs0aFULa3xBCULNmBdt7P3y7GK8+9yMKWRh6TVgVkPHn9HUY8Z4erM9wqCBgPI8oDAaFJzO/Ou3sHwFVq6Zg8HNX4aVXrkXdepWtr9fGq0hh72rpou347uvFIWpSDkDPwuciR7lmUhYtWoSBAwdixYoVmDdvHvx+P6688koUFZ3ZEeDhYMPa/YBmqWg37dRV8Ydx/yDgZ3YmIXSElAJt29exvde4SVV8+b8H0bF9PUTIKvMRQwgu79kUI0fdi/Vr9hlLpxSST7EMWEKBNUt24n+fzzvNFjtj3eq9+k6TslD2Pln/BBQcO5zrGEBu2tRVKC3yWWNFsL8+b0DtNptdsghFoeh9RYszaktiYox+2nJAsS+TUsTGRuG1N24ok8jYcMBjiEWkTp3QNjy//bwaO3dkCcRa/SJBdcuV2HeqULw48Dvk5TozFk8/2xf9r20LwlQFXL0XHx+Nyy5tgGH/+QVffjoXP09cji/++yfu6P8xfjK5x55NxMVF464BXZCUEG0IUseh24YIHcmJITGNphCvKpgRdGxsFPpzBk7wxlIlJeEZu37/rUoUTxzPx8RvF+ObMQsN9VXFnDZrBKVAQJ1Lkl/9EJ8M4g0Y3bIpVcWkimJYa3ZlHkV61RTH+efxEFx6WUNUtIkGvWxxJiZ8tUiX5ohlsHIXzN6Et17+CbKsIGPrYU0Npq07MlXnEq9qyJ6yplDzhHogY4AiLjIC3096AtdedwkIIfhk1D2oJNrKURrUu4VSdd6cjiH/+QCh9Iw/FzvKtbpn9myjC+j48eNRpUoVrF27Ft27dz+nZSsKP3AjuLx+26ZDiIuPRuUqSThxPD9IeoqU1Hi066C7AcqyAm+pHzGxUVAUBd+OXoBVizIheSQQCsjeABbO2oRTx/NRoUqyKTuqMSWWRU8i+OWHFbjp1k6oHOJslXCgnXmj6KJVTV8PABKgBJksc//YCCDIQs8XSAWqCsNG/C1JBI2bVEMnB3VSOFg4dwsWzduihpwHI+x+2ag6oRTRkR58+c1DqGpyMw8XjZpURWJCDAoKS2HoMBGsu4b857qQ+U2dtEJlqjgR4YuzKU8CYN/u43j2sXEY9e3DhsMAOSIiPBj8fD8MuKcr/lm8HcVFPlSvkYoDu09g4rf6zjPA1BuBgIKvP/8LcXFRuOamsxuHYs2KXXjvP7+giMUYIszOwhGMMBOTmrGs8UvMeOCRnti65RB2bD4Mqqj9TZTwy1i9Yg/iYv/GxHH/MPotPMMOBaVaoD+BKWCxa8wMvDajeLvN1xj2783Gq69fjxcGT0QgIBvsYSQPQXJKPAY+3ddSX0opvh49X6ufuRxN+wJVMvHwHV/gtnu7sYeJ4T4UCvhk1R4myDlcanriaJvD1W0NGqYZmNCEhBh89+MT+POPjfhp8gqcPF4AxY6rFVBc7MOuzKNoYRMzyMWFg3ItSTEjLy8PAFChgr3YEgC8Xi/y8/MNn7KgWZhifp9fBiEEN9zSgct81RuGv+qUf3/EHerR9ftPYsRb03Fdt/dxY8/huKX3B3j24W+xlB2pzkWWfLHZtG4/tm48AG2J4oRK2JVpO05AExl/E8T+AACKCr2YN3MDfvpuKRb8uQmlNi7LAFC3fmXDTtZQFtR6NGjobLtRXOgNvrsyRNIVrgu7xc6XNsSwEXeE9qCwAaUUnw6biWGv/wo5oBjqr4rvFUh85+qT4S/2oUIZQqtzREVH4JY7O7HCxYoYf191dSvUsTnMUEQgICOPHV6nMyhBtpAADh/IwR/T1gbNt3KVJNx0S0fcfd9l6NS5AX6etDxo+u++WmTrmVZWzJq+Dq8OnqQxKBrsmF2TakCXKjins0OlysEj4MbGRuGFl/ur5YSQ6tmhqKAEP3yzWA1AJh5mKYCrNiBDixptd26XBmqMCUNgmn+UIuCX0axFDXw+9gF079lEmyPRMZG49rpLMOarB1Elzdr2wwdzcHD/SQuDAkA/DFHWP4f352DalBWsfzhzRQ0HDRIupWR1E8ElPXXrVjKulYb2qte8xda1KDo6Ejfc1B6TfhqEJwb2CYs3lcNQI59XKGfhc5GjXEtSRCiKgsGDB6Nr165o0cJZ5D9s2DC89dZbZ1xe7br8hFZ7jp/fI6BYvigTyxZsUyczMd7nO6iXX78eDRtXxc6MI3jxsQnw+QKaS2BhoReZ263xMjgUhSLr0Ck96qwg5rNTxfNrq1kQKGu1KX7+YTm+G7sQPm8AkodAkVUVx6ODr8TVN7UzpF0yf5sjTeTSFF+JD6WlPsTEWAPTRUZFQDYzQJQtfrxNAqNCZQACIa5SOQFvv3+rQw1C49cfV2LWtHX2GgEutoYonSBYsjADva9qVeYy77ynK45l5eHP3zeIfCoAdWfw8BOX47a7uoTM5/ixPIHhZd3ECajDuKSUYtb0dbjt7kvDquualbvhDSEWzz1VhG2bDqLVJXXCyjMYDuzLxsjhM60SpmBzTQQhVvWPeIyyQxZ33R/8ZOQD+7Lx7MPjLGNC3fXzXw6gFAVmeyBDIDhj2wiva5BYPSIdd1Y4AxFsDtWrXwVPD74K3bo2RkmJF60vqYOqNZw3dIcP5Vg3VYBR3cWbB5Vx27vruD5XWf3NW111DiuqVIX3A4DmLWvgznu7Yuync9W5L3ohiVAoTp0sdKw3ADRrWSMkDxkZ6SmT4fu/iTNV2bjqnnKEgQMHYsuWLViyJHjAnldeeQXPPfec9js/Px81a9Y87fJKS/1GImoHQuAv8WPoC5MhSUSNLOshqvqAqKfJXtajMW4b0AVNmlUHpRTvv/YLvF6/LpINtQByNYtEQKlAZYVFxCwm5n9LHEL1T5+8El9/ptusKMwAraTEh5HDZiIqJgJ9WKyN9av3Iv9UMSgx2QDwhZwZDmasP4Dru72PWnUq4bLLm6JKegraX9oAldOSUbtuJWzfdsTYQrYDgwJLmHN9lwgQiaB/ELdmSinyThWBUiClQrzFhkQOKPjpu6U26hZqOG/EAAlB7TrCgSQRPPdyf/S/vi3mzNyIw4dyEBHpwaWXNcSl3RojJcyw894SlXnQCRanWoLKg78LwsYHITh5GrFOLNIMO1CKNct3ISe7EA0ap6NG7bIHBvt96mqrqRdvg9lzzFQHlVgK741SNYKdIR0M8wQE6Nq9Mfpf1zZovT4b/ocWAJFYGCi7i0b4ubs6l/IQqqmnzhUpIQCKCkpRWurD2P/Owdzf1yPAz74hwKW9muKZV69Bcop1vB0/mqczYbzLHeaEtq5QqJ5FRAICsuPaxVVA3C+p/43tMJhJqSYlxUIiTFLMn+cDnF1KSDQGzcw6cgr/zN+GooJSVKtZAd37NEeTZtWwM/MoZBvjYkkiuPLqVkiwORSzXMEkXS3T8xc5LggmZdCgQZg5cyYWL16MGjWCq2Gio6MRHW3VxZcFqg6ZOrrcIqB6vEBR1DgfgOoSF1C0sXPFlS202AQb1+7DETHgUDhcMF80WDkQjOOCLZkUMJ7kylBS7MXXo/4KWuS3o+ajV9+W8HgkzGGBsIwMhmKNhcAWmAN7TmDSHvXEYiIR9O7XCrcO6IJ3X/tFqJxIZKC7HwpiZ75wVaqUhGtvsjIplFLM+W0dpn63FIfYoWzp1VJw092X4tpbO2heVLt3ZuFUTpFVR2WzW+TVIQpFnsNhkaeLxk2qoVHjqpg/axOmTliCzxZk4DMA6dVTcdOALoa62qFKerK+MwXRXzo1httXLzOS4CGQFAUDB3yJmNhIXNqzKa68rq0aUdUG1R28PgDozJxCMeXbf7TLrdrVwfNDb0B6kJNtnbBy6U7j4GX5q+1QAI8pFglDZGwkAiUmiY8oVTLPUYb7H+6Ju+6/LKgR9OGDOdi0bj8AgJiYEY2QBzHObd+5PtYKkksJ6vyjmshEwOnY0Ig2OE5JAAx9bjI2rtlrsEehFFi+aDsO7DmBUd89gtg447qoeTPxMqhpTojqKqFfCZFAA/w8neB143fFKMi9+7bE1k0HdWmS5qDAiyLo068lAFXdOWrYH5g9Yx0kQkAk9RTm0R/9iXuf6IXsEwU4ebJQ8xbkzWjYpCoee0o/b8jFhYtybZNCKcWgQYMwbdo0LFiwAHXr1v3Xyo6OVvk3QmH0AuEEVlaYuJPrX9mH6Ql5kJ7P3p+p5bl35zGj0R8HNf3Wrusn56o7HWbIF+yodAYCWNyZKaUY8uR3ujeSA05mF2DbpoMAVPdhc504QTEsZjagCsWCPzdh9q9r0LZDXUOFiaAKI7LqTUB8ARBfgHk4qN87tK+jeuSY2jH6o1n45J0ZOHxADzCXdTQXoz+chf8OnQ7ukukX28qrqdiE3uf1Yn+XLsiAoijw+wM43VD8Znwzah4+euNX7N+jn1mSdeQURn84Cx++/qtqpO2A+IQY1G1YxRjCX3NTdZCmyRTeYh92bT+KLesP4KtP5+DBGz/Dnp1ZtmW0bFML1Wqkgpi9Q0S7A9MzWzYcwOD7v0ZOGaLT6udcQWeCWBskPr61d0UBv4wYiWDIa9ciPsFIaDXbEcEbRZujFOjVpzkGPNAtJKE/JJ78rbDKUWpUt9jYmERESLj34e54+oV++kXKBS9Us+0gFIAvoHvmhAM+/4OlpxTxcVFYv2qP7fk3ikxxaH82Zv+23nKvCQsiqdZREaSbivZbX9c4s6r+Pl2bHVHy1vuqlqhaLcXWvszjIahQMR792OGonw//A7N/WwvIFIpfgeyVAZmipMCLsR/NxqNP9ML9j/RAtRqpSEiMQf2GaXjmpX74+It7ERvqTLTyAG28nsHnIke5ZlIGDhyIH374AZMmTUJiYiKysrKQlZWFkhLnWBBnC4f2n9QGAIE6kYnMPvwQNMFWwCK2Y79zThRg64YDAFTDLzuCp+0ozPdk1QCP8MMFmdtsOFEGKYCUCsbw9hmbD2H75kOhHwY0/Xppsc8Y30E2MWwhoCgUa5ftRt3alZAUH23bTkLZQXzUevDa7GlrMenrRYb0m9ftx4wpq9R2mvscwF9/bMTKf3YAAGrVqaSGguebY24LE2LhP7jnBG649D1c0/Ed3NrrA3z72TzdgJWhtMSHJfO34c9f12L9yt22zEbG5oOYOmGpta5s0V/4xyYMuutLTJ+0QotTofedgjVLdyL/eCGIad/q5H6paTlEHpgChfkleHXg9/DZHHJJCMHz/7kOHo9kdGPlxMgGiqwgL7cIv04MbnBrh5Zta+tMg81p3hKlKpNa4leNmQMKfEU+vPfiVFX0GzAZk4rxg/iHUnS5rCFeeuP6sOpkIGgak0OYDZA69ySfDMkbAPH6AVkGZAWTpj+Nex7sjvRqKYiNiXSQRkB1MQYgBTgz7gfx+QG/HHwuUQSXvBCCgDfg6H7Ms/jTxpC6Vt3KaNW2dtD1y/zRmBgOjZlxqj9FdHQEOnVtqF2KjYvCiDH3oUGjdACqaobHC6pVpzJGjL4PScmxyDpyCn9OX6saGdtlT4GPXp+G2jUrYsJPAzFtzgsYM/4RXHNDO0RFXxBKgjM7XDDI/LyYUK7f5JgxYwAAPXv2NFwfN24c7r///nNa9tHDp5h6xTFMpMBcBMmIAmP/+yc++/4xdOjaULUtsbM4F9VK3P1ZEQahsDtU9fAkqLqHAOhvUpP8fBoHcKVVSwEA+Hx+Xe3F9AqGMs2Lk2WxUlNP+2G5LlKWTGJzgRm0w/f/+xtXXtcWldOSAQAzf14Nj0dyPItEkgh+/2kVOndvjMSkWCQnxSLnZCGgEH136qgno5oqy8sidBbklWDqhKVYOHszPp3wCCpUSsCv3y/D92MWGjyiKqUlYfCb16P9pfqC/NuPK61lCHp/AmDP9iyMzZyFiV8uxHtj7kWj5tVRUuzFm09PxKY1+0A9RJeKE6IaK1ObYWkWzYtFMmPExX9tQ5/+rS33W7apjU/G3o/xXy7EWnYOjirtguP4VmSKP6evxcPPXClUgWLd8t34/aeV2LvjGGLjotDtyhbof0t7jWm+5qZ2WDB7s5axpbbsHdi9ovxTxSAKVMUWt2Xi7utscBJCULdBGt4cflvY3mDNWtVEckqcbovE+1IBiF9mthjQ6ktkGUkV45HK2rRt40GU5JWqYfMB49w0jW9RA0QA0ICitsXm/WlzjxuZmlXOlEKRlaCnCIM6x4hp37EeNq3ao5ZkVsM6gRvL8vrKVK+/2W4NQIOGaVrEWo7KVZLw2TcPYvvWw9iwdh9AgRZtaqFF65oaA7tkQYa994rAJMk+BW8/PxlDP7kTXXo2CVJpFxcqyjWTcqZi9jNBbFwkWyA4oyLcpGBGYeFxsnt3qWL+ymlJ6NOvFf76c5NuLyKoPahfNp4rI+lKFX3BY3+YUa8TvY2Lj8bVNxqZlK0bD+oGfYA980UpPB4J9dkux8dtANhuSVtgRRsS3g6TCoKClUXUdlD+PDMw1p4LAQJg3swNuOuhHgCAPTuygh6WpigUe9mx7Yf3n0TOsXx9EeUMgt3OU2BQ7PLMPl6AL4b/gYZNq2LC5/MtabKP5eP1gT9g+P/uR+sOqmpy1dId1jIEAiVeLiosxWtPfIfxfzyLz96dgS3MRkLtU7UfKWdQ7OoeQqVBJIL1K3fbMikA0LhZdQz77G6cOlmI3NxifPvZPKxassM2LUdhfinkgAxPhAeUUrz/0k/4Z95WQ5p9uxdg2vfLMHTkAKxctB2zflkDEpBVhus0zDMAGGyhqMzsVwBVXcjGRFR0BG6+reNpicIjIjy4++Hu+GLEbD1/xXjOk7mqBSeLsHJxJjp1b4y//tioPuOV1TaxgH6coQymm1Xbwl6q6dwdynbLlMIq92Zj2VvqD/7+mTTDDEVRMP7zvwTVDQUzogkKwuumaFNbZbQ8krWdsoKc4/YMEiEETVvUQFOHcA+H92ebpDu6xMYwdxQFI9+dgU7dGwW17yqXOFOVjavu+f+L+o2qQrOTM+tn+URxsCOBojBxsPoJlPpVtQmAp1+5Bp27NVLTavp3Cshq5FbNJc28IzFZ3fM4BVoS6LejoiIw4n/3I9kUQr6Ye3GInkXmugOQfTKKCtXD1pKS1DyIQAT0AvXdHGQhAqRBX0rsd2WhVC4CCCE4djhX+x1nE6TMDC6+37/nODNo5jp3VvkyLA6KrGDZwgx8N9o5/gylFF8wOyRKqeo5I5ZjNjgW81coCgpKMG3iciyeu8XoAcbA41JYOjWMplCFhnUSbmrFBNStXwUVKllPwzYjLj4aHkaQv/lkroVB4XUrzC/Biw99g6nj/kFRfglIQAYJOKg6wnktVD1sE14/4PWD+GXwSML+Yh/++59fcEPnt/H2sxOxZe2+sDY8LVrVQCR/MWy8G2yvTCAS0YyJT54oYFI6Ni9lHhUWQfNQM2K2WRS6SlmcT/y+AkNsFQ8h6N67KQpyS0IyqKU2cUf+/HmNGlyOwriAcIhrmdmWhlJDkap0SVHVVwGZ/VVV007xl0LB7zPZzlFq35eU4tTxfMz6JXhsoPII7YTzM/icDs7nUTNlhcukOCAuPhodu6rMBNdxE0VRP6JEAcaJK05mTaIgKxh891gU5BUjKjoCDw3sjR69m+pMhqw4ntGhi1Sto5Eb9VJZEQzbKHpe0UyThIjgJzZrTI8dWD02rNoLAKjbME1tEyPyejqjoaKTYWHQaJ1hnkYMAEkCw9WtT/OghpCSRNDjSjWWDvdg0BZRzuzZBZwKk9DTEIT+wN4TOHLgJEq4PY8JofaqS/7aauWfOFPI629uP78fwtYmKjJ84WlcXGhmMJ2pBf2+AH79YZlz+ZT1m6DS4Dtya1qHwvjcEkLD8yMBtDkqJPf7ZCybn4EXHvgao4fNDMqoBPwBvPLYBMilfhDmShzKOJQqFNs2HEBRYSlq1alknAsBTthZg0K8F0cQgmatVEmDxyNBIgQRTHrUtmM9PPhkb90mRJRqBmTApzJw8MsoKSy1ZP3dmAX6c2JdAUMfa3laGBXBAJpXF9CYK75GVnc4lTsUqvEYL3ZlC+BrwQ+jrdJNF0acz6Nmyopyre453xj40tXYumG/YxwJblMHYfcFGIkQ/35g73F8MnQa/KUBrF6yAzymhRoiGpbn1PwESQUgTFKjztog3SFAgybVbOtbsVKCapzJs5Httk8quCFe+0sbYN70tSxvFs5afN6j2gNQE4EIBZ2Bo1YbFRNkWUEv5pIIAFfdcAmmfrcUBXnFFl28JBHExkWj/80dAAAt2tZGXHw0iou8WrlUk/Qwl27tCARYY27YVj50Sw/uy0aVqsl6vnbSMQNh0Rpg731lJpjmocBVVU42GOzZE0dyQ9adY3fGEQNTYZffySw1ovPCPzexuBcOajTzX55Oe0a4ZmcHY+4viCrQ0Nzl75NXom6jdFx9SwfD9VPZhZg6bjF+n7JS27kTQkBZGIGQZzdRikWzNsHDY6KIdeOhCAjR7WccwI27uaSLF9uybS28N+puHDuahzm/r0fWkVwkJceiV9+WaNGmFk5yVQrzUIccUCUtWv3YePfJePe5SXh1xB2aSiT3VLHxXWiLmXOAMWpgZKAfHOUEAvQNEZ/GCfmninQGBcGZewIgN7sQy+ZvRaceTTTpXrnHv6zuOZ9HzZQVriQlCNKrp2LU94+hvem8mEpVkvDcmzegQeOq+sUgokjIFIpPwbJ527D6H13HL3psmJ8zq01UWw7hIEMHREZFONoctOlYz+ipEwRtO6lnDLW4pBYzP2HPyHp9NNG2QbJkgsMkNNBYhxDiYOV2v6I56jXUJUOJSbH46H/3oxIL9e2JkNRFHkByajw++PI+VKysHqYWHROJW+/vaqkToWo8G+JTxdL6gYPBJ73ZrdsJMbFRyDvFjDANgheq2jkpinZ4HXfr5BKe7KO5pvrCmI8oUaGmdHaqPP5dofD5wj9wze+XhbrZ9IsCyCxU/u7tR4Jn5rQLBtS+Fz1E7HVhtsy/o6TOzNQQ4Odx/xikKSeycjHo9s8xfeJylUER8jJLZTRi6Zf1T0D18Pnsrd8w5avF1vSyUAdbyZ0utXjnkztxxTWtERsXBUkiqF6rIp54vh/e+/weREVHomadSnj4qSvwn2G34ukh12geUhUqJyAi0qNLTwLCuDC4CwNL5mzF3b0/1FS5hpOkuaqKjU3HWaCtQWK/mBht8btC0aVH2U5kzxajLTuBUjZ/1Xq8/fQkDLj8Q/w5dXWZyvzXQc/C5wwQzlEz5xuuJCUEqteqiPc+vwfHj+bi8MEcxMZFoWHTavB4JBTmFmN35lH1cD3z4mPY4RPDHwvsJqJp90mhLjbULxt3y6a4Ky++cyPibaIslhR5sWDGOtPC7WxoJ7Ey9u86brhFAFXlAZWBIDbNNKSl1FY6YSY4+sFreluIRHDldW0waMg1ludr16uC8dOfwYrFO7BxzV5QULRoUxuX9mqCSJNK444HuyE3pwi/TVqhETsqeI+Iu0bKPRcc+qZt5/pYu3SntbFaBioT2Kh5NeTlCDtV7n2ixdcROs70Dr2FXpCoCBCJQBEOddTSKArg8eixLTgBjPToZ8YQ4SlWHgFQ30HKBgDHDp/CX9PXYf+uYzhy4CROZBdqdpzaDprofySJoH4TlVHfk3k0pNTFCZxRoQFWBoU6xvm4EcZsOJIN0a2ZAurEkQiOHMzBzq2H4S31Iz4hBuM+m4vck0XqWVlOjA4hRps0w32n79Rqe6RQUC65o+ydcXd+AqxYkIFnX78efa9ti68/nIU924/im+EzMfvHFbj/2b7o1LMJcnMKsXnVXvj9Mhq1qI4adStDkiT0u6kdfp+8Un3HvNEOXZ5zvADP3T0WY6c/o18UVdSGSMYwjVM+V2WAe+tw6Ysm9RU6RFbvR0VF2lcG6phbtWg7vKUB1GuSjjad62uSnsioCO3YDYPUTHjHrNKGPHNPFmLk0OkoLvLi5hBHIVwsMJ9RF05Q03CPmjnfcJmUMFGlagqqmE7FveK6tvhu9AL4vAF2arLNxAZ0UQmfbTxdEGIoio5VBoVqbpk0wBY7j2kRlxUc3HUcuALwef3Yve0IZFlBnUbp+P3HFSjMZ3ppWVaJnEhU+ILMiGDALyMqKkJdMMTzOVhAOSoxbx2zS6QJhFLNQ0riRFeSVM8JUxdRwY20aauaeO3D2zS3Yzt4Ijzo1K0RatetBEkiSKtRwdblVJIkPPnS1diyajf2ZGQZ5Yd8ceb9HJBBIzxqGpnZHxECBUC/Wzvggad6487eH+knQ5tVOISgY7dGiI2LRmRkBOITYtSdK3MdthA7QZ0n9gf1BUAjPew4BIHgsnFBAwHVaE6sP38fVL9uxo13W88LopRi0ugF+OELo06fAgCzYyKGiyoUheK6OzsDAE6dKFTLDNe7wqzykRXDa1EUGfBQdYzLNuovp3wUY1/y6cZVm8/cNlpvhsf67ixQmB7FzIRw2D5jX1VCKeBXLHWkFJg5cQUO783G+qU7Dc/s25GFoU9MQFKFeBTllxoMn1t3qofn378FT7x0NVYtysTxQzlh7a737zyObev3G+pgaRszMhcZBFHQRX0y4CG6CticExuDVaunIsYmqFppiQ8j35iGv2dt1CS1ikKRVj0FQ0bciSata6Jjt0ZYyLymNDDnApX5DM6wjh85F31vaocEh0jL5QFn6+we89Evb775JoYOHRr02XCPmjnfcJmUM0BSShxefv9mvDPoB3WwSLpkQRt2ZhN4bZdFjbEPggxUkb8BqKomCjBxs2l9+HPqKkBRMP27pRpTEhnlAYmIYARO3Q7xo+gNZcuqt0VicqzmHdO4ZQ09tovm7sm8ZLS4LjSowphQCirL6NqnJfJOFSMuPhrdrmgOiQDffDpX9YxgzYhLiMadD/fAzfdeGtSd0O8L4Kev/saMH5Yjn8W2qJSejJsf6Ibr7u5ieTb3ZCH2bGOHOCrEdnHWrvkDhn4HVJ5lye8b0OOK5njshasw5oNZNg0lSEiMxlOvXQsAiIj04Lo7O2HK14tN0jabRZ3VQcsKjFHxSIhNioXPF0CFSgmIiPDg6P5sjUExL+CIcIrrA1x5wyW2B87NmrLKwqBodQjIqjutQMT5eLjqxkvQ9fKmatVFVZSZ6bVVFSkAkdQHFcDMJEiAeuxEQGY2SyS4JFKQXpqTGedPENgxKmaGL5gUkmr/Wa8R4lhH/t3MoIjIzykyzlVKsXH5bjx2zacYNv4hdO7RCDMmrgjJoHBM+Xox2nauj/XLd1nfj+YF59xnhKWDh2iEUlSlcQ/FB562hqanlOL9Zydh7ZKdmuCMP3viaB6GPPA1Rv08EJdd0RyfvT0DJcXMJtBOHRuEUQn4ZSyavRn9b+vomOa8w2l+nM7zAA4ePIikpCTtcigpyukcNXO+4TIpZ4ADu45hxItT9N0tl5A4SkfUP9rOhMd5EAepEKIagCpW5Z4QfKm1Ec/ySX7yaJ5KcATbBF+pAkRQw2pNhMXOjJQU/aC+uPhoVKySiOyj+cxjgeoicEL000yD7aDZDv+1D2+ziOsv798a+XnFOLz/JBISY1C1ZgUc2HkMG5fvRuWqKahRr7IlOzkg452nfsCaf3YYFsbsrDyMHTYT+3cdw9Nv3WgoaxuL+ksUBRTE0O/mnYwdAQGA4iIvXn90HCTOJHKJC2tj09Y1MeSD21FZOKfkjoe6Y9OqPdi26ZDOVHKKyYgOYc/b9Rn8AQQKSzBj3dvwRHiwY8shPH37aOvui8fa4WNKYBQ8HgnXD+iMhwb3tfalrODHLxdqZZrHLlEoUzESrZx6DdNw492Xovc1rbU+JkRYbM2MNzfqNIiKACiyPic0aYiiXdPUCyC2dTP2V3AmJCSDItY72P2gTIpNJajwJRjhD0aoRGmlQUpLUVJYisG3fAFPTIQ+rsKgebk5hXj5w9vw4DWf6KoTHjvIJEFxBIXK0Hs8xjWFoWnL6uhxVUvLY9s3HsTqxfbxdxSFwu+X8dPXi/H8+7fg5eG3YujTP6hq2tNgUABA8kg4cTQvVCvOLygcJW9hPw8gKSnJwKQ4JqcUTz31FKZNm4a///77Xz1qpqxwDWfLiDWLM/HEdZ+ipIjFwZAVJqUIwqAQqJ4w7BLhk5zrpnmcAR4jgV/jzA/7zY36RCJMCIEW0ZW7C5tjuyhCXQDrDpH9NqtM2nRqYMyXUsMZQsEMX1nl4JGIoz1BUnIcmraqiaP7T+LJaz/FwOtH4tUHvsYjV43AM7eMQsaG/Yb0i2ZtwurFmXByKZ09dTW2rNlnuJadlQdidrNEmMSLgcoKZJ8Mv9evx+go1T/bV+1BaZHR1TMmNgrDv3oAjzzXF8kVEgSmBFp/UZG4K4puSMrGRaA0gGuavYp3nvwOG5bvZvYNArEU1HQkoKhnxPC4IQEF381+Ho8+389WFbZn+1GcPJ4flEhyI2P4ZaRVSsAXU55En2vbaO9zzZIdOMQOldQ7K8SYYONbG7faM0IC/sfOHsSSHw2HNlvraIZNPBDtuqyodQ02x4O5VNucgXRa9QwSVTbgZbHjw+yEtOqpqFazIkb+8LgqdRTqaJHQ8Vgpdow0hR4Xha85bB3bvnY/dmyxHsOxcOaGoJGAFVnB339shCwr6NyzCW554LIyEXJFpkgNI9bP/yecz6NmygqXSSkDCvKK8e7TP0Dh1vtOi4fZfVhLx54LqBFmiV8G4QGpIBBOou5kteBQfAENtlAqpgWW6zzNZYvieei3oKjh3Wf/tBLP3/4FHuw9HNvX7tUWLiKUQYigNgkIq4hZSiPLiATw+ZvT8M6T32H029Oxa6tx8Vo6dwuGPjYeh/YYDXV3bjmMlwaMxbZ1+7RrMyevsB6GJ8DjkTBrijEcfa0GVdTqmOt2uqJW4TkCXafM++WLt3+zPBIVHYmb7+2Kd8fcq15QKPRwnTB5uDjXbdlfWzFp9F+sXHZRCxxoZCB5sDRCqaYOs4O3xBdcOsDrIcsggQByjuZi7s+r1SinAI7sz8bQxybo9TYzJ5RxyWZGgwqMFofZ2JyarivQ2mipn5mwhoCZUQyakHm8sMMJdMmXXT24SsLSttOonKUOJHQ9Kd8FaReCJr/z0Z4AgJTUONRtUEXPw8ygybJ1XAr3eamEqsbARDiQ0uOR8MePKyxlF+QVO24wOAJ+GT42xh4a3BfVaxtjrTi+AzGNRGwlOeUJ2vpxBp/TwZgxY5CXl4eePXuiatWq2mfKlCnnqIVnDlfdUwb8NW2dNoEsBnXiQi8exBUIGJkLs6uoSLA4+DWFGpM5iaVlG0ZBuycDxKPWlxN4LblxYTqYeRQjF2ZoRnFECM+v1w3CIkHUPAOM8HJbFUoBvwyiKPD6ZMyeshKyosAjSfj9+2XofeMlePb9WwEAnw+dBmqzE6QKhQwFY96ZgVHTngYAHNpzwv78I62pCg6amJ2W7esaPAUooEokQB0lPBbwNgVhEjezc2/sUK9pNYHgUqghI4k+TsSXbMeoUApvkZeVbRpHdlBUrjNY8LkadSsLxo92eSiGcRXwBfDJK1Px3adzMPy7RzHzR/WdqnZHAHh4CtEehXe4iZAa+l0kOGaVE2HSJs0oGNAGCqVAIKAybRGekCoAA3iTCasLASRCIMsKatSuiGo1K2DbugMoPFWk1jqYBxAgvFvVUNxunppZCUM+wcoIxUtRyuYE60eP5Fham871UadhGr77dA5+HD1fVRV6PHr0WV6mqHoDn/LUMD6D9bYsK9i+8aDlelr1Cnwhc3w2ISlWM7glhKBB02o4si/bmEisi02/3f5wd6RULOeSlHAY5VDPn07yMynrPMFlUsqA7RsPGHflItiE4TRcXWDUyU7EAwtFghdKOgKBsDqtcnYiavPzlLLFk2VkMRKkiI3y4OSxfPZTzUc7XNDMSPGdu7Yec3WTUL6w0HHPBP53/vR1qJSegmaX1EFudqFj26lCsWvrYezfmYXaDdMRnxijndJsB0KI1aLfvKiKdj/m2CdOTKDT+1YL1Yiot9SvRbo1pyUEmtifytwlFSHfn0Hkz4PfhRHiHgpFjXqVHG+nVExAnUZp2Ls9C5YxYWJQRJzKLsQr930FypksChAP0YPzGdyAeP1Z3oQaCbmIYNIc0dhbqKPhEM5geZjApV9UVlC1TiUEAjKqVE1BQlwU1v2zA4e3H1WN4UU1Lh9HhFjpA5fmUOhnfon14vMJJgZNbKPddUKgWyYHaaesxnohiqLO2QiPMODVZ5pfUgdvj74HMycux4+f/8Vuqf3K26RLKcz141VRgjIHIiJtzgy68qZ2mPK/v+3bzxj25m1rwu+TtdOM7xt8BRb/ucmQ3MA0CYiOicQdj/bAHUxa5OLChqvuKQPycorsRY2mxUib7NzoS7AdsDxjB5PYXLWpcIjpIO6iHECYPYvTLj0iQkJpkVeNGyGWzxdXLv7VxPfsozhII0LtACnw2/gl1h2SA/j5Pb2ubRv8aHpK0bN/G8O1/ezAQS22CBUC0pkZhHCInEK1YF5qIDj9fJO8HCPDlXuyEGPf+Q23tHndoK4gUN8JNUtSxLqItkWAsGsXt73BcWD3iaD3u1zezH5MmMepYNStBGRkZ+WhuLCUU12NSBOF6meLyFwsDcTGRLB8YLVjUoIwXDxfrR6KroIRmSgx7L5p7Nq+UkUB/AGQgIxje44je382ti7fhZXzt8Hv9et1NdVFZNad1A6aPRlXl8hqWQjIVkNpPk+cXqd2nzrahqhl6muL6u4c0MpEQFHVyv4APB4Jk0b9ZVSd+QNGmxlb5pTqjJgWciH4BG9rCoSZl1MEf6kfN9xzqbH+vK5sPq2ctxUDLn0bqxZmAACq1aqEpFTjWWSAdY19cdgtmLz4Fdz5WK/wJaTnE+I4LevnIofLpJQBp47nG1c9vnj4hQWEUlWMLMYDCbZbtmV6YLNzDDY4Q0xKQtRFUhF0mcKiX6VKovGacEgijxcCwHRuD5wnTBgTqLTEh9wcZymKiOTUeADANXd0QnxiDCSb6K+SR0JajVT0NEXdnfDxHL0+AVPcDVFlonBGDM7vixNJMxgxEkNy55zIx+AbR2LGd0vVg9ZsnrMldJwp0K6biC/0S6FQUmR/rANHlz7N9PqL5WuEURHqoteNENUwWH//Qh4WIqygbsN0fDVzsOoVxNtnl5Y3jlLdowxgTIkM4vWBlHpBSr3WMcaZAc7cKQoioEoIYR7zwjhQBJWqgWmg4lhn9fEHDPOGKAqIP6AyTEJ9VCaY22vIOlPsD+iSPJZvVJR+mrMt+BrD2+VAgInpu3jmGCjF1rX7sGphBnKz8/XEvK2cUXBaX4T+UfO1eX9ifal68jsA7N1+FG8+9A3u7DAUj181AtO//hs1a1dEcmoc83qz5lGUX4K3Hh+v2aN9/OMTtnOeM4tPDb0eva9ri9gwDiAtN1DOwucih8uklAHZR0/pOwkuEqdQv3t9OpEzLzh2u1UOp900XwjERUkksuZFPRgURbWN8fmAUi+Il31KvSCKgtjYaGExtKm7uEhyIsOr5fUZ22ArFdB3wOL95ApxWlh7WxAgvWYFNGxZHQBQoUoSPvjuUVROTwHAwuIzb4E6DdPw4YRHLQGk9mUe1etHYDT05X2j1ZHq7sJi/bW0wft5z7bD2vev35+JE0fzdOlUOGNCJOC8viK4EXVYoKhWS1f3+Lx+zJ26Ci/e9jke7Pkehtw1Ggd2ZKHZJbXVPuS7fp5/kLbSgIJErlYzSEWokI9O8G99tCdq1quCvje3F56Bzghx8PDzYoBEmY1dTkRDQRtrFAGvH6WFXtUby+cXYgzZjVFqYv4E1SxnmAIyUOo1fgIBQJGN0lK/X51rPp+1LLE9ARm+Er8qHeTriGU8CPYh4eyiefnih83rE1l5AhNmfo46Mg3afc5oAfpaZLe7l2VEx0Ri15ZDePbmz7B60XZQWVb7SVZwaOdRFJ4scFwrqKJG6Z3E1FLVa1fCV388h6ZtaxmSV66ajP98djeuvq2Tc3+4uGDh2qSUAYpCmSGsJMQWoPquy+cHSEAlLhLbVTvtOjjx599Z0DXNoDIQYNFlPerCJgnSDAJ7vb4dxHK0hgjP+QNo2bEO9myzugxqaXm9ePk8X07QuGhZktS6KVT97uTCyHT2/xs6Tf0e4TAcKfDwS/0NAdrqNkrHN3NewNolO5Cx/gCIRNCmc3207FDXIubNOZ6P7Kw8neAB0A4X5G3gxMjcX+ZgpyEYFAD4Y+JydOjZFPmnirD4jw2MQWFlcH0+t03iZbHbFgZFUfS4J1pdqU7MeJ84qNtiE6JRke1mC/KKMeTO0diTcUSjpVkHTmLjsp2IiYsGIZKWFVUQFiPkLTFJacQxIvyOjY9Gl96qxKZhixq6sS6X1nFw41iAjXFeGWKcQ7y/7Mo014dCYBJZ/4WtCRDem5afqV80g17GGNgxEDb2X1oaQuCJ9KBRq5omezfOEPj1dYRDDLJnRsCBkWNtSOUSUyc4GPVrMEmStAjWmkG+oq0PdZtUxaevTIW32CT1ogqoX4HM1whenmmNojKwdlEGCvNLkJAUi2q1K+HjSU8g50QBsg7lICEpFjXrVb4wVDs2OFsRZy9muExKGZBcIR6lBcxwMxjzQQHj4TYUWmArnkZ0K+a7H07ANIkMBRS2c6GCMRzfYYYC5UxV8ImcuX6/cTEWiWUEN4a0WWwt6Vm9eURdceHRiDF0gg3oDJnHYzDKS0yJwxOvX4eufa1nS3g8Ejr2aIKOPZoEbde0bxdBkQPC+yB6/3qEM0jsCAknaoDKlIaBnZtUj4Yj+7MhixIb3g+BABAZKTAeQnn8u6hSIkQ/QoFCrT8Vxo2dESN7Lzc/2I1lp+CFW0bhALPN4YSQv75STkQ8HmZXzfMLTlyPHcxB9fpVcHj3cbW+ERHqGOZ1l1TidfNDvbV65GYXICYuEiWFXiMjpCjCmUQCA0ah7r7NY5JCJ9ZlgeV926TRjER5Gqf5Jsxhyy0T82tjkCwHZLTqVBd1m1TFrCkrNZsrRXMRN809Ls2QJOvxGCEI11dvTQvO2HHYeSlpZcDYVjuGHsDebUewe/NBaB1oZvJNkhftmghZwR8/LMXtT/bBmkUZ+Prd33BgxzFQSiFJBJd0b4yHXrsedcQDXy8UhPG+Qj5/kcNlUsqAhi1q4Ni+E2w3roUctU9stvInUBkVUddtZ99ChR0GoKtWzAsGIcZnIiLsrf7DwI6NB/QFyDL4OSG0WSzNxEOssyLrTInIXInPi4yKqb8+/P7RM1585v28WlDvEIGYsIp4TESOM3WWNskqcxECkdFqmuiYKD0/gy8n0TwxDFAUlaibbV4UBfBECAytZHrGGGFWV7soaNqmNgDgv89NwoGdWSaJjKnNnBkmRN8Vi+kdbHEuu7Ilpoyaw5hmn/EmI15TP5+LxIRo/DxmAU4cOcXGqSkjTni1/jIRX7sNgWwK20+pQV3liYqEbDc3RSmBKA0wjG2TysuOIIj5mCVihBjvAc6qOkoxa8ISfP7ni6jToDJ+/2E58k8VIa+4FJq7uuSxfw8ioxrEoJzjxJHc8BhuzohIpvEGhC2J+ufPjdAkQraB7gQGRVRpiX0Giokj58Bb7MOPn801PK4oFGv+3o51/+zA2xMeRbvuwTcsLi48uDYpZUDTNkwnGg4XrEkaRJsE0zNiwDb+gZHYWDwt+MfnV+1BuKeJaBMTbh15NbTotnYEjNVblCxo94Lkb2CqTPec3K4Fu5XfJ/wTVt2DodAczExkGrj0KhDQg6L5/PZtIkRluoJA8kia1Kd2ozRUqZ7KyuQfgeBpEgdqvO5oD6TYvxvxHv/Lrvn9MvZnHsWCX9cEV4mY26upMKixDBv8PHquI4/O8/cW+TDmPz/jxOEcVp5NP1qkbuI4s7nGwW0++PgPBLQxJJd67ZvN8+F2IRZbFDGtYvwr5mGaDtaNBLuuKMa/NijIK8aD3d7B6Nd+xsGMw8g7lqerWfmzvG3meoRtoyQ8E+yeKDHh3mtm1VwYKC5kEZgNNlbCC5HFcetcP78NgyJCkRW8++i3qkTwQoKhr8v4ucjhMikhoCgK1izchvcf/xaDr/0v3nn4a0SznXLYg4Qv8KIhIN+laEaoijENh0hADM9T3e1VrAul6kJdXIL7n70Cdz55uTEvBxAee8OcRCSevA5mkbs5vdgvhFgJYIi6iMg9GZ7nTzCkVkkS2qH9ZwQ3evb77TPRpD3OiwMhBBEREq65pysA9fTlAU9foZbHI2XZSU8URSeATkQgWKA+no/gjcXTNGxeHTN/MJ1yKjIzYpt4O8V3zdOL48D0kf1hEC6xznxMOI0BTrB4GWJcIdu8oTKW/N2Z6kdFNQI3HOdMYkDWmWVDnyi6caiWJ6y/OcMrEnQbtRuA4ASeMVmyLyBeNI7bUIyIXR9x1RI3FmaSnMhoB3sWrS42kr4y0EOPBCuDYi5XFhgUO+Jrx5iLayb7XVrsw6IZ60+/kucTLpMSEq66Jwh8Xj/effQbrJ6/DZJHgiIrkDwEy2ZvEk4E5ouSHYUXQKGLSLXF3/QbEBYG9oDBZoUCkvDd7BFkGrDj35uBlLQUm7yhp6dqOZRSJKfGIS+70F4lIMv6oXy8zeKC4zRpPJLQcJv2hkDtRulhpzVDlhX89Pk8FGQX6P2vMLG+YbdLdSYgHAM8WciDPUMIEBMXide/fABVa+khvK+8tSO2rd6DOZOWMbE5tRIxWQluV2FgsABACaueaTUqoEJaMlbO3Wp8N/xkYZ63mUGhwtgSCbcT4Qh2uKQ5LQdnCoikDw+zfZVZVWqXrx1zZYY/AEgmKaE5poqZOQswZoGrWXk/eAQ7Ga0e0CUoooTOIE0JxmQJzJjhuvjdJGkQn+XtF+vC62BrA6IefXH0wClnSSYfAxabFPZfGOMvMioCJw7nBmc4AJ0hMreFwzxf7CRJLN3OzQfR947OIetWbsCE5mf0/EUOV5ISBN++PwNrFqjBhLh3gCILCxsHD3DGoe3YmMujGABNTAMYdgLadXGXZo6JIIiyDfk4LIC5x/OsafguUQv0FEBUBEHzDvXsO0IkXPy7KPo178zNz9oh1O4YAAhw6+O9ne8D8JX6UVrss1ynlOK/z3yP7z+aCS8/9I/v0M11DWV8LNr9cHdOr1eTWBACPPbGDfh+6eto27Wh5fGOzKvFlhDx9yiezSS+L1HCJu7Sw9hBDXznJlBKcSo7nz0D4zsT6yK+Dz7ueDqnna2FeQoThudCjAEuIbK7bifpcUonEl6zUbIIWdYZFAAG2zHDXwf1K+9jMxOstTcIzH1s+G2T1twWMQ8n9SDD0T0noEX+tStXtNUR5zz3HApj/D313i04tj9EoEZzXsGYH5FBMbeftaM0REwgFxceXEmKAwrzijHr+yWwPeuAUpWDFY3UFGHxMgcLC8iC14pNXoBR5Cw+Z94hUqoSSO7GKubhBPPuylw/SuEr9mHZzPVWKYMIM8NkZlxs3WChLoYUNiJ16NIZfk9QYaXVroSsA9mo27S6Jdu5k5dj/LDfcIqF8I9PisV1D/fC3S9cDUmSsGHJDiyctkaou0DgNXfq8BZbx7gxfj8gEbTs1ATX33eZ4+OG+tvtfPl10R3Tyf7ETnJg+h4R6cHT79+KDj2boTCvGAGfELeHZ6mw9vv9RqmQyMhyg1zed2bJw5m4fWrvnOpeYHb3xTpxOEl3zM+LEj/+nF06URJkqQeMUhcF0DyrPKZ0vHN5FgHmUSZJ+o7ZrrpOEgQRoicaJ8qSybBYSyswtEHskOLjolBU5DNJg01wUkHbGe+za1HREXhtzP3IPpwTfvuc1g9zGl4nOygKNi3bETyPcgbXBTk0XCbFAVtW7obfZ7ODEz1vzIuA066PS1bEOCBmgsOv8b9OO15xwbATcYu7DUAgNHyHpRgXFvOEFz1FzPU0lyPuvoMusIrV1VckHISo/WNino7tz8bAy9/HS6PvR88bOwAAfKU+PN7zXRzlofRZ3YryivHjx7Owcs4mjJo3BH9OXKqp6LR05pgyYtwZc7s4zDtVMxRqOaHVjPSaFQzpNQNchVEuM+MXrD/tXG5Z+ladG+DSvi1x+Y3tkJishhAnZgJs3mlzcGaEEz/+XoKNN/MYDqX2MYO3k8f8MecrCaogW8kEdS5T3HGLzL/TWOZtcLrHwaUs3FVaHFdiLCFC9E0Gv09hZf5kUxqnOvCPHYMi/hXzDwZKUXSyQI/BpMU2UhCfFI2ifK++bnHwOcznlLkOioL4pFiMmfsyCnIK8OY9XwavQyiYGSTe5+J9cUxIEk4cPInDe46jer0qZ1b2v4UwpVJBn7/I4TIpDti2Zo/1ojhhZNkaSMtppyYOJHFhchL98+/mBVjMy25wm9UF/JqHnX4snsRsR3zFhY4zSlwCZE4TLoNiXkQtdab2jJ32OMUHT4xD47Z1kFIlCbc2fh6y31m1tGfLQXz77nQc2XPCGOVVsVlYNQIZRv2DYMWcTXj6wzsBqEfMr5izEdvX7oUkSbikZ1O0vqwxpAgJCneD9gnqKb7wi4ttOIyKRJCQmoBGrWuhWfu6uO3J3oiMtE7n+MQY1GteHXu2HjYSVbEPnNorxtYxMwhmlYZTfmaIbSPE6jESGWkch5yBlSTrfHCyYwgmXRHz1oiuycbI/IwsqxsM83zh9TYzgibJghb/RyT44vN8Pppd4c3t4ePYzh5MXC9CMSri5kSMwcRQdNKP6KQ4ePOLrc/yOgSgSkF5yAOW5z3PXgUCisH9PrKuYaHGm/guxQ2f2Kf8t916wvpm5oR/8NhbN9u33cUFB5dJcUDW/mzAHEBJNCYkRBeViwHIOCi17tQNKgfFtMuiuujbDPMkFfMzi3ydJCvi7lS8pqUT8jQzY3aRYMMhpmLdRSM8se3iwu2kwyfA/978BRuXZTozKAKmf70AtRvX0OvP03NiIUpUxEXfXH6YjNip4/koLfZif+ZRvH3fl8jJylUlGJRi6qg5qNmoKnrd2A7zp662eg8pilXCZsegmqFQREVIeO/7x1k2CjYuycTxQzlIrpiAtj2aIjJKzffWx3vjg6e+08ev2c1XlJ7wOpmlJOJ1J8mS+F7EvuQMPWdKKFXbbLBrYs/xdyQ+K44ZkWlyGgeivYmZoPFrovcKbxefx2JZfHxyho3nwW0zeJu008WF/MT8xblvZsrNBDeYNCXYb7tn7PIKw43Yb7btEKU4fFmUFUtcnKL8Ynz+8mT4fab+cqqT+d3wcS/GTeHqwCA2NmJ+cyYuvXCYFK52PZPnL3K4TIoDPB6+6JgmlVmMzBdacfFUmNujRHQWR1wwzTsCjaGxkZyYF1JzfnyBNOjpYRSf+/3qBcLyFHfIFKDqfyrYQWhiHajCDkkUd3B27bFZELlND2ELOz/mnYjMkJlBsHgkACtmb7TkbQtKIZcEsGfTfm13rB0tL0ixCGMMqKKodfN4QP1+NZ3Ho16zY/occPJoLobc/Il6RgyFLsUBcDDzKHKz85FcMR55R09ZF2m2kze/tlDMUXIF9cDFlXM34YuXfsTxQye1e4mp8Xj4zVvQ9+6u6HH9Jdi99RB+HjVHzxcQ+kQBjYrUNSv8fStUiHRr7AvDezXv3M1jQmSE+PvVDDCNbaL8lGBJYuoS4aYogRQJbTAJBPecYvWhlNkAUBjnLZ8XYp7B4g2x9lDZdBJzhMc+fTCYmR3AngkzjwcxvXldcZLMUGodZzZQZNmoLgUE5s/5uU1LMrF15W5jWVwSZpYA+QPQ1lfOJBo2VcHraDk9nOVfwg3lLwSczjhxev4ih+vd44BqdSsbGBKqKGxhNu26OYSdORXFuOY05kEly6CyAgr1MC0LzAuPDah5MTXQQDX0OSVqfagsq1myBY9SBYRb9LEytBoqbJFRFO3QMSqUQ9liQtkzZiNjyk9Q5v3CpAjEkIcxT8edAbseakpSxnRRWWFtpaaFj7WRqowkgbrYUe6tw4gnv09DMAqgFJFREZj1/RKUFnpBWT2J8KGUouBkEeJNhx6K/URZYDdDX4RAn1s7Ye2CrXjzri8MDAoAFJwqwieDJ2DWd/+AEIL7X77GkC/VO0sdd6WlrM0KiEK1dohePlToO4s4nv81j1GRkIhnyoinDoOPU/WkbcrnEB9vJqaOyrI29hwZf7HufJxqeajzwa6HeT1sz5oS6gAIzJzD/bBhqLOij0ObuDda3/gDxrnDnxXzZL/FeUkDgbCNLalYp3DaRSm2rdqNgF9W3xF/34qibVAoXzd8fmu+fOOiKPq74xsrhWr14XMTdmNAUUDDOSrExQUDl0lxgHYgFl8QS72wRFs1QZ2EAPjunC2MVCS84qLP0hOwbAVmSFuEtcytuyptsmpBrKAt6JRSUOG0VwJoCz/PTmVuCJvwfKHUpT+UCV/4AkF9fp1YAaxvqO6ay8vl9ff71X6QZVA/83QQNkhU6E8ajEHh3WOW8lgSUdaXxkVLY354cRRWF2oRPp/qQSUeFicu9JxQymrA9chICfN/WgGqMCLL3yNLw585sue4heExEHx2aq7GQNjVjV1LSo3HFbd3wmfPTwSCjMnPX5wIb4kPnggP4pNjtb7UF36BIPC6mtQBOuEWGBTO4PBEBkmeiXBIEiRTfBF1pNk1jep1Y8TK4mFHqVFCLtbXrLISyuTP8jK0scJvicTRfI8zOdop4LB9NzQgG9M75GXbZpvrWp0YQw9F0ZzlVJWLrOdHqWEzQWV9YwBA2zSEDXGtCEetQAj83gCTzLJ+8unrgnYqtFhnMxRFl5BAWCvMjJy4ebBIYhUUnCoKv53nFaa5crqfskTYu8DgMikOWPr7en3X6PXZLiIcVBEYAkVV84gTDVSceCbJAfgiLExAFmOFyuw58w6JQ+ZMCAwDlgqLFaCK5CmLw6GfFmpd+MX6aPVlRFdbGLw+lVh52eFwVNFUBij16rFXOFNiyBu2xCGUrpmanndOZ5OWCgTFwLwIfRUkT7FMqigqo+bz6xE8fX74S3zqokipuns07/I4cTAt9JzwW8rni7j5vbN0nggJH894DscP5uDYweyg65QiK5g2dj4AoMf17dlFRX+GByszlaEXq+jh+8WYIaKUQnzWpi8ffO06m36mRgIEqo9NlSpqBEc84VbrM3N9WR9Ts3s9TwcIc8La7+Y5o9dLYCrEe6bf/BoB9LztmBFWVwORNr1nKsugXp861zQir5dHef/wZ2XFUB9NEuT3q/OwpBS0uETN7zQih0VEsvOVHA9VtEdiSpyF8TXMSbsxb07DoR1XYJqvTtJNoorIpo3967TqfN5wJgxKOGvXRQCXSXFAYV4xDIGZANtFh7JgaIYdOz93RySIMpeQ8O88E1gHG1+EfT6DyJbqBatMhBYlFTrR5aoZUz05YdLEpOKZGSI4UyQyP2LdKGVES1goxG6Rhf7g9dYWFgBU0a/xB+0Ii6EBCgQSECyh/a7a8C64EaPQnqBZipKhgG16f6kPsnhEgTlfjXgIO0ROSDTRAtX7XRSFC+J+QoDoaA9GTH8W1etVwY71e0N3C6WY/b16/tHV93YzMpwcRNB9KOY6CvY8FFZRvKYGZX3l94MWl4AWFqmfkhLUbpiOPrd0FJqqzgmB9dCva8y6kKcgjTL2MXTJoT+gEnaHXbomRTQd6aCNRT4GRRUJvydIGvXxbEe4qa72FPpYm/dUkAqZytHayt+/ORgb7xseaZrPTeG31jIe/t4OYhscQClFbHwUXhr9gNq9FIA5qJ0DYhOiUZhTpKclpg0b+y4ypIb6CP1BGaOq2T9RWNcr+xZg5rd/h5HOxYUAl0lxQPV6VWAbmE2co2JoertEpgVII3JijApx4Q0EjIsLpYDXB1pSajynBzBGglRMNjOGqlDjbkY8M4UKacQ2BfS8LWonS1ND3BOZPPPuSrwWkIXqsPYEAsxWQ+Do+OIcCiKD6LTroFQlCOI7cvK8CLI4Ugrj7pCXJdoT8B2vYRcEndHjhIXn4fODen2MaPlAS0ogeb1o3qYWAl51d31o93Hj6+ZMhemd5Z0sAAAU5RdBOycllP0Lrz8R7wtEXhy/XPVVUqoTWK0PFLxxyyeY+/0/ghpJZ0ZoQJe4GSQpImSFSag4Y071HChTQ4ptNttqiBDVJ7y//H7j+w0I80WWdeZMZHJFSZxQT8MzbBwTQjSGzsCQcNsMlg+VFaPdDk/HmXQz4ysy27xMM8SxKLbPvDaZ5kedxlVx2TVtcelVrQSmAcHtswDcMfgqozTLnFwrkuXJNm3EVg3KmUGe1qSOcwIhKMwrCZ2uPICrD8/kc5HDZVIc0OPG9qodBQfbyVh2WHYwECITAZSFHZth0dMXT+2+mB9XAfkD+gTmO1xA2O3ZqwiM7Qhy3wy7HVmoZ8Q2iAsWJ8xiGvEv10fz0POCTYeehQOjQqmwGxaYAFMbRZsZw2LKD2EzN43vbkMtBqZ22PazuMiKRN5JksQNDhljEfDL2LBoG17oNxx/fLMQcYnRet7+AKjPr0oyGIPDmWKPxwM5IOPjJ8bZM2TmEPiAcWxrHmY2Y5rX0+e39LUGSdIkbJqkjzIGRVBnWphssRyfX5csUgBUYYI5TqwFuwQueTFUwsZbi5dtHuOsPy2ZmPvOrr0iI8aYIP7+tDI5eHu47YZsZVA0RtvSIFO5CmPezJJfrgYzvzNZsH0TcqTCmPP7/FgyfQ1rC2CUgpq7TJX09bqpk7WvOOwiaLM03EhdM1ZXeP/CeaPB55Q/oKu22DsNwUuVH1DlzD8XOVwmxQGFOabTdw0EFvaLMWNkDIRYVAUBxr8G9z7G/JjTADpRZV4gtLhE3UuKu3RxB8ee1VxvxTzDgUZ8bAxLw2ByqHk3aGH2HCAQNMa1lQAAMERJREFUMl0yBJP+X1+wlUAASkkplOISKMUlgohcsWdO+H1GmLhnj+WwRs7I+FSCTwIB24XZ0CbRYNMMkYBqRBba4nw6kAPq86Oe/x4VqySrefj8RuLEq8XGS8O2tTBv0lJk7Tuu19dkaGghelom0Meh2F5AH7/BPGHE9BAYE0BTFVG/n79k++fM3mvsr7ZRsCN8jEGzrZWNGlAkvtp8M8e0Eec1Zx6YYT31+411ENOKaj0Otqkw2Fg4McJc2qT1M2caTOVo54RBJ/zsNgGMjBVnjsW2Cwxk1v4TuLPBYJ3BYsSQcm9E09yCQtG1f2vM/3Gpsa18/NisgbxMzSi71MvaoFjTU5P3I69/gHvwsfyYRPLCYVLomX8ucrhxUhywbeUu9Ys4oQIBIDIC6q7M5iFFASAJJ8xC3ekRk9TTNPm0y4piCDRGAXv1AyF6bAZCdLMETvQ0GwLhN6Ug5tD0djDUTdh9SpK+oEqSwSOJ1wMAqM8HwtIaSuAME2f2mDGvUHu9D0z1I8IzatUYYRP6hghxOigjeFyd7sQkalIxricX6yDYKFBAbT9gL5aWZT2uinhdaAcRdvmUXaf8uTLA45GwY91e4zuwAwWSU+MxcvB4VfXgEG2WyopxfJjyoIJdkOGN2UmfLHUQ5hCHyTiYctWFrIBE6PFtnFx8KYV6srEsG0P/ixAlKsQ0xlhMDiIwOyQyEpoXCk8rUes74u1h9eMMChGi4lLA5IGkaOsCl9ASojn/q/WA0T1f+woAATXWkTbvzPWhugu3tkbY9Ym4xhBiLylVFOQdy4Mc0PuWBmTAQzUGggKgfJ1jzFVKagLWLtwKS8wnCoOxM6VUlSKxepDISH1jw70B2YnhvO+5TQ8koqm3uVE1b6duZkaNsWtcXNBwJSkOIOZFDVAnCbcrcRKz2RBDc1wCg4RDe4xN0EBAF7uGYSQm2oxwUa3B5gUwiNRZ45zzsxAoYafFYiwYGS7oDJFAfCHs4izgwe78fmNMA7udgdb30O0AAkYGhXsgmWNnUMPz1PhuTB9DmwKChIrDhqHQCJpo2Mn/Cu/FQuTsdt6nCTmgYOeG/YhPiA6ZduHkZVD8pt2pWlH9fZvbqyWhoD6frm7k18V+DZfREvrZ4oXl108Mp16f7m5urhN/RpZVN3F+OYyytWfF+SAyrYGANSKw3w+LPIaPLYHRBWA08DWHwIfKCHLjXpEhJsJfW5jVpnZtM9fPZEBMTe891DuTRVdqXl+zXZzJJuLPbxdCZupoJ+8lTZ3LGEMAGoPC3fe1dyXrkh7RVkdkULQ6Cq7qBIDf678w3JBdm5SQcJkUB7Tt2QxE2ykIBAdQ9f9isDARwuQyEEtTGoNo2Uww7YiXeZExfeeLpjj5DW6w5rKcwJkSQFe5CLtLg4eFuNiJiw//7dAGdSclBHoSF3lhN6Xtsnk/UzYpRcNGs+2O+NfUfu15akpvhl3duVjfvPiaGVAbYm+rKqLUIn0J+YwJEZEelBTaGAjKCjT39EBAX8jMnhasHto1m/ZoY1n0BguTgba0R3z/4nuxI5hB1GCqZ5vRONX2/QhlU0VBlRoVdfUaJ4Bim81qJV6O16fNHa6KtIwtzqgL9wz2HrJqAG2YiyytIT8tvU28pCBqNQODzOvPyzEzTOK8s1tLzC70ZobOAX5fADHx0YY+syBgo/oVGA5DmVRQ0bJ7vD+0eEQ85AEb79QfYOuIIA0vzzCt/WX6XORwmRQHXHnPZYiOjbJ4t2gTj+/8FBtjMrvJbV5ERI8buwEnLjZcZSPcM9hYcKJkhg3BMexW7KDZbDgYBop1AjR1k4XwiG3kdebpuI5eE+WaiIvYNzLTMbO8je6MApNkJhhmA0IxABd/NyY3SMe+40SeGTpq71tUC9gt+iwf445PCc8Oxa5OhtsUTTvWh+yXje0OBKAZONrJF0QCxZ/h9RZD2IsElz9HiBAzxegSb64b98yifr/eDjsDULs2CgbktoaaZsaUG6SLnjKUgnq9oCUlQGkp4PUia9dh03u1Yc742BEJNZfaMOJr6UcxT9FTSIiyqhp1mvrZxFDw79Rntm+h2q7ZdkzY5cc//gBsmQynua0okCTBPoa/hzCYUkKAw7uy9GdNjIqRwTfNR16Gouj9xPteqD9h/a7Zn1AKiyEp20AW2R2Q6OKCg8ukOCC5YiJufvIKS9RRHrZbMzb0+bQ0XAIBQDcu4xMQsC5KfKEz77zFRcW8e+LXRI8Tu8XS7GbICYudGJoRJMXnMzE+wiLOyjbseGRZVQV4WXResZoKcyEWFiAtAJxZEmLeuVE9D4AAihBiW5QSAdbF1InR0/6yD1/4AYu6xiBCFb8zdZwWkZbVQ3u3DkwKlWUofj+oz6d+LEaWRuJjsNWwIST82oGMQ0I99YCBFLqNEv/N+5eIjIidxEfsK7P3j5aZ+pf6/RrjqfUsU+NZduJie6gekdnCZJqJlvndCO2C+L4URqBKS0FLSlTjclmNiMw/FtG4yJxQm2vmtD4e2dmG2AsE1cJwMwkK5QavnEE2tUOTDPC2W9qqE2lLf7HrBpuWgAxKhSjBdnU3MWUeD0HVOpWNrufiewoCQgiy9grG2YAuhRITUsqcrYTjDzSmAxAD+ZnrrI3RgAyVGdfjT3GVD6Vqm4ovBCZFZNTK9DnfDTj3cJkUBxQXlGDqxzONQb80Dj4Agyks32mxw9FoQNb1+DwOgJi5sPgargHq4m8XaE1kNjiY1Z0qgjZPZEXPyyyeDgRUBoLv6gMB9aPZ2wh1Mi3Ylp0tZ3oE3b0WaI5S1b5AUdT+4Bb8ALhnhLZAB4y6bJ3gsoUnEIBSWqpLMmRdIiGGn7clMHwhEGc0I2raAs7aQUXCTKm1vTxLRZcMEaGfbNNqdkKsb9l74btsxaQGMOzwhTwNdZFl7N92WP3OGWfK3z1/hliYH0MeNsRW40E44RLGksgUUBZDiLs9a15swrjWjg4QI7mKRMm8QLOgciqdEsateOQCV7OyOlvax/tYs5oWygCMbud2EpRwYGZ4hLVBm7u8r5n3icks1lFNycc74Uat2nyk2ngxq5QUr1frY6ooUChPqxjK05gFoT+MdaD4ae9nKpNirhvvX7v5pRVBAVl4lxw+v1EVTLXioMXt0TYugorM3Mfidz53TdNa34NQzJ2w2LGu5QZnxKDYvMeLEC6T4oC/pyyHt8SrTQCNeGjiaztiSKEE2GFxoLokgeoLjKjTF6Ua+lpEdUmGDRFRFzL1YECjPYtiFIMCoIrMrtsQUEqNdh+BAPNCEgyG+YJI7BdXxeez5snTcCLBGBVbjwTzMxojJSzOpj5TuDQG0HejXKpky6Dwd2VHrKETZk4MZUUrXwvEJT6jKFB86vs1xw3h0h4D4VRMbRHrFWDhzxUKpdQLhRlVGvqG7855PgIxlTwSIqOEE7AN+av/EUJ0GydKDaomOxUXVRQopaWqqoQKWYnVpwKR5vUVGD0AwiGbqpcGNWdmkV4J75+p+AxqQL9fD+7H1H9UHC88b1viK0DoS1siZ9cvakOE+4rWJio+o7DMDCoeJsEy5KDXS3yeB7YzR2PlaVXjVbWPFVlW31NJqR7PSWEG2WxMAdDPEOPvyDQGtTGtKEhIikZ8UhyqN0gzjjk+bgLO3miUUsTGs7g9dmkMjIhQF+6uRPV3bWDUhb4y9Ju2YAr9R43jbOeGvbZ1dXFhwXVBdsD+7YcREeFRT/RUrBERHcEZFIUCkkDyeXhxj8AX8onIjyinfPIpADEes05lWXf91S6yicrXNEVhREG4z21QwHb85nYIbsFEUtPqi6QCKASKcI0w92aFERXNRZG1V4urEKS7KKXMkZX95rtworpv60TCyFjoBE7YbWntMLbZcWHj7osASKS6cze6fRolDVSi6sIvEfU9BQIgvI81ZgkC4VZAIiMNZfO6G+yBCAAiaWXxftXE/BEe/brDjqlZpwaIiI7EnHELhfKEXmHjiRCBEfN4dLd0wU1UF6MLMW6oLpmgVGEutESXmFFVHSeZvZ4Eg3JCqNpfIvGiUPOi7KBAIQaH4d2x8WqQLLD+1iQmXN2mCP0qdAGrhLHjAjKoRxKGDJsfAAzj3wlso0DYGV3auGLtIMJGQUtP1eFNqeqaTCWitU/x+QDC6kPYOANncBXVHZfqqjrK1aswesOIr573hcIkeESStPFJtTmrqAwOYxjzSkox4qExqNm0BnucyX+EMUgDMkiER+snkaGL8BB4Ij1qLB8nKFztJcxtWVGdFKgCSnm/snWQevSQDrxdhldhYmh4F0gSZK9VtV3uoCgAgvRXWM9f3HAlKQ6IiYs27v4cOPhgICzKpWbCaF68ONhEI4AWP0InBFS1Z/AH2O6WagSRitIDviNUBAkLz5sKTJZBjE9txNzG3Qz1M9dT8TRTXk9LOxzsSux27Ibf/LKit4NfF9Rghv5ihMXQfnNEVAEKpfqBjdpFe3WWWm/u1cJ3qVTdGRMC85kiWp1Yew2uqEK9DMHkqE4AbONwBImhwt+v7A/g7ldu0Ntv024xP028zuvG6qq9IxvvEW2Xzg0SeV8wBsU8GYwqF3P9jf1FhbFKWf/oEhgT4yOqLyhUN3StLmLK4ARNfT8BPVCbsKvXp4xdP4rzghNQ/k6pdeyKYPcNzLcYGVhRADmgrRPa/FUEKRXrA8Xv11WhwlgXv+udRHUpKje6Zd5xij+gql5NNmpzJ/yNSe/9AnBmXRyvVGXYlVKv5k0FhUmMS73IO56Hdr2bO49Dsf9E6Yn2V2iLWGeRadXaCqNht6XPnaPjlisITGaZPxc5XCbFAZfd0EHYEQhcf1DOVZ18VFsYGMFyOvWTg08+di4JX5gUygzpOEFmaUWXUcpPITbsbnk2+iS3TFZR589rzye/efCzBZN6hcXJvIPhBJlSyyKqi3EV0IAf1M8NCbVOBVX46cjW/jUfE091xbOWXu8Tu8lLbBkSXm+reka/rpdNBYmJIAoX8+F1kGXNONbILLLvzPYi6C6IwtiPZgQC2L/1ENJqVVKFEgpnjBStjzR7HoFpBKWaeszwHs0xMCDUT+yfAJMUOoxnaiNBsLeBAWOSAvo9Pm+E5zVmRrADU3fiPAlVmQ6904ydCD0d5IBO+LUUjEhyqUUIAkt9XvCxp4e7NzFKpiy0kc7VvOJYEeqrjX/T3KKsL2xdibXvJgZc7HcAhOpjiVKqu0NbYv8AxfklqmRDqK+hTmxdol4fs73T1Tjt+7a21s1QgKBuM0d6NiQT7M2EsaHNv3AItBL8fbq4MHBBMClffPEF6tSpg5iYGHTq1AmrVq0652XGJkQbFi19dyP8DpUJtxMJ+NUPJ3ywrGMaATTsErirrpglNzxUK2GdsIqQO4XBQyfkhOW3tZ2bebVVd1JU9GyhFIoswxiymjFN4qJGjRIbGgjYSua1djhVVSC4op2IgUmggGbcKKvMnl3QI1WyYhXvGw/CM5ap95HODBmIgeitwg1m+fuxkYwEfSeyohtEim1lXhBxSbE4mHkEstevjxleFvfOcspfUfQ0DgGh9HcqjEkTkywyw4os60TXdDSBzqgIxJf3ITeuhSnYG0+jGUXrIdT1PtKDK9qpCQ3pjI2DwpgfjQkQ8rZ7L9Tv03OnVCekZnWDLTPMn1McCbN2zUyAmaE7ADWSrEnNolVKY0R0FZQ2FwHdDsjnheGoBhsofjOzGR6xzztRABBhfTIz8qIRNYG+ngYC6mWzREpR9JgxPDthcxIK3mJfWOnOG8T3XdbPRY5yz6RMmTIFzz33HN58802sW7cOrVu3Rt++fXH8+PFzWu6M0XP0hRYwrHsGCYXpOeOuURCHKwqoz6vuHEXDPTPEddYpmiA7XM0wSCk12mWIBMKsgrCrs3nSGxZSewIPQF90WDpqfk4jmiY1F6XGIG1i3YIuQOaFD9p7opwZMh8r7/fBcTILzI3zQszaolBr3uZdtOVRNobsVDoik2lTLwPDwc4pEQl03/t6YMn0lTD0CdX+YwRRDl4/E4io/zcTU6qYrnPJH0vvEKuH2r1fhwVWDNZlKFssl6c1x9MRiZihCg79q3m+sTZwjzRBGqQxYGIkWp6fxqja5G34rV7TpqfWDucxeVovjT/D+1pQRxnrQ3U1WTj5mfIOBckjYcea3VD8smrPIpZPRVsU45pqsAEzlw3okkmNCUfY3XNox5HwEp4vuBFnQ6LcMykff/wxHnnkETzwwANo1qwZvvzyS8TFxeHbb789p+Wum78ZSoDttFiALFFkTvj5PYqwyw2Hq5Vl3UtBWLyJRFQmgxOJYLZ7AmHVL1Fwt1qDNAX6V8vBf8JzVJZRIT1FvxGWobAY/Au6+yXPlwbPR7NBEIN3aTcVWFYioe5UJPyGxRBQZB4/QbH0k309FCgaYRKMO60prVcYAeJnzwSzZ7Be1hkqw3NEZW5tF2x2LTI6Alfc0x0n9mc7lCH2VXhi76ZdGlmlZ6xuGoExEz6N2RLHmyDRArQ5YmynmfAJhEubS1abF+3QS7Ec8bcdE+nkWixKfQx1UBkVRfQac2KWFPtxarceGGal7WYB0NU0TLVlHr+OmxvxfSh6Ztrt0ydmkdFGA/CQoBSeCI8qIAlwKTA3KBYZFOIwDij/B0ufAnqAwNMwFlXKuWGptk6dwediR7lmUnw+H9auXYs+ffpo1yRJQp8+fbB8+XLbZ7xeL/Lz8w2fssBAW/muX9CfW/S/FgM+diMIaCAAxe8HoQoefPt2fPL3UEw78U04jxoXawfCYnNRK1Nhf6nfr+1IH3j7Vlx+56UhCtYys1YyzB2XAeIizwmUmJ8Tt0bZHZv3JFbPzrjVPj+2gza916Aw7BCN7tunBUWXcmnEnP1v6Av+IcDrkwfDE+FB084NjfUBZxCFytsQaTPTcEnvFnh/5hA0v7QxJI/JQ4zC2neC5IZSxaGrBMKpeQmptgiKjSpTyFAzILepeOj3aJFEhEdgLWkpN7S1sZMKNa7spHOhdr0Cs+TIVLDrVqNqll5xYshOYy6w+1c92FP7GZZ3IyFo37e1Nv44o0cVdRNARRdkw+sxMbx276IsIEDdlrXPLA8X5x3lmknJzs6GLMtIS0szXE9LS0NWVpbtM8OGDUNycrL2qVmzZpnKvqR3K+NiDei7FLaQcs8IXX8uSFtO49C1bjd1xO0vXofmXRohISUeV9zbPcxaGncjht1JsAVRk37oaSSPhI792mLIhEH4PW88Hh52Jxp2qh+kHTaLpCGeCjVGrQy3LYadtHDdXH/LjtK409WlNNZdsmPZpwUj8aGKorEUhl20Xdh+S1aUGXWqY4t7D4HbeGhSJYomHetjxPw30PmadgCALtd1sNbJrhg5oBqYCl4yXDVGAwG8MfVZxCfF4f1Zr+CqB3shIioidJ78hqgGs2sb+6t7wgjMp+YSbuA2Da7xtmUGG1dmFVpYc9GBCQtShjG9TfsdPNucGW/OWQv1sR2udkxMaLdpY14hxiOAR4YPQIe+rXUGJUQRd75yI/rc3U09v0fMyy6GUbCynRjj00S7Pq0QJUqDyiN4/5T1c4Z9dCGgXDMpZcErr7yCvLw87XPw4MEy5XPtk32d1zZxp8YJPluQJI+ED+b9hyUMvTh6IiTc9uL1hmsvfvskGrWrFz6jA3AaJjwjDGBiCSVlAJEI+j3cG6lpKQCA6Ngo3Pb8tRi97H3UbVnL4Vn7nadt+WG3QVygAC2cuYXA810jX3hDMQI2Jrpi2pDVdBwIxvKEiLJqEab+4OUCDgGvFOO5MQCgKPh8+bsYu+5DTNo3Gp8tfRetujXVHklIiUdy5SRjHhohtLGDUVj+sn5adrX6VRCXEAsAiI2PweDRj2DKoS/Re0A3Y52D9YMjU2zqIz5fNFdjE1EW0wVFaCbC4F59WnAYJ5ZknDEJMv7YteQqSWFVW9sIMYmTIUaImMasZgq5VlCdINqWaax7zSZVERMXg7emvYgH37sDFaumqswKgZEJAZCYGo/HRtyD+4beitj4GNz/9u3O1SCCCs+WYTOtKzaqK/67SacGjsXEJMTgua8ec65HeYHY92X9XOQo10xKpUqV4PF4cOzYMcP1Y8eOIT093faZ6OhoJCUlGT5lQY2GVfHKD8/AEyFBitC7yRMhgRCCJz+9H11v7KBJWyIiPeh1R1fMyJ+ASy5viS9WD0NqegqCMQfRcVF485cX0ahdfcN1Qgi+WDUMr016BqnpqWHWmAoT2nzNRm8OaHXvdnNnPPnpA7a5/m/DR2jcsYFjOwx6a1vG5HQmEedOzAsScM/QW/HkJ/cL0i1hoTMzAk4gvI5WT6Qr7u2Oxh3qo0qtSmjauSHemvYCJu3/Are/dB3S66cFeY32hCkq2oPPlr2D92YOwV2v3mStc5ho1L4eGrVvgLota6FS9Qq2ab7NHGmtk506zgHPffW45VpiagKe+OR+JFZIgN5nwfILcc9mYU2vWwXJFRMheRyYYAfpjIcFuotLikXdVrWDlIszWMTFNodqt5mwGtO3vbw5xm37BO//8QqaX9qYXQ1H8qFufqQICdFxUY5pGl5SB9NzxuHNX18yGj7b1lUs26aPqQIiAR/NfxMAEBEZgdtfvB4T932BHw+MwS/Hvsbv+d/h683/xRtTn8Pw2a9h8uGxuOXZazSJy03PXI0nPrkPsQkxammsuJQqSXh9ynO4/eUb9Lo4MesGJpvq/cH6u12flhix4E08NuIexCXFGh5v3bMZ/rf+Q1SpWSlIX7i4UEBoOXck79SpEzp27IhRo0YBUA2hatWqhUGDBmHIkCEhn8/Pz0dycjLy8vLKxLAczDyMGaPnYO28TaCUom2vFrhu4FWo0zy0GolSiu2rdmH7qt04tPMwcrPykH34JCpWq4C2l7fA5QO6IT4pLmQ+AX8A00b9iXkTFqHgVCFqNKyKqx/pgxaXNcb4N6YgY8UOFOUXI+9EgXoqbhA079YETTo0QO7xPKRUSkLvu7uj4SX1QtahMK8Ynz46FttW7oDHI6FR+/rodUdXdLm2PUqLvXjn9o+xdu7G8HkSG5vIqNgo+EpMLoOEYODIB3DDoH4A1Pc/atDXmPnlvDALUvHWby+hxWVN8cpV72LH6t3a9YrVK+D5rx5Hh6vahsyjpMSLN6//AOv/2hw0Xfur2uCVH55GUoVE7dqf387HmGfHo6SgVG+aJKHXXV3x9+SlUGyidNZpXhOj132AyMjQIuv92w7g4RbPh0wnIiomEq/88Awuu6mTY5qsfccx5Kp3cXjH0dPKOxSueKAHXvpmEA5mHsawASOxc93eoOlj4qNx56s3IjI6CiX5JajRqCouu6kTomKisGjqcox5bhxOHj4VNA/JI0GxsS8JF8TDI6Fa7yVXSUK/h3pj5cy1yD6Ug4iYCLTo0gS3v3IDGps2IYW5RfCV+lGcX4y3bv4I+7YesmYIQJIIOl/XHq9OfAa+Uj+G3f0ZVs9er5UfHRuFB96/Czc/0197ZvfGfXj92uE4cehkmdpYsVoqRiwcihoNq5XpeRElRaVYNWs98rLzUaVmJbTv2xoRkRGglGL04HGYPurPkHnUaV4TtZvXxLZlmSguKEH1hlVx7eNXos893RERqaok/b4Ati3fAW+xF7WaVkd6nSpnVO8zpRmnU0bvxAGIIDYMaJgIUB/mF0w8p3U93yj3TMqUKVNw3333YezYsejYsSM+/fRT/PTTT9i+fbvFVsUO/8aAK084sjsLOVm5qFY/Dd4SH2aMnoOTR3JQo1E19Ly9K2o1qX5Oy1/62yqMf30KThzKRlR0JDr1b4f4lHj4Sryo3awmut3cCZsXZyD7cA5S0pKRVrsyco/nIblSEppd2giHdhzF35OXojC3CFXrpaHP3d2RVDHRUs7Rvccw+5sF2L5qFyKjI9D1ho644t4e2LhoC4bdNQoFuYWIiPCgY7+2eH7cQCQIzCClFL5SHyKiIuDxeCx5h4Osfcfxv5e+w5YlmfCV+FCxWip63n4pbn3hesTERTs+t+mfDBzMOISaTaqjVfdmAFTGa86EhZg8bDpKCkqQXi8Ng0Y9iEaX1HfMxwkzxs7BD29NRUlBKSpUTcWdr9yIy27qhN0b9iFrzzFsXrIdlFK07tkcV9zTI+zjHnas3Y35E5egpKAYrXu1QI/bumDx1OWYPHw6CnOLUSE9GXVa1EJkVARqNamOFX+uw7q5m7TnI6MiUKt5DXS7qRPuGHKjpd8z1+zGzrV7EBkdgXZXtkZedj5WzlyLiKgINLykHtr0ahG0rrIsY+vSTJzKykWlGhVRUlCCMc+Ox6kTeUhMTcDAzx5C+ytbYe3cjfjrh8U4dSwPabUro+8DvRCXGIM1czfCE+FBQnI8co7l4vjBbGxatA352fmIS4zFVQ9ejusH9UN0XBTG/Wcy/hg7F95SP5IrJODpsY+iS//2p/2uOPKy83Fs3wnEJMagVuPg81NRFOTnFCI2PhrRsc7jbM/m/diwYAsoBYryCrBlSSYOZh5FUV4xiATUb1MX9791O1KqJGP9/M1QZAVNOzdEk44NHfM82zi86yjmjFuIY/tPIKlCIrrf1gUEwNG9x5FcOQkN29bVVND/Jv5VJiXhrjNnUgonXdT0rdwzKQDw+eef46OPPkJWVhbatGmDzz77DJ06Oe/+RPx/Y1JcuHDhwkXZ4TIp5QsXxAGDgwYNwqBBg853NVy4cOHChYuzBqoooKTsKsj/D3FSLggmxYULFy5cuLjoYPZcKtPzFzfKtXePCxcuXLhw4eL/L1xJigsXLly4cHE+oFA1VlBZ8f9AkuIyKS5cuHDhwsX5AKUAzsCuxGVSXLhw4cKFCxfnAlShoGcgSbkAnHPPGK5NigsXLly4cOGiXMKVpLhw4cKFCxfnA1TBmal7XBdkFy5cuHDhwsU5gKvuCQ1X3ePChQsXLly4KJe46CUpnNPMz88/zzVx4cKFCxflHZxW/BtSigD1npHKJgD/WaxN+cRFz6QUFBQAAGrWDH1qsQsXLly4cAGotCM5Ofmc5B0VFYX09HQsyZp1xnmlp6cjKqrs5/+Ud1wQBwyeCRRFwZEjR5CYmBj2ia92yM/PR82aNXHw4MGL9iCnfxtun559uH16buD269lHee1TSikKCgpQrVo1SNK5s4goLS2Fz+c743yioqIQExNzFmpUPnHRS1IkSUKNGjXOWn5JSUnlakJdDHD79OzD7dNzA7dfzz7KY5+eKwmKiJiYmIuauThbcA1nXbhw4cKFCxflEi6T4sKFCxcuXLgol3CZlDARHR2NN998E9HR0ee7KhcN3D49+3D79NzA7dezD7dPXYSDi95w1oULFy5cuHBxYcKVpLhw4cKFCxcuyiVcJsWFCxcuXLhwUS7hMikuXLhw4cKFi3IJl0lx4cKFCxcuXJRLuExKGPjiiy9Qp04dxMTEoFOnTli1atX5rlK5weLFi3HttdeiWrVqIIRg+vTphvuUUrzxxhuoWrUqYmNj0adPH+zcudOQJicnBwMGDEBSUhJSUlLw0EMPobCw0JBm06ZN6NatG2JiYlCzZk18+OGH57pp5w3Dhg1Dhw4dkJiYiCpVquCGG25AZmamIU1paSkGDhyIihUrIiEhATfffDOOHTtmSHPgwAH0798fcXFxqFKlCl588UUEAgFDmr///huXXHIJoqOj0aBBA4wfP/5cN++8YMyYMWjVqpUWOKxLly74888/tftuf545hg8fDkIIBg8erF1z+9XFGYO6CIrJkyfTqKgo+u2339KtW7fSRx55hKakpNBjx46d76qVC8yaNYu+9tpr9Ndff6UA6LRp0wz3hw8fTpOTk+n06dPpxo0b6XXXXUfr1q1LS0pKtDRXXXUVbd26NV2xYgX9559/aIMGDeidd96p3c/Ly6NpaWl0wIABdMuWLfTHH3+ksbGxdOzYsf9WM/9V9O3bl44bN45u2bKFbtiwgV599dW0Vq1atLCwUEvz+OOP05o1a9L58+fTNWvW0M6dO9NLL71Uux8IBGiLFi1onz596Pr16+msWbNopUqV6CuvvKKl2bNnD42Li6PPPfcc3bZtGx01ahT1eDx09uzZ/2p7/w3MmDGD/vHHH3THjh00MzOTvvrqqzQyMpJu2bKFUur255li1apVtE6dOrRVq1b0mWee0a67/eriTOEyKSHQsWNHOnDgQO23LMu0WrVqdNiwYeexVuUTZiZFURSanp5OP/roI+1abm4ujY6Opj/++COllNJt27ZRAHT16tVamj///JMSQujhw4cppZSOHj2apqamUq/Xq6V5+eWXaePGjc9xi8oHjh8/TgHQRYsWUUrVPoyMjKRTp07V0mRkZFAAdPny5ZRSlXmUJIlmZWVpacaMGUOTkpK0fnzppZdo8+bNDWXdfvvttG/fvue6SeUCqamp9Ouvv3b78wxRUFBAGzZsSOfNm0d79OihMSluv7o4G3DVPUHg8/mwdu1a9OnTR7smSRL69OmD5cuXn8eaXRjYu3cvsrKyDP2XnJyMTp06af23fPlypKSkoH379lqaPn36QJIkrFy5UkvTvXt3w0mfffv2RWZmJk6dOvUvteb8IS8vDwBQoUIFAMDatWvh9/sN/dqkSRPUqlXL0K8tW7ZEWlqalqZv377Iz8/H1q1btTRiHjzNxT62ZVnG5MmTUVRUhC5durj9eYYYOHAg+vfvb2m7268uzgYu+gMGzwTZ2dmQZdkwgQAgLS0N27dvP0+1unCQlZUFALb9x+9lZWWhSpUqhvsRERGoUKGCIU3dunUtefB7qamp56T+5QGKomDw4MHo2rUrWrRoAUBtc1RUFFJSUgxpzf1q1+/8XrA0+fn5KCkpQWxs7Llo0nnD5s2b0aVLF5SWliIhIQHTpk1Ds2bNsGHDBrc/y4jJkydj3bp1WL16teWeO05dnA24TIoLF+UYAwcOxJYtW7BkyZLzXZULHo0bN8aGDRuQl5eHn3/+Gffddx8WLVp0vqt1weLgwYN45plnMG/ePPc0XxfnDK66JwgqVaoEj8djsUY/duwY0tPTz1OtLhzwPgrWf+np6Th+/LjhfiAQQE5OjiGNXR5iGRcjBg0ahJkzZ2LhwoWoUaOGdj09PR0+nw+5ubmG9OZ+DdVnTmmSkpIuyt1pVFQUGjRogHbt2mHYsGFo3bo1Ro4c6fZnGbF27VocP34cl1xyCSIiIhAREYFFixbhs88+Q0REBNLS0tx+dXHGcJmUIIiKikK7du0wf/587ZqiKJg/fz66dOlyHmt2YaBu3bpIT0839F9+fj5Wrlyp9V+XLl2Qm5uLtWvXamkWLFgARVHQqVMnLc3ixYvh9/u1NPPmzUPjxo0vSlUPpRSDBg3CtGnTsGDBAouqq127doiMjDT0a2ZmJg4cOGDo182bNxsYwHnz5iEpKQnNmjXT0oh58DT/X8a2oijwer1uf5YRvXv3xubNm7Fhwwbt0759ewwYMED77varizPG+bbcLe+YPHkyjY6OpuPHj6fbtm2jjz76KE1JSTFYo/9/RkFBAV2/fj1dv349BUA//vhjun79erp//35KqeqCnJKSQn/77Te6adMmev3119u6ILdt25auXLmSLlmyhDZs2NDggpybm0vT0tLoPffcQ7ds2UInT55M4+LiLloX5CeeeIImJyfTv//+mx49elT7FBcXa2kef/xxWqtWLbpgwQK6Zs0a2qVLF9qlSxftPnftvPLKK+mGDRvo7NmzaeXKlW1dO1988UWakZFBv/jii4vWtXPIkCF00aJFdO/evXTTpk10yJAhlBBC586dSyl1+/NsQfTuodTtVxdnDpdJCQOjRo2itWrVolFRUbRjx450xYoV57tK5QYLFy6kACyf++67j1KquiG//vrrNC0tjUZHR9PevXvTzMxMQx4nT56kd955J01ISKBJSUn0gQceoAUFBYY0GzdupJdddhmNjo6m1atXp8OHD/+3mvivw64/AdBx48ZpaUpKSuiTTz5JU1NTaVxcHL3xxhvp0aNHDfns27eP9uvXj8bGxtJKlSrR559/nvr9fkOahQsX0jZt2tCoqChar149QxkXEx588EFau3ZtGhUVRStXrkx79+6tMSiUuv15tmBmUtx+dXGmIJRSen5kOC5cuHDhwoULF85wbVJcuHDhwoULF+USLpPiwoULFy5cuCiXcJkUFy5cuHDhwkW5hMukuHDhwoULFy7KJVwmxYULFy5cuHBRLuEyKS5cuHDhwoWLcgmXSXHhwoULFy5clEu4TIoLFxcQxo8fbzhVdujQoWjTps1ZLePvv/8GIcRy5ooLFy5c/NtwmRQXLs4hDh48iAcffBDVqlVDVFQUateujWeeeQYnT548K/m/8MILlnNN/g1s3LgR1113HapUqYKYmBjUqVMHt99+u3YGy759+0AIQZUqVVBQUGB4tk2bNhg6dKj2u2fPniCEgBCCmJgYNGvWDKNHj/43m+PChYtyCpdJceHiHGHPnj1o3749du7ciR9//BG7du3Cl19+qR1QmZOT4/isz+cLq4yEhARUrFjxbFU5LJw4cQK9e/dGhQoVMGfOHGRkZGDcuHGoVq0aioqKDGkLCgowYsSIkHk+8sgjOHr0KLZt24bbbrsNAwcOxI8//niumuDChYsLBC6T4sLFOcLAgQMRFRWFuXPnokePHqhVqxb69euHv/76C4cPH8Zrr72mpa1Tpw7eeecd3HvvvUhKSsKjjz4KQFXv1KpVC3FxcbjxxhstEhizuuf+++/HDTfcgBEjRqBq1aqoWLEiBg4caDhB+vvvv0f79u2RmJiI9PR03HXXXYZTaENh6dKlyMvLw9dff422bduibt266NWrFz755BPLic1PPfUUPv7445D5x8XFIT09HfXq1cPQoUPRsGFDzJgxI+w6uXDh4uKEy6S4cHEOkJOTgzlz5uDJJ59EbGys4V56ejoGDBiAKVOmQDw6a8SIEWjdujXWr1+P119/HStXrsRDDz2EQYMGYcOGDejVqxfefffdkGUvXLgQu3fvxsKFCzFhwgSMHz8e48eP1+77/X6888472LhxI6ZPn459+/bh/vvvD7tt6enpCAQCmDZtGkId/XXnnXeiQYMGePvtt8POHwBiY2PDlia5cOHi4oXLpLhwcQ6wc+dOUErRtGlT2/tNmzbFqVOncOLECe3a5Zdfjueffx7169dH/fr1MXLkSFx11VV46aWX0KhRIzz99NPo27dvyLJTU1Px+eefo0mTJrjmmmvQv39/g93Kgw8+iH79+qFevXro3LkzPvvsM/z5558oLCwMq22dO3fGq6++irvuuguVKlVCv3798NFHH+HYsWOWtIQQDB8+HP/73/+we/fukHnLsowffvgBmzZtwuWXXx5WfVy4cHHxwmVSXLg4hzidQ8bbt29v+J2RkYFOnToZrnXp0iVkPs2bN4fH49F+V61a1aBuWbt2La699lrUqlULiYmJ6NGjBwDgwIEDYdf1vffeQ1ZWFr788ks0b94cX375JZo0aYLNmzdb0vbt2xeXXXYZXn/9dcf8Ro8ejYSEBMTGxuKRRx7Bs88+iyeeeCLs+rhw4eLihMukuHBxDtCgQQMQQpCRkWF7PyMjA6mpqahcubJ2LT4+/qyUHRkZafhNCIGiKACAoqIi9O3bF0lJSZg4cSJWr16NadOmAQjfWJejYsWKuPXWWzFixAhkZGSgWrVqjkayw4cPx5QpU7B+/Xrb+wMGDMCGDRuwd+9eFBUV4eOPP4YkucuTCxf/3+GuAi5cnANUrFgRV1xxBUaPHo2SkhLDvaysLEycOBG33347CCGOeTRt2hQrV640XFuxYsUZ1Wv79u04efIkhg8fjm7duqFJkyanZTTrhKioKNSvX9/i3cPRsWNH3HTTTRgyZIjt/eTkZDRo0ADVq1d3mRMXLlxocFcDFy7OET7//HN4vV707dsXixcvxsGDBzF79mxcccUVqF69Ot57772gzz/99NOYPXs2RowYgZ07d+Lzzz/H7Nmzz6hOtWrVQlRUFEaNGoU9e/ZgxowZeOedd04rj5kzZ+Luu+/GzJkzsWPHDmRmZmLEiBGYNWsWrr/+esfn3nvvPSxYsACZmZln1AYXLlz8/4HLpLhwcY7QsGFDrFmzBvXq1cNtt92G+vXr49FHH0WvXr2wfPlyVKhQIejznTt3xldffYWRI0eidevWmDt3Lv7zn/+cUZ0qV66M8ePHY+rUqWjWrBmGDx8eVhwTEc2aNUNcXByef/55tGnTBp07d8ZPP/2Er7/+Gvfcc4/jc40aNcKDDz6I0tLSM2qDCxcu/v+A0NOx7HPhwoULFy5cuPiX4EpSXLhw4cKFCxflEi6T4sKFCxcuXLgol3CZFBcuXLhw4cJFuYTLpLhw4cKFCxcuyiVcJsWFCxcuXLhwUS7hMikuXLhw4cKFi3IJl0lx4cKFCxcuXJRLuEyKCxcuXLhw4aJcwmVSXLhw4cKFCxflEi6T4sKFCxcuXLgol3CZFBcuXLhw4cJFuYTLpLhw4cKFCxcuyiX+DzzStsmLnx6mAAAAAElFTkSuQmCC\n",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "\n",
+ "# mark in Kayle as pipeline step: Pipeline step: manhattan_known: Depends on: saliency_known \n",
+ "model_folder = f\"a-{time}-model\"\n",
+ "p_values_observed_np = np.load('p_values_observed_np.npy',\n",
+ " allow_pickle=True)\n",
+ "\n",
+ "neg_log = -1 * np.log10(p_values_observed_np)\n",
+ "slug = np.arange(p_values_observed_np.shape[0])\n",
+ "\n",
+ "pt = plt.scatter(x = slug, y = neg_log, c=neg_log)\n",
+ "plt.title(f\"Manhattan Plot for {experiment_description} - observed\")\n",
+ "plt.xlabel(\"Ordinal SNP\")\n",
+ "plt.ylabel(\"- log10 observed P values\")\n",
+ "cbar = plt.colorbar(pt)\n",
+ "cbar.set_label(\"- log10 observed P values\")\n",
+ "plt.savefig(f\"{model_folder}-manhattan-observed\")\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 13,
+ "id": "f18fae63-9e34-4d5b-b5d7-76e08e94142c",
+ "metadata": {
+ "tags": [
+ "block:saliency_predicted",
+ "prev:train"
+ ]
+ },
+ "outputs": [
+ {
+ "ename": "FileNotFoundError",
+ "evalue": "[Errno 2] No such file or directory: 'val_snps_for_saliency.npy'",
+ "output_type": "error",
+ "traceback": [
+ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
+ "\u001b[0;31mFileNotFoundError\u001b[0m Traceback (most recent call last)",
+ "Cell \u001b[0;32mIn [13], line 1\u001b[0m\n\u001b[0;32m----> 1\u001b[0m val_snps_s \u001b[38;5;241m=\u001b[39m \u001b[43mnp\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mload\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[38;5;124;43mval_snps_for_saliency.npy\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mallow_pickle\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[38;5;28;43;01mTrue\u001b[39;49;00m\u001b[43m)\u001b[49m\n\u001b[1;32m 2\u001b[0m model_folder \u001b[38;5;241m=\u001b[39m \u001b[38;5;124mf\u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124ma-\u001b[39m\u001b[38;5;132;01m{\u001b[39;00mtime\u001b[38;5;132;01m}\u001b[39;00m\u001b[38;5;124m-model\u001b[39m\u001b[38;5;124m\"\u001b[39m\n\u001b[1;32m 4\u001b[0m \u001b[38;5;28;01mdef\u001b[39;00m \u001b[38;5;21mget_saved_model\u001b[39m(final_activation_scale_factor: \u001b[38;5;28mfloat\u001b[39m, model_folder: \u001b[38;5;28mstr\u001b[39m):\n",
+ "File \u001b[0;32m~/.local/lib/python3.8/site-packages/numpy/lib/npyio.py:405\u001b[0m, in \u001b[0;36mload\u001b[0;34m(file, mmap_mode, allow_pickle, fix_imports, encoding, max_header_size)\u001b[0m\n\u001b[1;32m 403\u001b[0m own_fid \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;01mFalse\u001b[39;00m\n\u001b[1;32m 404\u001b[0m \u001b[38;5;28;01melse\u001b[39;00m:\n\u001b[0;32m--> 405\u001b[0m fid \u001b[38;5;241m=\u001b[39m stack\u001b[38;5;241m.\u001b[39menter_context(\u001b[38;5;28;43mopen\u001b[39;49m\u001b[43m(\u001b[49m\u001b[43mos_fspath\u001b[49m\u001b[43m(\u001b[49m\u001b[43mfile\u001b[49m\u001b[43m)\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[38;5;124;43mrb\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[43m)\u001b[49m)\n\u001b[1;32m 406\u001b[0m own_fid \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;01mTrue\u001b[39;00m\n\u001b[1;32m 408\u001b[0m \u001b[38;5;66;03m# Code to distinguish from NumPy binary files and pickles.\u001b[39;00m\n",
+ "\u001b[0;31mFileNotFoundError\u001b[0m: [Errno 2] No such file or directory: 'val_snps_for_saliency.npy'"
+ ]
+ }
+ ],
+ "source": [
+ "\n",
+ "\n",
+ "val_snps_s = np.load(\"val_snps_for_saliency.npy\", allow_pickle=True)\n",
+ "model_folder = f\"a-{time}-model\"\n",
+ "\n",
+ "def get_saved_model(final_activation_scale_factor: float, model_folder: str):\n",
+ " final_model = tf.keras.models.load_model(model_folder)\n",
+ " for layer in final_model.layers:\n",
+ " layer.trainable=False\n",
+ " return final_model\n",
+ " # print(layer.weights)\n",
+ "\n",
+ "final_model = get_saved_model(final_activation_scale_factor=final_activation_scale_factor, model_folder = model_folder)\n",
+ "val_phenotypes_p = final_model.predict(val_snps_s).flatten()\n",
+ "\n",
+ "p_values_p = []\n",
+ "for i in np.arange(int(val_snps_s.shape[1] / 3)):\n",
+ " column_index_lower_bound = 3 * i\n",
+ " column_index_upper_bound = 3 * i + 3\n",
+ " data = val_snps_s[:,column_index_lower_bound:column_index_upper_bound]\n",
+ " data_reshaped = np.argmax(data, axis=1)\n",
+ " slope, intercept, r_value, p_value, std_err = stats.linregress(data_reshaped, val_phenotypes_p)\n",
+ " p_values_p.append(p_value)\n",
+ "p_values_predicted_np = np.array(p_values_p)\n",
+ "np.save('p_values_predicted_np', \n",
+ " p_values_predicted_np, \n",
+ " allow_pickle=True)\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 37,
+ "id": "043678ab-c0ae-45d7-8fdc-259d3138adba",
+ "metadata": {
+ "tags": [
+ "block:manttan_predcted",
+ "prev:saliency_predicted"
+ ]
+ },
+ "outputs": [
+ {
+ "data": {
+ "image/png": "\n",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "\n",
+ "# mark in Kayle as pipeline step: Pipeline step: manhattan_known: Depends on: saliency_known \n",
+ "model_folder = f\"a-{time}-model\"\n",
+ "p_values_predicted_np = np.load('p_values_predicted_np.npy',\n",
+ " allow_pickle=True)\n",
+ "\n",
+ "neg_log = -1 * np.log10(p_values_predicted_np)\n",
+ "slug = np.arange(p_values_predicted_np.shape[0])\n",
+ "\n",
+ "pt = plt.scatter(x = slug, y = neg_log, c=neg_log)\n",
+ "plt.title(f\"Manhattan Plot for {experiment_description} - predicted\")\n",
+ "plt.xlabel(\"Ordinal SNP\")\n",
+ "plt.ylabel(\"- log10 predicted P values\")\n",
+ "cbar = plt.colorbar(pt)\n",
+ "cbar.set_label(\"- log10 predicted P values\")\n",
+ "plt.savefig(f\"{model_folder}-manhattan-predicted\")\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 38,
+ "id": "bae392ab-a377-49a1-afaf-b5330892d89a",
+ "metadata": {
+ "tags": [
+ "block:qq_plot",
+ "prev:saliency_observed",
+ "prev:saliency_predicted"
+ ]
+ },
+ "outputs": [
+ {
+ "data": {
+ "image/png": "\n",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "\n",
+ "# mark in Kayle as pipeline step: qq plot\n",
+ "\n",
+ "model_folder = f\"a-{time}-model\"\n",
+ "\n",
+ "p_values_observed_np = np.load('p_values_observed_np.npy',\n",
+ " allow_pickle=True)\n",
+ "\n",
+ "neg_log_o = -1 * np.log10(p_values_observed_np)\n",
+ "\n",
+ "\n",
+ "p_values_predicted_np = np.load('p_values_predicted_np.npy',\n",
+ " allow_pickle=True)\n",
+ "\n",
+ "neg_log_p = -1 * np.log10(p_values_predicted_np)\n",
+ "\n",
+ "pt = plt.scatter(x = neg_log_o, y = neg_log_p, c= neg_log_p - neg_log_o)\n",
+ "plt.title(f\"QQ Plot for {experiment_description}\")\n",
+ "plt.xlabel(\"- log10 observed P values\")\n",
+ "plt.ylabel(\"- log10 predicted P values\")\n",
+ "cbar = plt.colorbar(pt)\n",
+ "cbar.set_label(\"- log10 predicted P values - log10 observed\")\n",
+ "plt.savefig(f\"{model_folder}-manhattan-predicted\")\n",
+ "\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "be03ced0-99f2-43fd-b29e-407d0f85212c",
+ "metadata": {
+ "tags": []
+ },
+ "source": [
+ "\n",
+ "## There is a second notebook to run after you run this pipeline. Please find the model folder for the best model that Katib found.\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 39,
+ "id": "4b240cfb-b264-49dc-ae2c-73b10ef8a69d",
+ "metadata": {
+ "tags": [
+ "pipeline-metrics"
+ ]
+ },
+ "outputs": [
+ {
+ "ename": "NameError",
+ "evalue": "name 'val_mean_absolute_error' is not defined",
+ "output_type": "error",
+ "traceback": [
+ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
+ "\u001b[0;31mNameError\u001b[0m Traceback (most recent call last)",
+ "Cell \u001b[0;32mIn [39], line 1\u001b[0m\n\u001b[0;32m----> 1\u001b[0m \u001b[38;5;28mprint\u001b[39m(\u001b[43mval_mean_absolute_error\u001b[49m)\n",
+ "\u001b[0;31mNameError\u001b[0m: name 'val_mean_absolute_error' is not defined"
+ ]
+ }
+ ],
+ "source": [
+ "\n",
+ "print(val_mean_absolute_error)\n"
+ ]
+ }
+ ],
+ "metadata": {
+ "environment": {
+ "kernel": "python3",
+ "name": "tf2-gpu.2-10.m100",
+ "type": "gcloud",
+ "uri": "gcr.io/deeplearning-platform-release/tf2-gpu.2-10:m100"
+ },
+ "kernelspec": {
+ "display_name": "Python 3 (ipykernel)",
+ "language": "python",
+ "name": "python3"
+ },
+ "kubeflow_notebook": {
+ "autosnapshot": true,
+ "deploy_config": {},
+ "docker_image": "",
+ "experiment": {
+ "id": "new",
+ "name": "gwas-7-d"
+ },
+ "experiment_name": "gwas-10-d",
+ "katib_metadata": {
+ "algorithm": {
+ "algorithmName": "bayesianoptimization",
+ "algorithmSettings": [
+ {
+ "name": "random_state",
+ "value": "10"
+ },
+ {
+ "name": "acq_optimizer",
+ "value": "auto"
+ },
+ {
+ "name": "acq_func",
+ "value": "gp_hedge"
+ },
+ {
+ "name": "base_estimator",
+ "value": "GP"
+ }
+ ]
+ },
+ "maxFailedTrialCount": 10,
+ "maxTrialCount": 40,
+ "objective": {
+ "additionalMetricNames": [],
+ "goal": 0.05,
+ "objectiveMetricName": "val-mean-absolute-error",
+ "type": "minimize"
+ },
+ "parallelTrialCount": 2,
+ "parameters": [
+ {
+ "feasibleSpace": {
+ "list": [
+ "IMP_height.txt"
+ ]
+ },
+ "name": "data_file_to_run",
+ "parameterType": "categorical"
+ },
+ {
+ "feasibleSpace": {
+ "list": [
+ "GWAS on soy height"
+ ]
+ },
+ "name": "experiment_description",
+ "parameterType": "categorical"
+ },
+ {
+ "feasibleSpace": {
+ "max": "0.3",
+ "min": "0.00001",
+ "step": "0.00001"
+ },
+ "name": "learning_rate",
+ "parameterType": "double"
+ },
+ {
+ "feasibleSpace": {
+ "max": ".98",
+ "min": "0.0001",
+ "step": "0.0001"
+ },
+ "name": "conv_1_dropout_rate",
+ "parameterType": "double"
+ },
+ {
+ "feasibleSpace": {
+ "max": "0.95",
+ "min": "0.0000001",
+ "step": "0.0000001"
+ },
+ "name": "conv_1_kernel_l1",
+ "parameterType": "double"
+ },
+ {
+ "feasibleSpace": {
+ "max": "0.95",
+ "min": "0.0000001",
+ "step": "0.0000001"
+ },
+ "name": "conv_1_kernel_l2",
+ "parameterType": "double"
+ },
+ {
+ "feasibleSpace": {
+ "max": "0.95",
+ "min": "0.0000001",
+ "step": "0.0000001"
+ },
+ "name": "conv_1_bias_l2",
+ "parameterType": "double"
+ },
+ {
+ "feasibleSpace": {
+ "max": "0.95",
+ "min": "0.0000001",
+ "step": "0.0000001"
+ },
+ "name": "conv_1_activity_l2",
+ "parameterType": "double"
+ },
+ {
+ "feasibleSpace": {
+ "max": "0.95",
+ "min": "0.0000001",
+ "step": "0.0000001"
+ },
+ "name": "conv_x_kernel_l1",
+ "parameterType": "double"
+ },
+ {
+ "feasibleSpace": {
+ "max": "0.95",
+ "min": "0.0000001",
+ "step": "0.0000001"
+ },
+ "name": "conv_x_kernel_l2",
+ "parameterType": "double"
+ },
+ {
+ "feasibleSpace": {
+ "max": "0.95",
+ "min": "0.0000001",
+ "step": "0.0000001"
+ },
+ "name": "conv_x_bias_l2",
+ "parameterType": "double"
+ },
+ {
+ "feasibleSpace": {
+ "max": "0.95",
+ "min": "0.0000001",
+ "step": "0.0000001"
+ },
+ "name": "conv_x_activity_l2",
+ "parameterType": "double"
+ },
+ {
+ "feasibleSpace": {
+ "max": "0.95",
+ "min": "0.0000001",
+ "step": "0.0000001"
+ },
+ "name": "dense_x_kernel_l1",
+ "parameterType": "double"
+ },
+ {
+ "feasibleSpace": {
+ "max": "0.95",
+ "min": "0.0000001",
+ "step": "0.0000001"
+ },
+ "name": "dense_x_kernel_l2",
+ "parameterType": "double"
+ },
+ {
+ "feasibleSpace": {
+ "max": "0.95",
+ "min": "0.0000001",
+ "step": "0.0000001"
+ },
+ "name": "dense_x_bias_l2",
+ "parameterType": "double"
+ },
+ {
+ "feasibleSpace": {
+ "max": "0.95",
+ "min": "0.0000001",
+ "step": "0.0000001"
+ },
+ "name": "dense_x_activity_l2",
+ "parameterType": "double"
+ },
+ {
+ "feasibleSpace": {
+ "max": "0.95",
+ "min": "0.0000001",
+ "step": "0.0000001"
+ },
+ "name": "dense_out_kernel_l1",
+ "parameterType": "double"
+ },
+ {
+ "feasibleSpace": {
+ "max": "0.95",
+ "min": "0.0000001",
+ "step": "0.0000001"
+ },
+ "name": "dense_out_kernel_l2",
+ "parameterType": "double"
+ },
+ {
+ "feasibleSpace": {
+ "max": "0.95",
+ "min": "0.0000001",
+ "step": "0.0000001"
+ },
+ "name": "dense_out_bias_l2",
+ "parameterType": "double"
+ },
+ {
+ "feasibleSpace": {
+ "max": "0.95",
+ "min": ".0000001",
+ "step": "0.0000001"
+ },
+ "name": "dense_out_activity_l2",
+ "parameterType": "double"
+ },
+ {
+ "feasibleSpace": {
+ "list": [
+ "TruncatedNormal",
+ "glorot_uniform",
+ "GlorotNormal",
+ "HeNormal",
+ "random_normal"
+ ]
+ },
+ "name": "conv_initializer",
+ "parameterType": "categorical"
+ },
+ {
+ "feasibleSpace": {
+ "list": [
+ "TruncatedNormal",
+ "glorot_uniform",
+ "GlorotNormal",
+ "HeNormal",
+ "random_normal"
+ ]
+ },
+ "name": "dese_initializer",
+ "parameterType": "categorical"
+ },
+ {
+ "feasibleSpace": {
+ "max": "0.95",
+ "min": "0.01",
+ "step": "0.1"
+ },
+ "name": "dropout_rate",
+ "parameterType": "double"
+ },
+ {
+ "feasibleSpace": {
+ "max": "5",
+ "min": "1",
+ "step": "1"
+ },
+ "name": "num_dense_layers",
+ "parameterType": "int"
+ },
+ {
+ "feasibleSpace": {
+ "max": "20",
+ "min": "1",
+ "step": "1"
+ },
+ "name": "num_dense_units",
+ "parameterType": "int"
+ },
+ {
+ "feasibleSpace": {
+ "list": [
+ "elu",
+ "relu",
+ "gelu",
+ "linear"
+ ]
+ },
+ "name": "conv_activation",
+ "parameterType": "categorical"
+ },
+ {
+ "feasibleSpace": {
+ "list": [
+ "elu",
+ "relu",
+ "gelu",
+ "linear"
+ ]
+ },
+ "name": "activation",
+ "parameterType": "categorical"
+ },
+ {
+ "feasibleSpace": {
+ "list": [
+ "huber_loss",
+ "mean_absolute_error",
+ "mse"
+ ]
+ },
+ "name": "loss",
+ "parameterType": "categorical"
+ },
+ {
+ "feasibleSpace": {
+ "max": "5",
+ "min": "1.2",
+ "step": "0.01"
+ },
+ "name": "final_activation_scale_factor",
+ "parameterType": "double"
+ },
+ {
+ "feasibleSpace": {
+ "max": "40",
+ "min": "1",
+ "step": "1"
+ },
+ "name": "batch_size",
+ "parameterType": "int"
+ },
+ {
+ "feasibleSpace": {
+ "max": "10",
+ "min": "1",
+ "step": "1"
+ },
+ "name": "epochs",
+ "parameterType": "int"
+ },
+ {
+ "feasibleSpace": {
+ "list": [
+ "2023-01-051310"
+ ]
+ },
+ "name": "time",
+ "parameterType": "categorical"
+ }
+ ]
+ },
+ "katib_run": true,
+ "pipeline_description": "gwas-10-d",
+ "pipeline_name": "gwas-10-d",
+ "snapshot_volumes": true,
+ "volumes": [
+ {
+ "annotations": [],
+ "mount_point": "/home/jovyan",
+ "name": "gwas-11-a-workspace-gxcvc",
+ "size": 5,
+ "size_type": "Gi",
+ "snapshot": false,
+ "type": "clone"
+ }
+ ]
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.8.0"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/tutorials/notebooks/DL-gwas-gcp-example/2-mse-run-second-in-jupyter.ipynb b/tutorials/notebooks/DL-gwas-gcp-example/2-mse-run-second-in-jupyter.ipynb
new file mode 100644
index 0000000..8d260ac
--- /dev/null
+++ b/tutorials/notebooks/DL-gwas-gcp-example/2-mse-run-second-in-jupyter.ipynb
@@ -0,0 +1,1057 @@
+{
+ "cells": [
+ {
+ "cell_type": "code",
+ "execution_count": 1,
+ "id": "619de0db-3878-4254-93a4-4404553d0ecf",
+ "metadata": {
+ "tags": [
+ "imports"
+ ]
+ },
+ "outputs": [
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "2023-01-05 21:21:20.751703: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F AVX512_VNNI FMA\n",
+ "To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.\n",
+ "2023-01-05 21:21:20.895255: I tensorflow/core/util/port.cc:104] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.\n",
+ "2023-01-05 21:21:20.899185: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory\n",
+ "2023-01-05 21:21:20.899205: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.\n",
+ "2023-01-05 21:21:21.754779: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory\n",
+ "2023-01-05 21:21:21.754858: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory\n",
+ "2023-01-05 21:21:21.754864: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.\n",
+ "/usr/local/lib/python3.8/dist-packages/scipy/__init__.py:138: UserWarning: A NumPy version >=1.16.5 and <1.23.0 is required for this version of SciPy (detected version 1.24.1)\n",
+ " warnings.warn(f\"A NumPy version >={np_minversion} and <{np_maxversion} is required for this version of \"\n"
+ ]
+ }
+ ],
+ "source": [
+ "\n",
+ "# Don't use Kale on this one.\n",
+ "import pendulum\n",
+ "import numpy as np\n",
+ "import pandas as pd\n",
+ "import warnings\n",
+ "import tensorflow as tf\n",
+ "from tensorflow.keras.models import Model\n",
+ "from tensorflow.keras.layers import Dense, Flatten, Conv1D, Dropout, BatchNormalization, Lambda\n",
+ "from tensorflow.keras.regularizers import l1,l2, L1L2\n",
+ "import matplotlib.pyplot as plt\n",
+ "from scipy import stats\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "365d8767-2dd3-4317-b556-2273f1510e19",
+ "metadata": {
+ "tags": []
+ },
+ "source": [
+ "## Get the best params from Katib from running the first notebook with Kale \n",
+ "![assets/x002-final-results-page.png](assets/x002-final-results-page.png)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 20,
+ "id": "fd604bf6-0d00-4178-9deb-28465c915c29",
+ "metadata": {
+ "tags": [
+ "pipeline-parameters"
+ ]
+ },
+ "outputs": [],
+ "source": [
+ "# Get the best params from Katib from running the first notebook with Kale \n",
+ "data_file_to_run = \"IMP_height.txt\"\n",
+ "\n",
+ "experiment_description = \"Soy Height GWAS\"\n",
+ "\n",
+ "learning_rate = 0.000036 # 0.0001957\n",
+ "conv_1_dropout_rate = 0.60 # Dropout rate for first convolutional layer\n",
+ "conv_1_kernel_l1 = 0.0000007 # L1 and l2 regularization for the first conv1d layer's weights\n",
+ "conv_1_kernel_l2 = 0.00045\n",
+ "conv_1_bias_l2 = 0.000064 # L1 and l2 regularization for the first conv1d layer's bias\n",
+ "conv_1_activity_l2 = 0.004 # L1 and l2 activity regularization for the first conv1d layer\n",
+ "\n",
+ "conv_x_kernel_l1 = 0.7\n",
+ "conv_x_kernel_l2 = 0.14\n",
+ "conv_x_bias_l2 = 0.007\n",
+ "conv_x_activity_l2 = 0.000005\n",
+ "\n",
+ "dense_x_kernel_l1 = 0.018\n",
+ "dense_x_kernel_l2 = 0.0000015\n",
+ "dense_x_bias_l2 = 0.0017\n",
+ "dense_x_activity_l2 = 0.0005\n",
+ "\n",
+ "dense_out_kernel_l1 = 0.0000007\n",
+ "dense_out_kernel_l2 = .0000024\n",
+ "dense_out_bias_l2 = 0.023\n",
+ "dense_out_activity_l2 = 0.024\n",
+ "\n",
+ "conv_initializer = 'glorot_uniform' # 'TruncatedNormal' 'TruncatedNormal' 'glorot_uniform' \"GlorotNormal\", \"HeNormal\" 'random_normal' \n",
+ "dese_initializer = 'HeNormal' # \"GlorotNormal\" # 'TruncatedNormal' # \"GlorotUniform\"\n",
+ "\n",
+ "dropout_rate = 0.055\n",
+ "num_dense_layers = 3\n",
+ "num_dense_units = 13\n",
+ "\n",
+ "conv_activation = \"gelu\" # \"linear\"\n",
+ "activation = \"linear\"\n",
+ "loss = 'mean_squared_error' # 'mean_absolute_error' # # 'huber_loss' \n",
+ "\n",
+ "final_activation_scale_factor = 2.12\n",
+ "\n",
+ "batch_size = 25\n",
+ "epochs = 15\n",
+ "\n",
+ "time = pendulum.now(tz='America/New_York').__str__()[:16].replace('T','').replace(':','').replace('_','-')\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 3,
+ "id": "e06747ae-77b6-4524-b803-fc0d87512fbb",
+ "metadata": {
+ "tags": [
+ "block:preprocessing"
+ ]
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Preprocessing successful\n"
+ ]
+ }
+ ],
+ "source": [
+ "# Don't run in kale this time\n",
+ "\n",
+ "ht = pd.read_csv(data_file_to_run, sep = '\\t')\n",
+ "ht_pd = ht_relevant_cols = ht.drop(columns = ['strain', 'height', 'folds'])\n",
+ "phenotypes_norm = ht_pd.pop(\"norm_phe\")\n",
+ "for col in ht_pd.columns:\n",
+ " ht_pd[col] = ht_pd[col].astype('category')\n",
+ "ohe_height_genotypes = pd.get_dummies(ht_pd)\n",
+ "\n",
+ "def train_test_splitting(row, split_ratio):\n",
+ " string_of_row = \"\".join([str(l) for l in list(row.values)])\n",
+ " return (abs(hash(string_of_row)) % 10) / 10 < split_ratio\n",
+ "belongs_in_train_set_index =\\\n",
+ " np.array([train_test_splitting(ohe_height_genotypes.loc[i],0.7)\n",
+ " for i in np.arange(ht_pd.shape[0])])\n",
+ "\n",
+ "train_ohe_height_genotypes = ohe_height_genotypes[belongs_in_train_set_index]\n",
+ "val_ohe_height_genotypes = ohe_height_genotypes[~belongs_in_train_set_index]\n",
+ "\n",
+ "train_phenotypes_norm = phenotypes_norm[belongs_in_train_set_index]\n",
+ "val_phenotypes_norm = phenotypes_norm[~belongs_in_train_set_index]\n",
+ "\n",
+ "# Make sure the number of rows in test and train add up to the original rows\n",
+ "assert train_ohe_height_genotypes.shape[0] + val_ohe_height_genotypes.shape[0] == ht_pd.shape[0]\n",
+ "\n",
+ "# Data as a numpy array...\n",
+ "train_ohe_height_genotypes_np = train_ohe_height_genotypes.values\n",
+ "val_ohe_height_genotypes_np = val_ohe_height_genotypes.values\n",
+ "\n",
+ "train_phenotypes_norm_np = train_phenotypes_norm.values\n",
+ "val_phenotypes_norm_np = val_phenotypes_norm.values\n",
+ "\n",
+ "# Reshape to fit the conv1D network. \n",
+ "train_np_ohe_reshaped_for_conv_1_d =\\\n",
+ " train_ohe_height_genotypes_np.reshape((train_ohe_height_genotypes_np.shape[0],\n",
+ " train_ohe_height_genotypes_np.shape[1], 1))\n",
+ "val_np_ohe_reshaped_for_conv_1_d =\\\n",
+ " val_ohe_height_genotypes_np.reshape((val_ohe_height_genotypes_np.shape[0],\n",
+ " val_ohe_height_genotypes_np.shape[1],1))\n",
+ "\n",
+ "np.save('train_data_ready', train_np_ohe_reshaped_for_conv_1_d)\n",
+ "np.save('val_data_ready', val_np_ohe_reshaped_for_conv_1_d)\n",
+ "np.save('train_labels_ready', train_phenotypes_norm_np)\n",
+ "np.save('val_labels_ready',val_phenotypes_norm_np)\n",
+ "\n",
+ "# Since the data was reshaped for the convolutional 1D neural network, we are also saving the \n",
+ "# non-reshaped data to be used for calculating saliency.\n",
+ "\n",
+ "np.save(\"val_snps_for_saliency\", val_ohe_height_genotypes_np)\n",
+ "\n",
+ "print(\"Preprocessing successful\")\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 21,
+ "id": "dd01e8b2-78ba-4315-9e12-9153b1c57ce5",
+ "metadata": {
+ "tags": [
+ "block:train",
+ "prev:preprocessing"
+ ]
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Epoch 1/15\n",
+ "145/145 [==============================] - 55s 343ms/step - loss: 184.9438 - mean_absolute_error: 1.1132 - val_loss: 145.5629 - val_mean_absolute_error: 0.7159\n",
+ "Epoch 2/15\n",
+ "145/145 [==============================] - 53s 366ms/step - loss: 134.6771 - mean_absolute_error: 1.0567 - val_loss: 127.3722 - val_mean_absolute_error: 0.7289\n"
+ ]
+ },
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "WARNING:absl:Found untraced functions such as _jit_compiled_convolution_op, _jit_compiled_convolution_op, _jit_compiled_convolution_op, _jit_compiled_convolution_op, _update_step_xla while saving (showing 5 of 5). These functions will not be directly callable after loading.\n"
+ ]
+ },
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "INFO:tensorflow:Assets written to: a-2023-01-051641-model/assets\n"
+ ]
+ },
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "INFO:tensorflow:Assets written to: a-2023-01-051641-model/assets\n"
+ ]
+ },
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "a-2023-01-051641-model\n"
+ ]
+ },
+ {
+ "data": {
+ "image/png": "\n",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "# Don't run in Kale this time\n",
+ "nb_classes = 3\n",
+ "\n",
+ "data_files = ['./train_data_ready.npy',\n",
+ " \"./val_data_ready.npy\",\n",
+ " \"./train_labels_ready.npy\",\n",
+ " \"./val_labels_ready.npy\"]\n",
+ "# artifact_bucket_root_name = artifacts_bucket.split('/')[-1]\n",
+ "# print(artifact_bucket_root_name)\n",
+ "# storage_client = storage.Client()\n",
+ "# bucket = storage_client.get_bucket(artifact_bucket_root_name)\n",
+ "\n",
+ "ht_np_train = np.load('train_data_ready.npy', allow_pickle=True)\n",
+ "ht_np_val = np.load('val_data_ready.npy', allow_pickle=True)\n",
+ "train_labels_np = np.load('train_labels_ready.npy', allow_pickle=True)\n",
+ "val_labels_np = np.load('val_labels_ready.npy', allow_pickle=True)\n",
+ "\n",
+ "train_snps = ht_np_train\n",
+ "train_phenotypes = train_labels_np\n",
+ "val_snps = ht_np_val\n",
+ "val_phenotypes = val_labels_np\n",
+ "\n",
+ "# print(\"min\")\n",
+ "# print(val_phenotypes.min())\n",
+ "# print(\"max\")\n",
+ "# print(val_phenotypes.max())\n",
+ "\n",
+ "inputs =\\\n",
+ " tf.keras.layers.Input(\n",
+ " shape=(train_snps.shape[1], \n",
+ " train_snps.shape[2])) # train_snps.shape[1] ,nb_classes))\n",
+ "\n",
+ "x = Conv1D(10,\n",
+ " nb_classes,\n",
+ " padding='same',\n",
+ " activation = conv_activation,\n",
+ " kernel_initializer = conv_initializer,\n",
+ " kernel_regularizer=tf.keras.regularizers.L1L2(l1=conv_1_kernel_l1, l2=conv_1_kernel_l2),\n",
+ " bias_regularizer=tf.keras.regularizers.L2(conv_1_bias_l2),\n",
+ " activity_regularizer=tf.keras.regularizers.L2(conv_1_activity_l2)\n",
+ " )(inputs)\n",
+ "\n",
+ " # kernel_initializer = conv_initializer ,\n",
+ " # kernel_regularizer=\"l2\", bias_regularizer = \"l2\")\n",
+ "x = Dropout(conv_1_dropout_rate)(x)\n",
+ "\n",
+ "x = Conv1D(10,\n",
+ " 20,\n",
+ " padding='same',\n",
+ " activation = conv_activation,\n",
+ " kernel_initializer = conv_initializer,\n",
+ " kernel_regularizer=tf.keras.regularizers.L1L2(l1=conv_x_kernel_l1, l2=conv_x_kernel_l2),\n",
+ " bias_regularizer=tf.keras.regularizers.L2(conv_x_bias_l2),\n",
+ " activity_regularizer=tf.keras.regularizers.L2(conv_x_activity_l2)\n",
+ " # kernel_initializer = 'TruncatedNormal',\n",
+ " # kernel_regularizer=\"l2\",\n",
+ " # bias_regularizer=\"l2\"\n",
+ " )(x) # Leaving l1 l2 on head layer only to see if this prevents everything from zeroing out.\n",
+ "\n",
+ "x = Dropout(dropout_rate)(x)\n",
+ "\n",
+ "\n",
+ "shortcut = Conv1D(10,\n",
+ " 4,\n",
+ " padding='same',\n",
+ " activation = conv_activation,\n",
+ " kernel_initializer = conv_initializer,\n",
+ " kernel_regularizer=tf.keras.regularizers.L1L2(l1=conv_x_kernel_l1, l2=conv_x_kernel_l2),\n",
+ " bias_regularizer=tf.keras.regularizers.L2(conv_x_bias_l2),\n",
+ " activity_regularizer=tf.keras.regularizers.L2(conv_x_activity_l2))(inputs)\n",
+ "shortcut = Dropout(dropout_rate)(shortcut)\n",
+ "x = tf.keras.layers.Add()([shortcut,x])\n",
+ "\n",
+ "x = Conv1D(10,\n",
+ " 4,\n",
+ " padding='same',\n",
+ " activation = conv_activation,\n",
+ " kernel_initializer = conv_initializer,\n",
+ " kernel_regularizer=tf.keras.regularizers.L1L2(l1=conv_x_kernel_l1, l2=conv_x_kernel_l2),\n",
+ " bias_regularizer=tf.keras.regularizers.L2(conv_x_bias_l2),\n",
+ " activity_regularizer=tf.keras.regularizers.L2(conv_x_activity_l2)\n",
+ " # kernel_initializer = 'TruncatedNormal', \n",
+ " # kernel_regularizer = \"l2\",\n",
+ " # bias_regularizer = \"l2\"\n",
+ " )(x)\n",
+ "\n",
+ "# x = Dropout(dropout_rate)(x)\n",
+ "\n",
+ "x = Flatten()(x)\n",
+ "# x = Dropout(dropout_rate)(x)\n",
+ "x = BatchNormalization()(x)\n",
+ "\n",
+ "if num_dense_layers > 0:\n",
+ " y = x\n",
+ " for i in np.arange(num_dense_layers):\n",
+ " y = Dense(num_dense_units, \n",
+ " activation,\n",
+ " kernel_initializer=dese_initializer,\n",
+ " kernel_regularizer=tf.keras.regularizers.L1L2(l1=dense_x_kernel_l1, l2=dense_x_kernel_l2),\n",
+ " bias_regularizer=tf.keras.regularizers.L2(dense_x_bias_l2),\n",
+ " activity_regularizer=tf.keras.regularizers.L2(dense_x_activity_l2)\n",
+ " )(y)\n",
+ " # y = Dropout(dropout_rate)(y)\n",
+ " y = BatchNormalization()(y)\n",
+ " \n",
+ " x = tf.keras.layers.Concatenate(axis=1)([x,y])\n",
+ " x = BatchNormalization()(x)\n",
+ "\n",
+ "outputs_unscaled = Dense(1,\n",
+ " activation=\"softsign\",\n",
+ " kernel_initializer=dese_initializer,\n",
+ " kernel_regularizer=tf.keras.regularizers.L1L2(l1=dense_out_kernel_l1, l2=dense_out_kernel_l2),\n",
+ " bias_regularizer=tf.keras.regularizers.L2(dense_out_bias_l2),\n",
+ " activity_regularizer=tf.keras.regularizers.L2(dense_out_activity_l2),\n",
+ " # bias_regularizer = \"l2\",\n",
+ " # kernel_initializer = 'TruncatedNormal',\n",
+ " name = 'out')(x) # Should have no activation\n",
+ "# Softsign coerces the output to the range {-1,1}. The labels are norm scaled, \n",
+ "# where the range {-2,2} or {-3,3} encompasses most values. We multiply by a scalar \n",
+ "# and the range will terminate at +/- said scalar. No telling which one is optimal,\n",
+ "# so we'll le the the tuner figure out what: \n",
+ "outputs = Lambda(lambda x: x * final_activation_scale_factor)(outputs_unscaled) \n",
+ "\n",
+ "our_data_model = Model(inputs = inputs, outputs = outputs)\n",
+ "# qa_data_model = Model(inputs = inputs, outputs = outputs)\n",
+ "our_data_model.compile(loss=loss,\n",
+ " optimizer=tf.keras.optimizers.Adam(learning_rate=learning_rate),\n",
+ " metrics=[tf.keras.metrics.MeanAbsoluteError()],\n",
+ " jit_compile=True)\n",
+ "\n",
+ "# We added the early stopping callback and instructed it to restore the best weights. \n",
+ "callbacks = [tf.keras.callbacks.EarlyStopping(restore_best_weights=True)]\n",
+ "\n",
+ "history =\\\n",
+ " our_data_model.fit(x = train_snps,\n",
+ " y = train_phenotypes,\n",
+ " callbacks = callbacks,\n",
+ " batch_size=batch_size,\n",
+ " epochs=epochs,\n",
+ " validation_data=(val_snps, val_phenotypes),\n",
+ " shuffle= True,\n",
+ " use_multiprocessing=True)\n",
+ "# Requirement 6: save and log your artifact. \n",
+ "# I'm adding a random number to the file name as an\n",
+ "# extra layer of safety nets against race conditions\n",
+ "# / file name conflicts\n",
+ "\n",
+ "# tn = str(int(np.random.random() * 10 ** 12))\n",
+ "model_folder = f\"a-{time}-model\"\n",
+ "our_data_model.save(model_folder)\n",
+ "\n",
+ "history_df = pd.DataFrame(history.history)\n",
+ "\n",
+ "history_df[[\"mean_absolute_error\", \"val_mean_absolute_error\"]].plot()\n",
+ "plt.savefig(f'{model_folder}-history.png')\n",
+ "\n",
+ "print(model_folder)\n",
+ "\n",
+ "val_mean_absolute_error = float(history_df['val_mean_absolute_error'].values.min())\n",
+ "\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 5,
+ "id": "74954dbc-9133-40d5-9f23-400a39807724",
+ "metadata": {
+ "tags": [
+ "block:saliency_observed",
+ "prev:preprocessing"
+ ]
+ },
+ "outputs": [],
+ "source": [
+ "\n",
+ "\n",
+ "# Calculate observed p values: \n",
+ "\n",
+ "val_snps_s = np.load(\"val_snps_for_saliency.npy\", allow_pickle=True)\n",
+ "val_phenotypes_k = np.load('val_labels_ready.npy', allow_pickle=True)\n",
+ "\n",
+ "p_values = []\n",
+ "for i in np.arange(int(val_snps_s.shape[1] / 3)):\n",
+ " column_index_lower_bound = 3 * i\n",
+ " column_index_upper_bound = 3 * i + 3\n",
+ " data = val_snps_s[:,column_index_lower_bound:column_index_upper_bound]\n",
+ " data_reshaped = np.argmax(data, axis=1)\n",
+ " slope, intercept, r_value, p_value, std_err = stats.linregress(data_reshaped, val_phenotypes_k)\n",
+ " p_values.append(p_value)\n",
+ "p_values_observed_np = np.array(p_values)\n",
+ "np.save('p_values_observed_np', \n",
+ " p_values_observed_np, \n",
+ " allow_pickle=True)\n",
+ "\n",
+ "\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 6,
+ "id": "464ffbb4-eb9e-438e-ac1e-292a2e45a99f",
+ "metadata": {
+ "tags": [
+ "block:manhattan_observed",
+ "prev:saliency_observed"
+ ]
+ },
+ "outputs": [
+ {
+ "data": {
+ "image/png": "\n",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "\n",
+ "\n",
+ "model_folder = f\"a-{time}-model\"\n",
+ "p_values_observed_np = np.load('p_values_observed_np.npy',\n",
+ " allow_pickle=True)\n",
+ "\n",
+ "neg_log = -1 * np.log10(p_values_observed_np)\n",
+ "slug = np.arange(p_values_observed_np.shape[0])\n",
+ "\n",
+ "pt = plt.scatter(x = slug, y = neg_log, c=neg_log)\n",
+ "plt.title(f\"Manhattan Plot for {experiment_description} - observed\")\n",
+ "plt.xlabel(\"Ordinal SNP\")\n",
+ "plt.ylabel(\"- log10 observed P values\")\n",
+ "cbar = plt.colorbar(pt)\n",
+ "cbar.set_label(\"- log10 observed P values\")\n",
+ "plt.savefig(f\"{model_folder}-manhattan-observed\")\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 22,
+ "id": "f18fae63-9e34-4d5b-b5d7-76e08e94142c",
+ "metadata": {
+ "tags": [
+ "block:saliency_predicted",
+ "prev:train"
+ ]
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "48/48 [==============================] - 2s 33ms/step\n"
+ ]
+ }
+ ],
+ "source": [
+ "\n",
+ "\n",
+ "val_snps_s = np.load(\"val_snps_for_saliency.npy\", allow_pickle=True)\n",
+ "model_folder = f\"a-{time}-model\"\n",
+ "\n",
+ "def get_saved_model(final_activation_scale_factor: float, model_folder: str):\n",
+ " final_model = tf.keras.models.load_model(model_folder)\n",
+ " for layer in final_model.layers:\n",
+ " layer.trainable=False\n",
+ " return final_model\n",
+ " # print(layer.weights)\n",
+ "\n",
+ "final_model = get_saved_model(final_activation_scale_factor=final_activation_scale_factor, model_folder = model_folder)\n",
+ "val_phenotypes_p = final_model.predict(val_snps_s).flatten()\n",
+ "\n",
+ "p_values_p = []\n",
+ "for i in np.arange(int(val_snps_s.shape[1] / 3)):\n",
+ " column_index_lower_bound = 3 * i\n",
+ " column_index_upper_bound = 3 * i + 3\n",
+ " data = val_snps_s[:,column_index_lower_bound:column_index_upper_bound]\n",
+ " data_reshaped = np.argmax(data, axis=1)\n",
+ " slope, intercept, r_value, p_value, std_err = stats.linregress(data_reshaped, val_phenotypes_p)\n",
+ " p_values_p.append(p_value)\n",
+ "p_values_predicted_np = np.array(p_values_p)\n",
+ "np.save('p_values_predicted_np', \n",
+ " p_values_predicted_np, \n",
+ " allow_pickle=True)\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 25,
+ "id": "043678ab-c0ae-45d7-8fdc-259d3138adba",
+ "metadata": {
+ "tags": [
+ "block:manttan_predcted",
+ "prev:saliency_predicted"
+ ]
+ },
+ "outputs": [
+ {
+ "data": {
+ "image/png": "\n",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "\n",
+ "# mark in Kayle as pipeline step: Pipeline step: manhattan_known: Depends on: saliency_known \n",
+ "model_folder = f\"a-{time}-model\"\n",
+ "p_values_predicted_np = np.load('p_values_predicted_np.npy',\n",
+ " allow_pickle=True)\n",
+ "\n",
+ "neg_log = -1 * np.log10(p_values_predicted_np)\n",
+ "slug = np.arange(p_values_predicted_np.shape[0])\n",
+ "\n",
+ "pt = plt.scatter(x = slug, y = neg_log, c=neg_log)\n",
+ "plt.title(f\"Manhattan Plot for {experiment_description} - predicted\")\n",
+ "plt.xlabel(\"Ordinal SNP\")\n",
+ "plt.ylabel(\"- log10 predicted P values\")\n",
+ "cbar = plt.colorbar(pt)\n",
+ "cbar.set_label(\"- log10 predicted P values\")\n",
+ "plt.savefig(f\"{model_folder}-manhattan-predicted\")\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 24,
+ "id": "bae392ab-a377-49a1-afaf-b5330892d89a",
+ "metadata": {
+ "tags": [
+ "block:qq_plot",
+ "prev:saliency_observed",
+ "prev:saliency_predicted"
+ ]
+ },
+ "outputs": [
+ {
+ "data": {
+ "image/png": "\n",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "\n",
+ "# mark in Kayle as pipeline step: qq plot\n",
+ "\n",
+ "model_folder = f\"a-{time}-model\"\n",
+ "\n",
+ "p_values_observed_np = np.load('p_values_observed_np.npy',\n",
+ " allow_pickle=True)\n",
+ "\n",
+ "neg_log_o = -1 * np.log10(p_values_observed_np)\n",
+ "\n",
+ "\n",
+ "p_values_predicted_np = np.load('p_values_predicted_np.npy',\n",
+ " allow_pickle=True)\n",
+ "\n",
+ "neg_log_p = -1 * np.log10(p_values_predicted_np)\n",
+ "\n",
+ "pt = plt.scatter(x = neg_log_o, y = neg_log_p, c= neg_log_p - neg_log_o)\n",
+ "plt.title(f\"QQ Plot for {experiment_description}\")\n",
+ "plt.xlabel(\"- log10 observed P values\")\n",
+ "plt.ylabel(\"- log10 predicted P values\")\n",
+ "cbar = plt.colorbar(pt)\n",
+ "cbar.set_label(\"- log10 predicted P values - log10 observed\")\n",
+ "plt.savefig(f\"{model_folder}-manhattan-predicted\")\n",
+ "\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "be03ced0-99f2-43fd-b29e-407d0f85212c",
+ "metadata": {
+ "tags": []
+ },
+ "source": [
+ "\n",
+ "## There is a second notebook to run after you run this pipeline. Please find the model folder for the best model that Katib found.\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 26,
+ "id": "4b240cfb-b264-49dc-ae2c-73b10ef8a69d",
+ "metadata": {
+ "tags": [
+ "pipeline-metrics"
+ ]
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "0.7159170508384705\n"
+ ]
+ }
+ ],
+ "source": [
+ "\n",
+ "print(val_mean_absolute_error)\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "8159eb55-1b22-4662-885b-587d90e954b5",
+ "metadata": {},
+ "outputs": [],
+ "source": []
+ }
+ ],
+ "metadata": {
+ "environment": {
+ "kernel": "python3",
+ "name": "tf2-gpu.2-10.m100",
+ "type": "gcloud",
+ "uri": "gcr.io/deeplearning-platform-release/tf2-gpu.2-10:m100"
+ },
+ "kernelspec": {
+ "display_name": "Python 3 (ipykernel)",
+ "language": "python",
+ "name": "python3"
+ },
+ "kubeflow_notebook": {
+ "autosnapshot": true,
+ "deploy_config": {},
+ "docker_image": "gcr.io/arrikto/jupyter-kale-py38@sha256:2e1ce3427b780c0c78e7cfec527ee10c391092fdc4a8344cd76f8b83c61c5234",
+ "experiment": {
+ "id": "new",
+ "name": "gwas-7-d"
+ },
+ "experiment_name": "gwas-10-d",
+ "katib_metadata": {
+ "algorithm": {
+ "algorithmName": "bayesianoptimization",
+ "algorithmSettings": [
+ {
+ "name": "random_state",
+ "value": "10"
+ },
+ {
+ "name": "acq_optimizer",
+ "value": "auto"
+ },
+ {
+ "name": "acq_func",
+ "value": "gp_hedge"
+ },
+ {
+ "name": "base_estimator",
+ "value": "GP"
+ }
+ ]
+ },
+ "maxFailedTrialCount": 10,
+ "maxTrialCount": 40,
+ "objective": {
+ "additionalMetricNames": [],
+ "goal": 0.05,
+ "objectiveMetricName": "val-mean-absolute-error",
+ "type": "minimize"
+ },
+ "parallelTrialCount": 2,
+ "parameters": [
+ {
+ "feasibleSpace": {
+ "list": [
+ "IMP_height.txt"
+ ]
+ },
+ "name": "data_file_to_run",
+ "parameterType": "categorical"
+ },
+ {
+ "feasibleSpace": {
+ "list": [
+ "GWAS on soy height"
+ ]
+ },
+ "name": "experiment_description",
+ "parameterType": "categorical"
+ },
+ {
+ "feasibleSpace": {
+ "max": "0.3",
+ "min": "0.00001",
+ "step": "0.00001"
+ },
+ "name": "learning_rate",
+ "parameterType": "double"
+ },
+ {
+ "feasibleSpace": {
+ "max": ".98",
+ "min": "0.0001",
+ "step": "0.0001"
+ },
+ "name": "conv_1_dropout_rate",
+ "parameterType": "double"
+ },
+ {
+ "feasibleSpace": {
+ "max": "0.95",
+ "min": "0.0000001",
+ "step": "0.0000001"
+ },
+ "name": "conv_1_kernel_l1",
+ "parameterType": "double"
+ },
+ {
+ "feasibleSpace": {
+ "max": "0.95",
+ "min": "0.0000001",
+ "step": "0.0000001"
+ },
+ "name": "conv_1_kernel_l2",
+ "parameterType": "double"
+ },
+ {
+ "feasibleSpace": {
+ "max": "0.95",
+ "min": "0.0000001",
+ "step": "0.0000001"
+ },
+ "name": "conv_1_bias_l2",
+ "parameterType": "double"
+ },
+ {
+ "feasibleSpace": {
+ "max": "0.95",
+ "min": "0.0000001",
+ "step": "0.0000001"
+ },
+ "name": "conv_1_activity_l2",
+ "parameterType": "double"
+ },
+ {
+ "feasibleSpace": {
+ "max": "0.95",
+ "min": "0.0000001",
+ "step": "0.0000001"
+ },
+ "name": "conv_x_kernel_l1",
+ "parameterType": "double"
+ },
+ {
+ "feasibleSpace": {
+ "max": "0.95",
+ "min": "0.0000001",
+ "step": "0.0000001"
+ },
+ "name": "conv_x_kernel_l2",
+ "parameterType": "double"
+ },
+ {
+ "feasibleSpace": {
+ "max": "0.95",
+ "min": "0.0000001",
+ "step": "0.0000001"
+ },
+ "name": "conv_x_bias_l2",
+ "parameterType": "double"
+ },
+ {
+ "feasibleSpace": {
+ "max": "0.95",
+ "min": "0.0000001",
+ "step": "0.0000001"
+ },
+ "name": "conv_x_activity_l2",
+ "parameterType": "double"
+ },
+ {
+ "feasibleSpace": {
+ "max": "0.95",
+ "min": "0.0000001",
+ "step": "0.0000001"
+ },
+ "name": "dense_x_kernel_l1",
+ "parameterType": "double"
+ },
+ {
+ "feasibleSpace": {
+ "max": "0.95",
+ "min": "0.0000001",
+ "step": "0.0000001"
+ },
+ "name": "dense_x_kernel_l2",
+ "parameterType": "double"
+ },
+ {
+ "feasibleSpace": {
+ "max": "0.95",
+ "min": "0.0000001",
+ "step": "0.0000001"
+ },
+ "name": "dense_x_bias_l2",
+ "parameterType": "double"
+ },
+ {
+ "feasibleSpace": {
+ "max": "0.95",
+ "min": "0.0000001",
+ "step": "0.0000001"
+ },
+ "name": "dense_x_activity_l2",
+ "parameterType": "double"
+ },
+ {
+ "feasibleSpace": {
+ "max": "0.95",
+ "min": "0.0000001",
+ "step": "0.0000001"
+ },
+ "name": "dense_out_kernel_l1",
+ "parameterType": "double"
+ },
+ {
+ "feasibleSpace": {
+ "max": "0.95",
+ "min": "0.0000001",
+ "step": "0.0000001"
+ },
+ "name": "dense_out_kernel_l2",
+ "parameterType": "double"
+ },
+ {
+ "feasibleSpace": {
+ "max": "0.95",
+ "min": "0.0000001",
+ "step": "0.0000001"
+ },
+ "name": "dense_out_bias_l2",
+ "parameterType": "double"
+ },
+ {
+ "feasibleSpace": {
+ "max": "0.95",
+ "min": ".0000001",
+ "step": "0.0000001"
+ },
+ "name": "dense_out_activity_l2",
+ "parameterType": "double"
+ },
+ {
+ "feasibleSpace": {
+ "list": [
+ "TruncatedNormal",
+ "glorot_uniform",
+ "GlorotNormal",
+ "HeNormal",
+ "random_normal"
+ ]
+ },
+ "name": "conv_initializer",
+ "parameterType": "categorical"
+ },
+ {
+ "feasibleSpace": {
+ "list": [
+ "TruncatedNormal",
+ "glorot_uniform",
+ "GlorotNormal",
+ "HeNormal",
+ "random_normal"
+ ]
+ },
+ "name": "dese_initializer",
+ "parameterType": "categorical"
+ },
+ {
+ "feasibleSpace": {
+ "max": "0.95",
+ "min": "0.01",
+ "step": "0.1"
+ },
+ "name": "dropout_rate",
+ "parameterType": "double"
+ },
+ {
+ "feasibleSpace": {
+ "max": "5",
+ "min": "1",
+ "step": "1"
+ },
+ "name": "num_dense_layers",
+ "parameterType": "int"
+ },
+ {
+ "feasibleSpace": {
+ "max": "20",
+ "min": "1",
+ "step": "1"
+ },
+ "name": "num_dense_units",
+ "parameterType": "int"
+ },
+ {
+ "feasibleSpace": {
+ "list": [
+ "elu",
+ "relu",
+ "gelu",
+ "linear"
+ ]
+ },
+ "name": "conv_activation",
+ "parameterType": "categorical"
+ },
+ {
+ "feasibleSpace": {
+ "list": [
+ "elu",
+ "relu",
+ "gelu",
+ "linear"
+ ]
+ },
+ "name": "activation",
+ "parameterType": "categorical"
+ },
+ {
+ "feasibleSpace": {
+ "list": [
+ "huber_loss",
+ "mean_absolute_error",
+ "mse"
+ ]
+ },
+ "name": "loss",
+ "parameterType": "categorical"
+ },
+ {
+ "feasibleSpace": {
+ "max": "5",
+ "min": "1.2",
+ "step": "0.01"
+ },
+ "name": "final_activation_scale_factor",
+ "parameterType": "double"
+ },
+ {
+ "feasibleSpace": {
+ "max": "40",
+ "min": "1",
+ "step": "1"
+ },
+ "name": "batch_size",
+ "parameterType": "int"
+ },
+ {
+ "feasibleSpace": {
+ "max": "10",
+ "min": "1",
+ "step": "1"
+ },
+ "name": "epochs",
+ "parameterType": "int"
+ },
+ {
+ "feasibleSpace": {
+ "list": [
+ "2023-01-051310"
+ ]
+ },
+ "name": "time",
+ "parameterType": "categorical"
+ }
+ ]
+ },
+ "katib_run": true,
+ "pipeline_description": "gwas-10-d",
+ "pipeline_name": "gwas-10-d",
+ "snapshot_volumes": true,
+ "volumes": [
+ {
+ "annotations": [],
+ "mount_point": "/home/jovyan",
+ "name": "gwas-11-a-workspace-gxcvc",
+ "size": 5,
+ "size_type": "Gi",
+ "snapshot": false,
+ "type": "clone"
+ }
+ ]
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.8.0"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/tutorials/notebooks/DL-gwas-gcp-example/README.md b/tutorials/notebooks/DL-gwas-gcp-example/README.md
new file mode 100644
index 0000000..ed3a8ce
--- /dev/null
+++ b/tutorials/notebooks/DL-gwas-gcp-example/README.md
@@ -0,0 +1,210 @@
+# Run a deep learning GWAS on soy height on Kubeflow Pipelines with Katib's Bayesian hyperparameter tuning. (Using minikf and the Kale Jupyter plugin)
+
+- David Thrower, NIEHS Office of Scientific Computing, Kelly Goverment Solutions
+
+## Acknowledgements:
+
+1. God for writing the laws of science we can all spend a lifetime trying to decipher and never fully understand.
+2. Jennifer, my better half and my stepkids staying around thought the storm we have all weathered because of my career in science.
+2. Yang Liu at University of Missouri and his colleagues, who developed the study which we are "partially" reproducing as an abbreviated version of the study in order to make a minimum viable demonstration. Below is the original paper and the original author's source code. Those who are subject matter experts, I encourage you to extend and complete this, and to adapt this to your own work. That is what this is here for.
+ - https://www.frontiersin.org/articles/10.3389/fgene.2019.01091/full
+ - https://github.com/kateyliu/DL_gwas
+3. NIEHS Office of Scientific Computing
+ - Frank Day
+ - Kennith Grantham
+ - Greg Stamper
+ - Chris Stone
+ - Elizabeth Black
+ - Adam Burkholder
+ - Jason Baucom
+4. The NIH CIT GCP support community
+ - Danny Chester
+ - Thad Carlson
+ - Kyle O'Connell
+ - Hil Liao
+ - Teja Peddi
+5. GCP support community.
+ - Jeremiah Jenkins
+ - Cheryl Corman
+ - Alex, GCP Mexico City support team
+ - Diego, GCP Mexico City support team
+6. The KF open source community:
+ - Too many to mention
+ - https://www.kubeflow.org/docs/about/community
+7. Arrikto for the easy - to - deploy Kubeflow deployment and the low licensing fee for using this.
+ - Bogdan Kowalczyk
+ - Patrick Gryzan
+ - Constantinos Venetsanopoulos
+ - Amber Graner
+ - Chase Christensen
+ - Jimmy Guerrero
+
+## Summary:
+
+1. This task will deploy a deep learning (1D Convolutional Neural Network - MLP) GWAS experiment using the Kubeflow pipelines machine learning framework. For a tutorial on exactly how Convolutional 1D neural networks work, this is one of many recommended reads [0]. This specific algorithm is not the focus of this tutorial, as our focus is how to use Kubeflow to run any machine learning experiment in your research. My intent is to introduce you to the basic operation of the features that Kubeflow provides that are most relevant in a basic research analysis workflow and empower you to make meaningful use of them, these are tools that when used correctly, are user friendly and are in my humble opinion, underutilized by the life sciences community. This framework is used by CERN in some of their research [1], and CERN is also part of the KF open source community. This experiment will leave you with a template and the skills to build your own workflow relevant to your research.
+2. This experiment should cost you $6 - $12 to run (given pricing at the time of this writing, 2022-12-22), assuming that:
+ 1. You run the experiment on the same data (or similar sized data) without modification
+ 2. You archive your experiment data and you delete the minikf deployment after the experiment has finished running. You can always spin up another deployment of minikf just as quickly for future experiments.
+3. Kubeflow is an open - source resource which runs machine learning experiments on Kubernetes, a distributed container orchestration cluster [1] that can run things reproducibly on any current server or cloud. This has features that we will explore such as:
+ 1. Kubeflow pipelines: Kubeflow pipelines can make complex ML workflows into a GUI - based, parameterized, sharable and reproducible operation. Parameterizing and packaging common bioinformatic and quantitative tasks into a pipeline may create a more ergonomic work environment for the scientific community and could accelerate the cycle of iterative experimentation.
+ 2. Katib: Katib is a hyperparameter tuner which recursively deploys pipelines with different parameters and solves the optimal parameter set to build an optimal model. Neural Architecture Search features are also available with updated versions of Katib. NAS is basically the pinnacle of automation in model building. This is a bit beyond the scope of the discussion but, if you are interested in NAS, you may read about it here [5].
+ 1. Here, we will use Gaussian Bayesian hyperparameter tuning which can usually solve a reasonably optimized model for a ML workflow with far fewer trials than a gridsearch or random search. Gridsearch and random search are also available if you do an experiment can benefit from it.
+ 2. Trials are automatically executed in parallel in separate container execution environments on the Kubernetes cluster. Each one runs the entire pipeline on a different selection of arguments for the pipieline's parameters.
+ 3. It is also likely that the next generation of Neural Architecture Search algorithms in development will inevitably be integrated into the platform in coming years, and it is likely to eliminate the need for model building labor altogether for many classification and regression problems. There are already NAS algorithms / autoMLs in development that are outperforming XGBoost and best-realistic-case manually developed models on small data sets without the need to script different models and tune their hyperparameters. Nonetheless, data preparation, sampling, subject matter expertise on the data, and human intuition won't be automated any time soon, so don't worry, you should always have a job.
+ 3. Kubeflow also provides numerous other features that are out of scope for this tutorial, but may be useful for some users. Many KF features, plugins, and 3rd party tools provide services to track model provenance, serving predictions as a component of production systems, model performance monitoring in production (e.g. monitoring for training - serving skew, model drift, etc.). These tools may be useful for those in the community who are trying to deploy production ML systems in a regulated environment, for example, those who are developing AI based medical or diagnostic systems, those developing models that review grant or job candidate applications and prioritize applications for review, and those developing models that participate in other administrative or financial decision making ... There are many great resources that you can read up on at your leisure if this is of interest to you. GCP's Vertex AI Platform also provides a stable implementation of these features, but not all of the convenient features for AI research which we will explore here. Here we will focus on these and leave you with templates that you can substitute the data we are analyzing here with your own, or even swap out deep learning training script with an XGBoost training task. See below some images of the actual task that this tutorial will **easily** deploy from a Jupyter notebook (in the special Jupyter environment we will provision).
+
+ ![assets/dl-gwas-headline-1.png](assets/dl-gwas-headline-1.png)
+ ![assets/dl-gwas-headline-2.png](assets/dl-gwas-headline-2.png)
+
+## Step 1: Make sure that the required APIs are enabled.
+
+On GCP, many services are turned off by default for security reasons, because GCP supports many customers that work in regulated industries such as healthcare, defense, and financial services. Since we are working in a project that is designated for running academic experiments and is segregated from production resources, we can safely enable some of these services as are necessary for this experiment.
+
+1. Log into GCP with your NIH credentials `https://cloud.google.com`. After that click *console* in the upper right corner of the page.
+2. Accept any terms of service if prompted.
+3. On the main GCP console menu on the left, find [APIs and services] > library.
+![assets/000-enable-apis.png](assets/000-enable-apis.png)
+4. On the search field that appears on the top of the screen, paste `serviceusage.googleapis.com`.
+5. Click *enable*.
+![assets/service-usage-api.png](assets/service-usage-api.png)
+6. Repeat this, this time find and enable, `servicemanagement.googleapis.com`
+![assets/service-management-api.png](assets/service-management-api.png)
+8. Repeat this, this time find and enable, the compute engine API.
+![assets/enable-compute-engine.png](assets/enable-compute-engine.png)
+9. When you follow the downstream steps, you may be prompted to enable additional APIs. If so, repeat this as necessary.
+
+## Step 2: Deploy minikf kubeflow:
+
+Minikf is a small deployment of Kubeflow on a minikube Kubernetes instance. These minikube instances are single machine Kubernetes clusters that are meant for development use. For tasks this size that don't need to true highly available distributed HPC cluster to run, this is ideal, as these are easy to deploy when needed and delete when not. At current pricing at the time of this writing, the optimal scale of machine we need to run this job costs about $0.50 an hour.
+
+1. On the main GCP console menu on the left, find the GCP marketplace.
+![ASSETS/001-marketplace.png](assets/001-marketplace.png)
+2. When the marketplace page opens, enter in its search bar "minikf".
+3. Click on *Arrikto minikf*.
+4. Set the configuration for your minikf deployment:
+ 1. Give it any suitable name you want.
+ 2. Also change the machine type to any machine in the N2 family or its successor, having **16 CPU and 64 GB ram, n2-standard-16 at the time of this writing**.
+ 3. Don't get sticker shock on the monthly price. This is what you would pay if you left this running continuously, which I don't recommend doing. Here, we will use this for a few hours, make a copy of important data, then delete the instance. The price is pro-rated to the hours used.
+ 4. Provisioning a GPU is not necessary, but a GPUs it will make the experiment complete much more quickly once it is deployed on the cluster. Unless you are already familiar with the platform, I would recommend experimenting first without the GPU. They are a little expensive and it may take a little time to get the task deployed on the cluster until you are used to the setup. You probably don't want to pay for a GPU that's sitting idle while we fumble around the system.
+ 5. Leave all the other fields as default and click the *deploy* button at the bottom of the page.
+![assets/002-name-minikf.png](assets/002-name-minikf.png)
+![assets/n2-standard-16.png](assets/n2-standard-16.png)
+5. When the page that pops up tells you that the deployment was successful, click the button [ssh].
+ 1. A terminal should pop up. Pay attention to the browser for "pop up blocked" if this does not appear and enable popups from GCP if necessary.
+ ![assets/ssh-link.png](assets/ssh-link.png)
+ 2. When this terminal pops up, type: `gcloud services enable servicemanagement.googleapis.com`. If this throws errors, go back to "[APIs and services] > [library] and enable `serviceusage.googleapis.com`. Then close the ssh terminal, open a new one, and re-try `gcloud services enable servicemanagement.googleapis.com`. (This isn't an error. Yes we have to enable it here too). It doesn't make sense why, we just do...
+ ![assets/enable-service-management.png](assets/enable-service-management.png)
+ 3. Type: `minikf` and press enter.
+ ![assets/run-minikf-startup.png](assets/run-minikf-startup.png)
+ 3. Wait for it to tell you that minikf is ready. If this freezes up with the caption at "exposing services 33% progress" for more than 10 minutes, submit a help ticket on the **NIH** service desk and reference minikf and kubeflow.
+6. When the terminal tells you minikf is ready, you may close the terminal and from the web page on GCP where you deployed this, click the link to the dashboard that the marketplace page on GCP gives you. Log in with the user and password that the page gives you.
+
+## Step 3: create your Jupyter notebook:
+
+1. Click on *Notebooks* and click on "new" or "+" button.
+![assets/004-notebook.png](assets/004-notebook.png)
+![assets/x001-new-notebook.png](assets/x001-new-notebook.png)
+2. Configure the notebook to:
+ 1. any suitable name
+ 2. For docker image, pick any image having a name that includes both **"tensorflow"** and **"kale"**. Tensorflow is required for the script to run, and Kale will automatically compile and submit your pipeline for you.
+ 3. Provisions this notebook with 4.5 CPU and 9 GB of RAM.
+ 4. No GPU.
+ 5. Leave the "Workspace Volume as its default."
+ 6. For "Data volume", click "+" and leave what comes up as its default.
+ 7. Make sure "Allow access to Kubeflow Pipelines, Allow access to Rok" is selected.
+ 8. Make sure "enable shared memory is selected". This allows all the workers running the jobs in parallel to use a flexible shared pool of memory.
+ 8. Click *Launch*.
+![assets/00-create-new-notebook1.png](assets/00-create-new-notebook1.png)
+![assets/01-r3-create-new-notebook2.png](assets/01-r3-create-new-notebook2.png)
+![assets/02-create-new-notebook3.png](assets/02-create-new-notebook3.png)
+ 9. When the green checkmark shows that the notebook is ready, click [connect].
+![assets/03-open-notebook.png](assets/03-open-notebook.png)
+
+## Step 4: Run the notebook in the notebooks environment.
+
+If you chose a jupyter tensorflow container above, this should have the Kale notebooks extension. Click on the extension and make sure it's enabled. Also note the control for "Hyperparameter tuning with Katib". Leave this off for now, but we will revisit this one soon:
+
+![assets/05-enable-kale.png](assets/05-enable-kale.png)
+
+1. Upload the notebook "a1gwaskaleversion.ipynb" from this folder and the sequence file "IMP_height.txt" to the notebooks environment. You can do this one of 2 ways:
+ 1. Open a Jupyter terminal and run `git clone [the url for this repo]`. then `cd ...` into this directory. Note that the url you will use will be [ADD URL], since I was copying the notebook from a development copy of the repo.
+ ![assets/x004-launch-terminal.png](assets/x004-launch-terminal.png)
+ ![assets/x003-git-clone.png](assets/x003-git-clone.png)
+ 2. Download these 2 files, then upload them from the Jupyter files UI this way:
+ ![assets/04-upload-notebook-and-data.png](assets/04-upload-notebook-and-data.png)
+2. The kale notebook will add a selector to the right of each cell. When you click it, a menu opens above the cell. You must make a selection from this for each cell before you ask Kale to submit your job. This will tell Kale what purpose the cell serves in the pipeline [3] and how it should contextualize the cell's content. On the first 2 cells, select "skip".
+![assets/click-set-cell-kind.png](assets/click-set-cell-kind.png)
+![assets/x006-skip.png](assets/x006-skip.png)
+3. Run the first cell. It will install the updated version of Tensorflow, which will enable XLA, which should make the job run much more quickly, and other libraries. Since the options to use XLA are selected, the job will error out if this is not updated.
+ - If you are curious what XLA is, and want to read on, XLA or accelerated linear algebra is a processor (hardware) technology that allows the processor to do multiple tandem mathematical operations in one step without caching intermediate results, instead of the traditional way we do math where we complete the first operation, save the result of it, and then perform the second operation on the saved output of the first one. Basically the chip has a large and complex block of transistors arranged where it can be given the operands, the 2 desired operations and their order, and will perform, for example a "multiplication-addition" in one step. This is of course much faster than the old fashioned approach, but the downside is that it needs to be compiled with machine - specific C library options. The newer versions of Tensorflow have an option that lets you tell Tensorflow you want it to "JIT" compile its operations for XLA. JIT compiling also does other hardware - level optimizations, such as "function inlining", which basically means that the compiler does some specific refactoring of the code in the background to make it run faster. If you are curious to read more about this, you can here [4] ...
+4. Run the second cell. **Wait for a popup to appear asking you if you want to restart the kernel and click [ok] before you run the cells after this.** It will restart the Jupyter kernel, so the upgrades become effective. If you run the next cell before clicking OK, the notebook will freeze up, and you will need to restart the kernel yourself.
+![assets/x007-wait.png](assets/x007-wait.png)
+![assets/x008-restart-ok.png](assets/x008-restart-ok.png)
+5. For the remaining cells:
+ - Set the 3rd cell as imports and run it.
+ - Set the 4th cell as "pipeline parameters" and run it.
+ ![assets/updated-pipeline-params.png](assets/updated-pipeline-params.png)
+ - Set the 5th cell as "a pipeline step". In the example, I named it "preprocessing", but you can name it anything you want that. For depends on, leave blank.
+ ![assets/x009-preprocessing.png](assets/x009-preprocessing.png)
+ - For the 6th cell, set it as a pipeline step. In the example, I named it "train". In this one, set the depends on field to the name you assigned to the last cell. The purpose of the depends on field is to set the order of workflow steps. By default, Kubeflow always runs in separate execution environments on the cluster, and in addition to this, unless you tell it to do otherwise, Kubeflow will also try to run all workflow steps in parallel. If one step depends on the output of another step as its input, this will fail out. In this case, this is the same, because the first step of the workflow is a data preprocessing step. It takes in one tab delimited CSV sequence file, train-test splits it, reformats it to be amenable to a 1-d convolutional layer, and then saves the train data, train labels, test data, and test labels in separate files. The second step trains and and tunes a neural network on the data which the first step prepared. Since the second step needs files that the first step will create, we can't run these in parallel. The data preprocessing must be completed first.
+ ![assets/08-pipeline-step.png](assets/08-pipeline-step.png)
+ - For the last step, set this as "pipeline metrics". This will return the metric "val_mean_absolute_error" which the Katib tuner will try to minimize.
+ ![assets/09-pipeline-metrics.png](assets/09-pipeline-metrics.png)
+
+## Step 5: Configure Katib, compile your job, and submit it. (last step, and Kubeflow has it from here.):
+
+1. In the Kale panel to the left: Enable the hyperparameter tuning with Kale. A menu will open.
+2. Click *Set up Katib Job*.
+ 1. Checkbox each box for the pipeline parameters and set the following:
+ - All l1 and l2 regularizations: 0.0000001 to 0.9
+ - learning_rate: min 0.0001 max 0.3 step: .001 # Avoids overshooting an optimal solution when adjusting the kernel weights (weights are the "m" in the y = mx + b model) A neural network is just a chain of m3(m2(mx + b) + b2) + b3, with an activation function nested between each layer that adds non-linearity so it can't collapse to a single layer.
+ - dropout_rate: min 0.01 max 0.95 step 0.1 # Prevents overfitting; allows a more complex and robust model to be used without overfitting.
+ - num_dense_layers: min 1 max 5 step 1
+ - conv_activation and activation: elu, relu, gelu, linear (which activation function will be used; As a note, most textbook examples teach you to use relu. Elu is often said in an out of context way to be "more computationally expensive". In reality, models trained with elu usually reach convergence within a few epochs, sometimes in as few as 1 /20 as many epochs, making the higher per - epoch computational expense a moot problem in many cases, unless the celling of available ram and GPU power is barely able to compute the gradient for your task with relu. Elu usually wins out, but we will let the tuner figure what works best here in this specific case.)
+ - batch_size: min 5 max 35 step 1 # If you run into "out of memory" errors, then reduce the maximum.
+ - epochs: min 1 max 15 step 1
+ ![assets/10-setup-katib.png](assets/10-setup-katib.png)
+ 2. Select the Bayesian > Gaussian tuner. Leave the other tuner options as defaults and set it to run 3 trials in parallel and set it to run 35 trials in total (The tuner will probably converge on an optima well before that, but just in case your run does a little better than mine ... . Leave the other tuner's settings as the defaults, unless you are familiar and have recommendations on how we may improve its performance. If you do, open an issue and pull request on Github so we can make this better as we go.
+ 3. Click *Close*.
+ 4. Select advanced options for the storage.
+ - Advanced settings
+ - **Make sure "Docker image" is blank.**
+ - Storage class default
+ - Read write many
+ - Use notebook volume
+ - Take ROK snapshot during each step
+ - 5GB
+ - Second volume: None
+ - Everything else defaults
+
+ ![assets/11-r2-setup-job.png](assets/11-r2-setup-job.png)
+
+ 5. Click [Compile and Run Katib Job]
+ 6. Confirm it submitted successfully.
+ 7. Click on the browser tab where main Kubeflow dashboard is open.
+ - Click on the *Runs* tab.
+ ![assets/xx-0002-pick-a-run.png](assets/xx-0002-pick-a-run.png)
+ - Once you have opened a run, click on a pipeline step.
+ ![assets/xx0003-pick-pipeline-step.png](assets/xx0003-pick-pipeline-step.png)
+ - On a step, click its *logs*.
+ ![assets/xx0006-step-logs.png](assets/xx0006-step-logs.png)
+ - There are a few other useful tabs here like the the inputs and outputs of a step which list the hyperparameters and metrics which a step returned.
+ ![assets/xx0007-step-io.png](assets/xx0007-step-io.png)
+ - Click on experiments automl (Katib), then click on *your experiment*.
+ ![assets/xx0001-navigate-to-experiment.png](assets/xx0001-navigate-to-experiment.png)
+ - Once the first 3 runs have completed, return here. Notice the parallel coordinates visualization that shows the patterns in the parameters chosen for the trials and which subset of the hyperparameter space is contributing to the best model performance. Once several trials have completed, this will be very informative. The one in this example is a bit cluttered, I admit, as it has so many hyperaprameters. For most tasks, however, this visualization will make it clear what is tending to work and what isn't.
+ ![assets/13-parallel-coords.png](assets/13-parallel-coords.png)
+ - Feel free to find something else to do for about 2 - 3 hours. Kubeflow and Katib will continue running the experiment even if you are not logged in. In 2 - 3 hours, you can return to the [experiments AutoML] page, and it will provide the best hyperparameters it found among the trials and the validation mean absolute error for the best trial. You can run the train task with these parameters and get the model to use for further work. Alternatively, you can set the option [make a rok image of of each trial]. Visit these pages and see your results. Take screenshots. (In Windows (r), search for the snipping tool. In Mac, press "[command] + [shift] + 4"). Write down the best parameters it found. You will need these in the second notebook.
+ ![assets/14-successful-katib-run.png](assets/14-successful-katib-run.png)
+ ![assets/x002-final-results-page.png](assets/x002-final-results-page.png)
+ - Once you are sure you have a copy of everything you want, go back to the page in the deployment manager and delete the deployment, so you don't keep being billed for it. We recommend selecting the option to delete everything "disks, VPC settings, ...". The SSD disks that this uses are a little expensive.
+ ![assets/stop-instance.png](assets/stop-instance.png)
+
+## References
+
+- [0] https://colab.research.google.com/github/kmkarakaya/ML_tutorials/blob/master/Conv1d_Predict_house_prices.ipynb
+- [1] https://www.epj-conferences.org/articles/epjconf/abs/2021/05/epjconf_chep2021_02067/epjconf_chep2021_02067.html
+- [2] https://www.kubeflow.org/docs/started/introduction/
+- [3] https://docs.arrikto.com/features/pipelines/jupyterlab/cell-types.html#imports-cells
+- [4] https://www.greenend.org.uk/rjk/tech/inline.html
+- [5] https://en.wikipedia.org/wiki/Neural_architecture_search
diff --git a/tutorials/notebooks/DL-gwas-gcp-example/assets/00-create-new-notebook1.png b/tutorials/notebooks/DL-gwas-gcp-example/assets/00-create-new-notebook1.png
new file mode 100644
index 0000000..894f63e
Binary files /dev/null and b/tutorials/notebooks/DL-gwas-gcp-example/assets/00-create-new-notebook1.png differ
diff --git a/tutorials/notebooks/DL-gwas-gcp-example/assets/000-enable-apis.png b/tutorials/notebooks/DL-gwas-gcp-example/assets/000-enable-apis.png
new file mode 100644
index 0000000..4c05358
Binary files /dev/null and b/tutorials/notebooks/DL-gwas-gcp-example/assets/000-enable-apis.png differ
diff --git a/tutorials/notebooks/DL-gwas-gcp-example/assets/001-marketplace.png b/tutorials/notebooks/DL-gwas-gcp-example/assets/001-marketplace.png
new file mode 100644
index 0000000..fbb045a
Binary files /dev/null and b/tutorials/notebooks/DL-gwas-gcp-example/assets/001-marketplace.png differ
diff --git a/tutorials/notebooks/DL-gwas-gcp-example/assets/002-name-minikf.png b/tutorials/notebooks/DL-gwas-gcp-example/assets/002-name-minikf.png
new file mode 100644
index 0000000..8068370
Binary files /dev/null and b/tutorials/notebooks/DL-gwas-gcp-example/assets/002-name-minikf.png differ
diff --git a/tutorials/notebooks/DL-gwas-gcp-example/assets/003-pick-machine.png b/tutorials/notebooks/DL-gwas-gcp-example/assets/003-pick-machine.png
new file mode 100644
index 0000000..8395ba6
Binary files /dev/null and b/tutorials/notebooks/DL-gwas-gcp-example/assets/003-pick-machine.png differ
diff --git a/tutorials/notebooks/DL-gwas-gcp-example/assets/004-notebook.png b/tutorials/notebooks/DL-gwas-gcp-example/assets/004-notebook.png
new file mode 100644
index 0000000..2995a72
Binary files /dev/null and b/tutorials/notebooks/DL-gwas-gcp-example/assets/004-notebook.png differ
diff --git a/tutorials/notebooks/DL-gwas-gcp-example/assets/01-create-new-notebook2.png b/tutorials/notebooks/DL-gwas-gcp-example/assets/01-create-new-notebook2.png
new file mode 100644
index 0000000..5b8ae51
Binary files /dev/null and b/tutorials/notebooks/DL-gwas-gcp-example/assets/01-create-new-notebook2.png differ
diff --git a/tutorials/notebooks/DL-gwas-gcp-example/assets/01-r2-create-new-notebook2.png b/tutorials/notebooks/DL-gwas-gcp-example/assets/01-r2-create-new-notebook2.png
new file mode 100644
index 0000000..db6f110
Binary files /dev/null and b/tutorials/notebooks/DL-gwas-gcp-example/assets/01-r2-create-new-notebook2.png differ
diff --git a/tutorials/notebooks/DL-gwas-gcp-example/assets/01-r3-create-new-notebook2.png b/tutorials/notebooks/DL-gwas-gcp-example/assets/01-r3-create-new-notebook2.png
new file mode 100644
index 0000000..e812cc8
Binary files /dev/null and b/tutorials/notebooks/DL-gwas-gcp-example/assets/01-r3-create-new-notebook2.png differ
diff --git a/tutorials/notebooks/DL-gwas-gcp-example/assets/02-create-new-notebook3.png b/tutorials/notebooks/DL-gwas-gcp-example/assets/02-create-new-notebook3.png
new file mode 100644
index 0000000..06f5dd5
Binary files /dev/null and b/tutorials/notebooks/DL-gwas-gcp-example/assets/02-create-new-notebook3.png differ
diff --git a/tutorials/notebooks/DL-gwas-gcp-example/assets/03-open-notebook.png b/tutorials/notebooks/DL-gwas-gcp-example/assets/03-open-notebook.png
new file mode 100644
index 0000000..2ab5d98
Binary files /dev/null and b/tutorials/notebooks/DL-gwas-gcp-example/assets/03-open-notebook.png differ
diff --git a/tutorials/notebooks/DL-gwas-gcp-example/assets/04-upload-notebook-and-data.png b/tutorials/notebooks/DL-gwas-gcp-example/assets/04-upload-notebook-and-data.png
new file mode 100644
index 0000000..880532e
Binary files /dev/null and b/tutorials/notebooks/DL-gwas-gcp-example/assets/04-upload-notebook-and-data.png differ
diff --git a/tutorials/notebooks/DL-gwas-gcp-example/assets/05-enable-kale.png b/tutorials/notebooks/DL-gwas-gcp-example/assets/05-enable-kale.png
new file mode 100644
index 0000000..29b141f
Binary files /dev/null and b/tutorials/notebooks/DL-gwas-gcp-example/assets/05-enable-kale.png differ
diff --git a/tutorials/notebooks/DL-gwas-gcp-example/assets/06-pipeline-parameters.png b/tutorials/notebooks/DL-gwas-gcp-example/assets/06-pipeline-parameters.png
new file mode 100644
index 0000000..5ee06bd
Binary files /dev/null and b/tutorials/notebooks/DL-gwas-gcp-example/assets/06-pipeline-parameters.png differ
diff --git a/tutorials/notebooks/DL-gwas-gcp-example/assets/07-pipeline-parameters-katib.png b/tutorials/notebooks/DL-gwas-gcp-example/assets/07-pipeline-parameters-katib.png
new file mode 100644
index 0000000..a2b8dba
Binary files /dev/null and b/tutorials/notebooks/DL-gwas-gcp-example/assets/07-pipeline-parameters-katib.png differ
diff --git a/tutorials/notebooks/DL-gwas-gcp-example/assets/08-pipeline-step.png b/tutorials/notebooks/DL-gwas-gcp-example/assets/08-pipeline-step.png
new file mode 100644
index 0000000..441d296
Binary files /dev/null and b/tutorials/notebooks/DL-gwas-gcp-example/assets/08-pipeline-step.png differ
diff --git a/tutorials/notebooks/DL-gwas-gcp-example/assets/09-pipeline-metrics.png b/tutorials/notebooks/DL-gwas-gcp-example/assets/09-pipeline-metrics.png
new file mode 100644
index 0000000..22882b8
Binary files /dev/null and b/tutorials/notebooks/DL-gwas-gcp-example/assets/09-pipeline-metrics.png differ
diff --git a/tutorials/notebooks/DL-gwas-gcp-example/assets/10-setup-katib.png b/tutorials/notebooks/DL-gwas-gcp-example/assets/10-setup-katib.png
new file mode 100644
index 0000000..7439df1
Binary files /dev/null and b/tutorials/notebooks/DL-gwas-gcp-example/assets/10-setup-katib.png differ
diff --git a/tutorials/notebooks/DL-gwas-gcp-example/assets/11-r2-setup-job.png b/tutorials/notebooks/DL-gwas-gcp-example/assets/11-r2-setup-job.png
new file mode 100644
index 0000000..779bd62
Binary files /dev/null and b/tutorials/notebooks/DL-gwas-gcp-example/assets/11-r2-setup-job.png differ
diff --git a/tutorials/notebooks/DL-gwas-gcp-example/assets/11-setup-job.png b/tutorials/notebooks/DL-gwas-gcp-example/assets/11-setup-job.png
new file mode 100644
index 0000000..0741436
Binary files /dev/null and b/tutorials/notebooks/DL-gwas-gcp-example/assets/11-setup-job.png differ
diff --git a/tutorials/notebooks/DL-gwas-gcp-example/assets/12-compare.png b/tutorials/notebooks/DL-gwas-gcp-example/assets/12-compare.png
new file mode 100644
index 0000000..6cf1500
Binary files /dev/null and b/tutorials/notebooks/DL-gwas-gcp-example/assets/12-compare.png differ
diff --git a/tutorials/notebooks/DL-gwas-gcp-example/assets/13-parallel-coords.png b/tutorials/notebooks/DL-gwas-gcp-example/assets/13-parallel-coords.png
new file mode 100644
index 0000000..7c36105
Binary files /dev/null and b/tutorials/notebooks/DL-gwas-gcp-example/assets/13-parallel-coords.png differ
diff --git a/tutorials/notebooks/DL-gwas-gcp-example/assets/14-successful-katib-run.png b/tutorials/notebooks/DL-gwas-gcp-example/assets/14-successful-katib-run.png
new file mode 100644
index 0000000..be7b4ed
Binary files /dev/null and b/tutorials/notebooks/DL-gwas-gcp-example/assets/14-successful-katib-run.png differ
diff --git a/tutorials/notebooks/DL-gwas-gcp-example/assets/click-set-cell-kind.png b/tutorials/notebooks/DL-gwas-gcp-example/assets/click-set-cell-kind.png
new file mode 100644
index 0000000..5e5add7
Binary files /dev/null and b/tutorials/notebooks/DL-gwas-gcp-example/assets/click-set-cell-kind.png differ
diff --git a/tutorials/notebooks/DL-gwas-gcp-example/assets/dl-gwas-headline-1.png b/tutorials/notebooks/DL-gwas-gcp-example/assets/dl-gwas-headline-1.png
new file mode 100644
index 0000000..3c6592e
Binary files /dev/null and b/tutorials/notebooks/DL-gwas-gcp-example/assets/dl-gwas-headline-1.png differ
diff --git a/tutorials/notebooks/DL-gwas-gcp-example/assets/dl-gwas-headline-2.png b/tutorials/notebooks/DL-gwas-gcp-example/assets/dl-gwas-headline-2.png
new file mode 100644
index 0000000..c3a300f
Binary files /dev/null and b/tutorials/notebooks/DL-gwas-gcp-example/assets/dl-gwas-headline-2.png differ
diff --git a/tutorials/notebooks/DL-gwas-gcp-example/assets/enable-compute-engine.png b/tutorials/notebooks/DL-gwas-gcp-example/assets/enable-compute-engine.png
new file mode 100644
index 0000000..193ab59
Binary files /dev/null and b/tutorials/notebooks/DL-gwas-gcp-example/assets/enable-compute-engine.png differ
diff --git a/tutorials/notebooks/DL-gwas-gcp-example/assets/enable-service-management.png b/tutorials/notebooks/DL-gwas-gcp-example/assets/enable-service-management.png
new file mode 100644
index 0000000..4985b25
Binary files /dev/null and b/tutorials/notebooks/DL-gwas-gcp-example/assets/enable-service-management.png differ
diff --git a/tutorials/notebooks/DL-gwas-gcp-example/assets/n2-standard-16.png b/tutorials/notebooks/DL-gwas-gcp-example/assets/n2-standard-16.png
new file mode 100644
index 0000000..2776a65
Binary files /dev/null and b/tutorials/notebooks/DL-gwas-gcp-example/assets/n2-standard-16.png differ
diff --git a/tutorials/notebooks/DL-gwas-gcp-example/assets/old-11-r2-setup-job.png b/tutorials/notebooks/DL-gwas-gcp-example/assets/old-11-r2-setup-job.png
new file mode 100644
index 0000000..120d88b
Binary files /dev/null and b/tutorials/notebooks/DL-gwas-gcp-example/assets/old-11-r2-setup-job.png differ
diff --git a/tutorials/notebooks/DL-gwas-gcp-example/assets/old-x002-final-results-page.png b/tutorials/notebooks/DL-gwas-gcp-example/assets/old-x002-final-results-page.png
new file mode 100644
index 0000000..a21da27
Binary files /dev/null and b/tutorials/notebooks/DL-gwas-gcp-example/assets/old-x002-final-results-page.png differ
diff --git a/tutorials/notebooks/DL-gwas-gcp-example/assets/run-minikf-startup.png b/tutorials/notebooks/DL-gwas-gcp-example/assets/run-minikf-startup.png
new file mode 100644
index 0000000..018c4d2
Binary files /dev/null and b/tutorials/notebooks/DL-gwas-gcp-example/assets/run-minikf-startup.png differ
diff --git a/tutorials/notebooks/DL-gwas-gcp-example/assets/service-management-api.png b/tutorials/notebooks/DL-gwas-gcp-example/assets/service-management-api.png
new file mode 100644
index 0000000..e4d0ef5
Binary files /dev/null and b/tutorials/notebooks/DL-gwas-gcp-example/assets/service-management-api.png differ
diff --git a/tutorials/notebooks/DL-gwas-gcp-example/assets/service-usage-api.png b/tutorials/notebooks/DL-gwas-gcp-example/assets/service-usage-api.png
new file mode 100644
index 0000000..d07c49c
Binary files /dev/null and b/tutorials/notebooks/DL-gwas-gcp-example/assets/service-usage-api.png differ
diff --git a/tutorials/notebooks/DL-gwas-gcp-example/assets/ssh-link.png b/tutorials/notebooks/DL-gwas-gcp-example/assets/ssh-link.png
new file mode 100644
index 0000000..f51a3a3
Binary files /dev/null and b/tutorials/notebooks/DL-gwas-gcp-example/assets/ssh-link.png differ
diff --git a/tutorials/notebooks/DL-gwas-gcp-example/assets/stop-instance.png b/tutorials/notebooks/DL-gwas-gcp-example/assets/stop-instance.png
new file mode 100644
index 0000000..e039784
Binary files /dev/null and b/tutorials/notebooks/DL-gwas-gcp-example/assets/stop-instance.png differ
diff --git a/tutorials/notebooks/DL-gwas-gcp-example/assets/updated-pipeline-params.png b/tutorials/notebooks/DL-gwas-gcp-example/assets/updated-pipeline-params.png
new file mode 100644
index 0000000..f4453bd
Binary files /dev/null and b/tutorials/notebooks/DL-gwas-gcp-example/assets/updated-pipeline-params.png differ
diff --git a/tutorials/notebooks/DL-gwas-gcp-example/assets/x001-new-notebook.png b/tutorials/notebooks/DL-gwas-gcp-example/assets/x001-new-notebook.png
new file mode 100644
index 0000000..951d6d5
Binary files /dev/null and b/tutorials/notebooks/DL-gwas-gcp-example/assets/x001-new-notebook.png differ
diff --git a/tutorials/notebooks/DL-gwas-gcp-example/assets/x002-final-results-page.png b/tutorials/notebooks/DL-gwas-gcp-example/assets/x002-final-results-page.png
new file mode 100644
index 0000000..b1fe14d
Binary files /dev/null and b/tutorials/notebooks/DL-gwas-gcp-example/assets/x002-final-results-page.png differ
diff --git a/tutorials/notebooks/DL-gwas-gcp-example/assets/x003-git-clone.png b/tutorials/notebooks/DL-gwas-gcp-example/assets/x003-git-clone.png
new file mode 100644
index 0000000..6f5ea89
Binary files /dev/null and b/tutorials/notebooks/DL-gwas-gcp-example/assets/x003-git-clone.png differ
diff --git a/tutorials/notebooks/DL-gwas-gcp-example/assets/x004-launch-terminal.png b/tutorials/notebooks/DL-gwas-gcp-example/assets/x004-launch-terminal.png
new file mode 100644
index 0000000..17d4e95
Binary files /dev/null and b/tutorials/notebooks/DL-gwas-gcp-example/assets/x004-launch-terminal.png differ
diff --git a/tutorials/notebooks/DL-gwas-gcp-example/assets/x006-skip.png b/tutorials/notebooks/DL-gwas-gcp-example/assets/x006-skip.png
new file mode 100644
index 0000000..015afdb
Binary files /dev/null and b/tutorials/notebooks/DL-gwas-gcp-example/assets/x006-skip.png differ
diff --git a/tutorials/notebooks/DL-gwas-gcp-example/assets/x007-wait.png b/tutorials/notebooks/DL-gwas-gcp-example/assets/x007-wait.png
new file mode 100644
index 0000000..0c62041
Binary files /dev/null and b/tutorials/notebooks/DL-gwas-gcp-example/assets/x007-wait.png differ
diff --git a/tutorials/notebooks/DL-gwas-gcp-example/assets/x008-restart-ok.png b/tutorials/notebooks/DL-gwas-gcp-example/assets/x008-restart-ok.png
new file mode 100644
index 0000000..2faeff7
Binary files /dev/null and b/tutorials/notebooks/DL-gwas-gcp-example/assets/x008-restart-ok.png differ
diff --git a/tutorials/notebooks/DL-gwas-gcp-example/assets/x009-preprocessing.png b/tutorials/notebooks/DL-gwas-gcp-example/assets/x009-preprocessing.png
new file mode 100644
index 0000000..49a8839
Binary files /dev/null and b/tutorials/notebooks/DL-gwas-gcp-example/assets/x009-preprocessing.png differ
diff --git a/tutorials/notebooks/DL-gwas-gcp-example/assets/xx-0002-pick-a-run.png b/tutorials/notebooks/DL-gwas-gcp-example/assets/xx-0002-pick-a-run.png
new file mode 100644
index 0000000..888224b
Binary files /dev/null and b/tutorials/notebooks/DL-gwas-gcp-example/assets/xx-0002-pick-a-run.png differ
diff --git a/tutorials/notebooks/DL-gwas-gcp-example/assets/xx0001-navigate-to-experiment.png b/tutorials/notebooks/DL-gwas-gcp-example/assets/xx0001-navigate-to-experiment.png
new file mode 100644
index 0000000..540af7e
Binary files /dev/null and b/tutorials/notebooks/DL-gwas-gcp-example/assets/xx0001-navigate-to-experiment.png differ
diff --git a/tutorials/notebooks/DL-gwas-gcp-example/assets/xx0003-pick-pipeline-step.png b/tutorials/notebooks/DL-gwas-gcp-example/assets/xx0003-pick-pipeline-step.png
new file mode 100644
index 0000000..5560c63
Binary files /dev/null and b/tutorials/notebooks/DL-gwas-gcp-example/assets/xx0003-pick-pipeline-step.png differ
diff --git a/tutorials/notebooks/DL-gwas-gcp-example/assets/xx0006-step-logs.png b/tutorials/notebooks/DL-gwas-gcp-example/assets/xx0006-step-logs.png
new file mode 100644
index 0000000..e5fb1d2
Binary files /dev/null and b/tutorials/notebooks/DL-gwas-gcp-example/assets/xx0006-step-logs.png differ
diff --git a/tutorials/notebooks/DL-gwas-gcp-example/assets/xx0007-step-io.png b/tutorials/notebooks/DL-gwas-gcp-example/assets/xx0007-step-io.png
new file mode 100644
index 0000000..42f02b7
Binary files /dev/null and b/tutorials/notebooks/DL-gwas-gcp-example/assets/xx0007-step-io.png differ
diff --git a/tutorials/notebooks/DL-gwas-gcp-example/nb_assets/x002-final-results-page.png b/tutorials/notebooks/DL-gwas-gcp-example/nb_assets/x002-final-results-page.png
new file mode 100644
index 0000000..a21da27
Binary files /dev/null and b/tutorials/notebooks/DL-gwas-gcp-example/nb_assets/x002-final-results-page.png differ
diff --git a/tutorials/notebooks/GWASCoatColor/GWAS_coat_color.ipynb b/tutorials/notebooks/GWASCoatColor/GWAS_coat_color.ipynb
new file mode 100644
index 0000000..3aac3ce
--- /dev/null
+++ b/tutorials/notebooks/GWASCoatColor/GWAS_coat_color.ipynb
@@ -0,0 +1,394 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "id": "7a244bb3",
+ "metadata": {},
+ "source": [
+ "# GWAS in the cloud\n",
+ "We adapted the NIH CFDE tutorial from [here](https://training.nih-cfde.org/en/latest/Bioinformatic-Analyses/GWAS-in-the-cloud/background/) and fit it to a notebook. We have greatly simplified the instructions, so if you need or want more details, look at the full tutorial to find out more.\n",
+ "Most of this notebook is bash, but expects that you are using a Python kernel, until step 3, plotting, you will need to switch your kernel to R."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "8fbf6304",
+ "metadata": {},
+ "source": [
+ "## 1. Setup\n",
+ "### Download the data\n",
+ "use %%bash to denote a bash block. You can also use '!' to denote a single bash command within a Python notebook"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "8ec900bd",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "%%bash\n",
+ "mkdir GWAS\n",
+ "curl -LO https://de.cyverse.org/dl/d/E0A502CC-F806-4857-9C3A-BAEAA0CCC694/pruned_coatColor_maf_geno.vcf.gz\n",
+ "curl -LO https://de.cyverse.org/dl/d/3B5C1853-C092-488C-8C2F-CE6E8526E96B/coatColor.pheno"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "4d43ae73",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "%%bash\n",
+ "mv *.gz GWAS\n",
+ "mv *.pheno GWAS\n",
+ "ls GWAS"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "28aadbf8",
+ "metadata": {},
+ "source": [
+ "### Install dependencies\n",
+ "Here we install mamba, which is faster than conda, but it can be tricky to add to path in a Sagemaker notebook so we just call the whole path. You could also skip this install and just use conda since that is preinstalled in the kernel."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "b3ba3eef",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "! curl -L -O https://github.com/conda-forge/miniforge/releases/latest/download/Mambaforge-$(uname)-$(uname -m).sh\n",
+ "! bash Mambaforge-$(uname)-$(uname -m).sh -b -p $HOME/mambaforge"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "f22059df-5a9c-4982-9b2f-bd15ce746bb2",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "#add to your path\n",
+ "import os\n",
+ "os.environ[\"PATH\"] += os.pathsep + os.environ[\"HOME\"]+\"/mambaforge/bin\""
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "b219074a",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "! mamba install -y -c bioconda plink vcftools"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "3de2fc4c",
+ "metadata": {},
+ "source": [
+ "## 2. Analyze"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "013d960d",
+ "metadata": {},
+ "source": [
+ "### Make map and ped files from the vcf file to feed into plink"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "e91c7a01",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "cd GWAS"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "6570875d",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "! vcftools --gzvcf pruned_coatColor_maf_geno.vcf.gz --plink --out coatColor"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "b9a38761",
+ "metadata": {},
+ "source": [
+ "### Create a list of minor alleles\n",
+ "\n",
+ "For more info on these terms, look at step 2 [here](https://training.nih-cfde.org/en/latest/Bioinformatic-Analyses/GWAS-in-the-cloud/analyze/)."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "6c868a67",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "#unzip vcf\n",
+ "! vcftools --gzvcf pruned_coatColor_maf_geno.vcf.gz --recode --out pruned_coatColor_maf_geno"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "8e11f991",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "#create list of minor alleles\n",
+ "! cat pruned_coatColor_maf_geno.recode.vcf | awk 'BEGIN{FS=\"\\t\";OFS=\"\\t\";}/#/{next;}{{if($3==\".\")$3=$1\":\"$2;}print $3,$5;}' > minor_alleles"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "8cff47e3",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "! head minor_alleles"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "56d901c7",
+ "metadata": {},
+ "source": [
+ "### Run quality controls"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "dafa14a6",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "#calculate missingness per locus\n",
+ "! plink --file coatColor --make-pheno coatColor.pheno \"yellow\" --missing --out miss_stat --noweb --dog --reference-allele minor_alleles --allow-no-sex --adjust"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "5cf5f51b",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "#take a look at lmiss, which is the per locus rates of missingness\n",
+ "! head miss_stat.lmiss"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "915bb263",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "#peek at imiss which is the individual rates of missingness\n",
+ "! head miss_stat.imiss"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "4c11ca71",
+ "metadata": {},
+ "source": [
+ "### Convert to plink binary format"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "3b8f2d7f",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "! plink --file coatColor --allow-no-sex --dog --make-bed --noweb --out coatColor.binary"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "e36f6cd7",
+ "metadata": {},
+ "source": [
+ "### Run a simple association step (the GWAS part!)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "f926ef9b",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "! plink --bfile coatColor.binary --make-pheno coatColor.pheno \"yellow\" --assoc --reference-allele minor_alleles --allow-no-sex --adjust --dog --noweb --out coatColor"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "b397d484",
+ "metadata": {},
+ "source": [
+ "### Identify statistical cutoffs\n",
+ "This code finds the equivalent of 0.05 and 0.01 p value in the negative-log-transformed p values file. We will use these cutoffs to draw horizontal lines in the Manhattan plot for visualization of haplotypes that cross the 0.05 and 0.01 statistical threshold (i.e. have a statistically significant association with yellow coat color)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "b94e1e2a",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "%%bash\n",
+ "unad_cutoff_sug=$(tail -n+2 coatColor.assoc.adjusted | awk '$10>=0.05' | head -n1 | awk '{print $3}')\n",
+ "unad_cutoff_conf=$(tail -n+2 coatColor.assoc.adjusted | awk '$10>=0.01' | head -n1 | awk '{print $3}')"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "1f52e97c",
+ "metadata": {},
+ "source": [
+ "## 3. Plotting\n",
+ "In this tutorial, plotting is done in R, so at this point you can change your kernel to R in the top right. Wait for it to say 'idle' in the bottom left, then continue. You could also plot using Python native packages and maintain the Python notebook kernel."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "effb5acd",
+ "metadata": {},
+ "source": [
+ "### Install qqman"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "60feed89",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "install.packages('qqman', contriburl=contrib.url('http://cran.r-project.org/'))"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "d3f1fcd2",
+ "metadata": {},
+ "source": [
+ "### Run the plotting function"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "a7e8cd2b",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "#make sure you are still CD in GWAS, when you change kernel it may reset to home\n",
+ "setwd('GWAS')"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "7946a3a7",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "require(qqman)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "0d28ef2c",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "data=read.table(\"coatColor.assoc\", header=TRUE)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "8e5207be",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "data=data[!is.na(data$P),]"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "6330b1e0",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "manhattan(data, p = \"P\", col = c(\"blue4\", \"orange3\"),\n",
+ " suggestiveline = 12,\n",
+ " genomewideline = 15,\n",
+ " chrlabs = c(1:38, \"X\"), annotateTop=TRUE, cex = 1.2)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "26787d84",
+ "metadata": {},
+ "source": [
+ "In our graph, haplotypes in four parts of the genome (chromosome 2, 5, 28 and X) are found to be associated with an increased occurrence of the yellow coat color phenotype.\n",
+ "\n",
+ "The top associated mutation is a nonsense SNP in the gene MC1R known to control pigment production. The MC1R allele encoding yellow coat color contains a single base change (from C to T) at the 916th nucleotide."
+ ]
+ }
+ ],
+ "metadata": {
+ "environment": {
+ "kernel": "python3",
+ "name": "tf2-gpu.2-11.m110",
+ "type": "gcloud",
+ "uri": "gcr.io/deeplearning-platform-release/tf2-gpu.2-11:m110"
+ },
+ "kernelspec": {
+ "display_name": "Python 3",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.10.12"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/tutorials/notebooks/GenAI/GCP_GenAI_Huggingface.ipynb b/tutorials/notebooks/GenAI/GCP_GenAI_Huggingface.ipynb
new file mode 100644
index 0000000..09b3d8d
--- /dev/null
+++ b/tutorials/notebooks/GenAI/GCP_GenAI_Huggingface.ipynb
@@ -0,0 +1,2590 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "id": "92021b22-3fbf-4489-9e88-aea4d73f3529",
+ "metadata": {},
+ "source": [
+ "# Finetuning and Deploying Hugging Face Models on Vertex AI"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "dfbc2ea5-1aca-4322-a15c-7bcbb3925ef6",
+ "metadata": {},
+ "source": [
+ "For this tutorial it is recommended to use 1 GPU to speed up processes, this notebooks was run using the machinetype n1-highcpu-8 (8 vCPUs, 7.199 GB RAM) on Tensorflow. Visit the following tutorial to set up notebooks that utilize: GPUs [Spinning up a Vertex AI Notebook](../../../docs/vertexai.md).\n",
+ "\n",
+ "This tutorial will focus on utilizing Hugging Face which is a repository for user to share and download machine learning models, datasets, and demos. For this tutorial we will load in a model and dataset from Hugging Face and train and test our model before deploying it on Vertex AI. The model we will be deploying is Flan T5 and the datasets is [ccdv/pubmed-summarization](https://HuggingFace.co/datasets/ccdv/pubmed-summarization). Steps will show how to hypertune a model locally and how to launch our custom training job on Vertex AI Training, these steps are based on Keras NLP Tutorials for [abstractive summarization](https://keras.io/examples/nlp/t5_hf_summarization/).\n",
+ "\n",
+ "You may be wondering why are we training a pretrained model? The reason for this is because we are fine tuning our pretrained model for optimal performance on a particular application, in our case summarizing scientific documents. This is not a necessary step anymore as new methods have been made to enhance model performance like zero-shot learning which we will go over in our next tutorial."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "6ab3668c-58c0-489a-aa4f-6f7e045b450f",
+ "metadata": {},
+ "source": [
+ "## Install Tools"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "161dea96-601e-4194-a285-51b0c4403e9d",
+ "metadata": {},
+ "source": [
+ "Hugging Face **transformers** are an open-source framework that allows you to utilize APIs and tools to download pretrained models, set hyperparameters, tokenize datasets, and further tune them to suite your needs. Here we are updating Vertex AI as well as installing the transformers package and **datasets** so that we can have access to Hugging Face datasets and as a bonus we are adding the S3 feature to help download datasets that may already be in a S3 bucket."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 1,
+ "id": "a6e5884b-ac90-42d4-aafd-d34d5495d24d",
+ "metadata": {
+ "scrolled": true,
+ "tags": []
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Collecting transformers\n",
+ " Obtaining dependency information for transformers from https://files.pythonhosted.org/packages/c1/bd/f64d67df4d3b05a460f281defe830ffab6d7940b7ca98ec085e94e024781/transformers-4.34.1-py3-none-any.whl.metadata\n",
+ " Downloading transformers-4.34.1-py3-none-any.whl.metadata (121 kB)\n",
+ "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m121.5/121.5 kB\u001b[0m \u001b[31m8.8 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+ "\u001b[?25hCollecting datasets\n",
+ " Obtaining dependency information for datasets from https://files.pythonhosted.org/packages/7c/55/b3432f43d6d7fee999bb23a547820d74c48ec540f5f7842e41aa5d8d5f3a/datasets-2.14.6-py3-none-any.whl.metadata\n",
+ " Downloading datasets-2.14.6-py3-none-any.whl.metadata (19 kB)\n",
+ "Collecting rouge_score\n",
+ " Downloading rouge_score-0.1.2.tar.gz (17 kB)\n",
+ " Preparing metadata (setup.py) ... \u001b[?25ldone\n",
+ "\u001b[?25hCollecting evaluate\n",
+ " Obtaining dependency information for evaluate from https://files.pythonhosted.org/packages/70/63/7644a1eb7b0297e585a6adec98ed9e575309bb973c33b394dae66bc35c69/evaluate-0.4.1-py3-none-any.whl.metadata\n",
+ " Downloading evaluate-0.4.1-py3-none-any.whl.metadata (9.4 kB)\n",
+ "Collecting keras_nlp\n",
+ " Obtaining dependency information for keras_nlp from https://files.pythonhosted.org/packages/37/d4/dfd85606db811af2138e97fc480eb7ed709042dd96dd453868bede0929fe/keras_nlp-0.6.2-py3-none-any.whl.metadata\n",
+ " Downloading keras_nlp-0.6.2-py3-none-any.whl.metadata (7.2 kB)\n",
+ "Requirement already satisfied: filelock in /opt/conda/lib/python3.10/site-packages (from transformers) (3.12.4)\n",
+ "Collecting huggingface-hub<1.0,>=0.16.4 (from transformers)\n",
+ " Obtaining dependency information for huggingface-hub<1.0,>=0.16.4 from https://files.pythonhosted.org/packages/ef/b5/b6107bd65fa4c96fdf00e4733e2fe5729bb9e5e09997f63074bb43d3ab28/huggingface_hub-0.18.0-py3-none-any.whl.metadata\n",
+ " Downloading huggingface_hub-0.18.0-py3-none-any.whl.metadata (13 kB)\n",
+ "Requirement already satisfied: numpy>=1.17 in /opt/conda/lib/python3.10/site-packages (from transformers) (1.23.5)\n",
+ "Requirement already satisfied: packaging>=20.0 in /opt/conda/lib/python3.10/site-packages (from transformers) (23.1)\n",
+ "Requirement already satisfied: pyyaml>=5.1 in /opt/conda/lib/python3.10/site-packages (from transformers) (6.0.1)\n",
+ "Requirement already satisfied: regex!=2019.12.17 in /opt/conda/lib/python3.10/site-packages (from transformers) (2023.8.8)\n",
+ "Requirement already satisfied: requests in /opt/conda/lib/python3.10/site-packages (from transformers) (2.31.0)\n",
+ "Collecting tokenizers<0.15,>=0.14 (from transformers)\n",
+ " Obtaining dependency information for tokenizers<0.15,>=0.14 from https://files.pythonhosted.org/packages/a7/7b/c1f643eb086b6c5c33eef0c3752e37624bd23e4cbc9f1332748f1c6252d1/tokenizers-0.14.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata\n",
+ " Downloading tokenizers-0.14.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.7 kB)\n",
+ "Collecting safetensors>=0.3.1 (from transformers)\n",
+ " Obtaining dependency information for safetensors>=0.3.1 from https://files.pythonhosted.org/packages/20/4e/878b080dbda92666233ec6f316a53969edcb58eab1aa399a64d0521cf953/safetensors-0.4.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata\n",
+ " Downloading safetensors-0.4.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (3.8 kB)\n",
+ "Requirement already satisfied: tqdm>=4.27 in /opt/conda/lib/python3.10/site-packages (from transformers) (4.66.1)\n",
+ "Requirement already satisfied: pyarrow>=8.0.0 in /opt/conda/lib/python3.10/site-packages (from datasets) (9.0.0)\n",
+ "Requirement already satisfied: dill<0.3.8,>=0.3.0 in /opt/conda/lib/python3.10/site-packages (from datasets) (0.3.1.1)\n",
+ "Requirement already satisfied: pandas in /opt/conda/lib/python3.10/site-packages (from datasets) (2.0.3)\n",
+ "Collecting xxhash (from datasets)\n",
+ " Obtaining dependency information for xxhash from https://files.pythonhosted.org/packages/80/8a/1dd41557883b6196f8f092011a5c1f72d4d44cf36d7b67d4a5efe3127949/xxhash-3.4.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata\n",
+ " Downloading xxhash-3.4.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (12 kB)\n",
+ "Collecting multiprocess (from datasets)\n",
+ " Obtaining dependency information for multiprocess from https://files.pythonhosted.org/packages/35/a8/36d8d7b3e46b377800d8dec47891cdf05842d1a2366909ae4a0c89fbc5e6/multiprocess-0.70.15-py310-none-any.whl.metadata\n",
+ " Downloading multiprocess-0.70.15-py310-none-any.whl.metadata (7.2 kB)\n",
+ "Requirement already satisfied: fsspec[http]<=2023.10.0,>=2023.1.0 in /opt/conda/lib/python3.10/site-packages (from datasets) (2023.9.2)\n",
+ "Requirement already satisfied: aiohttp in /opt/conda/lib/python3.10/site-packages (from datasets) (3.8.5)\n",
+ "Requirement already satisfied: absl-py in /opt/conda/lib/python3.10/site-packages (from rouge_score) (1.4.0)\n",
+ "Collecting nltk (from rouge_score)\n",
+ " Downloading nltk-3.8.1-py3-none-any.whl (1.5 MB)\n",
+ "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m1.5/1.5 MB\u001b[0m \u001b[31m80.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+ "\u001b[?25hRequirement already satisfied: six>=1.14.0 in /opt/conda/lib/python3.10/site-packages (from rouge_score) (1.16.0)\n",
+ "Collecting responses<0.19 (from evaluate)\n",
+ " Downloading responses-0.18.0-py3-none-any.whl (38 kB)\n",
+ "Requirement already satisfied: keras-core in /opt/conda/lib/python3.10/site-packages (from keras_nlp) (0.1.7)\n",
+ "Requirement already satisfied: rich in /opt/conda/lib/python3.10/site-packages (from keras_nlp) (13.5.3)\n",
+ "Requirement already satisfied: dm-tree in /opt/conda/lib/python3.10/site-packages (from keras_nlp) (0.1.8)\n",
+ "Collecting tensorflow-text (from keras_nlp)\n",
+ " Obtaining dependency information for tensorflow-text from https://files.pythonhosted.org/packages/0b/5f/8b301d2d0cea8334c22aaeb8880ce115ec34d7eba20f7b08c64202011a85/tensorflow_text-2.14.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata\n",
+ " Downloading tensorflow_text-2.14.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (1.9 kB)\n",
+ "Requirement already satisfied: attrs>=17.3.0 in /opt/conda/lib/python3.10/site-packages (from aiohttp->datasets) (23.1.0)\n",
+ "Requirement already satisfied: charset-normalizer<4.0,>=2.0 in /opt/conda/lib/python3.10/site-packages (from aiohttp->datasets) (3.2.0)\n",
+ "Requirement already satisfied: multidict<7.0,>=4.5 in /opt/conda/lib/python3.10/site-packages (from aiohttp->datasets) (6.0.4)\n",
+ "Requirement already satisfied: async-timeout<5.0,>=4.0.0a3 in /opt/conda/lib/python3.10/site-packages (from aiohttp->datasets) (4.0.3)\n",
+ "Requirement already satisfied: yarl<2.0,>=1.0 in /opt/conda/lib/python3.10/site-packages (from aiohttp->datasets) (1.9.2)\n",
+ "Requirement already satisfied: frozenlist>=1.1.1 in /opt/conda/lib/python3.10/site-packages (from aiohttp->datasets) (1.4.0)\n",
+ "Requirement already satisfied: aiosignal>=1.1.2 in /opt/conda/lib/python3.10/site-packages (from aiohttp->datasets) (1.3.1)\n",
+ "Requirement already satisfied: typing-extensions>=3.7.4.3 in /opt/conda/lib/python3.10/site-packages (from huggingface-hub<1.0,>=0.16.4->transformers) (4.5.0)\n",
+ "Requirement already satisfied: idna<4,>=2.5 in /opt/conda/lib/python3.10/site-packages (from requests->transformers) (3.4)\n",
+ "Requirement already satisfied: urllib3<3,>=1.21.1 in /opt/conda/lib/python3.10/site-packages (from requests->transformers) (1.26.16)\n",
+ "Requirement already satisfied: certifi>=2017.4.17 in /opt/conda/lib/python3.10/site-packages (from requests->transformers) (2023.7.22)\n",
+ "Collecting huggingface-hub<1.0,>=0.16.4 (from transformers)\n",
+ " Obtaining dependency information for huggingface-hub<1.0,>=0.16.4 from https://files.pythonhosted.org/packages/aa/f3/3fc97336a0e90516901befd4f500f08d691034d387406fdbde85bea827cc/huggingface_hub-0.17.3-py3-none-any.whl.metadata\n",
+ " Downloading huggingface_hub-0.17.3-py3-none-any.whl.metadata (13 kB)\n",
+ "Requirement already satisfied: namex in /opt/conda/lib/python3.10/site-packages (from keras-core->keras_nlp) (0.0.7)\n",
+ "Requirement already satisfied: h5py in /opt/conda/lib/python3.10/site-packages (from keras-core->keras_nlp) (3.9.0)\n",
+ "Collecting dill<0.3.8,>=0.3.0 (from datasets)\n",
+ " Obtaining dependency information for dill<0.3.8,>=0.3.0 from https://files.pythonhosted.org/packages/f5/3a/74a29b11cf2cdfcd6ba89c0cecd70b37cd1ba7b77978ce611eb7a146a832/dill-0.3.7-py3-none-any.whl.metadata\n",
+ " Downloading dill-0.3.7-py3-none-any.whl.metadata (9.9 kB)\n",
+ "Requirement already satisfied: click in /opt/conda/lib/python3.10/site-packages (from nltk->rouge_score) (8.1.7)\n",
+ "Requirement already satisfied: joblib in /opt/conda/lib/python3.10/site-packages (from nltk->rouge_score) (1.3.2)\n",
+ "Requirement already satisfied: python-dateutil>=2.8.2 in /opt/conda/lib/python3.10/site-packages (from pandas->datasets) (2.8.2)\n",
+ "Requirement already satisfied: pytz>=2020.1 in /opt/conda/lib/python3.10/site-packages (from pandas->datasets) (2023.3.post1)\n",
+ "Requirement already satisfied: tzdata>=2022.1 in /opt/conda/lib/python3.10/site-packages (from pandas->datasets) (2023.3)\n",
+ "Requirement already satisfied: markdown-it-py>=2.2.0 in /opt/conda/lib/python3.10/site-packages (from rich->keras_nlp) (3.0.0)\n",
+ "Requirement already satisfied: pygments<3.0.0,>=2.13.0 in /opt/conda/lib/python3.10/site-packages (from rich->keras_nlp) (2.16.1)\n",
+ "Requirement already satisfied: tensorflow-hub>=0.13.0 in /opt/conda/lib/python3.10/site-packages (from tensorflow-text->keras_nlp) (0.14.0)\n",
+ "Collecting tensorflow<2.15,>=2.14.0 (from tensorflow-text->keras_nlp)\n",
+ " Obtaining dependency information for tensorflow<2.15,>=2.14.0 from https://files.pythonhosted.org/packages/e2/7a/c7762c698fb1ac41a7e3afee51dc72aa3ec74ae8d2f57ce19a9cded3a4af/tensorflow-2.14.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata\n",
+ " Downloading tensorflow-2.14.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (4.1 kB)\n",
+ "Requirement already satisfied: mdurl~=0.1 in /opt/conda/lib/python3.10/site-packages (from markdown-it-py>=2.2.0->rich->keras_nlp) (0.1.2)\n",
+ "Requirement already satisfied: astunparse>=1.6.0 in /opt/conda/lib/python3.10/site-packages (from tensorflow<2.15,>=2.14.0->tensorflow-text->keras_nlp) (1.6.3)\n",
+ "Requirement already satisfied: flatbuffers>=23.5.26 in /opt/conda/lib/python3.10/site-packages (from tensorflow<2.15,>=2.14.0->tensorflow-text->keras_nlp) (23.5.26)\n",
+ "Requirement already satisfied: gast!=0.5.0,!=0.5.1,!=0.5.2,>=0.2.1 in /opt/conda/lib/python3.10/site-packages (from tensorflow<2.15,>=2.14.0->tensorflow-text->keras_nlp) (0.4.0)\n",
+ "Requirement already satisfied: google-pasta>=0.1.1 in /opt/conda/lib/python3.10/site-packages (from tensorflow<2.15,>=2.14.0->tensorflow-text->keras_nlp) (0.2.0)\n",
+ "Requirement already satisfied: libclang>=13.0.0 in /opt/conda/lib/python3.10/site-packages (from tensorflow<2.15,>=2.14.0->tensorflow-text->keras_nlp) (16.0.6)\n",
+ "Collecting ml-dtypes==0.2.0 (from tensorflow<2.15,>=2.14.0->tensorflow-text->keras_nlp)\n",
+ " Obtaining dependency information for ml-dtypes==0.2.0 from https://files.pythonhosted.org/packages/d1/1d/d5cf76e5e40f69dbd273036e3172ae4a614577cb141673427b80cac948df/ml_dtypes-0.2.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata\n",
+ " Downloading ml_dtypes-0.2.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (20 kB)\n",
+ "Requirement already satisfied: opt-einsum>=2.3.2 in /opt/conda/lib/python3.10/site-packages (from tensorflow<2.15,>=2.14.0->tensorflow-text->keras_nlp) (3.3.0)\n",
+ "Requirement already satisfied: protobuf!=4.21.0,!=4.21.1,!=4.21.2,!=4.21.3,!=4.21.4,!=4.21.5,<5.0.0dev,>=3.20.3 in /opt/conda/lib/python3.10/site-packages (from tensorflow<2.15,>=2.14.0->tensorflow-text->keras_nlp) (3.20.3)\n",
+ "Requirement already satisfied: setuptools in /opt/conda/lib/python3.10/site-packages (from tensorflow<2.15,>=2.14.0->tensorflow-text->keras_nlp) (68.2.2)\n",
+ "Requirement already satisfied: termcolor>=1.1.0 in /opt/conda/lib/python3.10/site-packages (from tensorflow<2.15,>=2.14.0->tensorflow-text->keras_nlp) (2.3.0)\n",
+ "Requirement already satisfied: wrapt<1.15,>=1.11.0 in /opt/conda/lib/python3.10/site-packages (from tensorflow<2.15,>=2.14.0->tensorflow-text->keras_nlp) (1.14.1)\n",
+ "Requirement already satisfied: tensorflow-io-gcs-filesystem>=0.23.1 in /opt/conda/lib/python3.10/site-packages (from tensorflow<2.15,>=2.14.0->tensorflow-text->keras_nlp) (0.31.0)\n",
+ "Requirement already satisfied: grpcio<2.0,>=1.24.3 in /opt/conda/lib/python3.10/site-packages (from tensorflow<2.15,>=2.14.0->tensorflow-text->keras_nlp) (1.48.0)\n",
+ "Collecting tensorboard<2.15,>=2.14 (from tensorflow<2.15,>=2.14.0->tensorflow-text->keras_nlp)\n",
+ " Obtaining dependency information for tensorboard<2.15,>=2.14 from https://files.pythonhosted.org/packages/73/a2/66ed644f6ed1562e0285fcd959af17670ea313c8f331c46f79ee77187eb9/tensorboard-2.14.1-py3-none-any.whl.metadata\n",
+ " Downloading tensorboard-2.14.1-py3-none-any.whl.metadata (1.7 kB)\n",
+ "Collecting tensorflow-estimator<2.15,>=2.14.0 (from tensorflow<2.15,>=2.14.0->tensorflow-text->keras_nlp)\n",
+ " Obtaining dependency information for tensorflow-estimator<2.15,>=2.14.0 from https://files.pythonhosted.org/packages/d1/da/4f264c196325bb6e37a6285caec5b12a03def489b57cc1fdac02bb6272cd/tensorflow_estimator-2.14.0-py2.py3-none-any.whl.metadata\n",
+ " Downloading tensorflow_estimator-2.14.0-py2.py3-none-any.whl.metadata (1.3 kB)\n",
+ "Collecting keras<2.15,>=2.14.0 (from tensorflow<2.15,>=2.14.0->tensorflow-text->keras_nlp)\n",
+ " Obtaining dependency information for keras<2.15,>=2.14.0 from https://files.pythonhosted.org/packages/fe/58/34d4d8f1aa11120c2d36d7ad27d0526164b1a8ae45990a2fede31d0e59bf/keras-2.14.0-py3-none-any.whl.metadata\n",
+ " Downloading keras-2.14.0-py3-none-any.whl.metadata (2.4 kB)\n",
+ "Requirement already satisfied: wheel<1.0,>=0.23.0 in /opt/conda/lib/python3.10/site-packages (from astunparse>=1.6.0->tensorflow<2.15,>=2.14.0->tensorflow-text->keras_nlp) (0.41.2)\n",
+ "Collecting grpcio<2.0,>=1.24.3 (from tensorflow<2.15,>=2.14.0->tensorflow-text->keras_nlp)\n",
+ " Obtaining dependency information for grpcio<2.0,>=1.24.3 from https://files.pythonhosted.org/packages/29/cc/e6883efbbcaa6570a0d2207ba53c796137f11293e47d11e2696f37b66811/grpcio-1.59.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata\n",
+ " Downloading grpcio-1.59.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (4.0 kB)\n",
+ "Requirement already satisfied: google-auth<3,>=1.6.3 in /opt/conda/lib/python3.10/site-packages (from tensorboard<2.15,>=2.14->tensorflow<2.15,>=2.14.0->tensorflow-text->keras_nlp) (2.23.0)\n",
+ "Requirement already satisfied: google-auth-oauthlib<1.1,>=0.5 in /opt/conda/lib/python3.10/site-packages (from tensorboard<2.15,>=2.14->tensorflow<2.15,>=2.14.0->tensorflow-text->keras_nlp) (1.0.0)\n",
+ "Requirement already satisfied: markdown>=2.6.8 in /opt/conda/lib/python3.10/site-packages (from tensorboard<2.15,>=2.14->tensorflow<2.15,>=2.14.0->tensorflow-text->keras_nlp) (3.4.4)\n",
+ "Requirement already satisfied: tensorboard-data-server<0.8.0,>=0.7.0 in /opt/conda/lib/python3.10/site-packages (from tensorboard<2.15,>=2.14->tensorflow<2.15,>=2.14.0->tensorflow-text->keras_nlp) (0.7.1)\n",
+ "Requirement already satisfied: werkzeug>=1.0.1 in /opt/conda/lib/python3.10/site-packages (from tensorboard<2.15,>=2.14->tensorflow<2.15,>=2.14.0->tensorflow-text->keras_nlp) (2.1.2)\n",
+ "Requirement already satisfied: cachetools<6.0,>=2.0.0 in /opt/conda/lib/python3.10/site-packages (from google-auth<3,>=1.6.3->tensorboard<2.15,>=2.14->tensorflow<2.15,>=2.14.0->tensorflow-text->keras_nlp) (4.2.4)\n",
+ "Requirement already satisfied: pyasn1-modules>=0.2.1 in /opt/conda/lib/python3.10/site-packages (from google-auth<3,>=1.6.3->tensorboard<2.15,>=2.14->tensorflow<2.15,>=2.14.0->tensorflow-text->keras_nlp) (0.3.0)\n",
+ "Requirement already satisfied: rsa<5,>=3.1.4 in /opt/conda/lib/python3.10/site-packages (from google-auth<3,>=1.6.3->tensorboard<2.15,>=2.14->tensorflow<2.15,>=2.14.0->tensorflow-text->keras_nlp) (4.9)\n",
+ "Requirement already satisfied: requests-oauthlib>=0.7.0 in /opt/conda/lib/python3.10/site-packages (from google-auth-oauthlib<1.1,>=0.5->tensorboard<2.15,>=2.14->tensorflow<2.15,>=2.14.0->tensorflow-text->keras_nlp) (1.3.1)\n",
+ "Requirement already satisfied: pyasn1<0.6.0,>=0.4.6 in /opt/conda/lib/python3.10/site-packages (from pyasn1-modules>=0.2.1->google-auth<3,>=1.6.3->tensorboard<2.15,>=2.14->tensorflow<2.15,>=2.14.0->tensorflow-text->keras_nlp) (0.5.0)\n",
+ "Requirement already satisfied: oauthlib>=3.0.0 in /opt/conda/lib/python3.10/site-packages (from requests-oauthlib>=0.7.0->google-auth-oauthlib<1.1,>=0.5->tensorboard<2.15,>=2.14->tensorflow<2.15,>=2.14.0->tensorflow-text->keras_nlp) (3.2.2)\n",
+ "Downloading transformers-4.34.1-py3-none-any.whl (7.7 MB)\n",
+ "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m7.7/7.7 MB\u001b[0m \u001b[31m98.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m:00:01\u001b[0m00:01\u001b[0m\n",
+ "\u001b[?25hDownloading datasets-2.14.6-py3-none-any.whl (493 kB)\n",
+ "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m493.7/493.7 kB\u001b[0m \u001b[31m57.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+ "\u001b[?25hDownloading evaluate-0.4.1-py3-none-any.whl (84 kB)\n",
+ "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m84.1/84.1 kB\u001b[0m \u001b[31m17.8 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+ "\u001b[?25hDownloading keras_nlp-0.6.2-py3-none-any.whl (590 kB)\n",
+ "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m590.1/590.1 kB\u001b[0m \u001b[31m58.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+ "\u001b[?25hDownloading safetensors-0.4.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB)\n",
+ "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m1.3/1.3 MB\u001b[0m \u001b[31m84.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+ "\u001b[?25hDownloading tokenizers-0.14.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.8 MB)\n",
+ "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m3.8/3.8 MB\u001b[0m \u001b[31m108.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m00:01\u001b[0m\n",
+ "\u001b[?25hDownloading huggingface_hub-0.17.3-py3-none-any.whl (295 kB)\n",
+ "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m295.0/295.0 kB\u001b[0m \u001b[31m46.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+ "\u001b[?25hDownloading multiprocess-0.70.15-py310-none-any.whl (134 kB)\n",
+ "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m134.8/134.8 kB\u001b[0m \u001b[31m25.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+ "\u001b[?25hDownloading dill-0.3.7-py3-none-any.whl (115 kB)\n",
+ "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m115.3/115.3 kB\u001b[0m \u001b[31m23.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+ "\u001b[?25hDownloading tensorflow_text-2.14.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (6.5 MB)\n",
+ "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m6.5/6.5 MB\u001b[0m \u001b[31m108.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m00:01\u001b[0m\n",
+ "\u001b[?25hDownloading xxhash-3.4.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (194 kB)\n",
+ "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m194.1/194.1 kB\u001b[0m \u001b[31m33.8 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+ "\u001b[?25hDownloading tensorflow-2.14.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (489.8 MB)\n",
+ "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m489.8/489.8 MB\u001b[0m \u001b[31m1.2 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m:00:01\u001b[0m00:01\u001b[0m\n",
+ "\u001b[?25hDownloading ml_dtypes-0.2.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.0 MB)\n",
+ "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m1.0/1.0 MB\u001b[0m \u001b[31m69.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+ "\u001b[?25hDownloading keras-2.14.0-py3-none-any.whl (1.7 MB)\n",
+ "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m1.7/1.7 MB\u001b[0m \u001b[31m89.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+ "\u001b[?25hDownloading tensorboard-2.14.1-py3-none-any.whl (5.5 MB)\n",
+ "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m5.5/5.5 MB\u001b[0m \u001b[31m113.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m00:01\u001b[0m\n",
+ "\u001b[?25hDownloading grpcio-1.59.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (5.3 MB)\n",
+ "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m5.3/5.3 MB\u001b[0m \u001b[31m115.8 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m00:01\u001b[0m\n",
+ "\u001b[?25hDownloading tensorflow_estimator-2.14.0-py2.py3-none-any.whl (440 kB)\n",
+ "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m440.7/440.7 kB\u001b[0m \u001b[31m53.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+ "\u001b[?25hBuilding wheels for collected packages: rouge_score\n",
+ " Building wheel for rouge_score (setup.py) ... \u001b[?25ldone\n",
+ "\u001b[?25h Created wheel for rouge_score: filename=rouge_score-0.1.2-py3-none-any.whl size=24932 sha256=57aa40a32d8d9171d43b9bc47cc3472fac0fb1192aa80eba9defb8e4ffd2352a\n",
+ " Stored in directory: /home/jupyter/.cache/pip/wheels/5f/dd/89/461065a73be61a532ff8599a28e9beef17985c9e9c31e541b4\n",
+ "Successfully built rouge_score\n",
+ "Installing collected packages: xxhash, tensorflow-estimator, safetensors, nltk, ml-dtypes, keras, grpcio, dill, rouge_score, responses, multiprocess, huggingface-hub, tokenizers, transformers, tensorboard, datasets, tensorflow, evaluate, tensorflow-text, keras_nlp\n",
+ " Attempting uninstall: tensorflow-estimator\n",
+ " Found existing installation: tensorflow-estimator 2.12.0\n",
+ " Uninstalling tensorflow-estimator-2.12.0:\n",
+ " Successfully uninstalled tensorflow-estimator-2.12.0\n",
+ " Attempting uninstall: ml-dtypes\n",
+ " Found existing installation: ml-dtypes 0.3.1\n",
+ " Uninstalling ml-dtypes-0.3.1:\n",
+ " Successfully uninstalled ml-dtypes-0.3.1\n",
+ " Attempting uninstall: keras\n",
+ " Found existing installation: keras 2.12.0\n",
+ " Uninstalling keras-2.12.0:\n",
+ " Successfully uninstalled keras-2.12.0\n",
+ " Attempting uninstall: grpcio\n",
+ " Found existing installation: grpcio 1.48.0\n",
+ " Uninstalling grpcio-1.48.0:\n",
+ " Successfully uninstalled grpcio-1.48.0\n",
+ " Attempting uninstall: dill\n",
+ " Found existing installation: dill 0.3.1.1\n",
+ " Uninstalling dill-0.3.1.1:\n",
+ " Successfully uninstalled dill-0.3.1.1\n",
+ " Attempting uninstall: tensorboard\n",
+ " Found existing installation: tensorboard 2.12.3\n",
+ " Uninstalling tensorboard-2.12.3:\n",
+ " Successfully uninstalled tensorboard-2.12.3\n",
+ " Attempting uninstall: tensorflow\n",
+ " Found existing installation: tensorflow 2.12.0\n",
+ " Uninstalling tensorflow-2.12.0:\n",
+ " Successfully uninstalled tensorflow-2.12.0\n",
+ "\u001b[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.\n",
+ "apache-beam 2.46.0 requires dill<0.3.2,>=0.3.1.1, but you have dill 0.3.7 which is incompatible.\u001b[0m\u001b[31m\n",
+ "\u001b[0mSuccessfully installed datasets-2.14.6 dill-0.3.7 evaluate-0.4.1 grpcio-1.58.0 huggingface-hub-0.17.3 keras-2.14.0 keras_nlp-0.6.2 ml-dtypes-0.2.0 multiprocess-0.70.15 nltk-3.8.1 responses-0.18.0 rouge_score-0.1.2 safetensors-0.4.0 tensorboard-2.14.1 tensorflow-2.14.0 tensorflow-estimator-2.14.0 tensorflow-text-2.14.0 tokenizers-0.14.1 transformers-4.34.1 xxhash-3.4.1\n"
+ ]
+ }
+ ],
+ "source": [
+ "!pip install \"transformers\" \"datasets\" \"rouge_score\" \"evaluate\" \"keras_nlp\""
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "e9b287a7-3035-48c1-8520-a7d3de4f925b",
+ "metadata": {},
+ "source": [
+ "## Download your dataset from Hugging Face"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "dd58ed15-7e8f-4f24-b68f-4a5b5fcd5cf2",
+ "metadata": {},
+ "source": [
+ "We will be downloading Hugging Face dataset 'ccdv/pubmed-summarization' which contains the full article and their abstracts which will help train our model to summarize scientific articles. Once the dataset is loaded we'll split the data into train, test, and validation datasets. Since these are large datasets we will only be using 5% of dataset to help our process run faster."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 211,
+ "id": "8f59e17e-c006-45ee-be0b-766774f9d420",
+ "metadata": {
+ "scrolled": true,
+ "tags": []
+ },
+ "outputs": [],
+ "source": [
+ "from datasets import load_dataset\n",
+ "\n",
+ "# load dataset\n",
+ "train, test, validation = load_dataset(\"ccdv/pubmed-summarization\", split=[\"train[:5%]\", \"test[:5%]\", \"validation[:5%]\" ])"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "51d2756c-d280-4aec-a6fb-416dd91c3ba5",
+ "metadata": {},
+ "source": [
+ "Lets list the feaures of one of our datasets to determine what we will need to tokenize in a later step. this dataset features are 'article' and 'abstract'"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 114,
+ "id": "760c9128-793a-4bed-a127-b92ef496e33b",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Dataset({\n",
+ " features: ['article', 'abstract'],\n",
+ " num_rows: 5996\n",
+ "})\n"
+ ]
+ }
+ ],
+ "source": [
+ "print(train)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "3446d1b6-af0c-4127-93c0-c47fde142546",
+ "metadata": {},
+ "source": [
+ "## Finetuning our Model Locally"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "ed6ddff1-2636-4e3b-88ee-e3c86c584245",
+ "metadata": {},
+ "source": [
+ "Now that we have our datasets we can upload our model which will be the small version of Flan T5."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "b112574b-0e33-4c31-b12a-f1839024ea44",
+ "metadata": {},
+ "source": [
+ "\n",
+ "**Flan T5** is a text-to-text generation model and an advancement to the original T5 model and can be run on both CPUs and GPUs. **Text-to-text** is a method of creating text by using a neural network to generate new text from a given input. These T5 models can be fine-tuned for various zero shot NLP tasks that we have seen and heard of before: text classification, summarization, translation, and question-answering. Text-to-text is not to be confused by text2text generation which is a earlier version of T5 that is designed specifically for sequence-to-sequence tasks, such as machine translation and text generation and is limited to these task where as T5 models are more flexible due to the wider range of NPL tasks they can execute.\n",
+ "\n",
+ "Because it is a seq2seq class model we will be using the transformer **TFAutoModelForSeq2Seq** (specifically for tensorflow models) to help find a load our pretrained model architecture. Then we will assign an **AutoTokenizer** to preprocess the text of our inputs (the test, train, validation datasets) into an array of numbers."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 185,
+ "id": "bfd433b3-9790-4a10-ac08-6c90c194d8b0",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "#model name\n",
+ "CHECKPOINT = \"google/flan-t5-small\""
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 184,
+ "id": "1988cbcb-4bec-4aa2-a356-a211584ceacb",
+ "metadata": {
+ "scrolled": true,
+ "tags": []
+ },
+ "outputs": [
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "2023-11-03 15:13:42.327557: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered\n",
+ "2023-11-03 15:13:42.327603: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered\n",
+ "2023-11-03 15:13:42.327636: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered\n",
+ "2023-11-03 15:13:42.336037: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.\n",
+ "To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.\n",
+ "2023-11-03 15:13:44.543851: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:894] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355\n",
+ "2023-11-03 15:13:44.554372: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:894] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355\n",
+ "2023-11-03 15:13:44.557202: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:894] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355\n",
+ "2023-11-03 15:13:44.560698: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:894] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355\n",
+ "2023-11-03 15:13:44.563540: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:894] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355\n",
+ "2023-11-03 15:13:44.566113: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:894] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355\n",
+ "2023-11-03 15:13:45.308267: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:894] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355\n",
+ "2023-11-03 15:13:45.310177: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:894] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355\n",
+ "2023-11-03 15:13:45.311838: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:894] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355\n",
+ "2023-11-03 15:13:45.313437: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1886] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 13589 MB memory: -> device: 0, name: Tesla T4, pci bus id: 0000:00:04.0, compute capability: 7.5\n",
+ "/opt/conda/lib/python3.10/site-packages/keras/src/initializers/initializers.py:120: UserWarning: The initializer RandomNormal is unseeded and being called multiple times, which will return identical values each time (even if the initializer is unseeded). Please update your code to provide a seed to the initializer, or avoid using the same initializer instance more than once.\n",
+ " warnings.warn(\n",
+ "All PyTorch model weights were used when initializing TFT5ForConditionalGeneration.\n",
+ "\n",
+ "All the weights of TFT5ForConditionalGeneration were initialized from the PyTorch model.\n",
+ "If your task is similar to the task the model of the checkpoint was trained on, you can already use TFT5ForConditionalGeneration for predictions without further training.\n"
+ ]
+ }
+ ],
+ "source": [
+ "from transformers import TFAutoModelForSeq2SeqLM, AutoTokenizer\n",
+ "\n",
+ "model = TFAutoModelForSeq2SeqLM.from_pretrained(CHECKPOINT)\n",
+ "tokenizer = AutoTokenizer.from_pretrained(CHECKPOINT)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "f6ca0419-0075-4f62-becf-b859312cea22",
+ "metadata": {},
+ "source": [
+ "Now that we have loaded the architecture of our model and configured it to tokenize our inputs we can now implement a tokenization functions to start processing our datasets.\n",
+ "Since we are using a T5 model we will have to prefix the inputs with \"summarize:\" to know which task to perform. We create a preprocess function to append the prefix to each row within the \"article\" column of our dataset labeling them as inputs. The inputs are then tokenized, limited by a set max length, and truncated.\n",
+ "\n",
+ "A similar process is done for the \"abstract\" column within our dataset except we do not add the prefix and we labels them as **labels**.\n",
+ "\n",
+ "**What is Truncating?**\n",
+ "\n",
+ "Our group of inputs or batch will usually be different lengths which makes it hard to be converted to fixed-size tensors. To fix this problem **truncation** removes tokens ensure longer sequences will have the same length as the longest sequence in the batch which we have set to be **1024** for our inputs and **128** for our labels.\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 212,
+ "id": "f101c309-f214-4b3f-b77b-d55491e48a59",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "prefix = \"summarize: \"\n",
+ "\n",
+ "def preprocess_function(examples):\n",
+ " inputs = [prefix + doc for doc in examples[\"article\"]]\n",
+ " model_inputs = tokenizer(inputs, max_length=1024, truncation=True)\n",
+ "\n",
+ " labels = tokenizer(text_target=\n",
+ " examples[\"abstract\"], max_length=128, truncation=True\n",
+ " )\n",
+ "\n",
+ " model_inputs[\"labels\"] = labels[\"input_ids\"]\n",
+ "\n",
+ " return model_inputs"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "491efc2a-1679-4044-9f53-4aff27329856",
+ "metadata": {},
+ "source": [
+ "Now that we have our tokenized function the next step is to implement the **map** function to iterate the function **preprocess_function** over our loaded datasets."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 210,
+ "id": "5e58eb58-a655-4e2b-8665-b4b770bc87a7",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "tokenized_train = train.map(preprocess_function, batched=True)\n",
+ "\n",
+ "#tokenized_test = test.map(preprocess_function, batched=True)\n",
+ "\n",
+ "#tokenized_validation = validation.map(preprocess_function, batched=True)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "41bc1576-44dd-4604-b5db-6c57b711096a",
+ "metadata": {},
+ "source": [
+ "Lets look at the structure of one of our new tokenized datasets you should see 3 new features (**'input_ids', 'attention_mask', 'labels'**) making 5 features total:\n",
+ "\n",
+ "- **input_ids:** As our inputs are being tokenized an ID is assigned for each token, meaning as each text is broken up into sequences (which can be words or subwords) and converted to tokens within our dataset they are assign an ID.\n",
+ "- **attention_masks:** Tokens that should be ignored by the model usually represented by a 0. Masking can be done when some sequences are not the same length so they can not belong in the same tensor and need to be padded.\n",
+ "- **labels:** The new name of the abstract column that has been tokenized."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 11,
+ "id": "80a25bc8-00db-4b8d-9b68-d52c5d6ca7fe",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Dataset({\n",
+ " features: ['article', 'abstract', 'input_ids', 'attention_mask', 'labels'],\n",
+ " num_rows: 5996\n",
+ "})\n"
+ ]
+ }
+ ],
+ "source": [
+ "print(tokenized_train)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "285d972c-6aa3-4072-a405-86e7dd82904e",
+ "metadata": {},
+ "source": [
+ "DataCollators are objects that dynamically pads the inputs and the labels in our batches, reverse to truncating **padding** adds a special padding token to ensure shorter sequences will have the same length as the longest sequence in the batch which a gain we set in out preprocess_function."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 12,
+ "id": "875ef33d-5ef3-4b07-b1de-6d471743a8ad",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "from transformers import DataCollatorForSeq2Seq\n",
+ "data_collator = DataCollatorForSeq2Seq(tokenizer=tokenizer, model=CHECKPOINT, return_tensors=\"tf\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "1b48b037-d666-417b-b28e-88b715a1083c",
+ "metadata": {},
+ "source": [
+ "Then the last step will be to set our data format to be suitable for Tensorflow using the function **'prepare_tf_dataset()'** by automatically inspecting your model and keep only the features that are necessary. As you can see there are only 2 of our features left represented in the dataset: **input_ids and attention_mask**."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 13,
+ "id": "6fcac5a8-912f-461f-bfab-990e472c01ca",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "You're using a T5TokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.\n"
+ ]
+ }
+ ],
+ "source": [
+ "tf_train_set = model.prepare_tf_dataset(\n",
+ " tokenized_train,\n",
+ " shuffle=True,\n",
+ " batch_size=10,\n",
+ " collate_fn=data_collator,\n",
+ " \n",
+ ")\n",
+ "\n",
+ "tf_test_set = model.prepare_tf_dataset(\n",
+ " tokenized_test,\n",
+ " shuffle=False,\n",
+ " batch_size=10,\n",
+ " collate_fn=data_collator,\n",
+ " \n",
+ ")\n",
+ "\n",
+ "tf_validation_set = model.prepare_tf_dataset(\n",
+ " tokenized_validation,\n",
+ " shuffle=False,\n",
+ " batch_size=10,\n",
+ " collate_fn=data_collator,\n",
+ " \n",
+ ")"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 14,
+ "id": "11aaf028-c713-4064-84cc-f699df3151ec",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "<_PrefetchDataset element_spec=({'input_ids': TensorSpec(shape=(10, None), dtype=tf.int64, name=None), 'attention_mask': TensorSpec(shape=(10, None), dtype=tf.int64, name=None)}, TensorSpec(shape=(10, None), dtype=tf.int64, name=None))>\n"
+ ]
+ }
+ ],
+ "source": [
+ "print (tf_train_set)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "f7ad7839-4967-40fb-aa26-39cea71fa085",
+ "metadata": {},
+ "source": [
+ "**Learning rate** controls how much the model will change in response to the estimated error each time the model weights are updated. Too small of a learning rate could result very slow training process that could eventually get stuck, whereas a value too large may result in an unstable training process. Setting the **weight decay** helps to avoid overfitting, weights small, and avoid exploding gradient. "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 15,
+ "id": "50ed6068-763a-46a6-8aed-4862f84413a9",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "from transformers import AdamWeightDecay\n",
+ "optimizer = AdamWeightDecay(learning_rate=2e-5, weight_decay_rate=0.01)\n",
+ "model.compile(optimizer=optimizer)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "cd6a770a-e149-4970-a915-1362f544ef40",
+ "metadata": {},
+ "source": [
+ "Using the function metric_fn will help us calculate the **ROUGE** score between the ground-truth and predictions while training. ROUGE stands for **Recall-Oriented Understudy for Gisting Evaluation** this metric compares a reference sentence with what our model produces see if there is overlap if there is it calculates the precision and recall using the overlap.\n",
+ "\n",
+ "As an example say our model produced a sentence like so:\n",
+ "\n",
+ "**'the cat was found under the bed'**\n",
+ "\n",
+ "but the reference sentence normally written by a human is:\n",
+ "\n",
+ "**'the cat was under the bed'**"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 16,
+ "id": "f9c6b45b-7349-4965-938b-3a334ced3882",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Using TensorFlow backend\n"
+ ]
+ }
+ ],
+ "source": [
+ "import keras_nlp\n",
+ "\n",
+ "rouge_l = keras_nlp.metrics.RougeL()\n",
+ "\n",
+ "\n",
+ "def metric_fn(eval_predictions):\n",
+ " predictions, labels = eval_predictions\n",
+ " decoded_predictions = tokenizer.batch_decode(predictions, skip_special_tokens=True)\n",
+ " for label in labels:\n",
+ " label[label < 0] = tokenizer.pad_token_id # Replace masked label tokens\n",
+ " decoded_labels = tokenizer.batch_decode(labels, skip_special_tokens=True)\n",
+ " result = rouge_l(decoded_labels, decoded_predictions)\n",
+ " # We will print only the F1 score, you can use other aggregation metrics as well\n",
+ " result = {\"RougeL\": result[\"f1_score\"]}\n",
+ "\n",
+ " return result"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "b57e9aa9-9d7c-41d0-8ab1-5d79277c969d",
+ "metadata": {},
+ "source": [
+ "We will use the validation dataset for calculating our ROUGE score. While our ROUGE score is being calculated and our training is running its best to set up a **callback system**. A callback is an object that can perform actions at various stages of training and helps to write logs after every batch of training to monitor your metrics, periodically save your model to disk, and if need be do early stopping. Here we are using Keras call back system."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 17,
+ "id": "06e014b1-e9d2-4d9f-a149-c6c0381f7407",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "from transformers.keras_callbacks import KerasMetricCallback\n",
+ "metric_callback = KerasMetricCallback(\n",
+ " metric_fn, eval_dataset=tf_validation_set, predict_with_generate=True, use_xla_generation=True)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "cd1b16ea-0e8a-4c78-875e-e3edea6cf043",
+ "metadata": {},
+ "source": [
+ "Before we start to train our model the last step will be to set how many batches of training we should do, the number of iterations is called **epochs**, we will set ours to 3. Now we can start to train our model using the function **'fit'** and save our artifacts to a directory. The artifact that holds our model will be a file named **tf_model.h5**. "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 18,
+ "id": "b8fd0c64-4d85-4b5e-86fe-538c7dc65da7",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Epoch 1/3\n",
+ "599/599 [==============================] - ETA: 0s - loss: 2.5073"
+ ]
+ },
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "/opt/conda/lib/python3.10/site-packages/tensorflow/python/autograph/impl/api.py:371: UserWarning: Using the model-agnostic default `max_length` (=20) to control the generation length. recommend setting `max_new_tokens` to control the maximum length of the generation.\n",
+ " return py_builtins.overload_of(f)(*args)\n",
+ "2023-11-02 13:09:59.053088: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55faf0d80f50 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:\n",
+ "2023-11-02 13:09:59.053132: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Tesla T4, Compute Capability 7.5\n",
+ "2023-11-02 13:10:00.019242: W tensorflow/compiler/tf2xla/kernels/assert_op.cc:38] Ignoring Assert operator shared/assert_less/Assert/Assert\n",
+ "2023-11-02 13:10:00.163195: I tensorflow/compiler/mlir/tensorflow/utils/dump_mlir_util.cc:269] disabling MLIR crash reproducer, set env var `MLIR_CRASH_REPRODUCER_DIRECTORY` to enable.\n",
+ "2023-11-02 13:10:00.714302: W tensorflow/compiler/tf2xla/kernels/assert_op.cc:38] Ignoring Assert operator shared/assert_less_1/Assert/Assert\n",
+ "2023-11-02 13:10:01.396732: W tensorflow/compiler/tf2xla/kernels/assert_op.cc:38] Ignoring Assert operator shared/assert_less/Assert/Assert\n",
+ "2023-11-02 13:10:02.853947: I tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:442] Loaded cuDNN version 8900\n",
+ "warning: Linking two modules of different target triples: 'LLVMDialectModule' is 'nvptx64-nvidia-gpulibs' whereas '' is 'nvptx64-nvidia-cuda'\n",
+ "\n",
+ "warning: Linking two modules of different target triples: 'LLVMDialectModule' is 'nvptx64-nvidia-gpulibs' whereas '' is 'nvptx64-nvidia-cuda'\n",
+ "\n",
+ "warning: Linking two modules of different target triples: 'LLVMDialectModule' is 'nvptx64-nvidia-gpulibs' whereas '' is 'nvptx64-nvidia-cuda'\n",
+ "\n",
+ "2023-11-02 13:10:12.362168: I ./tensorflow/compiler/jit/device_compiler.h:186] Compiled cluster using XLA! This line is logged at most once for the lifetime of the process.\n",
+ "2023-11-02 13:10:25.996665: W tensorflow/compiler/tf2xla/kernels/assert_op.cc:38] Ignoring Assert operator shared/assert_less/Assert/Assert\n",
+ "2023-11-02 13:10:26.174121: W tensorflow/compiler/tf2xla/kernels/assert_op.cc:38] Ignoring Assert operator shared/assert_less_1/Assert/Assert\n",
+ "2023-11-02 13:10:26.666553: W tensorflow/compiler/tf2xla/kernels/assert_op.cc:38] Ignoring Assert operator shared/assert_less/Assert/Assert\n",
+ "warning: Linking two modules of different target triples: 'LLVMDialectModule' is 'nvptx64-nvidia-gpulibs' whereas '' is 'nvptx64-nvidia-cuda'\n",
+ "\n",
+ "warning: Linking two modules of different target triples: 'LLVMDialectModule' is 'nvptx64-nvidia-gpulibs' whereas '' is 'nvptx64-nvidia-cuda'\n",
+ "\n",
+ "warning: Linking two modules of different target triples: 'LLVMDialectModule' is 'nvptx64-nvidia-gpulibs' whereas '' is 'nvptx64-nvidia-cuda'\n",
+ "\n"
+ ]
+ },
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "599/599 [==============================] - 731s 1s/step - loss: 2.5073 - val_loss: 2.0886 - RougeL: 0.1196\n",
+ "Epoch 2/3\n",
+ "599/599 [==============================] - 662s 1s/step - loss: 2.3710 - val_loss: 2.0231 - RougeL: 0.1191\n",
+ "Epoch 3/3\n",
+ "599/599 [==============================] - 662s 1s/step - loss: 2.3102 - val_loss: 1.9996 - RougeL: 0.1172\n"
+ ]
+ }
+ ],
+ "source": [
+ "model.fit(x=tf_train_set, validation_data=tf_test_set, epochs=3, callbacks=metric_callback)\n",
+ "\n",
+ "model.save_pretrained('saved_model')"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "d1c35de0-4cdf-4c2b-9c0d-c2272b71b362",
+ "metadata": {},
+ "source": [
+ "## Testing the Model"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "31f7f62a-ca17-4dcc-b25d-825572ee1630",
+ "metadata": {},
+ "source": [
+ "Here we will use a sample text that we want our model to summarize."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 98,
+ "id": "980d0053-0c3b-4d0d-91b9-9dd6e6dd3e64",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "text = \"Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is a \\\n",
+ "highly transmissible and pathogenic coronavirus that emerged in late 2019 and has \\\n",
+ "caused a pandemic of acute respiratory disease, named ‘coronavirus disease 2019’ (COVID-19), \\\n",
+ "which threatens human health and public safety. In this Review, we describe the basic virology of \\\n",
+ "SARS-CoV-2, including genomic characteristics and receptor use, highlighting its key difference \\\n",
+ "from previously known coronaviruses. We summarize current knowledge of clinical, epidemiological and \\\n",
+ "pathological features of COVID-19, as well as recent progress in animal models and antiviral treatment \\\n",
+ "approaches for SARS-CoV-2 infection. We also discuss the potential wildlife hosts and zoonotic origin \\\n",
+ "of this emerging virus in detail.\""
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "3156ef96-20ab-46ff-ae76-57d6ea54f0ff",
+ "metadata": {},
+ "source": [
+ "To predict the following tokenizes the text to gather the inputs, then uses **generate()** generate sequences of token ids for our model. We then decode our output to translate our tokenized output into text."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "cb34d7c2-a815-4c9d-bd8b-495aca8b2d02",
+ "metadata": {},
+ "source": [
+ "Below you will see that we have provided a paragraph about SARS-CoV-2 as our output, we also have some parameters that we specify to further tune our model to get a concise summary of what our text is about.\n",
+ "\n",
+ "- **Max_Length:** Max number of words to generate.\n",
+ "- **Num_Return_Sequences:** Number of different outputs to generate. For our example we want one sentence or sequence.\n",
+ "- **Temperature:** Controls randomness, higher values increase diversity meaning a more unique response make the model to think harder. Must be a number from 0 to 1.\n",
+ "- **Top_p (nucleus):** The cumulative probability cutoff for token selection. Lower values mean sampling from a smaller, more top-weighted nucleus. Must be a number from 0 to 1.\n",
+ "- **Top_k**: Sample from the k most likely next tokens at each step. Lower k focuses on higher probability tokens. This means the model choses the most probable words. Lower values eliminate fewer coherent words."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 101,
+ "id": "fc2206c7-1bbf-41eb-8c63-abb17752d00d",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "All model checkpoint layers were used when initializing TFT5ForConditionalGeneration.\n",
+ "\n",
+ "All the layers of TFT5ForConditionalGeneration were initialized from the model checkpoint at saved_model.\n",
+ "If your task is similar to the task the model of the checkpoint was trained on, you can already use TFT5ForConditionalGeneration for predictions without further training.\n"
+ ]
+ },
+ {
+ "data": {
+ "text/plain": [
+ "'We describe the basic virology of Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) and its role in preventing the pandemic of acute respiratory disease, named ‘coronavirus disease 2019’ (COVID-19), which threatens human health and public safety.'"
+ ]
+ },
+ "execution_count": 101,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "from transformers import AutoTokenizer\n",
+ "\n",
+ "tokenizer = AutoTokenizer.from_pretrained(CHECKPOINT)\n",
+ "inputs = tokenizer.encode(text, return_tensors=\"tf\")\n",
+ "\n",
+ "from transformers import TFAutoModelForSeq2SeqLM\n",
+ "\n",
+ "model = TFAutoModelForSeq2SeqLM.from_pretrained(\"saved_model\")\n",
+ "\n",
+ "outputs = model.generate(inputs, \n",
+ " max_length=1000,\n",
+ " num_return_sequences = 1,\n",
+ " do_sample=True, \n",
+ " temperature = 0.6,\n",
+ " top_k = 50, \n",
+ " top_p = 0.95,)\n",
+ "\n",
+ "tokenizer.decode(outputs[0], skip_special_tokens=True)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "af0f0064-d9b4-430c-90ae-d2e390d1b78c",
+ "metadata": {},
+ "source": [
+ "### Optional: Summarizing PDF Files"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "1abe4b0b-8ee2-48f1-afed-0cd2796717a2",
+ "metadata": {},
+ "source": [
+ "The process of summarizing scientific PDF files is relatively the same except that we first need to extract the text from the PDF. To do so lets download a PDF file from PubMed."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 94,
+ "id": "d69aa008-80c0-4a19-aa0f-8f5798673c47",
+ "metadata": {
+ "scrolled": true,
+ "tags": []
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "--2023-11-02 20:07:00-- https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7784226/pdf/12248_2020_Article_532.pdf\n",
+ "Resolving www.ncbi.nlm.nih.gov (www.ncbi.nlm.nih.gov)... 130.14.29.110, 2607:f220:41e:4290::110\n",
+ "Connecting to www.ncbi.nlm.nih.gov (www.ncbi.nlm.nih.gov)|130.14.29.110|:443... connected.\n",
+ "HTTP request sent, awaiting response... 200 OK\n",
+ "Length: 5757370 (5.5M) [application/pdf]\n",
+ "Saving to: ‘12248_2020_Article_532.pdf’\n",
+ "\n",
+ "12248_2020_Article_ 100%[===================>] 5.49M 7.25MB/s in 0.8s \n",
+ "\n",
+ "2023-11-02 20:07:01 (7.25 MB/s) - ‘12248_2020_Article_532.pdf’ saved [5757370/5757370]\n",
+ "\n"
+ ]
+ }
+ ],
+ "source": [
+ "! wget --user-agent=\"Chrome\" https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7784226/pdf/12248_2020_Article_532.pdf"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "6555b048-a76c-4f10-b13c-4ac93f436fe8",
+ "metadata": {},
+ "source": [
+ "We'll be downloading some tools that help us extract only the text from our pdf file."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 30,
+ "id": "1347b3cd-5ce0-44c9-864d-a688bcacb1d0",
+ "metadata": {
+ "scrolled": true,
+ "tags": []
+ },
+ "outputs": [
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...\n",
+ "To disable this warning, you can either:\n",
+ "\t- Avoid using `tokenizers` before the fork if possible\n",
+ "\t- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)\n"
+ ]
+ },
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Requirement already satisfied: fitz in /opt/conda/lib/python3.10/site-packages (0.0.1.dev2)\n",
+ "Collecting PyMuPDF\n",
+ " Obtaining dependency information for PyMuPDF from https://files.pythonhosted.org/packages/41/4a/530017aaf0a554aa6d9abd547932a02c0188962d12122fe611bf7a6d0c26/PyMuPDF-1.23.5-cp310-none-manylinux2014_x86_64.whl.metadata\n",
+ " Downloading PyMuPDF-1.23.5-cp310-none-manylinux2014_x86_64.whl.metadata (3.4 kB)\n",
+ "Requirement already satisfied: configobj in /opt/conda/lib/python3.10/site-packages (from fitz) (5.0.8)\n",
+ "Requirement already satisfied: configparser in /opt/conda/lib/python3.10/site-packages (from fitz) (6.0.0)\n",
+ "Requirement already satisfied: httplib2 in /opt/conda/lib/python3.10/site-packages (from fitz) (0.21.0)\n",
+ "Requirement already satisfied: nibabel in /opt/conda/lib/python3.10/site-packages (from fitz) (5.1.0)\n",
+ "Requirement already satisfied: nipype in /opt/conda/lib/python3.10/site-packages (from fitz) (1.8.6)\n",
+ "Requirement already satisfied: numpy in /opt/conda/lib/python3.10/site-packages (from fitz) (1.23.5)\n",
+ "Requirement already satisfied: pandas in /opt/conda/lib/python3.10/site-packages (from fitz) (2.0.3)\n",
+ "Requirement already satisfied: pyxnat in /opt/conda/lib/python3.10/site-packages (from fitz) (1.6)\n",
+ "Requirement already satisfied: scipy in /opt/conda/lib/python3.10/site-packages (from fitz) (1.11.2)\n",
+ "Collecting PyMuPDFb==1.23.5 (from PyMuPDF)\n",
+ " Obtaining dependency information for PyMuPDFb==1.23.5 from https://files.pythonhosted.org/packages/cf/14/de59687368ad2c047b038b5b9b04e40bd5d486d5b36c6aef42c18c35ea2c/PyMuPDFb-1.23.5-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata\n",
+ " Downloading PyMuPDFb-1.23.5-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (1.3 kB)\n",
+ "Requirement already satisfied: six in /opt/conda/lib/python3.10/site-packages (from configobj->fitz) (1.16.0)\n",
+ "Requirement already satisfied: pyparsing!=3.0.0,!=3.0.1,!=3.0.2,!=3.0.3,<4,>=2.4.2 in /opt/conda/lib/python3.10/site-packages (from httplib2->fitz) (3.1.1)\n",
+ "Requirement already satisfied: packaging>=17 in /opt/conda/lib/python3.10/site-packages (from nibabel->fitz) (23.1)\n",
+ "Requirement already satisfied: click>=6.6.0 in /opt/conda/lib/python3.10/site-packages (from nipype->fitz) (8.1.7)\n",
+ "Requirement already satisfied: networkx>=2.0 in /opt/conda/lib/python3.10/site-packages (from nipype->fitz) (3.1)\n",
+ "Requirement already satisfied: prov>=1.5.2 in /opt/conda/lib/python3.10/site-packages (from nipype->fitz) (2.0.0)\n",
+ "Requirement already satisfied: pydot>=1.2.3 in /opt/conda/lib/python3.10/site-packages (from nipype->fitz) (1.4.2)\n",
+ "Requirement already satisfied: python-dateutil>=2.2 in /opt/conda/lib/python3.10/site-packages (from nipype->fitz) (2.8.2)\n",
+ "Requirement already satisfied: rdflib>=5.0.0 in /opt/conda/lib/python3.10/site-packages (from nipype->fitz) (7.0.0)\n",
+ "Requirement already satisfied: simplejson>=3.8.0 in /opt/conda/lib/python3.10/site-packages (from nipype->fitz) (3.19.2)\n",
+ "Requirement already satisfied: traits!=5.0,<6.4,>=4.6 in /opt/conda/lib/python3.10/site-packages (from nipype->fitz) (6.3.2)\n",
+ "Requirement already satisfied: filelock>=3.0.0 in /opt/conda/lib/python3.10/site-packages (from nipype->fitz) (3.12.4)\n",
+ "Requirement already satisfied: etelemetry>=0.2.0 in /opt/conda/lib/python3.10/site-packages (from nipype->fitz) (0.3.1)\n",
+ "Requirement already satisfied: looseversion in /opt/conda/lib/python3.10/site-packages (from nipype->fitz) (1.3.0)\n",
+ "Requirement already satisfied: pytz>=2020.1 in /opt/conda/lib/python3.10/site-packages (from pandas->fitz) (2023.3.post1)\n",
+ "Requirement already satisfied: tzdata>=2022.1 in /opt/conda/lib/python3.10/site-packages (from pandas->fitz) (2023.3)\n",
+ "Requirement already satisfied: future>=0.16 in /opt/conda/lib/python3.10/site-packages (from pyxnat->fitz) (0.18.3)\n",
+ "Requirement already satisfied: lxml>=4.3 in /opt/conda/lib/python3.10/site-packages (from pyxnat->fitz) (4.9.3)\n",
+ "Requirement already satisfied: pathlib>=1.0 in /opt/conda/lib/python3.10/site-packages (from pyxnat->fitz) (1.0.1)\n",
+ "Requirement already satisfied: requests>=2.20 in /opt/conda/lib/python3.10/site-packages (from pyxnat->fitz) (2.31.0)\n",
+ "Requirement already satisfied: ci-info>=0.2 in /opt/conda/lib/python3.10/site-packages (from etelemetry>=0.2.0->nipype->fitz) (0.3.0)\n",
+ "Requirement already satisfied: isodate<0.7.0,>=0.6.0 in /opt/conda/lib/python3.10/site-packages (from rdflib>=5.0.0->nipype->fitz) (0.6.1)\n",
+ "Requirement already satisfied: charset-normalizer<4,>=2 in /opt/conda/lib/python3.10/site-packages (from requests>=2.20->pyxnat->fitz) (3.2.0)\n",
+ "Requirement already satisfied: idna<4,>=2.5 in /opt/conda/lib/python3.10/site-packages (from requests>=2.20->pyxnat->fitz) (3.4)\n",
+ "Requirement already satisfied: urllib3<3,>=1.21.1 in /opt/conda/lib/python3.10/site-packages (from requests>=2.20->pyxnat->fitz) (1.26.16)\n",
+ "Requirement already satisfied: certifi>=2017.4.17 in /opt/conda/lib/python3.10/site-packages (from requests>=2.20->pyxnat->fitz) (2023.7.22)\n",
+ "Downloading PyMuPDF-1.23.5-cp310-none-manylinux2014_x86_64.whl (4.3 MB)\n",
+ "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m4.3/4.3 MB\u001b[0m \u001b[31m46.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m00:01\u001b[0m00:01\u001b[0m\n",
+ "\u001b[?25hDownloading PyMuPDFb-1.23.5-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (30.6 MB)\n",
+ "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m30.6/30.6 MB\u001b[0m \u001b[31m42.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m:00:01\u001b[0m00:01\u001b[0m\n",
+ "\u001b[?25hInstalling collected packages: PyMuPDFb, PyMuPDF\n",
+ "Successfully installed PyMuPDF-1.23.5 PyMuPDFb-1.23.5\n"
+ ]
+ }
+ ],
+ "source": [
+ "!pip install \"fitz\" \"PyMuPDF\""
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "3331b05f-d00e-4ff2-be24-edea732e4af2",
+ "metadata": {},
+ "source": [
+ "Now we can make a function **extract_text_from_pdf** to extract the text from the pdf and save it as a variable."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 95,
+ "id": "d77b6ffe-90e1-4a01-aa52-9cf93a9c5c85",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import fitz\n",
+ "def extract_text_from_pdf(pdf_path):\n",
+ " doc = fitz.open(pdf_path)\n",
+ " text = ''\n",
+ " for page in doc:\n",
+ " text += page.get_text()\n",
+ " return text\n",
+ "\n",
+ "text_pdf=extract_text_from_pdf('12248_2020_Article_532.pdf')"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "10e761c6-da93-4ca0-827e-a3715c20eb10",
+ "metadata": {},
+ "source": [
+ "Finally we'll follow the same steps we did before to encode our inputs, pass it to our model, and then decode our output. Notice how we increased the max_length of what is expected of our input."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 97,
+ "id": "9a1f5dbd-6a9f-4533-a12e-8a6c4073df74",
+ "metadata": {
+ "scrolled": true,
+ "tags": []
+ },
+ "outputs": [
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "All model checkpoint layers were used when initializing TFT5ForConditionalGeneration.\n",
+ "\n",
+ "All the layers of TFT5ForConditionalGeneration were initialized from the model checkpoint at saved_model.\n",
+ "If your task is similar to the task the model of the checkpoint was trained on, you can already use TFT5ForConditionalGeneration for predictions without further training.\n"
+ ]
+ },
+ {
+ "ename": "TypeError",
+ "evalue": "Cannot convert 'Summary:' to EagerTensor of dtype int32",
+ "output_type": "error",
+ "traceback": [
+ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
+ "\u001b[0;31mTypeError\u001b[0m Traceback (most recent call last)",
+ "Cell \u001b[0;32mIn[97], line 12\u001b[0m\n\u001b[1;32m 8\u001b[0m model \u001b[38;5;241m=\u001b[39m TFAutoModelForSeq2SeqLM\u001b[38;5;241m.\u001b[39mfrom_pretrained(\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124msaved_model\u001b[39m\u001b[38;5;124m\"\u001b[39m)\n\u001b[1;32m 10\u001b[0m outputs \u001b[38;5;241m=\u001b[39m model\u001b[38;5;241m.\u001b[39mgenerate(inputs, max_new_tokens\u001b[38;5;241m=\u001b[39m\u001b[38;5;241m100\u001b[39m, do_sample\u001b[38;5;241m=\u001b[39m\u001b[38;5;28;01mFalse\u001b[39;00m)\n\u001b[0;32m---> 12\u001b[0m tokenizer\u001b[38;5;241m.\u001b[39mdecode(\u001b[38;5;124;43m'\u001b[39;49m\u001b[38;5;124;43mSummary:\u001b[39;49m\u001b[38;5;124;43m'\u001b[39;49m\u001b[43m \u001b[49m\u001b[38;5;241;43m+\u001b[39;49m\u001b[43m \u001b[49m\u001b[43moutputs\u001b[49m\u001b[43m[\u001b[49m\u001b[38;5;241;43m0\u001b[39;49m\u001b[43m]\u001b[49m, skip_special_tokens\u001b[38;5;241m=\u001b[39m\u001b[38;5;28;01mTrue\u001b[39;00m)\n",
+ "File \u001b[0;32m/opt/conda/lib/python3.10/site-packages/tensorflow/python/util/traceback_utils.py:153\u001b[0m, in \u001b[0;36mfilter_traceback..error_handler\u001b[0;34m(*args, **kwargs)\u001b[0m\n\u001b[1;32m 151\u001b[0m \u001b[38;5;28;01mexcept\u001b[39;00m \u001b[38;5;167;01mException\u001b[39;00m \u001b[38;5;28;01mas\u001b[39;00m e:\n\u001b[1;32m 152\u001b[0m filtered_tb \u001b[38;5;241m=\u001b[39m _process_traceback_frames(e\u001b[38;5;241m.\u001b[39m__traceback__)\n\u001b[0;32m--> 153\u001b[0m \u001b[38;5;28;01mraise\u001b[39;00m e\u001b[38;5;241m.\u001b[39mwith_traceback(filtered_tb) \u001b[38;5;28;01mfrom\u001b[39;00m \u001b[38;5;28;01mNone\u001b[39;00m\n\u001b[1;32m 154\u001b[0m \u001b[38;5;28;01mfinally\u001b[39;00m:\n\u001b[1;32m 155\u001b[0m \u001b[38;5;28;01mdel\u001b[39;00m filtered_tb\n",
+ "File \u001b[0;32m/opt/conda/lib/python3.10/site-packages/tensorflow/python/framework/constant_op.py:102\u001b[0m, in \u001b[0;36mconvert_to_eager_tensor\u001b[0;34m(value, ctx, dtype)\u001b[0m\n\u001b[1;32m 100\u001b[0m dtype \u001b[38;5;241m=\u001b[39m dtypes\u001b[38;5;241m.\u001b[39mas_dtype(dtype)\u001b[38;5;241m.\u001b[39mas_datatype_enum\n\u001b[1;32m 101\u001b[0m ctx\u001b[38;5;241m.\u001b[39mensure_initialized()\n\u001b[0;32m--> 102\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[43mops\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mEagerTensor\u001b[49m\u001b[43m(\u001b[49m\u001b[43mvalue\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mctx\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mdevice_name\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mdtype\u001b[49m\u001b[43m)\u001b[49m\n",
+ "\u001b[0;31mTypeError\u001b[0m: Cannot convert 'Summary:' to EagerTensor of dtype int32"
+ ]
+ }
+ ],
+ "source": [
+ "from transformers import AutoTokenizer\n",
+ "\n",
+ "tokenizer = AutoTokenizer.from_pretrained(CHECKPOINT)\n",
+ "inputs = tokenizer.encode(text_pdf, max_length=1000, truncation=True, return_tensors=\"tf\")\n",
+ "\n",
+ "from transformers import TFAutoModelForSeq2SeqLM\n",
+ "\n",
+ "model = TFAutoModelForSeq2SeqLM.from_pretrained(\"saved_model\")\n",
+ "\n",
+ "outputs = model.generate(inputs, max_new_tokens=100, do_sample=False)\n",
+ "\n",
+ "tokenizer.decode(outputs[0], skip_special_tokens=True)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "257fb45f-7752-481d-a1f6-f3eeb7655fac",
+ "metadata": {},
+ "source": [
+ "## Finetuning our Model via Vertex AI Training API"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "6ac841f6-c65e-4ebf-8c42-3030e2f92cb0",
+ "metadata": {
+ "tags": []
+ },
+ "source": [
+ "### Setting up our Datasets for Training "
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "fff825cf-86fb-4777-885b-9e981be831b7",
+ "metadata": {},
+ "source": [
+ "Although we have our datasets saved locally inorder to utilize the Vertex AI Training API we will need to store our datasets in a bucket."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 233,
+ "id": "d3f49896-b2c1-47e6-a7cc-aca7753bb6c4",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "from datasets import load_dataset\n",
+ "\n",
+ "# load dataset\n",
+ "train, test, validation = load_dataset(\"ccdv/pubmed-summarization\", split=[\"train[:5%]\", \"test[:5%]\", \"validation[:5%]\" ])"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 1,
+ "id": "1b91fe7c-5970-45c1-9401-1db3206a8ce9",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "#load in the storage package and name our bucket\n",
+ "from google.cloud import storage\n",
+ "BUCKET='flan-t5-model-resources'\n",
+ "client = storage.Client()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 105,
+ "id": "0066ad72-e451-41c0-b30a-c3a7dfa5f17c",
+ "metadata": {
+ "scrolled": true,
+ "tags": []
+ },
+ "outputs": [
+ {
+ "ename": "Conflict",
+ "evalue": "409 POST https://storage.googleapis.com/storage/v1/b?project=cit-oconnellka-9999&prettyPrint=false: Your previous request to create the named bucket succeeded and you already own it.",
+ "output_type": "error",
+ "traceback": [
+ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
+ "\u001b[0;31mConflict\u001b[0m Traceback (most recent call last)",
+ "Cell \u001b[0;32mIn[105], line 3\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[38;5;66;03m#Create bucket\u001b[39;00m\n\u001b[1;32m 2\u001b[0m bucket \u001b[38;5;241m=\u001b[39m client\u001b[38;5;241m.\u001b[39mbucket(BUCKET)\n\u001b[0;32m----> 3\u001b[0m \u001b[43mbucket\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mcreate\u001b[49m\u001b[43m(\u001b[49m\u001b[43m)\u001b[49m\n",
+ "File \u001b[0;32m/opt/conda/lib/python3.10/site-packages/google/cloud/storage/bucket.py:972\u001b[0m, in \u001b[0;36mBucket.create\u001b[0;34m(self, client, project, location, predefined_acl, predefined_default_object_acl, timeout, retry)\u001b[0m\n\u001b[1;32m 925\u001b[0m \u001b[38;5;250m\u001b[39m\u001b[38;5;124;03m\"\"\"Creates current bucket.\u001b[39;00m\n\u001b[1;32m 926\u001b[0m \n\u001b[1;32m 927\u001b[0m \u001b[38;5;124;03mIf the bucket already exists, will raise\u001b[39;00m\n\u001b[0;32m (...)\u001b[0m\n\u001b[1;32m 968\u001b[0m \u001b[38;5;124;03m (Optional) How to retry the RPC. See: :ref:`configuring_retries`\u001b[39;00m\n\u001b[1;32m 969\u001b[0m \u001b[38;5;124;03m\"\"\"\u001b[39;00m\n\u001b[1;32m 971\u001b[0m client \u001b[38;5;241m=\u001b[39m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_require_client(client)\n\u001b[0;32m--> 972\u001b[0m \u001b[43mclient\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mcreate_bucket\u001b[49m\u001b[43m(\u001b[49m\n\u001b[1;32m 973\u001b[0m \u001b[43m \u001b[49m\u001b[43mbucket_or_name\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[43m,\u001b[49m\n\u001b[1;32m 974\u001b[0m \u001b[43m \u001b[49m\u001b[43mproject\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mproject\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 975\u001b[0m \u001b[43m \u001b[49m\u001b[43muser_project\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43muser_project\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 976\u001b[0m \u001b[43m \u001b[49m\u001b[43mlocation\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mlocation\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 977\u001b[0m \u001b[43m \u001b[49m\u001b[43mpredefined_acl\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mpredefined_acl\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 978\u001b[0m \u001b[43m \u001b[49m\u001b[43mpredefined_default_object_acl\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mpredefined_default_object_acl\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 979\u001b[0m \u001b[43m \u001b[49m\u001b[43mtimeout\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mtimeout\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 980\u001b[0m \u001b[43m \u001b[49m\u001b[43mretry\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mretry\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 981\u001b[0m \u001b[43m\u001b[49m\u001b[43m)\u001b[49m\n",
+ "File \u001b[0;32m/opt/conda/lib/python3.10/site-packages/google/cloud/storage/client.py:954\u001b[0m, in \u001b[0;36mClient.create_bucket\u001b[0;34m(self, bucket_or_name, requester_pays, project, user_project, location, data_locations, predefined_acl, predefined_default_object_acl, timeout, retry)\u001b[0m\n\u001b[1;32m 951\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m data_locations \u001b[38;5;129;01mis\u001b[39;00m \u001b[38;5;129;01mnot\u001b[39;00m \u001b[38;5;28;01mNone\u001b[39;00m:\n\u001b[1;32m 952\u001b[0m properties[\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mcustomPlacementConfig\u001b[39m\u001b[38;5;124m\"\u001b[39m] \u001b[38;5;241m=\u001b[39m {\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mdataLocations\u001b[39m\u001b[38;5;124m\"\u001b[39m: data_locations}\n\u001b[0;32m--> 954\u001b[0m api_response \u001b[38;5;241m=\u001b[39m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_post_resource\u001b[49m\u001b[43m(\u001b[49m\n\u001b[1;32m 955\u001b[0m \u001b[43m \u001b[49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[38;5;124;43m/b\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[43m,\u001b[49m\n\u001b[1;32m 956\u001b[0m \u001b[43m \u001b[49m\u001b[43mproperties\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 957\u001b[0m \u001b[43m \u001b[49m\u001b[43mquery_params\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mquery_params\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 958\u001b[0m \u001b[43m \u001b[49m\u001b[43mtimeout\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mtimeout\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 959\u001b[0m \u001b[43m \u001b[49m\u001b[43mretry\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mretry\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 960\u001b[0m \u001b[43m \u001b[49m\u001b[43m_target_object\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mbucket\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 961\u001b[0m \u001b[43m\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m 963\u001b[0m bucket\u001b[38;5;241m.\u001b[39m_set_properties(api_response)\n\u001b[1;32m 964\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m bucket\n",
+ "File \u001b[0;32m/opt/conda/lib/python3.10/site-packages/google/cloud/storage/client.py:618\u001b[0m, in \u001b[0;36mClient._post_resource\u001b[0;34m(self, path, data, query_params, headers, timeout, retry, _target_object)\u001b[0m\n\u001b[1;32m 557\u001b[0m \u001b[38;5;28;01mdef\u001b[39;00m \u001b[38;5;21m_post_resource\u001b[39m(\n\u001b[1;32m 558\u001b[0m \u001b[38;5;28mself\u001b[39m,\n\u001b[1;32m 559\u001b[0m path,\n\u001b[0;32m (...)\u001b[0m\n\u001b[1;32m 565\u001b[0m _target_object\u001b[38;5;241m=\u001b[39m\u001b[38;5;28;01mNone\u001b[39;00m,\n\u001b[1;32m 566\u001b[0m ):\n\u001b[1;32m 567\u001b[0m \u001b[38;5;250m \u001b[39m\u001b[38;5;124;03m\"\"\"Helper for bucket / blob methods making API 'POST' calls.\u001b[39;00m\n\u001b[1;32m 568\u001b[0m \n\u001b[1;32m 569\u001b[0m \u001b[38;5;124;03m Args:\u001b[39;00m\n\u001b[0;32m (...)\u001b[0m\n\u001b[1;32m 615\u001b[0m \u001b[38;5;124;03m If the bucket is not found.\u001b[39;00m\n\u001b[1;32m 616\u001b[0m \u001b[38;5;124;03m \"\"\"\u001b[39;00m\n\u001b[0;32m--> 618\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_connection\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mapi_request\u001b[49m\u001b[43m(\u001b[49m\n\u001b[1;32m 619\u001b[0m \u001b[43m \u001b[49m\u001b[43mmethod\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[38;5;124;43mPOST\u001b[39;49m\u001b[38;5;124;43m\"\u001b[39;49m\u001b[43m,\u001b[49m\n\u001b[1;32m 620\u001b[0m \u001b[43m \u001b[49m\u001b[43mpath\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mpath\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 621\u001b[0m \u001b[43m \u001b[49m\u001b[43mdata\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mdata\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 622\u001b[0m \u001b[43m \u001b[49m\u001b[43mquery_params\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mquery_params\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 623\u001b[0m \u001b[43m \u001b[49m\u001b[43mheaders\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mheaders\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 624\u001b[0m \u001b[43m \u001b[49m\u001b[43mtimeout\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mtimeout\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 625\u001b[0m \u001b[43m \u001b[49m\u001b[43mretry\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mretry\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 626\u001b[0m \u001b[43m \u001b[49m\u001b[43m_target_object\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43m_target_object\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 627\u001b[0m \u001b[43m \u001b[49m\u001b[43m)\u001b[49m\n",
+ "File \u001b[0;32m/opt/conda/lib/python3.10/site-packages/google/cloud/storage/_http.py:72\u001b[0m, in \u001b[0;36mConnection.api_request\u001b[0;34m(self, *args, **kwargs)\u001b[0m\n\u001b[1;32m 70\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m retry:\n\u001b[1;32m 71\u001b[0m call \u001b[38;5;241m=\u001b[39m retry(call)\n\u001b[0;32m---> 72\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[43mcall\u001b[49m\u001b[43m(\u001b[49m\u001b[43m)\u001b[49m\n",
+ "File \u001b[0;32m/opt/conda/lib/python3.10/site-packages/google/api_core/retry.py:349\u001b[0m, in \u001b[0;36mRetry.__call__..retry_wrapped_func\u001b[0;34m(*args, **kwargs)\u001b[0m\n\u001b[1;32m 345\u001b[0m target \u001b[38;5;241m=\u001b[39m functools\u001b[38;5;241m.\u001b[39mpartial(func, \u001b[38;5;241m*\u001b[39margs, \u001b[38;5;241m*\u001b[39m\u001b[38;5;241m*\u001b[39mkwargs)\n\u001b[1;32m 346\u001b[0m sleep_generator \u001b[38;5;241m=\u001b[39m exponential_sleep_generator(\n\u001b[1;32m 347\u001b[0m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_initial, \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_maximum, multiplier\u001b[38;5;241m=\u001b[39m\u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_multiplier\n\u001b[1;32m 348\u001b[0m )\n\u001b[0;32m--> 349\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[43mretry_target\u001b[49m\u001b[43m(\u001b[49m\n\u001b[1;32m 350\u001b[0m \u001b[43m \u001b[49m\u001b[43mtarget\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 351\u001b[0m \u001b[43m \u001b[49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_predicate\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 352\u001b[0m \u001b[43m \u001b[49m\u001b[43msleep_generator\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 353\u001b[0m \u001b[43m \u001b[49m\u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_timeout\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 354\u001b[0m \u001b[43m \u001b[49m\u001b[43mon_error\u001b[49m\u001b[38;5;241;43m=\u001b[39;49m\u001b[43mon_error\u001b[49m\u001b[43m,\u001b[49m\n\u001b[1;32m 355\u001b[0m \u001b[43m\u001b[49m\u001b[43m)\u001b[49m\n",
+ "File \u001b[0;32m/opt/conda/lib/python3.10/site-packages/google/api_core/retry.py:191\u001b[0m, in \u001b[0;36mretry_target\u001b[0;34m(target, predicate, sleep_generator, timeout, on_error, **kwargs)\u001b[0m\n\u001b[1;32m 189\u001b[0m \u001b[38;5;28;01mfor\u001b[39;00m sleep \u001b[38;5;129;01min\u001b[39;00m sleep_generator:\n\u001b[1;32m 190\u001b[0m \u001b[38;5;28;01mtry\u001b[39;00m:\n\u001b[0;32m--> 191\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[43mtarget\u001b[49m\u001b[43m(\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m 193\u001b[0m \u001b[38;5;66;03m# pylint: disable=broad-except\u001b[39;00m\n\u001b[1;32m 194\u001b[0m \u001b[38;5;66;03m# This function explicitly must deal with broad exceptions.\u001b[39;00m\n\u001b[1;32m 195\u001b[0m \u001b[38;5;28;01mexcept\u001b[39;00m \u001b[38;5;167;01mException\u001b[39;00m \u001b[38;5;28;01mas\u001b[39;00m exc:\n",
+ "File \u001b[0;32m/opt/conda/lib/python3.10/site-packages/google/cloud/_http/__init__.py:494\u001b[0m, in \u001b[0;36mJSONConnection.api_request\u001b[0;34m(self, method, path, query_params, data, content_type, headers, api_base_url, api_version, expect_json, _target_object, timeout, extra_api_info)\u001b[0m\n\u001b[1;32m 482\u001b[0m response \u001b[38;5;241m=\u001b[39m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_make_request(\n\u001b[1;32m 483\u001b[0m method\u001b[38;5;241m=\u001b[39mmethod,\n\u001b[1;32m 484\u001b[0m url\u001b[38;5;241m=\u001b[39murl,\n\u001b[0;32m (...)\u001b[0m\n\u001b[1;32m 490\u001b[0m extra_api_info\u001b[38;5;241m=\u001b[39mextra_api_info,\n\u001b[1;32m 491\u001b[0m )\n\u001b[1;32m 493\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;129;01mnot\u001b[39;00m \u001b[38;5;241m200\u001b[39m \u001b[38;5;241m<\u001b[39m\u001b[38;5;241m=\u001b[39m response\u001b[38;5;241m.\u001b[39mstatus_code \u001b[38;5;241m<\u001b[39m \u001b[38;5;241m300\u001b[39m:\n\u001b[0;32m--> 494\u001b[0m \u001b[38;5;28;01mraise\u001b[39;00m exceptions\u001b[38;5;241m.\u001b[39mfrom_http_response(response)\n\u001b[1;32m 496\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m expect_json \u001b[38;5;129;01mand\u001b[39;00m response\u001b[38;5;241m.\u001b[39mcontent:\n\u001b[1;32m 497\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m response\u001b[38;5;241m.\u001b[39mjson()\n",
+ "\u001b[0;31mConflict\u001b[0m: 409 POST https://storage.googleapis.com/storage/v1/b?project=cit-oconnellka-9999&prettyPrint=false: Your previous request to create the named bucket succeeded and you already own it."
+ ]
+ }
+ ],
+ "source": [
+ "#Create bucket\n",
+ "bucket = client.bucket(BUCKET)\n",
+ "bucket.create()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "4fc7335e-cfa7-4b0e-8880-85edfa573772",
+ "metadata": {},
+ "source": [
+ "Convert our datasets to csv and upload to our bucket in one step!"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 60,
+ "id": "1bfbbd92-4b2c-4e5c-95f8-d4e645a6ab24",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "application/vnd.jupyter.widget-view+json": {
+ "model_id": "094bf0b9c0bf44b0859f2b9c5f375e8c",
+ "version_major": 2,
+ "version_minor": 0
+ },
+ "text/plain": [
+ "Creating CSV from Arrow format: 0%| | 0/6 [00:00, ?ba/s]"
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "from io import BytesIO\n",
+ "\n",
+ "#convert train dataset to csv and push to GCS bucket\n",
+ "csv_buffer = BytesIO()\n",
+ "train.to_csv(csv_buffer)\n",
+ "client = storage.Client()\n",
+ "bucket = client.get_bucket(BUCKET)\n",
+ "bucket.blob('train.csv').upload_from_file(csv_buffer, 'text/csv')"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 61,
+ "id": "dbf3f68f-acc8-4086-9b89-be0d3eacf898",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "application/vnd.jupyter.widget-view+json": {
+ "model_id": "e54494e07db241aa8537c6bce84558bd",
+ "version_major": 2,
+ "version_minor": 0
+ },
+ "text/plain": [
+ "Creating CSV from Arrow format: 0%| | 0/1 [00:00, ?ba/s]"
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "#convert test dataset to csv and push to GCS bucket\n",
+ "csv_buffer = BytesIO()\n",
+ "test.to_csv(csv_buffer)\n",
+ "client = storage.Client()\n",
+ "bucket = client.get_bucket(BUCKET)\n",
+ "bucket.blob('test.csv').upload_from_file(csv_buffer, 'text/csv')"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 62,
+ "id": "ea1773e1-dfe2-46b5-a63d-782101d79096",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "application/vnd.jupyter.widget-view+json": {
+ "model_id": "8b3cd00c2e42453b9c85320fd43360a5",
+ "version_major": 2,
+ "version_minor": 0
+ },
+ "text/plain": [
+ "Creating CSV from Arrow format: 0%| | 0/1 [00:00, ?ba/s]"
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "#convert validation dataset to csv and push to GCS bucket\n",
+ "csv_buffer = BytesIO()\n",
+ "validation.to_csv(csv_buffer)\n",
+ "client = storage.Client()\n",
+ "bucket = client.get_bucket(BUCKET)\n",
+ "bucket.blob('validation.csv').upload_from_file(csv_buffer, 'text/csv')"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "f18630a7-109c-4f53-9233-1842f5c27029",
+ "metadata": {},
+ "source": [
+ "Here we will be saving the location of our datasets be used when we execute the training of our model."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 257,
+ "id": "ebc1bc39-a554-473b-949a-d9588f6e7fb8",
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [],
+ "source": [
+ "# save train_dataset to s3\n",
+ "training_input_path = f'gs://{BUCKET}/train.csv'\n",
+ "\n",
+ "# save test_dataset to s3\n",
+ "test_input_path = f'gs://{BUCKET}/test.csv'\n",
+ "\n",
+ "validation_input_path = f'gs://{BUCKET}/validation.csv'"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "9204b6dc-8f6e-407e-8c68-a036a6a5b7c9",
+ "metadata": {},
+ "source": [
+ "### Training our Model via Vertex AI Training API"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "2f873f2f-90b8-4566-96f5-37a23a2294e1",
+ "metadata": {},
+ "source": [
+ "To train our model on Vertex AI Training API you must first create a custom AI job, this is done by creating a autopkg that holds your requirements.txt and task.py files is a specific structure like so: "
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "8eeafae2-a698-4a52-a4f1-cae550245d0b",
+ "metadata": {},
+ "source": [
+ "```\n",
+ "autopkg-summarizer /\n",
+ " + requirements.txt\n",
+ " + trainer/\n",
+ " + task.py\n",
+ "```"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 103,
+ "id": "48d49de7-6d86-411e-9e6e-104763ae36e6",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "#Creates the following directories and files\n",
+ "!mkdir autopkg-summarizer\n",
+ "!touch autopkg-summarizer/requirements.txt\n",
+ "!mkdir autopkg-summarizer/trainer\n",
+ "!touch autopkg-summarizer/trainer/task.py"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "7bd953b5-8d21-4c07-adcc-23604d5d0279",
+ "metadata": {},
+ "source": [
+ "Add your requirements.txt file by adding the packages below:\n",
+ "```\n",
+ "nltk\n",
+ "transformers\n",
+ "keras_nlp\n",
+ "datasets\n",
+ "rouge_score\n",
+ "```"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "443f0e45-dbe6-4ce9-b0ad-96a5fe52a455",
+ "metadata": {},
+ "source": [
+ "To create our training script we will be adding all the steps that we ran from the 'Finetuning our Model Locally' section of this tutorial to a file named task.py:"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "14ab5a7d-36bf-449d-b2cd-4f7e107c75de",
+ "metadata": {},
+ "source": [
+ "```\n",
+ "import nltk\n",
+ "import argparse\n",
+ "from datasets import load_dataset\n",
+ "#import evaluate\n",
+ "import numpy as np\n",
+ "from transformers import create_optimizer, AdamWeightDecay, TFAutoModelForSeq2SeqLM, AutoTokenizer, DataCollatorForSeq2Seq, set_seed\n",
+ "import tensorflow as tf\n",
+ "from tensorflow import keras\n",
+ "from transformers.keras_callbacks import KerasMetricCallback\n",
+ "import keras_nlp\n",
+ "\n",
+ "def get_args():\n",
+ " '''Parses args.'''\n",
+ " parser = argparse.ArgumentParser(formatter_class=argparse.ArgumentDefaultsHelpFormatter)\n",
+ " parser.add_argument(\n",
+ " '--model_name_or_path',\n",
+ " required=True,\n",
+ " type=str,\n",
+ " help='name of model or path to load into tokenizer and class')\n",
+ " parser.add_argument(\n",
+ " '--train_file',\n",
+ " required=True,\n",
+ " type=str,\n",
+ " help='train dataset in csv or json format')\n",
+ " parser.add_argument(\n",
+ " '--test_file',\n",
+ " required=True,\n",
+ " type=str,\n",
+ " help='test dataset in csv or json format')\n",
+ " parser.add_argument(\n",
+ " '--validation_file',\n",
+ " required=True,\n",
+ " type=str,\n",
+ " help='validation dataset in csv or json format used to calculate ROUGE score')\n",
+ " parser.add_argument(\n",
+ " '--text_column',\n",
+ " required=True,\n",
+ " type=str,\n",
+ " help='The name of the column in the datasets containing the full texts (for summarization)')\n",
+ " parser.add_argument(\n",
+ " '--summary_column',\n",
+ " required=True,\n",
+ " type=str,\n",
+ " help='The name of the column in the datasets containing the abstracts or summary of the full text')\n",
+ " parser.add_argument(\n",
+ " '--num_train_epochs',\n",
+ " required=False,\n",
+ " type=int,\n",
+ " default=3,\n",
+ " help='number of complete passes through the training dataset')\n",
+ " parser.add_argument(\n",
+ " '--source_prefix',\n",
+ " required=False,\n",
+ " type=str,\n",
+ " help='A prefix to add before every source text (needed for T5 models)')\n",
+ " parser.add_argument(\n",
+ " '--inputs_max_length',\n",
+ " required=False,\n",
+ " type=int,\n",
+ " default=1024,\n",
+ " help='max token length for model inputs')\n",
+ " parser.add_argument(\n",
+ " '--labels_max_length',\n",
+ " required=False,\n",
+ " type=int,\n",
+ " default=128,\n",
+ " help='max token length for model labels or targets')\n",
+ " parser.add_argument(\n",
+ " '--batch_size',\n",
+ " required=False,\n",
+ " type=int,\n",
+ " default=10,\n",
+ " help='max token length for model labels or targets')\n",
+ " parser.add_argument(\n",
+ " '--output_dir',\n",
+ " required=True,\n",
+ " type=str,\n",
+ " help='bucket to store saved model, include gs://')\n",
+ " \n",
+ " args = parser.parse_args()\n",
+ " return args\n",
+ "\n",
+ "def main():\n",
+ " \n",
+ " args = get_args() \n",
+ " \n",
+ " checkpoint = args.model_name_or_path\n",
+ " \n",
+ " tokenizer = AutoTokenizer.from_pretrained(checkpoint)\n",
+ " \n",
+ " text = args.text_column\n",
+ " summary = args.summary_column\n",
+ " inputs_max_length = args.inputs_max_length\n",
+ " labels_max_length = args.labels_max_length\n",
+ " prefix = args.source_prefix \n",
+ " \n",
+ " model = TFAutoModelForSeq2SeqLM.from_pretrained(checkpoint) \n",
+ " \n",
+ " data_files = {'train':args.train_file, 'test':args.test_file, 'validation':args.validation_file}\n",
+ " extension = args.train_file.split(\".\")[-1]\n",
+ " \n",
+ " raw_datasets = load_dataset(\n",
+ " extension,\n",
+ " data_files=data_files)\n",
+ " \n",
+ " raw_datasets = raw_datasets.filter(lambda x: x[text] is not None) \n",
+ " \n",
+ " train = raw_datasets[\"train\"]\n",
+ " test = raw_datasets[\"test\"]\n",
+ " validation = raw_datasets[\"validation\"]\n",
+ " \n",
+ " def preprocess_function(examples):\n",
+ " \n",
+ " inputs = [prefix + doc for doc in examples[text]]\n",
+ " model_inputs = tokenizer(inputs, max_length=inputs_max_length, truncation=True)\n",
+ "\n",
+ " # labels = tokenizer(text_target=examples[\"abstract\"], max_length=128, truncation=True)\n",
+ "\n",
+ " labels = tokenizer(text_target=\n",
+ " examples[summary], max_length=labels_max_length, truncation=True\n",
+ " )\n",
+ "\n",
+ " model_inputs[\"labels\"] = labels[\"input_ids\"]\n",
+ " return model_inputs\n",
+ " \n",
+ " tokenized_train = train.map(preprocess_function, batched=True)\n",
+ " tokenized_test = test.map(preprocess_function, batched=True)\n",
+ " tokenized_validation = validation.map(preprocess_function, batched=True)\n",
+ " \n",
+ " data_collator = DataCollatorForSeq2Seq(tokenizer=tokenizer, model=checkpoint, return_tensors=\"tf\")\n",
+ "\n",
+ " optimizer = AdamWeightDecay(learning_rate=2e-5, weight_decay_rate=0.01)\n",
+ " model.compile(optimizer=optimizer)\n",
+ "\n",
+ " tf_train_set = model.prepare_tf_dataset(\n",
+ " tokenized_train,\n",
+ " shuffle=True,\n",
+ " batch_size=args.batch_size,\n",
+ " collate_fn=data_collator\n",
+ " )\n",
+ "\n",
+ " tf_test_set = model.prepare_tf_dataset(\n",
+ " tokenized_test,\n",
+ " shuffle=False,\n",
+ " batch_size=args.batch_size,\n",
+ " collate_fn=data_collator\n",
+ " )\n",
+ " \n",
+ " tf_validation_set = model.prepare_tf_dataset(\n",
+ " tokenized_validation,\n",
+ " shuffle=False,\n",
+ " batch_size=args.batch_size,\n",
+ " collate_fn=data_collator\n",
+ " ) \n",
+ " \n",
+ " def metric_fn(eval_predictions):\n",
+ " predictions, labels = eval_predictions\n",
+ " decoded_predictions = tokenizer.batch_decode(predictions, skip_special_tokens=True)\n",
+ " for label in labels:\n",
+ " label[label < 0] = tokenizer.pad_token_id # Replace masked label tokens\n",
+ " decoded_labels = tokenizer.batch_decode(labels, skip_special_tokens=True)\n",
+ " result = rouge_l(decoded_labels, decoded_predictions)\n",
+ " # We will print only the F1 score, you can use other aggregation metrics as well\n",
+ " result = {\"RougeL\": result[\"f1_score\"]}\n",
+ "\n",
+ " return result\n",
+ " \n",
+ " rouge_l = keras_nlp.metrics.RougeL()\n",
+ "\n",
+ " metric_callback = KerasMetricCallback(\n",
+ " metric_fn, eval_dataset=tf_validation_set, predict_with_generate=True, use_xla_generation=True)\n",
+ "\n",
+ "\n",
+ " model.fit(x=tf_train_set, validation_data=tf_test_set, epochs=args.num_train_epochs, callbacks=metric_callback)\n",
+ " model.save(f'{args.output_dir}/saved_model_artifacts_tf')\n",
+ " model.save_pretrained(f'{args.output_dir}/saved_model_hf_tf')\n",
+ "\n",
+ "\n",
+ "if __name__ == \"__main__\":\n",
+ " main()\n",
+ "```"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "8775dba1-c47b-4375-9587-6fec561bc5f9",
+ "metadata": {},
+ "source": [
+ "### Hyperparameters (for the training script and custom AI job)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "a66d0c47-f6df-4b79-87a6-a637b04ebc87",
+ "metadata": {},
+ "source": [
+ "The first step to training our model other than setting up our datasets is to set our **hyperparameters**. Hyperparameters depend on your training script and for this one we need to identify our model, the location of our train and test files, etc. \n",
+ "\n",
+ "The batch_size, inputs_max_length, num_train_epochs, and labels_max_length already have defualts setting same as the ones we used in the first section of this tutorial!"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 5,
+ "id": "764980e6-9bc1-4715-b540-9e254b12f1f3",
+ "metadata": {
+ "scrolled": true,
+ "tags": []
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "2023-11-03 12:32:26.151679: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered\n",
+ "2023-11-03 12:32:26.151738: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered\n",
+ "2023-11-03 12:32:26.151777: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered\n",
+ "2023-11-03 12:32:26.161962: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.\n",
+ "To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.\n",
+ "Traceback (most recent call last):\n",
+ " File \"/home/jupyter/autopkg-summarizer/trainer/task.py\", line 4, in \n",
+ " import evaluate\n",
+ " File \"/opt/conda/lib/python3.10/site-packages/evaluate/__init__.py\", line 29, in \n",
+ " from .evaluation_suite import EvaluationSuite\n",
+ " File \"/opt/conda/lib/python3.10/site-packages/evaluate/evaluation_suite/__init__.py\", line 10, in \n",
+ " from ..evaluator import evaluator\n",
+ " File \"/opt/conda/lib/python3.10/site-packages/evaluate/evaluator/__init__.py\", line 17, in \n",
+ " from transformers.pipelines import SUPPORTED_TASKS as SUPPORTED_PIPELINE_TASKS\n",
+ " File \"/opt/conda/lib/python3.10/site-packages/transformers/pipelines/__init__.py\", line 72, in \n",
+ " from .table_question_answering import TableQuestionAnsweringArgumentHandler, TableQuestionAnsweringPipeline\n",
+ " File \"/opt/conda/lib/python3.10/site-packages/transformers/pipelines/table_question_answering.py\", line 26, in \n",
+ " import tensorflow_probability as tfp\n",
+ " File \"/opt/conda/lib/python3.10/site-packages/tensorflow_probability/__init__.py\", line 20, in \n",
+ " from tensorflow_probability import substrates\n",
+ " File \"/opt/conda/lib/python3.10/site-packages/tensorflow_probability/substrates/__init__.py\", line 17, in \n",
+ " from tensorflow_probability.python.internal import all_util\n",
+ " File \"/opt/conda/lib/python3.10/site-packages/tensorflow_probability/python/__init__.py\", line 138, in \n",
+ " dir(globals()[pkg_name]) # Forces loading the package from its lazy loader.\n",
+ " File \"/opt/conda/lib/python3.10/site-packages/tensorflow_probability/python/internal/lazy_loader.py\", line 57, in __dir__\n",
+ " module = self._load()\n",
+ " File \"/opt/conda/lib/python3.10/site-packages/tensorflow_probability/python/internal/lazy_loader.py\", line 40, in _load\n",
+ " module = importlib.import_module(self.__name__)\n",
+ " File \"/opt/conda/lib/python3.10/importlib/__init__.py\", line 126, in import_module\n",
+ " return _bootstrap._gcd_import(name[level:], package, level)\n",
+ " File \"/opt/conda/lib/python3.10/site-packages/tensorflow_probability/python/experimental/__init__.py\", line 31, in \n",
+ " from tensorflow_probability.python.experimental import bayesopt\n",
+ " File \"/opt/conda/lib/python3.10/site-packages/tensorflow_probability/python/experimental/bayesopt/__init__.py\", line 17, in \n",
+ " from tensorflow_probability.python.experimental.bayesopt import acquisition\n",
+ " File \"/opt/conda/lib/python3.10/site-packages/tensorflow_probability/python/experimental/bayesopt/acquisition/__init__.py\", line 17, in \n",
+ " from tensorflow_probability.python.experimental.bayesopt.acquisition.acquisition_function import AcquisitionFunction\n",
+ " File \"/opt/conda/lib/python3.10/site-packages/tensorflow_probability/python/experimental/bayesopt/acquisition/acquisition_function.py\", line 22, in \n",
+ " from tensorflow_probability.python.internal import prefer_static as ps\n",
+ " File \"/opt/conda/lib/python3.10/site-packages/tensorflow_probability/python/internal/prefer_static.py\", line 361, in \n",
+ " ones_like = _copy_docstring(tf.ones_like, _ones_like)\n",
+ " File \"/opt/conda/lib/python3.10/site-packages/tensorflow_probability/python/internal/prefer_static.py\", line 84, in _copy_docstring\n",
+ " raise ValueError(\n",
+ "ValueError: Arg specs do not match: original=FullArgSpec(args=['input', 'dtype', 'name', 'layout'], varargs=None, varkw=None, defaults=(None, None, None), kwonlyargs=[], kwonlydefaults=None, annotations={}), new=FullArgSpec(args=['input', 'dtype', 'name'], varargs=None, varkw=None, defaults=(None, None), kwonlyargs=[], kwonlydefaults=None, annotations={}), fn=\n"
+ ]
+ }
+ ],
+ "source": [
+ "#to view options and defaults you can run the command below\n",
+ "!python autopkg-summarizer/trainer/task.py --help"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 258,
+ "id": "b21c8c79-1709-4052-8522-ae332cfec934",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "#Parameters for task.py script\n",
+ "CHECKPOINT = \"google/flan-t5-small\"\n",
+ "train_file=training_input_path\n",
+ "test_file=test_input_path\n",
+ "validation_file=validation_input_path\n",
+ "text_column=\"article\"\n",
+ "summary_column=\"abstract\"\n",
+ "source_prefix=\"summarize: \" \n",
+ "output_dir= f'gs://{BUCKET}'"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "e830bb4a-854e-412d-93ec-3059faf603d6",
+ "metadata": {},
+ "source": [
+ "For custom AI we need to set the machine type, the accelerator for GPUs, and prebuilt docker image that will run our training. See here for more available containers: https://cloud.google.com/vertex-ai/docs/training/pre-built-containers."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 20,
+ "id": "09392ddd-aa9d-4358-95a6-3e64fa1692ad",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "#Parameters for custom AI job\n",
+ "display_name='flan-t5-training-tf'\n",
+ "BASE_GPU_IMAGE_tf='us-docker.pkg.dev/vertex-ai/training/tf-gpu.2-12.py310:latest'\n",
+ "machine_type='n1-standard-4'\n",
+ "accelerator_type='NVIDIA_TESLA_V100'"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "7a6a7856-624d-4229-8a5d-cfd263a84033",
+ "metadata": {},
+ "source": [
+ "### Submit Custom AI Training Job"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "16c88abb-2a4f-4475-ad58-1d69ca31c449",
+ "metadata": {},
+ "source": [
+ "Finally we can submit our training via a custom job! It will first deploy the container that we specified and then submit our model for training. This custom job can take 15 - 20 min using our sample datasets."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 262,
+ "id": "252d8e16-5b3d-409b-bc86-9da0ce996f72",
+ "metadata": {
+ "scrolled": true,
+ "tags": []
+ },
+ "outputs": [
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...\n",
+ "To disable this warning, you can either:\n",
+ "\t- Avoid using `tokenizers` before the fork if possible\n",
+ "\t- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)\n"
+ ]
+ },
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Using endpoint [https://us-central1-aiplatform.googleapis.com/]\n",
+ "/usr/lib/google-cloud-sdk/platform/bundledpythonunix/lib/python3.9/subprocess.py:935: RuntimeWarning: line buffering (buffering=1) isn't supported in binary mode, the default buffer size will be used\n",
+ " self.stdin = io.open(p2cwrite, 'wb', bufsize)\n",
+ "/usr/lib/google-cloud-sdk/platform/bundledpythonunix/lib/python3.9/subprocess.py:941: RuntimeWarning: line buffering (buffering=1) isn't supported in binary mode, the default buffer size will be used\n",
+ " self.stdout = io.open(c2pread, 'rb', bufsize)\n",
+ "Sending build context to Docker daemon 18.99kB\n",
+ "Step 1/10 : FROM us-docker.pkg.dev/vertex-ai/training/tf-gpu.2-12.py310:latest\n",
+ " ---> bd2bbbab7d71\n",
+ "Step 2/10 : RUN mkdir -m 777 -p /usr/app /home\n",
+ " ---> Running in 358dbf3724e8\n",
+ "Removing intermediate container 358dbf3724e8\n",
+ " ---> edf7be7209d7\n",
+ "Step 3/10 : WORKDIR /usr/app\n",
+ " ---> Running in a23be90e59c5\n",
+ "Removing intermediate container a23be90e59c5\n",
+ " ---> c35f2baa964c\n",
+ "Step 4/10 : ENV HOME=/home\n",
+ " ---> Running in 0137537b093b\n",
+ "Removing intermediate container 0137537b093b\n",
+ " ---> 64af9b387e54\n",
+ "Step 5/10 : ENV PYTHONDONTWRITEBYTECODE=1\n",
+ " ---> Running in cc5806ee80a2\n",
+ "Removing intermediate container cc5806ee80a2\n",
+ " ---> dfe914f7ecbc\n",
+ "Step 6/10 : RUN rm -rf /var/sitecustomize\n",
+ " ---> Running in 3e7c5fa57fe2\n",
+ "Removing intermediate container 3e7c5fa57fe2\n",
+ " ---> fa997bc68c88\n",
+ "Step 7/10 : COPY [\"./requirements.txt\", \"./requirements.txt\"]\n",
+ " ---> 7c46da48c940\n",
+ "Step 8/10 : RUN pip3 install --no-cache-dir -r ./requirements.txt\n",
+ " ---> Running in 6502f72390d6\n",
+ "Collecting evaluate (from -r ./requirements.txt (line 1))\n",
+ " Obtaining dependency information for evaluate from https://files.pythonhosted.org/packages/70/63/7644a1eb7b0297e585a6adec98ed9e575309bb973c33b394dae66bc35c69/evaluate-0.4.1-py3-none-any.whl.metadata\n",
+ " Downloading evaluate-0.4.1-py3-none-any.whl.metadata (9.4 kB)\n",
+ "Collecting nltk (from -r ./requirements.txt (line 2))\n",
+ " Downloading nltk-3.8.1-py3-none-any.whl (1.5 MB)\n",
+ " ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.5/1.5 MB 61.9 MB/s eta 0:00:00\n",
+ "Collecting transformers (from -r ./requirements.txt (line 3))\n",
+ " Obtaining dependency information for transformers from https://files.pythonhosted.org/packages/9a/06/e4ec2a321e57c03b7e9345d709d554a52c33760e5015fdff0919d9459af0/transformers-4.35.0-py3-none-any.whl.metadata\n",
+ " Downloading transformers-4.35.0-py3-none-any.whl.metadata (123 kB)\n",
+ " ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 123.1/123.1 kB 203.6 MB/s eta 0:00:00\n",
+ "Collecting keras_nlp (from -r ./requirements.txt (line 4))\n",
+ " Obtaining dependency information for keras_nlp from https://files.pythonhosted.org/packages/37/d4/dfd85606db811af2138e97fc480eb7ed709042dd96dd453868bede0929fe/keras_nlp-0.6.2-py3-none-any.whl.metadata\n",
+ " Downloading keras_nlp-0.6.2-py3-none-any.whl.metadata (7.2 kB)\n",
+ "Collecting datasets (from -r ./requirements.txt (line 5))\n",
+ " Obtaining dependency information for datasets from https://files.pythonhosted.org/packages/7c/55/b3432f43d6d7fee999bb23a547820d74c48ec540f5f7842e41aa5d8d5f3a/datasets-2.14.6-py3-none-any.whl.metadata\n",
+ " Downloading datasets-2.14.6-py3-none-any.whl.metadata (19 kB)\n",
+ "Collecting rouge_score (from -r ./requirements.txt (line 6))\n",
+ " Downloading rouge_score-0.1.2.tar.gz (17 kB)\n",
+ " Preparing metadata (setup.py): started\n",
+ " Preparing metadata (setup.py): finished with status 'done'\n",
+ "Requirement already satisfied: numpy>=1.17 in /opt/conda/lib/python3.10/site-packages (from evaluate->-r ./requirements.txt (line 1)) (1.23.5)\n",
+ "Collecting dill (from evaluate->-r ./requirements.txt (line 1))\n",
+ " Obtaining dependency information for dill from https://files.pythonhosted.org/packages/f5/3a/74a29b11cf2cdfcd6ba89c0cecd70b37cd1ba7b77978ce611eb7a146a832/dill-0.3.7-py3-none-any.whl.metadata\n",
+ " Downloading dill-0.3.7-py3-none-any.whl.metadata (9.9 kB)\n",
+ "Requirement already satisfied: pandas in /opt/conda/lib/python3.10/site-packages (from evaluate->-r ./requirements.txt (line 1)) (2.0.3)\n",
+ "Requirement already satisfied: requests>=2.19.0 in /opt/conda/lib/python3.10/site-packages (from evaluate->-r ./requirements.txt (line 1)) (2.31.0)\n",
+ "Requirement already satisfied: tqdm>=4.62.1 in /opt/conda/lib/python3.10/site-packages (from evaluate->-r ./requirements.txt (line 1)) (4.65.0)\n",
+ "Collecting xxhash (from evaluate->-r ./requirements.txt (line 1))\n",
+ " Obtaining dependency information for xxhash from https://files.pythonhosted.org/packages/80/8a/1dd41557883b6196f8f092011a5c1f72d4d44cf36d7b67d4a5efe3127949/xxhash-3.4.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata\n",
+ " Downloading xxhash-3.4.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (12 kB)\n",
+ "Collecting multiprocess (from evaluate->-r ./requirements.txt (line 1))\n",
+ " Obtaining dependency information for multiprocess from https://files.pythonhosted.org/packages/35/a8/36d8d7b3e46b377800d8dec47891cdf05842d1a2366909ae4a0c89fbc5e6/multiprocess-0.70.15-py310-none-any.whl.metadata\n",
+ " Downloading multiprocess-0.70.15-py310-none-any.whl.metadata (7.2 kB)\n",
+ "Requirement already satisfied: fsspec[http]>=2021.05.0 in /opt/conda/lib/python3.10/site-packages (from evaluate->-r ./requirements.txt (line 1)) (2023.6.0)\n",
+ "Collecting huggingface-hub>=0.7.0 (from evaluate->-r ./requirements.txt (line 1))\n",
+ " Obtaining dependency information for huggingface-hub>=0.7.0 from https://files.pythonhosted.org/packages/ef/b5/b6107bd65fa4c96fdf00e4733e2fe5729bb9e5e09997f63074bb43d3ab28/huggingface_hub-0.18.0-py3-none-any.whl.metadata\n",
+ " Downloading huggingface_hub-0.18.0-py3-none-any.whl.metadata (13 kB)\n",
+ "Requirement already satisfied: packaging in /opt/conda/lib/python3.10/site-packages (from evaluate->-r ./requirements.txt (line 1)) (23.1)\n",
+ "Collecting responses<0.19 (from evaluate->-r ./requirements.txt (line 1))\n",
+ " Downloading responses-0.18.0-py3-none-any.whl (38 kB)\n",
+ "Requirement already satisfied: click in /opt/conda/lib/python3.10/site-packages (from nltk->-r ./requirements.txt (line 2)) (8.1.6)\n",
+ "Requirement already satisfied: joblib in /opt/conda/lib/python3.10/site-packages (from nltk->-r ./requirements.txt (line 2)) (1.3.1)\n",
+ "Collecting regex>=2021.8.3 (from nltk->-r ./requirements.txt (line 2))\n",
+ " Obtaining dependency information for regex>=2021.8.3 from https://files.pythonhosted.org/packages/8f/3e/4b8b40eb3c80aeaf360f0361d956d129bb3d23b2a3ecbe3a04a8f3bdd6d3/regex-2023.10.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata\n",
+ " Downloading regex-2023.10.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (40 kB)\n",
+ " ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 40.9/40.9 kB 178.9 MB/s eta 0:00:00\n",
+ "Requirement already satisfied: filelock in /opt/conda/lib/python3.10/site-packages (from transformers->-r ./requirements.txt (line 3)) (3.12.2)\n",
+ "Requirement already satisfied: pyyaml>=5.1 in /opt/conda/lib/python3.10/site-packages (from transformers->-r ./requirements.txt (line 3)) (6.0.1)\n",
+ "Collecting tokenizers<0.15,>=0.14 (from transformers->-r ./requirements.txt (line 3))\n",
+ " Obtaining dependency information for tokenizers<0.15,>=0.14 from https://files.pythonhosted.org/packages/a7/7b/c1f643eb086b6c5c33eef0c3752e37624bd23e4cbc9f1332748f1c6252d1/tokenizers-0.14.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata\n",
+ " Downloading tokenizers-0.14.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.7 kB)\n",
+ "Collecting safetensors>=0.3.1 (from transformers->-r ./requirements.txt (line 3))\n",
+ " Obtaining dependency information for safetensors>=0.3.1 from https://files.pythonhosted.org/packages/20/4e/878b080dbda92666233ec6f316a53969edcb58eab1aa399a64d0521cf953/safetensors-0.4.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata\n",
+ " Downloading safetensors-0.4.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (3.8 kB)\n",
+ "Collecting keras-core (from keras_nlp->-r ./requirements.txt (line 4))\n",
+ " Obtaining dependency information for keras-core from https://files.pythonhosted.org/packages/95/f7/b8dcff937ea64f822f0d3fe8c6010793406b82d14467cd0e9eecea458a40/keras_core-0.1.7-py3-none-any.whl.metadata\n",
+ " Downloading keras_core-0.1.7-py3-none-any.whl.metadata (4.3 kB)\n",
+ "Requirement already satisfied: absl-py in /opt/conda/lib/python3.10/site-packages (from keras_nlp->-r ./requirements.txt (line 4)) (1.4.0)\n",
+ "Requirement already satisfied: rich in /opt/conda/lib/python3.10/site-packages (from keras_nlp->-r ./requirements.txt (line 4)) (13.5.1)\n",
+ "Requirement already satisfied: dm-tree in /opt/conda/lib/python3.10/site-packages (from keras_nlp->-r ./requirements.txt (line 4)) (0.1.8)\n",
+ "Collecting tensorflow-text (from keras_nlp->-r ./requirements.txt (line 4))\n",
+ " Obtaining dependency information for tensorflow-text from https://files.pythonhosted.org/packages/0b/5f/8b301d2d0cea8334c22aaeb8880ce115ec34d7eba20f7b08c64202011a85/tensorflow_text-2.14.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata\n",
+ " Downloading tensorflow_text-2.14.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (1.9 kB)\n",
+ "Requirement already satisfied: pyarrow>=8.0.0 in /opt/conda/lib/python3.10/site-packages (from datasets->-r ./requirements.txt (line 5)) (12.0.1)\n",
+ "Requirement already satisfied: aiohttp in /opt/conda/lib/python3.10/site-packages (from datasets->-r ./requirements.txt (line 5)) (3.8.5)\n",
+ "Requirement already satisfied: six>=1.14.0 in /opt/conda/lib/python3.10/site-packages (from rouge_score->-r ./requirements.txt (line 6)) (1.16.0)\n",
+ "Requirement already satisfied: attrs>=17.3.0 in /opt/conda/lib/python3.10/site-packages (from aiohttp->datasets->-r ./requirements.txt (line 5)) (23.1.0)\n",
+ "Requirement already satisfied: charset-normalizer<4.0,>=2.0 in /opt/conda/lib/python3.10/site-packages (from aiohttp->datasets->-r ./requirements.txt (line 5)) (3.2.0)\n",
+ "Requirement already satisfied: multidict<7.0,>=4.5 in /opt/conda/lib/python3.10/site-packages (from aiohttp->datasets->-r ./requirements.txt (line 5)) (6.0.4)\n",
+ "Requirement already satisfied: async-timeout<5.0,>=4.0.0a3 in /opt/conda/lib/python3.10/site-packages (from aiohttp->datasets->-r ./requirements.txt (line 5)) (4.0.2)\n",
+ "Requirement already satisfied: yarl<2.0,>=1.0 in /opt/conda/lib/python3.10/site-packages (from aiohttp->datasets->-r ./requirements.txt (line 5)) (1.9.2)\n",
+ "Requirement already satisfied: frozenlist>=1.1.1 in /opt/conda/lib/python3.10/site-packages (from aiohttp->datasets->-r ./requirements.txt (line 5)) (1.4.0)\n",
+ "Requirement already satisfied: aiosignal>=1.1.2 in /opt/conda/lib/python3.10/site-packages (from aiohttp->datasets->-r ./requirements.txt (line 5)) (1.3.1)\n",
+ "Requirement already satisfied: typing-extensions>=3.7.4.3 in /opt/conda/lib/python3.10/site-packages (from huggingface-hub>=0.7.0->evaluate->-r ./requirements.txt (line 1)) (4.7.1)\n",
+ "Requirement already satisfied: idna<4,>=2.5 in /opt/conda/lib/python3.10/site-packages (from requests>=2.19.0->evaluate->-r ./requirements.txt (line 1)) (3.4)\n",
+ "Requirement already satisfied: urllib3<3,>=1.21.1 in /opt/conda/lib/python3.10/site-packages (from requests>=2.19.0->evaluate->-r ./requirements.txt (line 1)) (1.26.16)\n",
+ "Requirement already satisfied: certifi>=2017.4.17 in /opt/conda/lib/python3.10/site-packages (from requests>=2.19.0->evaluate->-r ./requirements.txt (line 1)) (2023.7.22)\n",
+ "Collecting huggingface-hub>=0.7.0 (from evaluate->-r ./requirements.txt (line 1))\n",
+ " Obtaining dependency information for huggingface-hub>=0.7.0 from https://files.pythonhosted.org/packages/aa/f3/3fc97336a0e90516901befd4f500f08d691034d387406fdbde85bea827cc/huggingface_hub-0.17.3-py3-none-any.whl.metadata\n",
+ " Downloading huggingface_hub-0.17.3-py3-none-any.whl.metadata (13 kB)\n",
+ "Collecting namex (from keras-core->keras_nlp->-r ./requirements.txt (line 4))\n",
+ " Downloading namex-0.0.7-py3-none-any.whl (5.8 kB)\n",
+ "Requirement already satisfied: h5py in /opt/conda/lib/python3.10/site-packages (from keras-core->keras_nlp->-r ./requirements.txt (line 4)) (3.9.0)\n",
+ "Requirement already satisfied: python-dateutil>=2.8.2 in /opt/conda/lib/python3.10/site-packages (from pandas->evaluate->-r ./requirements.txt (line 1)) (2.8.2)\n",
+ "Requirement already satisfied: pytz>=2020.1 in /opt/conda/lib/python3.10/site-packages (from pandas->evaluate->-r ./requirements.txt (line 1)) (2023.3)\n",
+ "Requirement already satisfied: tzdata>=2022.1 in /opt/conda/lib/python3.10/site-packages (from pandas->evaluate->-r ./requirements.txt (line 1)) (2023.3)\n",
+ "Requirement already satisfied: markdown-it-py>=2.2.0 in /opt/conda/lib/python3.10/site-packages (from rich->keras_nlp->-r ./requirements.txt (line 4)) (3.0.0)\n",
+ "Requirement already satisfied: pygments<3.0.0,>=2.13.0 in /opt/conda/lib/python3.10/site-packages (from rich->keras_nlp->-r ./requirements.txt (line 4)) (2.15.1)\n",
+ "Collecting tensorflow-hub>=0.13.0 (from tensorflow-text->keras_nlp->-r ./requirements.txt (line 4))\n",
+ " Obtaining dependency information for tensorflow-hub>=0.13.0 from https://files.pythonhosted.org/packages/6e/1a/fbae76f4057b9bcdf9468025d7a8ca952dec14bfafb9fc0b1e4244ce212f/tensorflow_hub-0.15.0-py2.py3-none-any.whl.metadata\n",
+ " Downloading tensorflow_hub-0.15.0-py2.py3-none-any.whl.metadata (1.3 kB)\n",
+ "Collecting tensorflow<2.15,>=2.14.0 (from tensorflow-text->keras_nlp->-r ./requirements.txt (line 4))\n",
+ " Obtaining dependency information for tensorflow<2.15,>=2.14.0 from https://files.pythonhosted.org/packages/e2/7a/c7762c698fb1ac41a7e3afee51dc72aa3ec74ae8d2f57ce19a9cded3a4af/tensorflow-2.14.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata\n",
+ " Downloading tensorflow-2.14.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (4.1 kB)\n",
+ "Requirement already satisfied: mdurl~=0.1 in /opt/conda/lib/python3.10/site-packages (from markdown-it-py>=2.2.0->rich->keras_nlp->-r ./requirements.txt (line 4)) (0.1.2)\n",
+ "Requirement already satisfied: astunparse>=1.6.0 in /opt/conda/lib/python3.10/site-packages (from tensorflow<2.15,>=2.14.0->tensorflow-text->keras_nlp->-r ./requirements.txt (line 4)) (1.6.3)\n",
+ "Requirement already satisfied: flatbuffers>=23.5.26 in /opt/conda/lib/python3.10/site-packages (from tensorflow<2.15,>=2.14.0->tensorflow-text->keras_nlp->-r ./requirements.txt (line 4)) (23.5.26)\n",
+ "Requirement already satisfied: gast!=0.5.0,!=0.5.1,!=0.5.2,>=0.2.1 in /opt/conda/lib/python3.10/site-packages (from tensorflow<2.15,>=2.14.0->tensorflow-text->keras_nlp->-r ./requirements.txt (line 4)) (0.4.0)\n",
+ "Requirement already satisfied: google-pasta>=0.1.1 in /opt/conda/lib/python3.10/site-packages (from tensorflow<2.15,>=2.14.0->tensorflow-text->keras_nlp->-r ./requirements.txt (line 4)) (0.2.0)\n",
+ "Requirement already satisfied: libclang>=13.0.0 in /opt/conda/lib/python3.10/site-packages (from tensorflow<2.15,>=2.14.0->tensorflow-text->keras_nlp->-r ./requirements.txt (line 4)) (16.0.6)\n",
+ "Requirement already satisfied: ml-dtypes==0.2.0 in /opt/conda/lib/python3.10/site-packages (from tensorflow<2.15,>=2.14.0->tensorflow-text->keras_nlp->-r ./requirements.txt (line 4)) (0.2.0)\n",
+ "Requirement already satisfied: opt-einsum>=2.3.2 in /opt/conda/lib/python3.10/site-packages (from tensorflow<2.15,>=2.14.0->tensorflow-text->keras_nlp->-r ./requirements.txt (line 4)) (3.3.0)\n",
+ "Collecting protobuf!=4.21.0,!=4.21.1,!=4.21.2,!=4.21.3,!=4.21.4,!=4.21.5,<5.0.0dev,>=3.20.3 (from tensorflow<2.15,>=2.14.0->tensorflow-text->keras_nlp->-r ./requirements.txt (line 4))\n",
+ " Obtaining dependency information for protobuf!=4.21.0,!=4.21.1,!=4.21.2,!=4.21.3,!=4.21.4,!=4.21.5,<5.0.0dev,>=3.20.3 from https://files.pythonhosted.org/packages/ae/32/45b1cf0c5d4a3ba881f5164c26af877c0dabfe6de0019d426aa0e5cf6806/protobuf-4.25.0-cp37-abi3-manylinux2014_x86_64.whl.metadata\n",
+ " Downloading protobuf-4.25.0-cp37-abi3-manylinux2014_x86_64.whl.metadata (541 bytes)\n",
+ "Requirement already satisfied: setuptools in /opt/conda/lib/python3.10/site-packages (from tensorflow<2.15,>=2.14.0->tensorflow-text->keras_nlp->-r ./requirements.txt (line 4)) (68.0.0)\n",
+ "Requirement already satisfied: termcolor>=1.1.0 in /opt/conda/lib/python3.10/site-packages (from tensorflow<2.15,>=2.14.0->tensorflow-text->keras_nlp->-r ./requirements.txt (line 4)) (2.3.0)\n",
+ "Requirement already satisfied: wrapt<1.15,>=1.11.0 in /opt/conda/lib/python3.10/site-packages (from tensorflow<2.15,>=2.14.0->tensorflow-text->keras_nlp->-r ./requirements.txt (line 4)) (1.14.1)\n",
+ "Requirement already satisfied: tensorflow-io-gcs-filesystem>=0.23.1 in /opt/conda/lib/python3.10/site-packages (from tensorflow<2.15,>=2.14.0->tensorflow-text->keras_nlp->-r ./requirements.txt (line 4)) (0.32.0)\n",
+ "Requirement already satisfied: grpcio<2.0,>=1.24.3 in /opt/conda/lib/python3.10/site-packages (from tensorflow<2.15,>=2.14.0->tensorflow-text->keras_nlp->-r ./requirements.txt (line 4)) (1.56.2)\n",
+ "Collecting tensorboard<2.15,>=2.14 (from tensorflow<2.15,>=2.14.0->tensorflow-text->keras_nlp->-r ./requirements.txt (line 4))\n",
+ " Obtaining dependency information for tensorboard<2.15,>=2.14 from https://files.pythonhosted.org/packages/73/a2/66ed644f6ed1562e0285fcd959af17670ea313c8f331c46f79ee77187eb9/tensorboard-2.14.1-py3-none-any.whl.metadata\n",
+ " Downloading tensorboard-2.14.1-py3-none-any.whl.metadata (1.7 kB)\n",
+ "Collecting tensorflow-estimator<2.15,>=2.14.0 (from tensorflow<2.15,>=2.14.0->tensorflow-text->keras_nlp->-r ./requirements.txt (line 4))\n",
+ " Obtaining dependency information for tensorflow-estimator<2.15,>=2.14.0 from https://files.pythonhosted.org/packages/d1/da/4f264c196325bb6e37a6285caec5b12a03def489b57cc1fdac02bb6272cd/tensorflow_estimator-2.14.0-py2.py3-none-any.whl.metadata\n",
+ " Downloading tensorflow_estimator-2.14.0-py2.py3-none-any.whl.metadata (1.3 kB)\n",
+ "Collecting keras<2.15,>=2.14.0 (from tensorflow<2.15,>=2.14.0->tensorflow-text->keras_nlp->-r ./requirements.txt (line 4))\n",
+ " Obtaining dependency information for keras<2.15,>=2.14.0 from https://files.pythonhosted.org/packages/fe/58/34d4d8f1aa11120c2d36d7ad27d0526164b1a8ae45990a2fede31d0e59bf/keras-2.14.0-py3-none-any.whl.metadata\n",
+ " Downloading keras-2.14.0-py3-none-any.whl.metadata (2.4 kB)\n",
+ "Requirement already satisfied: wheel<1.0,>=0.23.0 in /opt/conda/lib/python3.10/site-packages (from astunparse>=1.6.0->tensorflow<2.15,>=2.14.0->tensorflow-text->keras_nlp->-r ./requirements.txt (line 4)) (0.41.0)\n",
+ "Requirement already satisfied: google-auth<3,>=1.6.3 in /opt/conda/lib/python3.10/site-packages (from tensorboard<2.15,>=2.14->tensorflow<2.15,>=2.14.0->tensorflow-text->keras_nlp->-r ./requirements.txt (line 4)) (2.22.0)\n",
+ "Requirement already satisfied: google-auth-oauthlib<1.1,>=0.5 in /opt/conda/lib/python3.10/site-packages (from tensorboard<2.15,>=2.14->tensorflow<2.15,>=2.14.0->tensorflow-text->keras_nlp->-r ./requirements.txt (line 4)) (1.0.0)\n",
+ "Requirement already satisfied: markdown>=2.6.8 in /opt/conda/lib/python3.10/site-packages (from tensorboard<2.15,>=2.14->tensorflow<2.15,>=2.14.0->tensorflow-text->keras_nlp->-r ./requirements.txt (line 4)) (3.4.4)\n",
+ "Requirement already satisfied: tensorboard-data-server<0.8.0,>=0.7.0 in /opt/conda/lib/python3.10/site-packages (from tensorboard<2.15,>=2.14->tensorflow<2.15,>=2.14.0->tensorflow-text->keras_nlp->-r ./requirements.txt (line 4)) (0.7.1)\n",
+ "Requirement already satisfied: werkzeug>=1.0.1 in /opt/conda/lib/python3.10/site-packages (from tensorboard<2.15,>=2.14->tensorflow<2.15,>=2.14.0->tensorflow-text->keras_nlp->-r ./requirements.txt (line 4)) (2.3.6)\n",
+ "Requirement already satisfied: cachetools<6.0,>=2.0.0 in /opt/conda/lib/python3.10/site-packages (from google-auth<3,>=1.6.3->tensorboard<2.15,>=2.14->tensorflow<2.15,>=2.14.0->tensorflow-text->keras_nlp->-r ./requirements.txt (line 4)) (5.3.1)\n",
+ "Requirement already satisfied: pyasn1-modules>=0.2.1 in /opt/conda/lib/python3.10/site-packages (from google-auth<3,>=1.6.3->tensorboard<2.15,>=2.14->tensorflow<2.15,>=2.14.0->tensorflow-text->keras_nlp->-r ./requirements.txt (line 4)) (0.3.0)\n",
+ "Requirement already satisfied: rsa<5,>=3.1.4 in /opt/conda/lib/python3.10/site-packages (from google-auth<3,>=1.6.3->tensorboard<2.15,>=2.14->tensorflow<2.15,>=2.14.0->tensorflow-text->keras_nlp->-r ./requirements.txt (line 4)) (4.9)\n",
+ "Requirement already satisfied: requests-oauthlib>=0.7.0 in /opt/conda/lib/python3.10/site-packages (from google-auth-oauthlib<1.1,>=0.5->tensorboard<2.15,>=2.14->tensorflow<2.15,>=2.14.0->tensorflow-text->keras_nlp->-r ./requirements.txt (line 4)) (1.3.1)\n",
+ "Requirement already satisfied: MarkupSafe>=2.1.1 in /opt/conda/lib/python3.10/site-packages (from werkzeug>=1.0.1->tensorboard<2.15,>=2.14->tensorflow<2.15,>=2.14.0->tensorflow-text->keras_nlp->-r ./requirements.txt (line 4)) (2.1.3)\n",
+ "Requirement already satisfied: pyasn1<0.6.0,>=0.4.6 in /opt/conda/lib/python3.10/site-packages (from pyasn1-modules>=0.2.1->google-auth<3,>=1.6.3->tensorboard<2.15,>=2.14->tensorflow<2.15,>=2.14.0->tensorflow-text->keras_nlp->-r ./requirements.txt (line 4)) (0.5.0)\n",
+ "Requirement already satisfied: oauthlib>=3.0.0 in /opt/conda/lib/python3.10/site-packages (from requests-oauthlib>=0.7.0->google-auth-oauthlib<1.1,>=0.5->tensorboard<2.15,>=2.14->tensorflow<2.15,>=2.14.0->tensorflow-text->keras_nlp->-r ./requirements.txt (line 4)) (3.2.2)\n",
+ "Downloading evaluate-0.4.1-py3-none-any.whl (84 kB)\n",
+ " ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 84.1/84.1 kB 198.5 MB/s eta 0:00:00\n",
+ "Downloading transformers-4.35.0-py3-none-any.whl (7.9 MB)\n",
+ " ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 7.9/7.9 MB 124.2 MB/s eta 0:00:00\n",
+ "Downloading keras_nlp-0.6.2-py3-none-any.whl (590 kB)\n",
+ " ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 590.1/590.1 kB 225.3 MB/s eta 0:00:00\n",
+ "Downloading datasets-2.14.6-py3-none-any.whl (493 kB)\n",
+ " ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 493.7/493.7 kB 214.8 MB/s eta 0:00:00\n",
+ "Downloading dill-0.3.7-py3-none-any.whl (115 kB)\n",
+ " ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 115.3/115.3 kB 230.1 MB/s eta 0:00:00\n",
+ "Downloading regex-2023.10.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (773 kB)\n",
+ " ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 773.9/773.9 kB 233.9 MB/s eta 0:00:00\n",
+ "Downloading safetensors-0.4.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB)\n",
+ " ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.3/1.3 MB 228.3 MB/s eta 0:00:00\n",
+ "Downloading tokenizers-0.14.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.8 MB)\n",
+ " ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3.8/3.8 MB 223.3 MB/s eta 0:00:00\n",
+ "Downloading huggingface_hub-0.17.3-py3-none-any.whl (295 kB)\n",
+ " ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 295.0/295.0 kB 240.5 MB/s eta 0:00:00\n",
+ "Downloading keras_core-0.1.7-py3-none-any.whl (950 kB)\n",
+ " ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 950.8/950.8 kB 236.1 MB/s eta 0:00:00\n",
+ "Downloading multiprocess-0.70.15-py310-none-any.whl (134 kB)\n",
+ " ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 134.8/134.8 kB 216.1 MB/s eta 0:00:00\n",
+ "Downloading tensorflow_text-2.14.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (6.5 MB)\n",
+ " ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 6.5/6.5 MB 158.6 MB/s eta 0:00:00\n",
+ "Downloading xxhash-3.4.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (194 kB)\n",
+ " ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 194.1/194.1 kB 223.4 MB/s eta 0:00:00\n",
+ "Downloading tensorflow-2.14.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (489.8 MB)\n",
+ " ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 489.8/489.8 MB 212.8 MB/s eta 0:00:00\n",
+ "Downloading tensorflow_hub-0.15.0-py2.py3-none-any.whl (85 kB)\n",
+ " ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 85.4/85.4 kB 185.0 MB/s eta 0:00:00\n",
+ "Downloading keras-2.14.0-py3-none-any.whl (1.7 MB)\n",
+ " ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.7/1.7 MB 219.8 MB/s eta 0:00:00\n",
+ "Downloading protobuf-4.25.0-cp37-abi3-manylinux2014_x86_64.whl (294 kB)\n",
+ " ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 294.4/294.4 kB 217.2 MB/s eta 0:00:00\n",
+ "Downloading tensorboard-2.14.1-py3-none-any.whl (5.5 MB)\n",
+ " ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 5.5/5.5 MB 212.0 MB/s eta 0:00:00\n",
+ "Downloading tensorflow_estimator-2.14.0-py2.py3-none-any.whl (440 kB)\n",
+ " ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 440.7/440.7 kB 223.9 MB/s eta 0:00:00\n",
+ "Building wheels for collected packages: rouge_score\n",
+ " Building wheel for rouge_score (setup.py): started\n",
+ " Building wheel for rouge_score (setup.py): finished with status 'done'\n",
+ " Created wheel for rouge_score: filename=rouge_score-0.1.2-py3-none-any.whl size=24933 sha256=7fb2b5092b892710a8c128f5633d6f5f22dc260df119b78067900b8c74e972a4\n",
+ " Stored in directory: /tmp/pip-ephem-wheel-cache-sagd5q__/wheels/5f/dd/89/461065a73be61a532ff8599a28e9beef17985c9e9c31e541b4\n",
+ "Successfully built rouge_score\n",
+ "Installing collected packages: namex, xxhash, tensorflow-estimator, safetensors, regex, protobuf, keras, dill, tensorflow-hub, responses, nltk, multiprocess, huggingface-hub, tokenizers, rouge_score, keras-core, transformers, tensorboard, datasets, tensorflow, evaluate, tensorflow-text, keras_nlp\n",
+ " Attempting uninstall: tensorflow-estimator\n",
+ " Found existing installation: tensorflow-estimator 2.12.0\n",
+ " Uninstalling tensorflow-estimator-2.12.0:\n",
+ " Successfully uninstalled tensorflow-estimator-2.12.0\n",
+ " Attempting uninstall: protobuf\n",
+ " Found existing installation: protobuf 3.20.1\n",
+ " Uninstalling protobuf-3.20.1:\n",
+ " Successfully uninstalled protobuf-3.20.1\n",
+ " Attempting uninstall: keras\n",
+ " Found existing installation: keras 2.12.0\n",
+ " Uninstalling keras-2.12.0:\n",
+ " Successfully uninstalled keras-2.12.0\n",
+ " Attempting uninstall: tensorboard\n",
+ " Found existing installation: tensorboard 2.12.3\n",
+ " Uninstalling tensorboard-2.12.3:\n",
+ " Successfully uninstalled tensorboard-2.12.3\n",
+ " Attempting uninstall: tensorflow\n",
+ " Found existing installation: tensorflow 2.12.0\n",
+ " Uninstalling tensorflow-2.12.0:\n",
+ " Successfully uninstalled tensorflow-2.12.0\n",
+ "Successfully installed datasets-2.14.6 dill-0.3.7 evaluate-0.4.1 huggingface-hub-0.17.3 keras-2.14.0 keras-core-0.1.7 keras_nlp-0.6.2 multiprocess-0.70.15 namex-0.0.7 nltk-3.8.1 protobuf-4.25.0 regex-2023.10.3 responses-0.18.0 rouge_score-0.1.2 safetensors-0.4.0 tensorboard-2.14.1 tensorflow-2.14.0 tensorflow-estimator-2.14.0 tensorflow-hub-0.15.0 tensorflow-text-2.14.0 tokenizers-0.14.1 transformers-4.35.0 xxhash-3.4.1\n",
+ "\u001b[91mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.\n",
+ "google-cloud-datastore 1.15.5 requires protobuf<4.0.0dev, but you have protobuf 4.25.0 which is incompatible.\n",
+ "\u001b[0m\u001b[91mWARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv\n",
+ "\u001b[0mRemoving intermediate container 6502f72390d6\n",
+ " ---> 97a4b7990a59\n",
+ "Step 9/10 : COPY [\"trainer\", \"trainer\"]\n",
+ " ---> dce93f89c146\n",
+ "Step 10/10 : ENTRYPOINT [\"python3\", \"-m\", \"trainer.task\"]\n",
+ " ---> Running in beccb40ff5ce\n",
+ "Removing intermediate container beccb40ff5ce\n",
+ " ---> 6be133543c75\n",
+ "Successfully built 6be133543c75\n",
+ "Successfully tagged gcr.io/cit-oconnellka-9999/cloudai-autogenerated/flan-t5-training-tf3:20231103.17.39.12.779660\n",
+ "\n",
+ "A custom container image is built locally.\n",
+ "\n",
+ "/usr/lib/google-cloud-sdk/platform/bundledpythonunix/lib/python3.9/subprocess.py:935: RuntimeWarning: line buffering (buffering=1) isn't supported in binary mode, the default buffer size will be used\n",
+ " self.stdin = io.open(p2cwrite, 'wb', bufsize)\n",
+ "/usr/lib/google-cloud-sdk/platform/bundledpythonunix/lib/python3.9/subprocess.py:941: RuntimeWarning: line buffering (buffering=1) isn't supported in binary mode, the default buffer size will be used\n",
+ " self.stdout = io.open(c2pread, 'rb', bufsize)\n",
+ "The push refers to repository [gcr.io/cit-oconnellka-9999/cloudai-autogenerated/flan-t5-training-tf3]\n",
+ "de565aa0e952: Preparing\n",
+ "8027f564cadd: Preparing\n",
+ "5bd43a783137: Preparing\n",
+ "c2cec13eda62: Preparing\n",
+ "73c814d198fd: Preparing\n",
+ "e42695c7b436: Preparing\n",
+ "e42695c7b436: Preparing\n",
+ "7e34967c8575: Preparing\n",
+ "19c1ff49a1a3: Preparing\n",
+ "724eb7d1e386: Preparing\n",
+ "e7df186da59e: Preparing\n",
+ "e7df186da59e: Preparing\n",
+ "d9e5455afa58: Preparing\n",
+ "a4f1c7b5b5c5: Preparing\n",
+ "1eeca563762d: Preparing\n",
+ "b3f8d9df367e: Preparing\n",
+ "29e2658ae6ea: Preparing\n",
+ "228616cf4f10: Preparing\n",
+ "ae32b7336b96: Preparing\n",
+ "ae32b7336b96: Preparing\n",
+ "ea7b0ccc272e: Preparing\n",
+ "01d4173a3960: Preparing\n",
+ "c235d251a607: Preparing\n",
+ "f2833e4d69b4: Preparing\n",
+ "49fc5a524f1f: Preparing\n",
+ "e175e85d3600: Preparing\n",
+ "55bfb3527de7: Preparing\n",
+ "ee67859f37c6: Preparing\n",
+ "ed7e041f0699: Preparing\n",
+ "0235cf47cbae: Preparing\n",
+ "724eb7d1e386: Waiting\n",
+ "2971cdbb4b45: Preparing\n",
+ "8374b2bc65e7: Preparing\n",
+ "3b93a6feba89: Preparing\n",
+ "b15400eb0fa7: Preparing\n",
+ "29ecaf0c2ae0: Preparing\n",
+ "a4f1c7b5b5c5: Waiting\n",
+ "41e673079fce: Preparing\n",
+ "e7df186da59e: Waiting\n",
+ "1eeca563762d: Waiting\n",
+ "cda9215846ee: Preparing\n",
+ "d9e5455afa58: Waiting\n",
+ "c5eafb4bee8f: Preparing\n",
+ "b3f8d9df367e: Waiting\n",
+ "29e2658ae6ea: Waiting\n",
+ "81182eb0608d: Preparing\n",
+ "f2baf76d88ee: Preparing\n",
+ "228616cf4f10: Waiting\n",
+ "01d4173a3960: Waiting\n",
+ "cdd7c7392317: Preparing\n",
+ "ae32b7336b96: Waiting\n",
+ "c235d251a607: Waiting\n",
+ "ea7b0ccc272e: Waiting\n",
+ "e175e85d3600: Waiting\n",
+ "f2833e4d69b4: Waiting\n",
+ "b15400eb0fa7: Waiting\n",
+ "29ecaf0c2ae0: Waiting\n",
+ "2971cdbb4b45: Waiting\n",
+ "49fc5a524f1f: Waiting\n",
+ "41e673079fce: Waiting\n",
+ "55bfb3527de7: Waiting\n",
+ "ee67859f37c6: Waiting\n",
+ "cda9215846ee: Waiting\n",
+ "3b93a6feba89: Waiting\n",
+ "8374b2bc65e7: Waiting\n",
+ "ed7e041f0699: Waiting\n",
+ "c5eafb4bee8f: Waiting\n",
+ "0235cf47cbae: Waiting\n",
+ "81182eb0608d: Waiting\n",
+ "cdd7c7392317: Waiting\n",
+ "f2baf76d88ee: Waiting\n",
+ "e42695c7b436: Waiting\n",
+ "7e34967c8575: Waiting\n",
+ "19c1ff49a1a3: Waiting\n",
+ "73c814d198fd: Pushed\n",
+ "5bd43a783137: Pushed\n",
+ "c2cec13eda62: Pushed\n",
+ "de565aa0e952: Pushed\n",
+ "e42695c7b436: Layer already exists\n",
+ "7e34967c8575: Layer already exists\n",
+ "19c1ff49a1a3: Layer already exists\n",
+ "e7df186da59e: Layer already exists\n",
+ "724eb7d1e386: Layer already exists\n",
+ "d9e5455afa58: Layer already exists\n",
+ "a4f1c7b5b5c5: Layer already exists\n",
+ "1eeca563762d: Layer already exists\n",
+ "b3f8d9df367e: Layer already exists\n",
+ "228616cf4f10: Layer already exists\n",
+ "29e2658ae6ea: Layer already exists\n",
+ "ae32b7336b96: Layer already exists\n",
+ "ea7b0ccc272e: Layer already exists\n",
+ "01d4173a3960: Layer already exists\n",
+ "c235d251a607: Layer already exists\n",
+ "f2833e4d69b4: Layer already exists\n",
+ "49fc5a524f1f: Layer already exists\n",
+ "e175e85d3600: Layer already exists\n",
+ "55bfb3527de7: Layer already exists\n",
+ "ee67859f37c6: Layer already exists\n",
+ "ed7e041f0699: Layer already exists\n",
+ "0235cf47cbae: Layer already exists\n",
+ "2971cdbb4b45: Layer already exists\n",
+ "8374b2bc65e7: Layer already exists\n",
+ "3b93a6feba89: Layer already exists\n",
+ "b15400eb0fa7: Layer already exists\n",
+ "41e673079fce: Layer already exists\n",
+ "29ecaf0c2ae0: Layer already exists\n",
+ "c5eafb4bee8f: Layer already exists\n",
+ "cda9215846ee: Layer already exists\n",
+ "81182eb0608d: Layer already exists\n",
+ "f2baf76d88ee: Layer already exists\n",
+ "cdd7c7392317: Layer already exists\n",
+ "8027f564cadd: Pushed\n",
+ "20231103.17.39.12.779660: digest: sha256:1240e61185c933e273e7bc6b5112358d85942e1f8bcb2cf076b3a144e5b748eb size: 8901\n",
+ "\n",
+ "Custom container image [gcr.io/cit-oconnellka-9999/cloudai-autogenerated/flan-t5-training-tf3:20231103.17.39.12.779660] is created for your custom job.\n",
+ "\n",
+ "CustomJob [projects/144763482491/locations/us-central1/customJobs/6207308081613766656] is submitted successfully.\n",
+ "\n",
+ "Your job is still active. You may view the status of your job with the command\n",
+ "\n",
+ " $ gcloud ai custom-jobs describe projects/144763482491/locations/us-central1/customJobs/6207308081613766656\n",
+ "\n",
+ "or continue streaming the logs with the command\n",
+ "\n",
+ " $ gcloud ai custom-jobs stream-logs projects/144763482491/locations/us-central1/customJobs/6207308081613766656\n"
+ ]
+ }
+ ],
+ "source": [
+ "!gcloud ai custom-jobs create \\\n",
+ "--region=us-central1 \\\n",
+ "--display-name=$display_name \\\n",
+ "--args=--model_name_or_path=$CHECKPOINT \\\n",
+ "--args=--train_file=$train_file \\\n",
+ "--args=--test_file=$test_file \\\n",
+ "--args=--validation_file=$validation_file \\\n",
+ "--args=--text_column=$text_column \\\n",
+ "--args=--summary_column=$summary_column \\\n",
+ "--args=--output_dir=gs://$BUCKET \\\n",
+ "--args=--source_prefix=$source_prefix \\\n",
+ "--worker-pool-spec=machine-type=$machine_type,replica-count=1,accelerator-type=$accelerator_type,executor-image-uri=$BASE_GPU_IMAGE_tf,local-package-path=autopkg-summarizer,python-module=trainer.task"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "403fe77f-990c-4518-ad3a-0aac3d2c8b92",
+ "metadata": {},
+ "source": [
+ "Once you start training the output from the command line should show you the command to use to view the progress of your training via the command `gcloud ai custom-jobs stream-logs <`. You can also monitor and view logs on the console by going to `Vertex AI > Training > Custom Jobs`\n",
+ "select your custom job and click on \"View Logs\""
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "dc7b81af-d424-427d-a37a-ac5da197567e",
+ "metadata": {},
+ "source": [
+ "## Deploy the Model"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "d772dc95-95e9-40f2-a5c5-dc782d6f7e14",
+ "metadata": {},
+ "source": [
+ "### Upload the Model to Vertex AI's Model Registry"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "4dff127a-4d14-4fa1-a22a-75662eccce02",
+ "metadata": {},
+ "source": [
+ "Once our model is done training you should see a model_save.pd file in your bucket. We will need this inorder to upload our model to the Model Registry. Here we are specifiying a prebuilt docker image that will run our predictions, the name of our model and the directory in our bucket that holds our **model_save.pd** file."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 3,
+ "id": "26042230-dc95-4b6c-bd32-bf3596e5de52",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "TF_PREDICTION_IMAGE_URI_RUNTIME = 'us-docker.pkg.dev/vertex-ai-restricted/prediction/tf_opt-gpu.2-12:latest'"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 14,
+ "id": "f5a3ef3b-8080-4e2d-bb8f-7e2f22c59e05",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Creating Model\n",
+ "Create Model backing LRO: projects/144763482491/locations/us-central1/models/3296764669607280640/operations/1237604172191236096\n",
+ "Model created. Resource name: projects/144763482491/locations/us-central1/models/3296764669607280640@1\n",
+ "To use this Model in another session:\n",
+ "model = aiplatform.Model('projects/144763482491/locations/us-central1/models/3296764669607280640@1')\n"
+ ]
+ }
+ ],
+ "source": [
+ "from google.cloud import aiplatform as vertexai\n",
+ "from google.cloud import aiplatform\n",
+ "\n",
+ "#give your model a name\n",
+ "MODEL_DISPLAY_NAME = \"summarizer-tf-runtime\"\n",
+ "MODEL_DESCRIPTION = \"summarizes scientific texts and pdfs\" #optional\n",
+ "\n",
+ "#add your project ID and location\n",
+ "project=''\n",
+ "location=''\n",
+ "\n",
+ "vertexai.init(project=project, location=location, staging_bucket=BUCKET)\n",
+ "\n",
+ "\n",
+ "model = aiplatform.Model.upload(\n",
+ " display_name=MODEL_DISPLAY_NAME,\n",
+ " description=MODEL_DESCRIPTION,\n",
+ " serving_container_image_uri=TF_PREDICTION_IMAGE_URI_RUNTIME,\n",
+ " serving_container_args=[\"--allow_precompilation\", \"--allow_compression\", \"--use_tfrt\"],\n",
+ " artifact_uri=f'gs://{BUCKET}/saved_model_artifacts_tf', #directory where our artifacts are in our bucket\n",
+ ")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "0d9772a7-f00d-4633-aa87-3861fa5dec79",
+ "metadata": {},
+ "source": [
+ "### Create a Endpoint and Deploy it to our Model"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "26122187-da1c-4d26-b1a0-ec1bd403cb19",
+ "metadata": {},
+ "source": [
+ "A **endpoint** is how the user of the model can communicate with the model. A single model endpoint responds by returning a single inference from at least one model. It can take 20 min or more to establish a endpoint."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 19,
+ "id": "74a2c3dd-0e34-4049-804b-940c9a440570",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Creating Endpoint\n",
+ "Create Endpoint backing LRO: projects/144763482491/locations/us-central1/endpoints/5468832298092724224/operations/884634551396073472\n",
+ "Endpoint created. Resource name: projects/144763482491/locations/us-central1/endpoints/5468832298092724224\n",
+ "To use this Endpoint in another session:\n",
+ "endpoint = aiplatform.Endpoint('projects/144763482491/locations/us-central1/endpoints/5468832298092724224')\n",
+ "Deploying model to Endpoint : projects/144763482491/locations/us-central1/endpoints/5468832298092724224\n",
+ "Deploy Endpoint model backing LRO: projects/144763482491/locations/us-central1/endpoints/5468832298092724224/operations/5601029261159825408\n",
+ "Endpoint model deployed. Resource name: projects/144763482491/locations/us-central1/endpoints/5468832298092724224\n"
+ ]
+ }
+ ],
+ "source": [
+ "ENDPOINT_DISPLAY_NAME = \"summarizer-endpoint\" \n",
+ "endpoint = aiplatform.Endpoint.create(display_name=ENDPOINT_DISPLAY_NAME)\n",
+ "\n",
+ "model_endpoint = model.deploy(\n",
+ " endpoint=endpoint,\n",
+ " deployed_model_display_name=MODEL_DISPLAY_NAME,\n",
+ " machine_type=\"n1-standard-8\",\n",
+ " accelerator_type=\"NVIDIA_TESLA_V100\",\n",
+ " accelerator_count=1,\n",
+ " traffic_percentage=100,\n",
+ " deploy_request_timeout=1200,\n",
+ " sync=True,\n",
+ ")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "4dee7811-559f-4bdc-b56e-b932b0831c0f",
+ "metadata": {},
+ "source": [
+ "Here we are creating a endpoint and deploying our model to said endpoint. We are deploying our endpoint using 1 GPU which can take 20min to run, feel free to try out other machine types that utilize more GPUs."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "1544dbe2-8f06-43e8-9b2c-e9d57332e00e",
+ "metadata": {},
+ "source": [
+ "## Delete All Resources"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "8b7a6d9a-3d0c-425e-a8f3-cb3160f1ee3b",
+ "metadata": {},
+ "source": [
+ "**Warning:** Once you are done don't forget to delete your endpoint, model, buckets, and shutdown or delete your Vertex AI notebook to avoid additional charges!"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "24d0fea3-fd4a-4735-b6d5-b910239b5ffa",
+ "metadata": {},
+ "source": [
+ "First we will delete our custom job. The command below will list custom jobs allowing you to gather the job id from the field called **'name:projects//locations/us-central1/customJobs/'**"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 22,
+ "id": "9721f52d-040f-4dc7-808e-8d1ffb5efb4a",
+ "metadata": {
+ "scrolled": true,
+ "tags": []
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Using endpoint [https://us-central1-aiplatform.googleapis.com/]\n",
+ "---\n",
+ "createTime: '2023-11-03T17:43:15.502041Z'\n",
+ "displayName: flan-t5-training-tf3\n",
+ "endTime: '2023-11-03T18:03:29Z'\n",
+ "jobSpec:\n",
+ " workerPoolSpecs:\n",
+ " - containerSpec:\n",
+ " args:\n",
+ " - --model_name_or_path=google/flan-t5-small\n",
+ " - --train_file=gs://flan-t5-model-resources/train.csv\n",
+ " - --test_file=gs://flan-t5-model-resources/test.csv\n",
+ " - --validation_file=gs://flan-t5-model-resources/validation.csv\n",
+ " - --text_column=article\n",
+ " - --summary_column=abstract\n",
+ " - --output_dir=gs://flan-t5-model-resources/\n",
+ " - '--source_prefix=summarize:'\n",
+ " imageUri: gcr.io/cit-oconnellka-9999/cloudai-autogenerated/flan-t5-training-tf3:20231103.17.39.12.779660\n",
+ " diskSpec:\n",
+ " bootDiskSizeGb: 100\n",
+ " bootDiskType: pd-ssd\n",
+ " machineSpec:\n",
+ " acceleratorCount: 1\n",
+ " acceleratorType: NVIDIA_TESLA_V100\n",
+ " machineType: n1-standard-4\n",
+ " replicaCount: '1'\n",
+ "name: projects/144763482491/locations/us-central1/customJobs/6207308081613766656\n",
+ "startTime: '2023-11-03T17:48:23Z'\n",
+ "state: JOB_STATE_SUCCEEDED\n",
+ "updateTime: '2023-11-03T18:03:44.992454Z'\n",
+ "---\n",
+ "createTime: '2023-11-02T04:29:33.732327Z'\n",
+ "displayName: flan-t5-training-tf3\n",
+ "endTime: '2023-11-02T04:34:24Z'\n",
+ "error:\n",
+ " code: 3\n",
+ " message: 'The replica workerpool0-0 exited with a non-zero status of 1. To find\n",
+ " out more about why your job exited please check the logs: https://console.cloud.google.com/logs/viewer?project=144763482491&resource=ml_job%2Fjob_id%2F2998009561996066816&advancedFilter=resource.type%3D%22ml_job%22%0Aresource.labels.job_id%3D%222998009561996066816%22'\n",
+ "jobSpec:\n",
+ " workerPoolSpecs:\n",
+ " - containerSpec:\n",
+ " args:\n",
+ " - --model_name_or_path=google/flan-t5-small\n",
+ " - --train_file=gs://flan-t5-model-resources/train.csv\n",
+ " - --test_file=gs://flan-t5-model-resources/test.csv\n",
+ " - --validation_file=gs://flan-t5-model-resources/validation.csv\n",
+ " - --text_column=article\n",
+ " - --summary_column=abstract\n",
+ " - --output_dir=gs://flan-t5-model-resources\n",
+ " - '--source_prefix=summarize:'\n",
+ " imageUri: gcr.io/cit-oconnellka-9999/cloudai-autogenerated/flan-t5-training-tf3:20231102.04.26.29.256583\n",
+ " diskSpec:\n",
+ " bootDiskSizeGb: 100\n",
+ " bootDiskType: pd-ssd\n",
+ " machineSpec:\n",
+ " acceleratorCount: 1\n",
+ " acceleratorType: NVIDIA_TESLA_V100\n",
+ " machineType: n1-standard-4\n",
+ " replicaCount: '1'\n",
+ "name: projects/144763482491/locations/us-central1/customJobs/2998009561996066816\n",
+ "startTime: '2023-11-02T04:33:54Z'\n",
+ "state: JOB_STATE_FAILED\n",
+ "updateTime: '2023-11-02T04:34:28.106045Z'\n",
+ "---\n",
+ "createTime: '2023-10-30T11:24:17.577560Z'\n",
+ "displayName: flan-t5-training-tf\n",
+ "endTime: '2023-10-30T11:44:47Z'\n",
+ "jobSpec:\n",
+ " workerPoolSpecs:\n",
+ " - containerSpec:\n",
+ " args:\n",
+ " - --job_dir=gs://flan-t5-model-resources\n",
+ " imageUri: gcr.io/cit-oconnellka-9999/cloudai-autogenerated/flan-t5-training-tf:20231030.11.21.47.379363\n",
+ " diskSpec:\n",
+ " bootDiskSizeGb: 100\n",
+ " bootDiskType: pd-ssd\n",
+ " machineSpec:\n",
+ " acceleratorCount: 1\n",
+ " acceleratorType: NVIDIA_TESLA_V100\n",
+ " machineType: n1-standard-4\n",
+ " replicaCount: '1'\n",
+ "name: projects/144763482491/locations/us-central1/customJobs/612998417047617536\n",
+ "startTime: '2023-10-30T11:29:12Z'\n",
+ "state: JOB_STATE_SUCCEEDED\n",
+ "updateTime: '2023-10-30T11:45:16.382233Z'\n",
+ "---\n",
+ "createTime: '2023-10-30T10:53:26.358002Z'\n",
+ "displayName: flan-t5-training-tf\n",
+ "endTime: '2023-10-30T11:12:59Z'\n",
+ "error:\n",
+ " code: 3\n",
+ " message: 'The replica workerpool0-0 exited with a non-zero status of 1. To find\n",
+ " out more about why your job exited please check the logs: https://console.cloud.google.com/logs/viewer?project=144763482491&resource=ml_job%2Fjob_id%2F6864276174814576640&advancedFilter=resource.type%3D%22ml_job%22%0Aresource.labels.job_id%3D%226864276174814576640%22'\n",
+ "jobSpec:\n",
+ " workerPoolSpecs:\n",
+ " - containerSpec:\n",
+ " args:\n",
+ " - --job_dir=gs://flan-t5-model-resources\n",
+ " imageUri: gcr.io/cit-oconnellka-9999/cloudai-autogenerated/flan-t5-training-tf:20231030.10.50.08.545796\n",
+ " diskSpec:\n",
+ " bootDiskSizeGb: 100\n",
+ " bootDiskType: pd-ssd\n",
+ " machineSpec:\n",
+ " acceleratorCount: 1\n",
+ " acceleratorType: NVIDIA_TESLA_V100\n",
+ " machineType: n1-standard-4\n",
+ " replicaCount: '1'\n",
+ "name: projects/144763482491/locations/us-central1/customJobs/6864276174814576640\n",
+ "startTime: '2023-10-30T10:57:55Z'\n",
+ "state: JOB_STATE_FAILED\n",
+ "updateTime: '2023-10-30T11:13:29.896168Z'\n",
+ "---\n",
+ "createTime: '2023-10-26T21:28:18.991136Z'\n",
+ "displayName: flan-t5-training\n",
+ "endTime: '2023-10-26T21:53:59Z'\n",
+ "jobSpec:\n",
+ " workerPoolSpecs:\n",
+ " - containerSpec:\n",
+ " args:\n",
+ " - --per_device_train_batch_size=2\n",
+ " - --per_device_eval_batch_size=4\n",
+ " - --model_name_or_path=google/flan-t5-small\n",
+ " - --train_file=gs://flan-t5-model-resources/datasets/train.csv\n",
+ " - --test_file=gs://flan-t5-model-resources/datasets/test.csv\n",
+ " - --text_column=article\n",
+ " - --summary_column=abstract\n",
+ " - --do_train=True\n",
+ " - --do_eval=False\n",
+ " - --do_predict=True\n",
+ " - --predict_with_generate=True\n",
+ " - --output_dir=gs://flan-t5-model-resources/model_output\n",
+ " - --num_train_epochs=3\n",
+ " - --learning_rate=5e-5\n",
+ " - --seed=7\n",
+ " - --fp16=True\n",
+ " imageUri: gcr.io/cit-oconnellka-9999/cloudai-autogenerated/flan-t5-training:20231026.21.27.25.218708\n",
+ " diskSpec:\n",
+ " bootDiskSizeGb: 100\n",
+ " bootDiskType: pd-ssd\n",
+ " machineSpec:\n",
+ " acceleratorCount: 1\n",
+ " acceleratorType: NVIDIA_TESLA_V100\n",
+ " machineType: n1-standard-4\n",
+ " replicaCount: '1'\n",
+ "name: projects/144763482491/locations/us-central1/customJobs/8666538460460351488\n",
+ "startTime: '2023-10-26T21:33:52Z'\n",
+ "state: JOB_STATE_SUCCEEDED\n",
+ "updateTime: '2023-10-26T21:54:18.730721Z'\n"
+ ]
+ }
+ ],
+ "source": [
+ "!gcloud ai custom-jobs list --project=$project --region=$location"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 26,
+ "id": "f6e10933-3d84-41fd-8785-fa801b97bfb0",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Long running operation: projects/144763482491/locations/us-central1/operations/3654348322228928512\n",
+ "delete_custom_job_response: \n"
+ ]
+ }
+ ],
+ "source": [
+ "from google.cloud import aiplatform\n",
+ "custom_job_id=''\n",
+ "\n",
+ "def delete_custom_job_sample(custom_job_id: str,\n",
+ " project: str = project,\n",
+ " location: str = location,\n",
+ " api_endpoint: str = f'{location}-aiplatform.googleapis.com',\n",
+ " timeout: int = 300,\n",
+ "):\n",
+ " # The AI Platform services require regional API endpoints.\n",
+ " client_options = {\"api_endpoint\": api_endpoint}\n",
+ " # Initialize client that will be used to create and send requests.\n",
+ " # This client only needs to be created once, and can be reused for multiple requests.\n",
+ " client = aiplatform.gapic.JobServiceClient(client_options=client_options)\n",
+ " name = client.custom_job_path(\n",
+ " project=project, location=location, custom_job=custom_job_id\n",
+ " )\n",
+ " response = client.delete_custom_job(name=name)\n",
+ " print(\"Long running operation:\", response.operation.name)\n",
+ " delete_custom_job_response = response.result(timeout=timeout)\n",
+ " print(\"delete_custom_job_response:\", delete_custom_job_response)\n",
+ " \n",
+ "delete_custom_job_sample(custom_job_id)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "7a02ba4b-1c8f-4fcc-93b9-3e2a6121b59b",
+ "metadata": {},
+ "source": [
+ "Now we will undeploy our model, delete endpoints, and delete finally our model!"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "dfc2276e-8ab2-4c80-9721-26153ea80d63",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "model_endpoint.undeploy_all()\n",
+ "model_endpoint.delete()\n",
+ "model.delete()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "8fc98dd1-b4e5-4ab2-a6ce-e46b5a23ab5d",
+ "metadata": {},
+ "source": [
+ "Delete custom container stored in Custom Registry or Artifacr Registry. List the images to gather the tag id."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 41,
+ "id": "588af684-4c7e-43d5-a1f5-5510157aa40f",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Listed 0 items.\n",
+ "DIGEST TAGS TIMESTAMP\n",
+ "1240e61185c9 20231103.17.39.12.779660 2023-11-03T17:42:05\n",
+ "ca99b71c4661 20231103.16.13.42.102563 2023-11-03T16:21:43\n"
+ ]
+ }
+ ],
+ "source": [
+ "#list the containers\n",
+ "!gcloud container images list-tags gcr.io/$project/cloudai-autogenerated/$display_name"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 50,
+ "id": "635cc519-230d-48a0-b9b7-c350c2d62ac4",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "#Save the tag ID\n",
+ "tag_id=''"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 51,
+ "id": "946fa12b-ad77-4d19-a556-c926309a14c4",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "\u001b[1;33mWARNING:\u001b[0m Successfully resolved tag to sha256, but it is recommended to use sha256 directly.\n",
+ "Digests:\n",
+ "- gcr.io/cit-oconnellka-9999/cloudai-autogenerated/flan-t5-training-tf3@sha256:ca99b71c466168f467152e04791710a9e269e767985b22a6cd1702e4fac2f691\n",
+ " Associated tags:\n",
+ " - 20231103.16.13.42.102563\n",
+ "Tags:\n",
+ "- gcr.io/cit-oconnellka-9999/cloudai-autogenerated/flan-t5-training-tf3:20231103.16.13.42.102563\n",
+ "Deleted [gcr.io/cit-oconnellka-9999/cloudai-autogenerated/flan-t5-training-tf3:20231103.16.13.42.102563].\n",
+ "Deleted [gcr.io/cit-oconnellka-9999/cloudai-autogenerated/flan-t5-training-tf3@sha256:ca99b71c466168f467152e04791710a9e269e767985b22a6cd1702e4fac2f691].\n"
+ ]
+ }
+ ],
+ "source": [
+ "#delete \n",
+ "!gcloud container images delete gcr.io/$project/cloudai-autogenerated/$display_name:$tag_id --force-delete-tags --quiet"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "fd4a6cfd-9567-4da3-8d1b-6a4207442680",
+ "metadata": {},
+ "source": [
+ "And finally delete our bucket"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "85824daa-66a5-4303-8b17-2565863a2844",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "!gcloud storage rm --recursive gs://$BUCKET/"
+ ]
+ }
+ ],
+ "metadata": {
+ "environment": {
+ "kernel": "python3",
+ "name": "tf2-gpu.2-12.m112",
+ "type": "gcloud",
+ "uri": "gcr.io/deeplearning-platform-release/tf2-gpu.2-12:m112"
+ },
+ "kernelspec": {
+ "display_name": "Python 3",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.10.12"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/tutorials/notebooks/GenAI/GCP_Pubmed_chatbot.ipynb b/tutorials/notebooks/GenAI/GCP_Pubmed_chatbot.ipynb
new file mode 100644
index 0000000..9798d2c
--- /dev/null
+++ b/tutorials/notebooks/GenAI/GCP_Pubmed_chatbot.ipynb
@@ -0,0 +1,1151 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "id": "2edc6187-82ae-44e2-852f-2ad2712c93aa",
+ "metadata": {},
+ "source": [
+ "# Creating a PubMed Chatbot on GCP"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "3ecea2ad-7c65-4367-87e1-b021167c3a1d",
+ "metadata": {},
+ "source": [
+ "For this tutorial we are creating a PubMed chatbot that will answer questions by gathering information from documents we have provided via an index. The model we will be using today is a pretrained 'text-bison@001' model from GCP.\n",
+ "\n",
+ "This tutorial will go over the following topics:\n",
+ "- Introduce langchain\n",
+ "- Explain the differences between zero-shot, one-shot, and few-shot prompting\n",
+ "- Practice using different document retrievers"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "4d01e74b-b5b4-4be9-b16e-ec55419318ef",
+ "metadata": {},
+ "source": [
+ "### Optional: Deploy the Model"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "9dbd13e7-afc9-416b-94dc-418a93e14587",
+ "metadata": {},
+ "source": [
+ "In this tutorial we will be using Google PaLM2 LLM **test-bison@001** which doesn't need to be deployed but if you would like to use another model you choose one from the **Model Garden** using the console which will allow you to add a model to your model registry, create an endpoint (or use an existing one), and deploy the model all in one step."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "4f3e3ab1-5f7e-4028-a66f-9619926a2afd",
+ "metadata": {},
+ "source": [
+ "## PubMed API vs RAG with Vertex AI Vector Search"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "5a820eea-1538-4f40-86c4-eb14fe09e127",
+ "metadata": {},
+ "source": [
+ "Our chatbot will rely on documents to answer our questions to do so we are supplying it a **vector index**. A vector index or index is a data structure that enables fast and accurate search and retrieval of vector embeddings from a large dataset of objects. We will be working with two options for our index: PubMed API vs RAG Vertex AI Vector Search method."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "7314b115-9433-460d-b275-78aa50f0a858",
+ "metadata": {},
+ "source": [
+ "**What is the difference?**\n",
+ "\n",
+ "The **PubMed API** is provided free by langchain to connect your model to more than **35 million citations** for biomedical literature from MEDLINE, life science journals, and online books. The langchain package for PubMed is already a retriever meaning that just simply using this tool will our chatbot beable to retrieve documents to refer to. \n",
+ "\n",
+ "**Vertex AI Vector Search** (formally known as Matching Engine) is a vector store from GCP that allows the user more **security and control** on which documents you wish to supply to your model. Vector Search, formerly known as Vertex AI Matching Engine, is a vector store or database that stores the **embeddings** of your documents and the metadata. Because this is not a retriever we have to make it so for our model to send back an output that also tells us which documents it is referencing, this is where RAG comes in. **RAG** stands for **Retrieval-augmented generation** it is a method or technique that **indexes documents** by first loading them in, splitting them into chucks (making it easier for our model to search for relevant splits), embedding the splits, then storing them in a vector store. The next steps in RAG are based on the question you ask your chatbot. If we were to ask it \"What is a cell?\" the vector store will be searched by a retriever to find relevant splits that have to do with our question, thus **retrieving relevant documents**. And finally our chatbot will **generate an answer** that makes sense of what a cell is, as part of the answer it will also point out which source documents it used to create the answer.\n",
+ "\n",
+ "We will be exploring both methods!"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "bcf1690d-e93d-4cd3-89c6-8d06b5a071a8",
+ "metadata": {},
+ "source": [
+ "## Setting up Vertex AI Vector Search"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "c6330ddf-7972-4451-9fcb-98cf83f5d118",
+ "metadata": {},
+ "source": [
+ "If you choose to use the RAG method with Vertex AI RAG Vector Search to supply documents to your model follow the instructions below:\n",
+ "\n",
+ "Set your project id, location, and bucket variables."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "fb694dc4-9e76-4091-9ddf-cd4eca816851",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "project_id=''\n",
+ "location=' (e.g.us-east4)'\n",
+ "bucket = ''"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "a7a349bb-2853-4028-972d-af7f3e857867",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "project_id='cit-oconnellka-9999'\n",
+ "location='us-central1'\n",
+ "bucket = 'pubmed-chatbot-resources'"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "02053f4d-fad7-44ab-a7c3-cfa1c218240f",
+ "metadata": {},
+ "source": [
+ "### Gathering our Docs For our Vector Store"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "1d1c9de7-4a06-4f85-b9ff-c8c9e51f8c70",
+ "metadata": {},
+ "source": [
+ "AWS marketplace has PubMed database named **PubMed Central® (PMC)** that contains free full-text archive of biomedical and life sciences journal article at the U.S. National Institutes of Health's National Library of Medicine (NIH/NLM). We will be subsetting this database to add documents to our Vertex AI Vector Search Index. Ensure that you have the correct permissions to allow your environment to connect to buckets and Vertex AI.\n",
+ "\n",
+ "The first step will be to create a bucket that we will later use as our data source for our index."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "99d49432-cf03-4f19-aa82-ef7f8bad5bde",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "#make bucket\n",
+ "!gsutil mb -l {location} gs://{bucket}"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "b6ad30ba-cee8-47f9-bc1e-ece8961ac66a",
+ "metadata": {},
+ "source": [
+ "We will then download the metadata file from the PMC index directory, this will list all of the articles within the PMC bucket and their paths. We will use this to subset the database into our own bucket. Here we are using curl to connect to the public AWS s3 bucket where the metadata and documents are originally stored."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "7b395e34-062d-4f77-afee-3601d471954a",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "#download the metadata file\n",
+ "!curl -O http://pmc-oa-opendata.s3.amazonaws.com/oa_comm/txt/metadata/csv/oa_comm.filelist.csv"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "93a8595a-767f-4cad-9273-62d8e2cf60d1",
+ "metadata": {},
+ "source": [
+ "We only want the metadata of the first 100 files."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "c26b0f29-2b07-43a6-800d-4aa5e957fe52",
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [],
+ "source": [
+ "#import the file as a dataframe\n",
+ "import pandas as pd\n",
+ "\n",
+ "df = pd.read_csv('oa_comm.filelist.csv')\n",
+ "#first 100 files\n",
+ "first_100=df[0:100]"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "abd1ae93-450e-4c79-83cc-ea46a1b507c1",
+ "metadata": {},
+ "source": [
+ "Lets look at our metadata! We can see that the bucket path to the files are under the **Key** column this is what we will use to loop through the PMC bucket and copy the first 100 files to our bucket."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "ff77b2aa-ed1b-4d27-8163-fdaa7a304582",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "first_100"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "84e5f36a-239c-4c15-80ab-f896d45849d3",
+ "metadata": {},
+ "source": [
+ "The following commands will gather the location of each document with in AWS s3 bucket, output the text from the docs as bytes and save the bytes to our bucket in the form of a text file in a directory named \"docs\". This will all be done using curl."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "7d63a7e2-dbf1-49ec-bc84-b8c2c8bde62d",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import os\n",
+ "from io import BytesIO\n",
+ "#gather path to files in bucket\n",
+ "for i in first_100['Key']:\n",
+ " doc_name=i.split(r'/')[-1]\n",
+ " os.system(f'curl http://pmc-oa-opendata.s3.amazonaws.com/{i} | curl -T - -v -H \"Authorization: Bearer `gcloud auth print-access-token`\" \"https://storage.googleapis.com/{bucket}/docs/{doc_name} \"')"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "c1b396c8-baa9-44d6-948c-2326dc514839",
+ "metadata": {},
+ "source": [
+ "### Creating an Index"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "bb6fa941-bf59-4cae-9aa8-2f2741f3a1b1",
+ "metadata": {},
+ "source": [
+ "To create our vector store index, we will first start by creating a dummy embeddings file. An index holds a set of records so our dummy data will be the first record and then later we will add our PubMed docs to the same index. Inorder for Vector Search to find our dummy embeddings file it too must be in our bucket and we will add it to the subdirectory 'init_index'."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "6cf5092c-23f3-4f28-9308-f34b8d90c62b",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import uuid\n",
+ "import numpy as np\n",
+ "import json\n",
+ "init_embedding = {\"id\": str(uuid.uuid4()), \"embedding\": list(np.zeros(768))}\n",
+ "\n",
+ "# dump embedding to a local file\n",
+ "with open(\"embeddings_0.json\", \"w\") as f:\n",
+ " json.dump(init_embedding, f)\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "8e8a4c42-dc17-48a3-a0bb-0cbea527ee7f",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "#move inital embeddings file to bucket\n",
+ "!gsutil cp embeddings_0.json gs://{bucket}/init_index/"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "f4d1a3cd-4f89-4271-b025-71af2bf25095",
+ "metadata": {},
+ "source": [
+ "Now we can make our index, this can take up to 30min to 1hr. \n",
+ "\n",
+ "Please note that the dimensions depend on what text embedding model you are using for this tutorial we are using **Vertex AI's embedding model** which uses 768 dimensions. If you chose to change your model choose a embedding model that is compatible with it or you can use Tensorflow's Universal Sentence Encoder. For more information see [here](https://python.langchain.com/docs/integrations/vectorstores/matchingengine#using-tensorflow-universal-sentence-encoder-as-an-embedder)."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "39aa7bba-3d15-4a3f-86c2-59d2c92a95ef",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "from google.cloud import aiplatform\n",
+ "# create Index\n",
+ "index = aiplatform.MatchingEngineIndex.create_tree_ah_index(\n",
+ " display_name = f\"pubmed_vector_index\",\n",
+ " contents_delta_uri = f\"gs://{bucket}/init_index\",\n",
+ " dimensions = 768,\n",
+ " approximate_neighbors_count = 150,\n",
+ " distance_measure_type=\"DOT_PRODUCT_DISTANCE\",\n",
+ " location=location\n",
+ " \n",
+ ")\n",
+ "\n",
+ "#save index id\n",
+ "index_id=index.name"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "ce65b0cc-cff3-47d6-af8c-7c39b2418ecb",
+ "metadata": {},
+ "source": [
+ "### Creating a Endpoint and Deploying our Index"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "9b3aa4a2-1145-475a-bd04-33bf69551751",
+ "metadata": {},
+ "source": [
+ "We will create a public endpoint for our vector store, you can also create a private one by setting up a VPC and specifying the VPC id for the params 'network'. Documentation for creating a VPC can be found [here](https://python.langchain.com/docs/integrations/vectorstores/matchingengine#imports-constants-and-configs)."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "55596202-13b9-4e35-8099-0602a2b13e72",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "#Create the endpoint\n",
+ "index_endpoint = aiplatform.MatchingEngineIndexEndpoint.create(\n",
+ " display_name = \"pubmed_vector_endpoint\",\n",
+ " public_endpoint_enabled = True,\n",
+ " location = location\n",
+ ")\n",
+ "\n",
+ "#save endpoint id\n",
+ "endpoint_id = endpoint.name"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "51412f2f-f32b-44a9-93bc-3e2f6185cada",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "#deploy our index to our endpoint\n",
+ "deployed_index_id=\"deployed_pubmed_vector_index\"\n",
+ "index_endpoint = index_endpoint.deploy_index(\n",
+ " index=index, deployed_index_id=deployed_index_id\n",
+ ")\n",
+ "index_endpoint.deployed_indexes"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "613cef7d-d0aa-42a8-a46e-7fd1f5c48c3b",
+ "metadata": {},
+ "source": [
+ "### Adding Metadata to Our Data"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "2fa34e7b-99c7-4a2e-b73b-146636a98285",
+ "metadata": {},
+ "source": [
+ "After we have our documents stored in our bucket we can start to load our files back. This step is necessary though redundant because we will need to embed our docs for our vector store and we can attach metadata for each document. The first step of adding our metadata to the docs will be to remove the 'Key' column because this is no longer the location of our documents. Next, we'll convert the rest of the columns into a dictionary form."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "b9016f15-db02-4073-b4c7-288d919bbb55",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import pandas as pd\n",
+ "#Remove the Key column to be replaced later\n",
+ "first_100.pop('Key')\n",
+ "#convert the metadata to dict\n",
+ "first_100_dict = first_100.to_dict('records')"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "eb80dce6-dc5b-4a73-8591-572be35c092a",
+ "metadata": {},
+ "source": [
+ "Lets look at our metadata now!"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "69ce004e-ab8d-4b9c-91d8-9320e1679fcd",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "first_100_dict"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "2a607a48-31b8-4081-a347-bb1528f8e725",
+ "metadata": {},
+ "source": [
+ "Now we can load in our documents, add in the location of our docs in our bucket and the document name to our metadata, and finally attach that metadata to our documents. At the end we should have 100 documents before splitting the data."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "47170e83-3e9e-48e6-ab0f-cabdd39507e1",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "#add metadata\n",
+ "from langchain.document_loaders import GCSDirectoryLoader\n",
+ "print(f\"Processing documents from {bucket}\")\n",
+ "loader = GCSDirectoryLoader(\n",
+ " project_name=project_id, bucket=bucket, prefix='docs'\n",
+ ")\n",
+ "documents = loader.load()\n",
+ "\n",
+ "# loop through docs to add metadata to each one\n",
+ "for i in range(len(documents)):\n",
+ " doc_md = documents[i].metadata\n",
+ " document_name = doc_md[\"source\"].split(\"/\")[-1]\n",
+ " source = f\"{bucket}/docs/{document_name}\"\n",
+ " # Add document name and source to the metadata\n",
+ " documents[i].metadata = {\"source\": source, \"document_name\": document_name}\n",
+ " documents[i].metadata.update(first_100_dict[i])# attached other metadata to doc\n",
+ "print(f\"# of documents loaded (pre-chunking) = {len(documents)}\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "d2cb96ea-dd0c-47b7-9556-05e25c3efb1d",
+ "metadata": {},
+ "source": [
+ "Lets take a look at our metadata!"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "8673a445-7c2e-4650-91fa-4b0b38196e2c",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "documents[0].metadata"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "613b65c0-fa38-456f-acb3-d406803ef204",
+ "metadata": {},
+ "source": [
+ "### Splitting our Data"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "6812ecaf-979f-4537-b420-071022a7b917",
+ "metadata": {},
+ "source": [
+ "Splitting our data into chucks will help our vector store parse through our data faster and efficiently. We'll then add the chuck number to our metadata."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "6e6503cf-02e5-4352-a6b1-13ef4e01c019",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "from langchain.text_splitter import RecursiveCharacterTextSplitter\n",
+ "# split the documents into chunks\n",
+ "text_splitter = RecursiveCharacterTextSplitter(\n",
+ " chunk_size=1000,\n",
+ " chunk_overlap=50,\n",
+ " separators=[\"\\n\\n\", \"\\n\", \".\", \"!\", \"?\", \",\", \" \", \"\"],\n",
+ ")\n",
+ "doc_splits = text_splitter.split_documents(documents)\n",
+ "\n",
+ "# Add chunk number to metadata\n",
+ "for idx, split in enumerate(doc_splits):\n",
+ " split.metadata[\"chunk\"] = idx\n",
+ "\n",
+ "print(f\"# of documents = {len(doc_splits)}\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "e1fb202a-6122-4083-81e4-ddcb33499e64",
+ "metadata": {},
+ "source": [
+ "After splitting our data we now have 7620 documents. And looking at our metadata we can see that the chunk number is the last entry."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "f1036b8e-6c7f-43be-83b7-5b9e61628003",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "doc_splits[0].metadata"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "055d5b85-950e-4a44-b3fa-a2dcec7df036",
+ "metadata": {},
+ "source": [
+ "### Embedding our Data"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "9a68b14e-0cf5-4973-90c1-0eee0c8bc8c9",
+ "metadata": {},
+ "source": [
+ "Now we can embed our text into **numerical vectors** that will help our model find similar objects like documents that hold similar texts or find similar photos based on the numbers assigned to the object. Depending on the model you choose you have to find an embedder that is compatible to our model. Since we are using a PaLM2 model (text-bison) we can use the embedding model from Vertex AI that defaults to using **'textembedding-gecko'**."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "50a4a98c-a332-469f-9a24-ce5abff23b15",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "from langchain.vectorstores import MatchingEngine\n",
+ "from langchain.embeddings import VertexAIEmbeddings\n",
+ "embeddings = VertexAIEmbeddings()\n",
+ "\n",
+ "# initialize vector store\n",
+ "vector_store = MatchingEngine.from_components(\n",
+ " project_id=project_id,\n",
+ " region=location,\n",
+ " gcs_bucket_name=bucket,\n",
+ " embedding=embeddings,\n",
+ " index_id=index_id,\n",
+ " endpoint_id=endpoint_id,\n",
+ ")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "4e3bfb5b-a3a6-4156-bca3-394774a94565",
+ "metadata": {},
+ "source": [
+ "For our split documents to be read by our embedding model we need to make tuple called **Document** that contains **page content** and **metadata**. The code below loops through the split docs and assigns them to the label page_content and the same is done for all parts of our metadata under the label metadata."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "9cda4699-5c46-49bb-97e3-059199254bba",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Store docs as embeddings in Matching Engine index\n",
+ "# It may take a while since API is rate limited\n",
+ "texts = [doc.page_content for doc in doc_splits]\n",
+ "metadatas = [\n",
+ " [\n",
+ " {\"namespace\": \"source\", \"allow_list\": [doc.metadata[\"source\"]]},\n",
+ " {\"namespace\": \"document_name\", \"allow_list\": [doc.metadata[\"document_name\"]]},\n",
+ " {\"namespace\": \"ETag\", \"allow_list\": [doc.metadata[\"ETag\"]]},\n",
+ " {\"namespace\": \"Article Citation\", \"allow_list\": [doc.metadata[\"Article Citation\"]]},\n",
+ " {\"namespace\": \"AccessionID\", \"allow_list\": [doc.metadata[\"AccessionID\"]]},\n",
+ " {\"namespace\": \"Last Updated UTC (YYYY-MM-DD HH:MM:SS)\", \"allow_list\": [doc.metadata[\"Last Updated UTC (YYYY-MM-DD HH:MM:SS)\"]]},\n",
+ " {\"namespace\": \"PMID\", \"allow_list\": [str(doc.metadata[\"PMID\"])]},\n",
+ " {\"namespace\": \"License\", \"allow_list\": [doc.metadata[\"License\"]]},\n",
+ " {\"namespace\": \"Retracted\", \"allow_list\": [doc.metadata[\"Retracted\"]]},\n",
+ " {\"namespace\": \"chunk\", \"allow_list\": [str(doc.metadata[\"chunk\"])]}\n",
+ " \n",
+ " ]\n",
+ " for doc in doc_splits\n",
+ "]"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "216f6ef6-b488-45d3-ac4c-2aca0d6eab56",
+ "metadata": {},
+ "source": [
+ "lets look at our Document tuple!"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "6c1af269-fdbb-4db5-9c1b-41e21d304b9d",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "doc_splits[0]"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "2aeb0bba-8ccd-4828-a7a2-6f34b03a03b9",
+ "metadata": {},
+ "source": [
+ "Now we can add our split documents and their metadata to our vector store. This is the longest step of the tutorial and can take up 1hr to complete. As you wait you can read up on Creating a Inference Script section of this tutorial."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "b3c2f21b-06ab-470e-8807-638548d50f77",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "doc_ids = vector_store.add_texts(texts=texts, metadatas=metadatas)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "03b90f92-f223-42e0-9e5b-accd3fdfbeea",
+ "metadata": {},
+ "source": [
+ "Test whether search from vector store is working"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "b6cd9aab-7f08-4a69-b7e4-9cd1d8f9110f",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "results=vector_store.similarity_search_with_score(\"brain\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "07b3bc6b-8c43-476f-a662-abda830dc2da",
+ "metadata": {
+ "tags": []
+ },
+ "source": [
+ "### Creating a Inference Script "
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "3ba2291e-109e-4120-ad10-5dbfd341a07b",
+ "metadata": {},
+ "source": [
+ "Inorder for us to fluidly send input and receive outputs from our chatbot we need to create a **inference script** that will format inputs in a way that the chatbot can understand and format outputs in a way we can understand. We will also be supplying instructions to the chatbot through the script.\n",
+ "\n",
+ "Our script will utilize **langchain** tools and packages to enable our model to:\n",
+ "- **Connect to sources of context** (e.g. providing our model with tasks and examples)\n",
+ "- **Rely on reason** (e.g. instruct our model how to answer based on provided context)\n",
+ "\n",
+ "**Warning**: The following tools must be installed via your terminal `pip install \"langchain\" \"xmltodict\"` and the over all inference script must be run on the terminal via the command `python YOUR_SCRIPT.py`."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "ad374085-c4b1-4083-85a5-90cba35846d6",
+ "metadata": {},
+ "source": [
+ "The first part of our script will be to list all the tools that are required. \n",
+ "- **PubMedRetriever:** Utilizes the langchain retriever tool to specifically retrieve PubMed documents from the PubMed API.\n",
+ "- **MatchingEngine:** Connects to Vertex AI Vector Search to be used as a langchain retriever tool to specifically retrieve embedded documents stored in your bucket. \n",
+ "- **ConversationalRetrievalChain:** Allows the user to construct a conversation with the model and retrieves the outputs while sending inputs to the model.\n",
+ "- **PromptTemplate:** Allows the user to prompt the model to provide instructions, best method for zero and few shot prompting\n",
+ "- **VertexAIEmbeddings:** Text embedding model used before to convert text to numerical vectors.\n",
+ "- **VertexAI**: Package used to import Google PaLM2 LLMs models (e.g. text-bison@001, code-bison). \n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "6f0ad48d-c6c8-421a-a48b-88e979d15b57",
+ "metadata": {
+ "tags": []
+ },
+ "source": [
+ "```python\n",
+ "from langchain.retrievers import PubMedRetriever\n",
+ "from langchain.vectorstores import MatchingEngine\n",
+ "#from langchain.llms import VertexAIModelGarden #uncomment if utilizing models from Model Garden\n",
+ "from langchain.chains import ConversationalRetrievalChain\n",
+ "from langchain.prompts import PromptTemplate\n",
+ "from langchain.embeddings import VertexAIEmbeddings\n",
+ "from langchain.llms import VertexAI\n",
+ "import sys\n",
+ "import json\n",
+ "import os\n",
+ "```"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "900f4c31-71cd-4f39-8bfc-de098bdbaafc",
+ "metadata": {},
+ "source": [
+ "Second will build a class that will hold the functions we need to send inputs and retrieve outputs from our model. For the beginning of our class we will establish some colors to our text conversation with our chatbot which we will utilize later."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "decbb901-f811-4b8e-a956-4c8c7f914ae2",
+ "metadata": {
+ "tags": []
+ },
+ "source": [
+ "```python\n",
+ "class bcolors:\n",
+ " HEADER = '\\033[95m'\n",
+ " OKBLUE = '\\033[94m'\n",
+ " OKCYAN = '\\033[96m'\n",
+ " OKGREEN = '\\033[92m'\n",
+ " WARNING = '\\033[93m'\n",
+ " FAIL = '\\033[91m'\n",
+ " ENDC = '\\033[0m'\n",
+ " BOLD = '\\033[1m'\n",
+ " UNDERLINE = '\\033[4m'\n",
+ "```"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "ba36d057-5189-4075-a243-18996c6fc932",
+ "metadata": {},
+ "source": [
+ "If you are using Vector Search instead of the PubMed API we need to create a function that will gather the necessary information to connect to our model, which will be the:\n",
+ "- Project ID\n",
+ "- Location of bucket and vector store (they should be in the same location)\n",
+ "- Bucket name\n",
+ "- Vector Store Index ID\n",
+ "- Vector Store Endpoint ID"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "3f7a244a-7e71-40d3-ae78-8e166dd3c7ee",
+ "metadata": {},
+ "source": [
+ "```python\n",
+ "def build_chain():\n",
+ " PROJECT_ID = os.environ[\"PROJECT_ID\"]\n",
+ " LOCATION_ID = os.environ[\"LOCATION_ID\"]\n",
+ " #ENDPOINT_ID = os.environ[\"ENDPOINT_ID\"] #uncomment if utilizing model from Model Garden\n",
+ " BUCKET = os.environ[\"BUCKET\"]\n",
+ " VC_INDEX_ID = os.environ[\"VC_INDEX_ID\"]\n",
+ " VC_ENDPOINT_ID = os.environ[\"VC_ENDPOINT_ID\"]\n",
+ "```"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "dab1012f-ed20-47b9-9162-924e03e836d5",
+ "metadata": {},
+ "source": [
+ "Now we can define our Google PaLM2 model being `text-bison@001` and other parameters:\n",
+ "\n",
+ "- Max Output Tokens: Limit of tokens outputted by the model.\n",
+ "- Temperature: Controls randomness, higher values increase diversity meaning a more unique response make the model to think harder. Must be a number from 0 to 1, 0 being less unique.\n",
+ "- Top_p (nucleus): The cumulative probability cutoff for token selection. Lower values mean sampling from a smaller, more top-weighted nucleus. Must be a number from 0 to 1.\n",
+ "- Top_k: Sample from the k most likely next tokens at each step. Lower k focuses on higher probability tokens. This means the model choses the most probable words. Lower values eliminate fewer coherent words.\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "8cadb1af-2c46-4ab1-92f9-6e0861f83324",
+ "metadata": {
+ "tags": []
+ },
+ "source": [
+ "```python\n",
+ "llm = VertexAI(\n",
+ " model_name=\"text-bison@001\",\n",
+ " max_output_tokens=1024,\n",
+ " temperature=0.2,\n",
+ " top_p=0.8,\n",
+ " top_k=40,\n",
+ " verbose=True,\n",
+ " \n",
+ " \n",
+ "#if using a model from the Model Garden uncomment\n",
+ "#llm = VertexAIModelGarden(project=PROJECT_ID, endpoint_id=ENDPOINT_ID, location=LOCATION_ID)\n",
+ "```"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "c44b4f91-0c64-459b-a6e9-8a955c0797c7",
+ "metadata": {},
+ "source": [
+ "We specify what our retriever both the PubMed and Vector Search retriever are listed, please only add one per script.\n",
+ "\n",
+ "If using Vector Search we need to initialize our vector store as we did before when we added our split documents and metadata to it. Then we set the vector store as a **retriever** with the search type being **'similarity'** meaning it will find texts that are similar to each other depending on the question you ask the model. We also set **'k'** to 3 meaning that our retriever will retrieve 3 documents that are similar."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "21c61724-23d3-4b49-8c72-cbd208bdb5df",
+ "metadata": {
+ "tags": []
+ },
+ "source": [
+ "```python\n",
+ "retriever= PubMedRetriever()\n",
+ "\n",
+ "#only if using Vector Search as a retriever\n",
+ "\n",
+ "embeddings = VertexAIEmbeddings() #Make sure embedding model is compatible with model\n",
+ "\n",
+ " vector_store = MatchingEngine.from_components(\n",
+ " project_id=PROJECT_ID,\n",
+ " region=LOCATION_ID,\n",
+ " gcs_bucket_name=BUCKET,\n",
+ " embedding=embeddings,\n",
+ " index_id=VC_INDEX_ID,\n",
+ " endpoint_id=VC_ENDPOINT_ID\n",
+ " )\n",
+ "retriever = vector_store.as_retriever(\n",
+ " search_type=\"similarity\",\n",
+ " search_kwargs={\"k\":3}\n",
+ " )\n",
+ "```"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "ec8e464a-0931-444a-aa58-09ee0c4c9884",
+ "metadata": {},
+ "source": [
+ "Here we are constructing our **prompt_template**, this is where we can try zero-shot or few-shot prompting. Only add one method per script."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "4431051e-0e84-408e-9821-f50a9b88c9c1",
+ "metadata": {},
+ "source": [
+ "#### Zero-shot prompting\n",
+ "\n",
+ "Zero-shot prompting does not require any additional training more so it gives a pre-trained language model a task or query to generate text (our output). The model relies on its general language understanding and the patterns it has learned during its training to produce relevant output. In our script we have connect our model to a **retriever** to make sure it gathers information from that retriever (this can be the PubMed API or Vector Search). \n",
+ "\n",
+ "See below that the task is more like instructions notifying our model they will be asked questions which it will answer based on the info of the scientific documents provided from the index provided (this can be the PubMed API or Vector Search index). All of this information is established as a **prompt template** for our model to receive."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "c0316dc5-6274-4a5e-92e4-3d266ed6a4df",
+ "metadata": {
+ "tags": []
+ },
+ "source": [
+ "```python\n",
+ "prompt_template = \"\"\"\n",
+ " Ignore everything before.\n",
+ " \n",
+ " Instruction:\n",
+ " Instructions:\n",
+ " I will provide you with research papers on a specific topic in English, and you will create a cumulative summary. \n",
+ " The summary should be concise and should accurately and objectively communicate the takeaway of the papers related to the topic. \n",
+ " You should not include any personal opinions or interpretations in your summary, but rather focus on objectively presenting the information from the papers. \n",
+ " Your summary should be written in your own words and ensure that your summary is clear, concise, and accurately reflects the content of the original papers. First, provide a concise summary then citations at the end.\n",
+ " \n",
+ " {question} Answer \"don't know\" if not present in the document. \n",
+ " {context}\n",
+ " Solution:\"\"\"\n",
+ " PROMPT = PromptTemplate(\n",
+ " template=prompt_template, input_variables=[\"context\", \"question\"],\n",
+ " )\n",
+ "```"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "edbe7032-8507-4d07-baab-1b3bf0e92074",
+ "metadata": {},
+ "source": [
+ "#### One-shot and Few-shot Prompting"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "5614ea04-e1f8-4941-ae16-4359f718f98f",
+ "metadata": {},
+ "source": [
+ "One and few shot prompting are similar to one-shot prompting, in addition to giving our model a task just like before we have also supplied an example of how the our model structure our output.\n",
+ "\n",
+ "See below that we have implemented one-shot prompting to our script. "
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "5ffb9669-5b77-4d9b-9f4e-a0d3a18b0fae",
+ "metadata": {},
+ "source": [
+ "```python\n",
+ "prompt_template = \"\"\"\n",
+ " Instructions:\n",
+ " I will provide you with research papers on a specific topic in English, and you will create a cumulative summary. \n",
+ " The summary should be concise and should accurately and objectively communicate the takeaway of the papers related to the topic. \n",
+ " You should not include any personal opinions or interpretations in your summary, but rather focus on objectively presenting the information from the papers. \n",
+ " Your summary should be written in your own words and ensure that your summary is clear, concise, and accurately reflects the content of the original papers. First, provide a concise summary then citations at the end. \n",
+ " Examples:\n",
+ " Question: What is a cell?\n",
+ " Answer: '''\n",
+ " Cell, in biology, the basic membrane-bound unit that contains the fundamental molecules of life and of which all living things are composed. \n",
+ " Sources: \n",
+ " Chow, Christopher , Laskey, Ronald A. , Cooper, John A. , Alberts, Bruce M. , Staehelin, L. Andrew , \n",
+ " Stein, Wilfred D. , Bernfield, Merton R. , Lodish, Harvey F. , Cuffe, Michael and Slack, Jonathan M.W.. \n",
+ " \"cell\". Encyclopedia Britannica, 26 Sep. 2023, https://www.britannica.com/science/cell-biology. Accessed 9 November 2023.\n",
+ " '''\n",
+ " \n",
+ " {question} Answer \"don't know\" if not present in the document. \n",
+ " {context}\n",
+ " \n",
+ "\n",
+ " \n",
+ " Solution:\"\"\"\n",
+ " PROMPT = PromptTemplate(\n",
+ " template=prompt_template, input_variables=[\"context\", \"question\"],\n",
+ " )\n",
+ "```"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "82c66d53-97b2-46dc-a466-70a3d3bee4a7",
+ "metadata": {},
+ "source": [
+ "The following set of commands control the chat history essentially telling the model to expect another question after it finishes answering the previous one. Follow up questions can contain references to past chat history so the **ConversationalRetrievalChain** combines the chat history and the followup question into a standalone question, then looks up relevant documents from the retriever, and finally passes those documents and the question to a question-answering chain to return a response.\n",
+ "\n",
+ "All of these pieces such as our conversational chain, prompt, and chat history are passed through a function called **run_chain** so that our model can return is response. We have also set the length of our chat history to one meaning that our model can only refer to the pervious conversation as a reference."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "fda4d33b-60f2-4462-a8e6-bbce7f8a7b07",
+ "metadata": {},
+ "source": [
+ "```python\n",
+ "condense_qa_template = \"\"\"\n",
+ " Chat History:\n",
+ " {chat_history}\n",
+ " Here is a new question for you: {question}\n",
+ " Standalone question:\"\"\"\n",
+ " standalone_question_prompt = PromptTemplate.from_template(condense_qa_template)\n",
+ " \n",
+ " qa = ConversationalRetrievalChain.from_llm(\n",
+ " llm=llm, \n",
+ " retriever=retriever, \n",
+ " condense_question_prompt=standalone_question_prompt, \n",
+ " return_source_documents=True, \n",
+ " combine_docs_chain_kwargs={\"prompt\":PROMPT},\n",
+ " )\n",
+ " return qa\n",
+ "\n",
+ "def run_chain(chain, prompt: str, history=[]):\n",
+ " print(prompt)\n",
+ " return chain({\"question\": prompt, \"chat_history\": history})\n",
+ "\n",
+ "MAX_HISTORY_LENGTH = 1 #increase to refer to more pervious chats\n",
+ "```"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "b8f1ef8d-66fe-4f84-933b-af2d730bd114",
+ "metadata": {},
+ "source": [
+ "The final part of our script utilizes our class and incorporates colors to add a bit of flare to our conversation with our model. The model when first initialized should greet the user asking **\"Hello! How can I help you?\"** then instructs the user to ask a question or exit the session **\"Ask a question, start a New search: or CTRL-D to exit.\"**. With every question submitted to the model it is labeled as a **new search** we then run the run_chain function to get the models response or answer and add the response to the **chat history**. "
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "1aa6ef65-ced4-445e-875c-7fee3483b81d",
+ "metadata": {},
+ "source": [
+ "```python\n",
+ "if __name__ == \"__main__\":\n",
+ " chat_history = []\n",
+ " qa = build_chain()\n",
+ " print(bcolors.OKBLUE + \"Hello! How can I help you?\" + bcolors.ENDC)\n",
+ " print(bcolors.OKCYAN + \"Ask a question, start a New search: or CTRL-D to exit.\" + bcolors.ENDC)\n",
+ " print(\">\", end=\" \", flush=True)\n",
+ " for query in sys.stdin:\n",
+ " if (query.strip().lower().startswith(\"new search:\")):\n",
+ " query = query.strip().lower().replace(\"new search:\",\"\")\n",
+ " chat_history = []\n",
+ " elif (len(chat_history) == MAX_HISTORY_LENGTH):\n",
+ " chat_history.pop(0)\n",
+ " result = run_chain(qa, query, chat_history)\n",
+ " chat_history.append((query, result[\"answer\"]))\n",
+ " print(bcolors.OKGREEN + result['answer'] + bcolors.ENDC) \n",
+ " if 'source_documents' in result: \n",
+ " print(bcolors.OKGREEN + 'Sources:')\n",
+ " for idx, ref in enumerate(result[\"source_documents\"]):\n",
+ " print(ref.page_content) #Use this for Vector store\n",
+ " #print(\"PubMed UID: \"+ref.metadata[\"uid\"])#Use this for PubMed retriever\n",
+ " print(bcolors.ENDC)\n",
+ " print(bcolors.OKCYAN + \"Ask a question, start a New search: or CTRL-D to exit.\" + bcolors.ENDC)\n",
+ " print(\">\", end=\" \", flush=True)\n",
+ " print(bcolors.OKBLUE + \"Bye\" + bcolors.ENDC)\n",
+ "```"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "1abcbd48-bb84-4310-b8eb-ad87850a8649",
+ "metadata": {},
+ "source": [
+ "Running our script in the terminal will require us to export the following global variables before using the command `python NAME_OF_SCRIPT.py`. You can also check out our **example inference scripts** for the [Pubmed API](/example_scripts/example_langchain_chat_llama_2_zeroshot.py) and [Vertex AI Vector Search](/example_scripts/example_vectorsearch_chat_llama_2_zeroshot.py)."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "ba97df23-6893-438d-8a67-cb7dbf83e407",
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [],
+ "source": [
+ "#retreive our index and endpoint id\n",
+ "print(index_id)\n",
+ "print(endpoint_id)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "7eab00a3-54ff-4873-8d25-eaf8bd18a2e6",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "#enter the global variables in your terminal\n",
+ "export PROJECT_ID='' \\\n",
+ "export LOCATION_ID='' \\\n",
+ "#export ENDPOINT_ID='' \\ #Uncomment if using model from Model Garden\n",
+ "export BUCKET='' \\\n",
+ "export VC_INDEX_ID='' \\\n",
+ "export VC_ENDPOINT_ID='VECTOR_SEARCH_ENDPOINT_ID>'"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "bbe127e6-c0b1-4e07-ad56-38c30a9bf858",
+ "metadata": {
+ "tags": []
+ },
+ "source": [
+ "You should see similar results on the terminal. In this example we ask the chatbot to explain brain cancer!"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "80c8fb4b-e74f-4e8d-892b-0f913eff747d",
+ "metadata": {},
+ "source": [
+ "![PubMed Chatbot Results](../../../images/GCP_chatbot_results.png)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "a178c1c6-368a-48c5-8beb-278443b685a2",
+ "metadata": {},
+ "source": [
+ "### Clean Up"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "7ec06a34-dc47-453f-b519-424804fa2748",
+ "metadata": {},
+ "source": [
+ "**Warning:** Dont forget to delete the resources we just made to avoid accruing additional costs!"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "c307bb17-757a-4579-a0d8-698eb1bb3f2e",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "#Undeploy index\n",
+ "!gcloud ai index-endpoints undeploy-index {endpoint_id} \\\n",
+ " --deployed-index-id={deployed_index_id} \\\n",
+ " --project={project_id} \\\n",
+ " --region={location}\n",
+ "\n",
+ "\n",
+ "#Delete index and endpoint\n",
+ "!gcloud ai indexes delete {index_id} \\\n",
+ " --project={project_id} \\\n",
+ " --region={location} --quiet\n",
+ "\n",
+ "!gcloud ai index-endpoints delete {endpoint_id} \\\n",
+ " --project={project_id} \\\n",
+ " --region={location} --quiet"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "280cea0a-a8fc-494e-8ce4-afb65847a222",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "#Delete bucket\n",
+ "!gcloud storage rm --recursive gs://{bucket}/"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "6928d95d-d7ec-43f6-9135-79fcfc9520d9",
+ "metadata": {},
+ "source": [
+ "If you have imported a model and deployed it don't forget to delete the model from the Model Registry and delete the endpoint."
+ ]
+ }
+ ],
+ "metadata": {
+ "environment": {
+ "kernel": "python3",
+ "name": "common-cpu.m113",
+ "type": "gcloud",
+ "uri": "gcr.io/deeplearning-platform-release/base-cpu:m113"
+ },
+ "kernelspec": {
+ "display_name": "Python 3",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.10.13"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/tutorials/notebooks/GenAI/Gemini_Intro.ipynb b/tutorials/notebooks/GenAI/Gemini_Intro.ipynb
new file mode 100644
index 0000000..94b8d7a
--- /dev/null
+++ b/tutorials/notebooks/GenAI/Gemini_Intro.ipynb
@@ -0,0 +1,853 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "id": "c51ed24c-53dd-4239-8be0-0e0422596ba3",
+ "metadata": {},
+ "source": [
+ "# Intro to GCP's Gemini "
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "22cc412f-5338-45ba-8b29-9039c19208d9",
+ "metadata": {},
+ "source": [
+ "**Gemini** is a Google multimodal model that has the capability to **summarize, chat, and generate text from images or videos**. Gemini comes in two model versions **Gemini Pro** and **Gemini Pro Vision**, for this tutorial we will be looking into utilizing both models via python packages and GCPs model playground, **Vertex AI Studio**."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "28289191-0813-47d1-be6d-8feb0ae708bd",
+ "metadata": {},
+ "source": [
+ "## Gemini in Python"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "b8ec6d40-b5b3-434f-adc4-2838b7f49d1d",
+ "metadata": {
+ "tags": []
+ },
+ "source": [
+ "The first step inorder to stat using Gemini is to update the google-cloud-aiplatform python package if you havent already."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 9,
+ "id": "ed9781dd-9764-4e9c-88ba-fcd7bb95842a",
+ "metadata": {
+ "scrolled": true,
+ "tags": []
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Requirement already satisfied: google-cloud-aiplatform in /opt/conda/lib/python3.10/site-packages (1.37.0)\n",
+ "Collecting google-cloud-aiplatform\n",
+ " Downloading google_cloud_aiplatform-1.39.0-py2.py3-none-any.whl.metadata (28 kB)\n",
+ "Requirement already satisfied: google-api-core!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,!=2.6.*,!=2.7.*,<3.0.0dev,>=1.32.0 in /opt/conda/lib/python3.10/site-packages (from google-api-core[grpc]!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,!=2.6.*,!=2.7.*,<3.0.0dev,>=1.32.0->google-cloud-aiplatform) (1.34.0)\n",
+ "Requirement already satisfied: proto-plus<2.0.0dev,>=1.22.0 in /opt/conda/lib/python3.10/site-packages (from google-cloud-aiplatform) (1.23.0)\n",
+ "Requirement already satisfied: protobuf!=3.20.0,!=3.20.1,!=4.21.0,!=4.21.1,!=4.21.2,!=4.21.3,!=4.21.4,!=4.21.5,<5.0.0dev,>=3.19.5 in /opt/conda/lib/python3.10/site-packages (from google-cloud-aiplatform) (3.20.3)\n",
+ "Requirement already satisfied: packaging>=14.3 in /opt/conda/lib/python3.10/site-packages (from google-cloud-aiplatform) (23.2)\n",
+ "Requirement already satisfied: google-cloud-storage<3.0.0dev,>=1.32.0 in /opt/conda/lib/python3.10/site-packages (from google-cloud-aiplatform) (2.13.0)\n",
+ "Requirement already satisfied: google-cloud-bigquery<4.0.0dev,>=1.15.0 in /opt/conda/lib/python3.10/site-packages (from google-cloud-aiplatform) (3.13.0)\n",
+ "Requirement already satisfied: google-cloud-resource-manager<3.0.0dev,>=1.3.3 in /opt/conda/lib/python3.10/site-packages (from google-cloud-aiplatform) (1.11.0)\n",
+ "Requirement already satisfied: shapely<3.0.0dev in /opt/conda/lib/python3.10/site-packages (from google-cloud-aiplatform) (2.0.2)\n",
+ "Requirement already satisfied: googleapis-common-protos<2.0dev,>=1.56.2 in /opt/conda/lib/python3.10/site-packages (from google-api-core!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,!=2.6.*,!=2.7.*,<3.0.0dev,>=1.32.0->google-api-core[grpc]!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,!=2.6.*,!=2.7.*,<3.0.0dev,>=1.32.0->google-cloud-aiplatform) (1.62.0)\n",
+ "Requirement already satisfied: google-auth<3.0dev,>=1.25.0 in /opt/conda/lib/python3.10/site-packages (from google-api-core!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,!=2.6.*,!=2.7.*,<3.0.0dev,>=1.32.0->google-api-core[grpc]!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,!=2.6.*,!=2.7.*,<3.0.0dev,>=1.32.0->google-cloud-aiplatform) (2.25.2)\n",
+ "Requirement already satisfied: requests<3.0.0dev,>=2.18.0 in /opt/conda/lib/python3.10/site-packages (from google-api-core!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,!=2.6.*,!=2.7.*,<3.0.0dev,>=1.32.0->google-api-core[grpc]!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,!=2.6.*,!=2.7.*,<3.0.0dev,>=1.32.0->google-cloud-aiplatform) (2.31.0)\n",
+ "Requirement already satisfied: grpcio<2.0dev,>=1.33.2 in /opt/conda/lib/python3.10/site-packages (from google-api-core[grpc]!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,!=2.6.*,!=2.7.*,<3.0.0dev,>=1.32.0->google-cloud-aiplatform) (1.60.0)\n",
+ "Requirement already satisfied: grpcio-status<2.0dev,>=1.33.2 in /opt/conda/lib/python3.10/site-packages (from google-api-core[grpc]!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,!=2.6.*,!=2.7.*,<3.0.0dev,>=1.32.0->google-cloud-aiplatform) (1.48.2)\n",
+ "Requirement already satisfied: google-cloud-core<3.0.0dev,>=1.6.0 in /opt/conda/lib/python3.10/site-packages (from google-cloud-bigquery<4.0.0dev,>=1.15.0->google-cloud-aiplatform) (2.4.1)\n",
+ "Requirement already satisfied: google-resumable-media<3.0dev,>=0.6.0 in /opt/conda/lib/python3.10/site-packages (from google-cloud-bigquery<4.0.0dev,>=1.15.0->google-cloud-aiplatform) (2.6.0)\n",
+ "Requirement already satisfied: python-dateutil<3.0dev,>=2.7.2 in /opt/conda/lib/python3.10/site-packages (from google-cloud-bigquery<4.0.0dev,>=1.15.0->google-cloud-aiplatform) (2.8.2)\n",
+ "Requirement already satisfied: grpc-google-iam-v1<1.0.0dev,>=0.12.4 in /opt/conda/lib/python3.10/site-packages (from google-cloud-resource-manager<3.0.0dev,>=1.3.3->google-cloud-aiplatform) (0.13.0)\n",
+ "Requirement already satisfied: google-crc32c<2.0dev,>=1.0 in /opt/conda/lib/python3.10/site-packages (from google-cloud-storage<3.0.0dev,>=1.32.0->google-cloud-aiplatform) (1.5.0)\n",
+ "Requirement already satisfied: numpy>=1.14 in /opt/conda/lib/python3.10/site-packages (from shapely<3.0.0dev->google-cloud-aiplatform) (1.25.2)\n",
+ "Requirement already satisfied: cachetools<6.0,>=2.0.0 in /opt/conda/lib/python3.10/site-packages (from google-auth<3.0dev,>=1.25.0->google-api-core!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,!=2.6.*,!=2.7.*,<3.0.0dev,>=1.32.0->google-api-core[grpc]!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,!=2.6.*,!=2.7.*,<3.0.0dev,>=1.32.0->google-cloud-aiplatform) (5.3.2)\n",
+ "Requirement already satisfied: pyasn1-modules>=0.2.1 in /opt/conda/lib/python3.10/site-packages (from google-auth<3.0dev,>=1.25.0->google-api-core!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,!=2.6.*,!=2.7.*,<3.0.0dev,>=1.32.0->google-api-core[grpc]!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,!=2.6.*,!=2.7.*,<3.0.0dev,>=1.32.0->google-cloud-aiplatform) (0.3.0)\n",
+ "Requirement already satisfied: rsa<5,>=3.1.4 in /opt/conda/lib/python3.10/site-packages (from google-auth<3.0dev,>=1.25.0->google-api-core!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,!=2.6.*,!=2.7.*,<3.0.0dev,>=1.32.0->google-api-core[grpc]!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,!=2.6.*,!=2.7.*,<3.0.0dev,>=1.32.0->google-cloud-aiplatform) (4.9)\n",
+ "Requirement already satisfied: six>=1.5 in /opt/conda/lib/python3.10/site-packages (from python-dateutil<3.0dev,>=2.7.2->google-cloud-bigquery<4.0.0dev,>=1.15.0->google-cloud-aiplatform) (1.16.0)\n",
+ "Requirement already satisfied: charset-normalizer<4,>=2 in /opt/conda/lib/python3.10/site-packages (from requests<3.0.0dev,>=2.18.0->google-api-core!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,!=2.6.*,!=2.7.*,<3.0.0dev,>=1.32.0->google-api-core[grpc]!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,!=2.6.*,!=2.7.*,<3.0.0dev,>=1.32.0->google-cloud-aiplatform) (3.3.2)\n",
+ "Requirement already satisfied: idna<4,>=2.5 in /opt/conda/lib/python3.10/site-packages (from requests<3.0.0dev,>=2.18.0->google-api-core!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,!=2.6.*,!=2.7.*,<3.0.0dev,>=1.32.0->google-api-core[grpc]!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,!=2.6.*,!=2.7.*,<3.0.0dev,>=1.32.0->google-cloud-aiplatform) (3.6)\n",
+ "Requirement already satisfied: urllib3<3,>=1.21.1 in /opt/conda/lib/python3.10/site-packages (from requests<3.0.0dev,>=2.18.0->google-api-core!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,!=2.6.*,!=2.7.*,<3.0.0dev,>=1.32.0->google-api-core[grpc]!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,!=2.6.*,!=2.7.*,<3.0.0dev,>=1.32.0->google-cloud-aiplatform) (1.26.18)\n",
+ "Requirement already satisfied: certifi>=2017.4.17 in /opt/conda/lib/python3.10/site-packages (from requests<3.0.0dev,>=2.18.0->google-api-core!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,!=2.6.*,!=2.7.*,<3.0.0dev,>=1.32.0->google-api-core[grpc]!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,!=2.6.*,!=2.7.*,<3.0.0dev,>=1.32.0->google-cloud-aiplatform) (2023.11.17)\n",
+ "Requirement already satisfied: pyasn1<0.6.0,>=0.4.6 in /opt/conda/lib/python3.10/site-packages (from pyasn1-modules>=0.2.1->google-auth<3.0dev,>=1.25.0->google-api-core!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,!=2.6.*,!=2.7.*,<3.0.0dev,>=1.32.0->google-api-core[grpc]!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,!=2.6.*,!=2.7.*,<3.0.0dev,>=1.32.0->google-cloud-aiplatform) (0.5.1)\n",
+ "Downloading google_cloud_aiplatform-1.39.0-py2.py3-none-any.whl (3.4 MB)\n",
+ "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m3.4/3.4 MB\u001b[0m \u001b[31m13.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0ma \u001b[36m0:00:01\u001b[0m\n",
+ "\u001b[?25hInstalling collected packages: google-cloud-aiplatform\n",
+ " Attempting uninstall: google-cloud-aiplatform\n",
+ " Found existing installation: google-cloud-aiplatform 1.37.0\n",
+ " Uninstalling google-cloud-aiplatform-1.37.0:\n",
+ " Successfully uninstalled google-cloud-aiplatform-1.37.0\n",
+ "Successfully installed google-cloud-aiplatform-1.39.0\n"
+ ]
+ }
+ ],
+ "source": [
+ "!pip install --upgrade google-cloud-aiplatform"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 59,
+ "id": "54499594-1d7b-41cc-858b-ffe6f6c2770b",
+ "metadata": {
+ "scrolled": true,
+ "tags": []
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Collecting langchain\n",
+ " Downloading langchain-0.1.0-py3-none-any.whl.metadata (13 kB)\n",
+ "Requirement already satisfied: PyYAML>=5.3 in /opt/conda/lib/python3.10/site-packages (from langchain) (6.0.1)\n",
+ "Requirement already satisfied: SQLAlchemy<3,>=1.4 in /opt/conda/lib/python3.10/site-packages (from langchain) (2.0.23)\n",
+ "Requirement already satisfied: aiohttp<4.0.0,>=3.8.3 in /opt/conda/lib/python3.10/site-packages (from langchain) (3.9.1)\n",
+ "Requirement already satisfied: async-timeout<5.0.0,>=4.0.0 in /opt/conda/lib/python3.10/site-packages (from langchain) (4.0.3)\n",
+ "Collecting dataclasses-json<0.7,>=0.5.7 (from langchain)\n",
+ " Downloading dataclasses_json-0.6.3-py3-none-any.whl.metadata (25 kB)\n",
+ "Requirement already satisfied: jsonpatch<2.0,>=1.33 in /opt/conda/lib/python3.10/site-packages (from langchain) (1.33)\n",
+ "Collecting langchain-community<0.1,>=0.0.9 (from langchain)\n",
+ " Downloading langchain_community-0.0.11-py3-none-any.whl.metadata (7.3 kB)\n",
+ "Collecting langchain-core<0.2,>=0.1.7 (from langchain)\n",
+ " Downloading langchain_core-0.1.9-py3-none-any.whl.metadata (4.0 kB)\n",
+ "Requirement already satisfied: langsmith<0.1.0,>=0.0.77 in /opt/conda/lib/python3.10/site-packages (from langchain) (0.0.77)\n",
+ "Requirement already satisfied: numpy<2,>=1 in /opt/conda/lib/python3.10/site-packages (from langchain) (1.25.2)\n",
+ "Requirement already satisfied: pydantic<3,>=1 in /opt/conda/lib/python3.10/site-packages (from langchain) (1.10.13)\n",
+ "Requirement already satisfied: requests<3,>=2 in /opt/conda/lib/python3.10/site-packages (from langchain) (2.31.0)\n",
+ "Requirement already satisfied: tenacity<9.0.0,>=8.1.0 in /opt/conda/lib/python3.10/site-packages (from langchain) (8.2.3)\n",
+ "Requirement already satisfied: attrs>=17.3.0 in /opt/conda/lib/python3.10/site-packages (from aiohttp<4.0.0,>=3.8.3->langchain) (23.1.0)\n",
+ "Requirement already satisfied: multidict<7.0,>=4.5 in /opt/conda/lib/python3.10/site-packages (from aiohttp<4.0.0,>=3.8.3->langchain) (6.0.4)\n",
+ "Requirement already satisfied: yarl<2.0,>=1.0 in /opt/conda/lib/python3.10/site-packages (from aiohttp<4.0.0,>=3.8.3->langchain) (1.9.4)\n",
+ "Requirement already satisfied: frozenlist>=1.1.1 in /opt/conda/lib/python3.10/site-packages (from aiohttp<4.0.0,>=3.8.3->langchain) (1.4.0)\n",
+ "Requirement already satisfied: aiosignal>=1.1.2 in /opt/conda/lib/python3.10/site-packages (from aiohttp<4.0.0,>=3.8.3->langchain) (1.3.1)\n",
+ "Collecting marshmallow<4.0.0,>=3.18.0 (from dataclasses-json<0.7,>=0.5.7->langchain)\n",
+ " Downloading marshmallow-3.20.2-py3-none-any.whl.metadata (7.5 kB)\n",
+ "Collecting typing-inspect<1,>=0.4.0 (from dataclasses-json<0.7,>=0.5.7->langchain)\n",
+ " Downloading typing_inspect-0.9.0-py3-none-any.whl.metadata (1.5 kB)\n",
+ "Requirement already satisfied: jsonpointer>=1.9 in /opt/conda/lib/python3.10/site-packages (from jsonpatch<2.0,>=1.33->langchain) (2.4)\n",
+ "Requirement already satisfied: anyio<5,>=3 in /opt/conda/lib/python3.10/site-packages (from langchain-core<0.2,>=0.1.7->langchain) (3.7.1)\n",
+ "Requirement already satisfied: packaging<24.0,>=23.2 in /opt/conda/lib/python3.10/site-packages (from langchain-core<0.2,>=0.1.7->langchain) (23.2)\n",
+ "Requirement already satisfied: typing-extensions>=4.2.0 in /opt/conda/lib/python3.10/site-packages (from pydantic<3,>=1->langchain) (4.8.0)\n",
+ "Requirement already satisfied: charset-normalizer<4,>=2 in /opt/conda/lib/python3.10/site-packages (from requests<3,>=2->langchain) (3.3.2)\n",
+ "Requirement already satisfied: idna<4,>=2.5 in /opt/conda/lib/python3.10/site-packages (from requests<3,>=2->langchain) (3.6)\n",
+ "Requirement already satisfied: urllib3<3,>=1.21.1 in /opt/conda/lib/python3.10/site-packages (from requests<3,>=2->langchain) (1.26.18)\n",
+ "Requirement already satisfied: certifi>=2017.4.17 in /opt/conda/lib/python3.10/site-packages (from requests<3,>=2->langchain) (2023.11.17)\n",
+ "Requirement already satisfied: greenlet!=0.4.17 in /opt/conda/lib/python3.10/site-packages (from SQLAlchemy<3,>=1.4->langchain) (3.0.2)\n",
+ "Requirement already satisfied: sniffio>=1.1 in /opt/conda/lib/python3.10/site-packages (from anyio<5,>=3->langchain-core<0.2,>=0.1.7->langchain) (1.3.0)\n",
+ "Requirement already satisfied: exceptiongroup in /opt/conda/lib/python3.10/site-packages (from anyio<5,>=3->langchain-core<0.2,>=0.1.7->langchain) (1.2.0)\n",
+ "Collecting mypy-extensions>=0.3.0 (from typing-inspect<1,>=0.4.0->dataclasses-json<0.7,>=0.5.7->langchain)\n",
+ " Downloading mypy_extensions-1.0.0-py3-none-any.whl (4.7 kB)\n",
+ "Downloading langchain-0.1.0-py3-none-any.whl (797 kB)\n",
+ "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m798.0/798.0 kB\u001b[0m \u001b[31m5.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m00:01\u001b[0m00:01\u001b[0m\n",
+ "\u001b[?25hDownloading dataclasses_json-0.6.3-py3-none-any.whl (28 kB)\n",
+ "Downloading langchain_community-0.0.11-py3-none-any.whl (1.5 MB)\n",
+ "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m1.5/1.5 MB\u001b[0m \u001b[31m26.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0ma \u001b[36m0:00:01\u001b[0m\n",
+ "\u001b[?25hDownloading langchain_core-0.1.9-py3-none-any.whl (216 kB)\n",
+ "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m216.5/216.5 kB\u001b[0m \u001b[31m29.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+ "\u001b[?25hDownloading marshmallow-3.20.2-py3-none-any.whl (49 kB)\n",
+ "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m49.4/49.4 kB\u001b[0m \u001b[31m6.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
+ "\u001b[?25hDownloading typing_inspect-0.9.0-py3-none-any.whl (8.8 kB)\n",
+ "Installing collected packages: mypy-extensions, marshmallow, typing-inspect, langchain-core, dataclasses-json, langchain-community, langchain\n",
+ " Attempting uninstall: langchain-core\n",
+ " Found existing installation: langchain-core 0.1.6\n",
+ " Uninstalling langchain-core-0.1.6:\n",
+ " Successfully uninstalled langchain-core-0.1.6\n",
+ "Successfully installed dataclasses-json-0.6.3 langchain-0.1.0 langchain-community-0.0.11 langchain-core-0.1.9 marshmallow-3.20.2 mypy-extensions-1.0.0 typing-inspect-0.9.0\n"
+ ]
+ }
+ ],
+ "source": [
+ "!pip install langchain"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "f7bbbed7-becf-48d0-98c3-6dd9942fd377",
+ "metadata": {},
+ "source": [
+ "Next we initialize the Gemini model by setting out project id and location. We are also pulling in the packages:\n",
+ "- **GenerativeModel:** Allows us to specify and launch the Gemini model we need (e.g. Gemini Pro, Gemini Pro Vision).\n",
+ "- **ChatSession:** Set Gemini Pro in chatbot mode.\n",
+ "- **Part:** Loads in files from buckets.\n",
+ "- **Image:** Loads in image files locally.\n",
+ "- **GenerationConfig:** Allows us to configure the models temperature, top p, top k, and max tokens."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 6,
+ "id": "47dc9232-383f-405b-b1a8-fab64a80492d",
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [],
+ "source": [
+ "from google.cloud import aiplatform\n",
+ "import vertexai.preview\n",
+ "from vertexai.preview.generative_models import GenerativeModel, ChatSession, Part, Image, GenerationConfig\n",
+ "\n",
+ "# TODO(developer): Update and un-comment below lines\n",
+ "project_id = \n",
+ "location = \n",
+ "vertexai.init(project=project_id, location=location)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "022f24ba-4034-4424-91d8-1229682755ab",
+ "metadata": {},
+ "source": [
+ "### Gemini as a Chatbot"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "b7dec028-6d30-4078-ac54-62b849ae9ced",
+ "metadata": {},
+ "source": [
+ "For dealing with text, code generation, natural language tasks we can use the **gemini-pro** model and to set our model in **chatbot mode** we need to use the `start_chat()` function. You will see below we also created a function named **get_chat_response** which will send the prompt or message we have for our model using the `send_message()` function and returns only the text of the chats response."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 105,
+ "id": "70bc5b25-c796-4015-82dc-6bc861bb525f",
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [],
+ "source": [
+ "model = GenerativeModel(\"gemini-pro\")\n",
+ "chat = model.start_chat()\n",
+ "\n",
+ "def get_chat_response(chat: ChatSession, prompt: str):\n",
+ " response = chat.send_message(prompt)\n",
+ " return response.text"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "ae40e87a-76ac-469b-a504-fb903319cbaf",
+ "metadata": {},
+ "source": [
+ "Now that we have our functions lets ask our Gemini chatbot some questions!"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "9d3277de-ab85-417f-b1d8-b21985a7a21b",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "prompt = \"Hello.\"\n",
+ "print(get_chat_response(chat, prompt))"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 19,
+ "id": "342a0e3d-fbcb-4562-bb5f-b439a92e80e2",
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "**Generative AI use cases that are Life Science or Health Care related:**\n",
+ "\n",
+ "* **Drug discovery and development:** Generative AI can be used to generate new molecules with desired properties, design new drugs, and predict how drugs will interact with biological systems. This can help to accelerate the drug discovery and development process and make it more efficient.\n",
+ "* **Personalized medicine:** Generative AI can be used to generate personalized treatment plans for patients based on their individual genetic and health data. This can help to improve the effectiveness of treatment and reduce side effects.\n",
+ "* **Disease diagnosis and prognosis:** Generative AI can be used to develop new diagnostic tools and methods, and to predict the course of a disease. This can help to improve patient outcomes and reduce healthcare costs.\n",
+ "* **Medical imaging:** Generative AI can be used to generate synthetic medical images, which can be used to train medical students and residents, develop new imaging technologies, and improve the accuracy of diagnosis.\n",
+ "* **Electronic health records (EHRs):** Generative AI can be used to generate synthetic EHRs, which can be used to train machine learning algorithms, develop new clinical decision support tools, and improve the efficiency of healthcare operations.\n",
+ "* **Healthcare chatbots:** Generative AI can be used to develop healthcare chatbots that can provide patients with information and support, answer questions, and schedule appointments. This can help to improve patient access to care and reduce the burden on healthcare providers.\n",
+ "* **Drug repurposing:** Generative AI can be used to identify new uses for existing drugs, which can help to expand treatment options for patients and reduce the cost of drug development.\n",
+ "* **Clinical trial design:** Generative AI can be used to design more efficient and effective clinical trials, which can help to accelerate the development of new treatments and improve patient outcomes.\n",
+ "* **Healthcare fraud detection:** Generative AI can be used to detect fraudulent healthcare claims, which can help to reduce costs and improve the efficiency of healthcare operations.\n",
+ "\n",
+ "These are just a few examples of the many potential use cases for generative AI in the life science and healthcare industries. As generative AI technology continues to develop, we can expect to see even more innovative and groundbreaking applications in the years to come.\n"
+ ]
+ }
+ ],
+ "source": [
+ "prompt = \"List gen ai use cases that are Life Science or Health Care related. \"\n",
+ "print(get_chat_response(chat, prompt))"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "fcfda59a-c440-4489-8f00-a4316b827292",
+ "metadata": {},
+ "source": [
+ "We can even ask it to **generate code or debug code**!"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 106,
+ "id": "f0b917b2-22b5-4011-a9c4-d8a667cf6b1d",
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Sure, here's a Python script that will replace all null values (empty cells) with zeros within a CSV file:\n",
+ "\n",
+ "\n",
+ "```python\n",
+ "import csv\n",
+ "\n",
+ "# Open the CSV file for reading and writing.\n",
+ "with open('input.csv', 'r+', newline='') as csvfile:\n",
+ " # Create a CSV reader and writer.\n",
+ " reader = csv.reader(csvfile)\n",
+ " writer = csv.writer(csvfile)\n",
+ "\n",
+ " # Read the header row.\n",
+ " header = next(reader)\n",
+ "\n",
+ " # Replace null values with zeros in the remaining rows.\n",
+ " for row in reader:\n",
+ " for i, cell in enumerate(row):\n",
+ " if cell == '':\n",
+ " row[i] = '0'\n",
+ "\n",
+ " # Write the updated row to the CSV file.\n",
+ " writer.writerow(row)\n",
+ "```\n",
+ "\n",
+ "\n",
+ "To use this script, save it as a file (e.g. `replace_nulls.py`) and run it from the command line:\n",
+ "\n",
+ "\n",
+ "```\n",
+ "python replace_nulls.py\n",
+ "```\n",
+ "\n",
+ "\n",
+ "This will replace all null values in the 'input.csv' file with zeros and create a new CSV file called 'output.csv'.\n",
+ "\n",
+ "\n",
+ "**Note:** Make sure to replace `input.csv` with the actual name of your input CSV file. You can also change the output file name by modifying the `output.csv` part of the script.\n"
+ ]
+ }
+ ],
+ "source": [
+ "prompt = \"create a python code that will replace all null values to zero within a csv file\"\n",
+ "print(get_chat_response(chat, prompt))"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "3a9d57d2-dbe1-434a-9345-fe5ae3315a21",
+ "metadata": {},
+ "source": [
+ "### Gemini as a Summarizer"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "4856bd61-3e24-40ea-8da4-1a6edc5f3e1d",
+ "metadata": {},
+ "source": [
+ "We can generate text like asking Gemini Pro to summarize articles we provide locally (using langchain). As of now Gemini does not support loading in documents that are not videos and images directly. "
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "b5d7df19-a625-40dc-b4e5-faff5e7ba241",
+ "metadata": {},
+ "source": [
+ "First we will load in a file using langchains text loader. You can also use langchain to load in files from your bucket following the instructions [here](https://python.langchain.com/docs/integrations/document_loaders/google_cloud_storage_file)."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 71,
+ "id": "3becd6c2-daf0-4287-80e5-06cf419287bd",
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [],
+ "source": [
+ "from langchain_community.document_loaders import TextLoader\n",
+ "\n",
+ "loader = TextLoader(\"./PMC10000003.txt\")\n",
+ "ex_file=loader.load()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "4e3d6c19-7d68-49e1-85e9-91bcd6bd1775",
+ "metadata": {},
+ "source": [
+ "We can configure our model to give us the best optimal output by setting the parameters below:\n",
+ "- **Max_Output_Token**: Max number of words to generate.\n",
+ "- **Temperature:** Controls randomness, higher values increase diversity meaning a more unique response make the model to think harder. Must be a number from 0 to 1.\n",
+ "- **Top_p (nucleus):** The cumulative probability cutoff for token selection. Lower values mean sampling from a smaller, more top-weighted nucleus. Must be a number from 0 to 1.\n",
+ "- **Top_k:** Sample from the k most likely next tokens at each step. Lower k focuses on higher probability tokens. This means the model choses the most probable words. Lower values eliminate fewer coherent words.\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 103,
+ "id": "c4228e44-9639-40da-8f69-343be93b65b0",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "generation_config = GenerationConfig(\n",
+ " temperature=0.9,\n",
+ " top_p=1.0,\n",
+ " top_k=32,\n",
+ " candidate_count=1,\n",
+ " max_output_tokens=8192,\n",
+ ")\n",
+ "\n",
+ "def summarizer(file: str) -> str:\n",
+ " \n",
+ " # Query the model\n",
+ " response = model.generate_content(\n",
+ " [\n",
+ " # Add an example query\n",
+ " \"summarize this file.\",\n",
+ " file\n",
+ " ],\n",
+ " generation_config=generation_config,\n",
+ " )\n",
+ " #print(response)\n",
+ " return response.text"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "0bb722d6-ef79-4a0b-9327-04811e7f8ffc",
+ "metadata": {},
+ "source": [
+ "Here we are inputting only the page content from our document loader."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 92,
+ "id": "97e7ea82-c58e-42ee-b01e-6aa51e324b05",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "The article \"Mechanical Treatment of Inflammation of the Knee Joint\" published in the Chicago Medical Examiner on January 1867, discusses the use of mechanical treatments for inflammation of the knee joint. The author emphasizes the importance of overcoming the reflex contraction of muscles surrounding the joint to prevent or correct deformities. Tenotomy of the flexor tendons may be necessary to achieve this. Additionally, the relief of pressure on the inflamed joint surfaces is crucial for recovery. This can be achieved through various methods such as adhesive strap dressings, application of an air cushion, or evacuation of pus from the joint. The author also introduces a new apparatus for making extension, which allows for optimal counter-extension and can be used in both acute and chronic cases. The advantages of this apparatus include its large counter-extending surface, security, and patient comfort. By utilizing this principle, various instruments can be crafted to address knee deformities.\n"
+ ]
+ }
+ ],
+ "source": [
+ "print(summarizer(ex_file[0].page_content))"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "77c10bd8-1570-4dfa-9b2a-999e3f149faf",
+ "metadata": {},
+ "source": [
+ "### Gemini as a Image to Text Generator"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "476052b3-ff64-4e5a-819c-7a18daf7f413",
+ "metadata": {},
+ "source": [
+ "Gemini Pro Vision can generate text from images and videos. These text can be descriptions or questions about the image or video. You can download an image or retrieve an image from your bucket or locally."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "214ee84f-f058-411a-944d-4e149cd0e9bc",
+ "metadata": {},
+ "source": [
+ "Images can only be in the following formats: \n",
+ "- PNG - image/png\n",
+ "- JPEG - image/jpeg\n",
+ "\n",
+ "Our function below takes in a prompt and the image, we have also included a if statement to recognize if the function should use `Image` to load in a image locally or `Part` to load it from a bucket."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "501efefe-d52f-43b3-b4eb-3d3fe81f4a3e",
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [],
+ "source": [
+ "def img2text(image_path: str, img_prompt: str) -> str:\n",
+ " multimodal_model = GenerativeModel(\"gemini-pro-vision\")\n",
+ " if \"gs://\" in image_path:\n",
+ " image1=Part.from_uri(image_path, mime_type=\"image/jpeg\")\n",
+ " else: \n",
+ " image1=Image.load_from_file(image_path)\n",
+ " #image1=Image.load_from_file(image_path)\n",
+ " responses = multimodal_model.generate_content(\n",
+ " [image1, img_prompt],\n",
+ " generation_config={\n",
+ " \"max_output_tokens\": 2048,\n",
+ " \"temperature\": 0.4,\n",
+ " \"top_p\": 1,\n",
+ " \"top_k\": 32\n",
+ " },\n",
+ " stream=True,\n",
+ " )\n",
+ " for response in responses:\n",
+ " print(response.text, end=\"\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "4cafaaa8-f3c7-472a-8f0b-6595d2112636",
+ "metadata": {},
+ "source": [
+ "Lets look at an image locally, by loading a image first, this a image of a Covid virus from the [CDC Public Health Image Library](https://phil.cdc.gov/details.aspx?pid=23312)."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 146,
+ "id": "b939f105-89c2-4c38-80f8-2cddf8dcb0ca",
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "--2024-01-11 05:24:55-- https://phil.cdc.gov//PHIL_Images/23312/23312_lores.jpg\n",
+ "Resolving phil.cdc.gov (phil.cdc.gov)... 198.246.102.26\n",
+ "Connecting to phil.cdc.gov (phil.cdc.gov)|198.246.102.26|:443... connected.\n",
+ "HTTP request sent, awaiting response... 200 OK\n",
+ "Length: 31823 (31K) [image/jpeg]\n",
+ "Saving to: ‘example_image_covid.jpg’\n",
+ "\n",
+ "example_image_covid 100%[===================>] 31.08K --.-KB/s in 0.07s \n",
+ "\n",
+ "2024-01-11 05:24:55 (455 KB/s) - ‘example_image_covid.jpg’ saved [31823/31823]\n",
+ "\n"
+ ]
+ }
+ ],
+ "source": [
+ "! wget -O example_image_covid.jpg \"https://phil.cdc.gov//PHIL_Images/23312/23312_lores.jpg\" "
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "696d9fc7-0ca9-44a7-b431-a9698b1a636c",
+ "metadata": {},
+ "source": [
+ "Now run our function!"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 180,
+ "id": "34e81656-4943-439d-9fbe-df439e0e30df",
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ " The image is a 3D rendering of a coronavirus. The virus is round and has a spiky outer coat. The spikes are made of proteins that help the virus attach to and infect cells. The virus is colored gray and red.None\n"
+ ]
+ }
+ ],
+ "source": [
+ "print(img2text(\"example_image_covid.jpg\", \"describe this image.\"))"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "dc152659-7bc4-46bc-9ead-4736b4ad2706",
+ "metadata": {},
+ "source": [
+ "Next we'll look at an image from a bucket."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 181,
+ "id": "81197d53-dd3d-4358-9835-ef513ec11d33",
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ " The image shows a table with a pink peony bouquet, two cups of coffee, a bowl of blueberries, and a silver spoon with the words \"Let's Jam\" on it. There are also five scones with blueberries on them. The table is covered with a white tablecloth with purple stains.None\n"
+ ]
+ }
+ ],
+ "source": [
+ "print(img2text(\"gs://generativeai-downloads/images/scones.jpg\", \"describe this image.\"))"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "80e3c48f-66e9-40e1-8623-04f73b672507",
+ "metadata": {},
+ "source": [
+ "We can even ask for more details related to the items in our image!"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 182,
+ "id": "d8395784-ea68-4a95-a0bb-b3d618f68054",
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ " Preheat oven to 375 degrees F (190 degrees C). Grease a baking sheet.\n",
+ "\n",
+ "In a large bowl, combine the flour, sugar, baking powder, and salt. Cut in butter until mixture resembles coarse crumbs. Stir in blueberries.\n",
+ "\n",
+ "Turn out onto a lightly floured surface; knead for 10 to 12 times. Pat into a 1/2-in.-thick circle. Cut with a 3-in. floured biscuit cutter. Place 2 in. apart on the prepared baking sheet.\n",
+ "\n",
+ "Bake for 12-15 minutes or until golden brown. Cool for 2 minutes before removing to a wire rack to cool completely.None\n"
+ ]
+ }
+ ],
+ "source": [
+ "img_prompt=\"How do you make whats in this image?\"\n",
+ "image=\"gs://generativeai-downloads/images/scones.jpg\"\n",
+ "print(img2text(image, img_prompt))"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "960f7ba6-8236-437a-ba99-ea1d887efd64",
+ "metadata": {},
+ "source": [
+ "### Gemini as a Video to Text Generator"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "fbdbbd35-f36f-4467-8a1d-48ccd942d7cc",
+ "metadata": {},
+ "source": [
+ "Just like images we will be using the same model Gemini Pro Vision. We can load videos locally and from a bucket just like images. Video files can only be in the following formats:\n",
+ "- MOV - video/mov\n",
+ "- MPEG - video/mpeg\n",
+ "- MP4 - video/mp4\n",
+ "- MPG - video/mpg\n",
+ "- AVI - video/avi\n",
+ "- WMV - video/wmv\n",
+ "- MPEGPS - video/mpegps\n",
+ "- FLS - video/flv"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "a7f94141-b5ba-49bf-a3d7-d0b68ccdd39a",
+ "metadata": {},
+ "source": [
+ "Our function below takes a video from a public bucket and asks for a prompt and the location of the video file."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "c18561c6-8f8f-46e4-b3ee-0d5fc96f2d31",
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [],
+ "source": [
+ "def video2text(video_path: str, video_prompt: str) -> str:\n",
+ " # Query the model\n",
+ " response = multimodal_model.generate_content(\n",
+ " [\n",
+ " # Add an example image\n",
+ " Part.from_uri(\n",
+ " video_path, mime_type=\"video/mp4\"\n",
+ " ),\n",
+ " # Add an example query\n",
+ " video_prompt,\n",
+ " ],\n",
+ " stream=True\n",
+ " )\n",
+ " for chunk in response :\n",
+ " return print(chunk.text)\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "492f9100-b5a4-446c-a03d-dd89a1b4bbde",
+ "metadata": {},
+ "source": [
+ "Run the function!"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 191,
+ "id": "55990074-0365-45f5-9fa6-bedbe93c9932",
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ " This video is about a messy world. It shows a bunch of different things that are messy, like a messy room, a messy desk, and a messy\n",
+ "None\n"
+ ]
+ }
+ ],
+ "source": [
+ "video_prompt=\"What is this video about in detail?\"\n",
+ "video=\"gs://cloud-samples-data/video/Machine Learning Solving Problems Big, Small, and Prickly.mp4\"\n",
+ "print(video2text(video, video_prompt))"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "c32253b4-d892-4c0c-896e-320c8df479c7",
+ "metadata": {},
+ "source": [
+ "## Gemini on Vertex AI Studio"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "614883f2-e53a-4ba5-855c-209d755a6e6f",
+ "metadata": {},
+ "source": [
+ "You can also use Gemini Pro and Pro Vision in Vertex AI's playground called **Vertex AI Studio**. To locate Vertex AI Studio search Vertex AI and on the left hand side locate Vertex AI Studio as the image below shows. To utilize Gemini Pro Vision locate and click **Multimodal** you will have the option to use your own prompt or explore some of the other set prompts such as Extract text from images, image question answering , etc."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "8a634e11-3bd5-4048-b6e4-46d5aac6ce34",
+ "metadata": {},
+ "source": [
+ "![Gemini1](../../../images/Gemini_1.png)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "c3716d9d-49af-40c9-acce-757c53be9a12",
+ "metadata": {},
+ "source": [
+ "For this tutorial we will select Open on the **Prompt Design** option. We will upload the COVID image we downloaded before by clicking **INSERT MEDIA** and selecting our file. Then we will ask it a question, here we asked \"Describe treatments for the item in this image\"."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "ec72f9cd-0d06-47e4-b67c-45039372d967",
+ "metadata": {
+ "tags": []
+ },
+ "source": [
+ "![Gemini3](../../../images/Gemini_3.png)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "4a88f06b-770a-43be-b875-b4054c3ccbf3",
+ "metadata": {},
+ "source": [
+ "To utilize Gemini Pro locate and click **Language** on the left side menu. You have the option to use a prompt or chat and if you would like to focus on text or code."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "1952cd5c-c93d-428f-8036-ac658eebfba4",
+ "metadata": {
+ "tags": []
+ },
+ "source": [
+ "![Gemini2](../../../images/Gemini_2.png)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "7810c0db-d145-4acd-a2cc-b6ce8231fa14",
+ "metadata": {},
+ "source": [
+ "Here we picked the **TEXT CHAT** option and asked the bot to describe covid and how it works."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "36b4635d-d8f3-42df-959d-dff92259813c",
+ "metadata": {
+ "tags": []
+ },
+ "source": [
+ "![Gemini4](../../../images/Gemini_4.png)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "acc53b99-e6ed-45f1-a0e1-9b146e18e46c",
+ "metadata": {},
+ "outputs": [],
+ "source": []
+ }
+ ],
+ "metadata": {
+ "environment": {
+ "kernel": "python3",
+ "name": "common-cpu.m114",
+ "type": "gcloud",
+ "uri": "gcr.io/deeplearning-platform-release/base-cpu:m114"
+ },
+ "kernelspec": {
+ "display_name": "Python 3",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.10.13"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/tutorials/notebooks/GenAI/VertexAIStudioGCP.ipynb b/tutorials/notebooks/GenAI/VertexAIStudioGCP.ipynb
new file mode 100644
index 0000000..3366fff
--- /dev/null
+++ b/tutorials/notebooks/GenAI/VertexAIStudioGCP.ipynb
@@ -0,0 +1,153 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "Hny4I-ODTIS6"
+ },
+ "source": [
+ "# Vertex AI Studio on GCP - Article Summary\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "-nLS57E2TO5y"
+ },
+ "source": [
+ "## Overview\n",
+ "\n",
+ "In research you often need to read several papers to understand new method or finidings and this can be quite taxing if you only need to get the gist of a article or need to quickly skim the article. In this tutorial we will use generative AI to summarize long documents but with the goal of still perserving the most important information using Generative AI Studio's Article Summary model."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "skXAu__iqks_"
+ },
+ "source": [
+ "### Costs\n",
+ "\n",
+ "This tutorial uses billable components of Google Cloud:\n",
+ "- Vertex AI Generative AI Studio\n",
+ "\n",
+ "Learn about [Vertex AI pricing](https://cloud.google.com/vertex-ai/pricing), [Generative AI pricing](https://cloud.google.com/vertex-ai/pricing#generative_ai_models), and use the [Pricing Calculator](https://cloud.google.com/products/calculator/) to generate a cost estimate based on your projected usage."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "mvKl-BtQTRiQ"
+ },
+ "source": [
+ "## Getting Started"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "x_xMwRLuyDrj"
+ },
+ "source": [
+ "Here you will use LLM via the API to summarize the extracted texts. Please note that LLMs currently have input text limit and stuffing a large input text might not be accepted. You can read more about quotas and limits [here](https://cloud.google.com/vertex-ai/docs/quotas)."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Go to the Generative AI Studio console [here](https://console.cloud.google.com/vertex-ai/generative/language?_ga=2.182664366.923116401.1692009977-1042353744.1691708677).\n",
+ "\n",
+ "Scroll down to **Summarization** and click on the model **Article Summary**. You will see a prompt session were you will need to enter in the contents of your article as the console does not allow you to upload files. For this tutorial this article is about how gut microbiota affects Alzeheimer's disease because of the gut-brain-microbiota axis network [here](https://www.aging-us.com/article/102930/pdf).\n",
+ "\n",
+ " \n",
+ "\n",
+ "To the left you can control the parameters that we have been using before this is a great way to test what each parameter does and how they effect each other. Once you are done click **submit**, you should have a similar output as below. For explainations on the parameters **temperature, token limit max, top p, and top k** see the following article [here](https://cloud.google.com/vertex-ai/docs/generative-ai/text/test-text-prompts#generative-ai-test-text-prompt-drest).\n",
+ "\n",
+ " \n",
+ " \n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Lets try increasing the temperature parameter and see if we recieve a different output."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ " "
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "As you can see our output becomes shorter and straight to the point this is because the temperature parameter controls the degree of randomness in token selection. Lower temperatures are good for prompts that require a more deterministic and less creative responses, while higher temperatures can lead to more diverse or creative results. A temperature of 0 is deterministic, meaning that the highest probability response is always selected."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "N5aVrDWkJs3Y"
+ },
+ "source": [
+ "### Troubleshooting\n",
+ "\n",
+ "If you model responds with an error its generally because the extracted text is too long for the generative model to process. In order for the model to work best try to not include any abstract or reference sections of your article, if errors still come up try limiting the article even more by removing other sections such as the intro or extracting the first 30,000 words of the article."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "Vtp21WX3T7d_"
+ },
+ "source": [
+ "### Recap\n",
+ "\n",
+ "Although full text is too large for the model, you have managed to create a concise, paragraph of the most important information from a portion of the PDF using the model. This method is the most simiplest and is ideal for shorter documents but can still be used when you limit the character number you want the model to read. "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": []
+ }
+ ],
+ "metadata": {
+ "colab": {
+ "name": "summarization_large_documents.ipynb",
+ "toc_visible": true
+ },
+ "environment": {
+ "kernel": "python3",
+ "name": "tf2-gpu.2-11.m108",
+ "type": "gcloud",
+ "uri": "gcr.io/deeplearning-platform-release/tf2-gpu.2-11:m108"
+ },
+ "kernelspec": {
+ "display_name": "Python (Local)",
+ "language": "python",
+ "name": "local-base"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.7.12"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}
diff --git a/tutorials/notebooks/GenAI/example_scripts/example_langchain_chat_llama_2_zeroshot.py b/tutorials/notebooks/GenAI/example_scripts/example_langchain_chat_llama_2_zeroshot.py
new file mode 100644
index 0000000..1756c90
--- /dev/null
+++ b/tutorials/notebooks/GenAI/example_scripts/example_langchain_chat_llama_2_zeroshot.py
@@ -0,0 +1,101 @@
+from langchain.retrievers import PubMedRetriever
+from langchain.chains import ConversationalRetrievalChain
+from langchain.prompts import PromptTemplate
+#from langchain.llms import VertexAIModelGarden
+from langchain.llms import VertexAI
+import sys
+import json
+import os
+
+
+class bcolors:
+ HEADER = '\033[95m'
+ OKBLUE = '\033[94m'
+ OKCYAN = '\033[96m'
+ OKGREEN = '\033[92m'
+ WARNING = '\033[93m'
+ FAIL = '\033[91m'
+ ENDC = '\033[0m'
+ BOLD = '\033[1m'
+ UNDERLINE = '\033[4m'
+
+MAX_HISTORY_LENGTH = 1
+
+def build_chain():
+ #if using model from uncomment Model Garden
+ #PROJECT_ID = os.environ["PROJECT_ID"]
+ #LOCATION_ID = os.environ["LOCATION_ID"]
+ #ENDPOINT_ID = os.environ["ENDPOINT_ID"]
+
+ #llm = VertexAIModelGarden(project=PROJECT_ID, endpoint_id=ENDPOINT_ID, location=LOCATION_ID)
+
+ llm = VertexAI(
+ model_name="text-bison@001",
+ max_output_tokens=1024,
+ temperature=0.2,
+ top_p=0.8,
+ top_k=40,
+ verbose=True,
+)
+
+ retriever= PubMedRetriever()
+
+ prompt_template = """
+ Ignore everything before.
+ Instructions:
+ I will provide you with research papers on a specific topic in English, and you will create a cumulative summary.
+ The summary should be concise and should accurately and objectively communicate the takeaway of the papers related to the topic.
+ You should not include any personal opinions or interpretations in your summary, but rather focus on objectively presenting the information from the papers.
+ Your summary should be written in your own words and ensure that your summary is clear, concise, and accurately reflects the content of the original papers. First, provide a concise summary then citations at the end.
+ {question} Answer "don't know" if not present in the document.
+ {context}
+ Solution:"""
+
+
+ PROMPT = PromptTemplate(
+ template=prompt_template, input_variables=["context", "question"],
+ )
+
+ condense_qa_template = """
+ Chat History:
+ {chat_history}
+ Here is a new question for you: {question}
+ Standalone question:"""
+ standalone_question_prompt = PromptTemplate.from_template(condense_qa_template)
+
+ qa = ConversationalRetrievalChain.from_llm(
+ llm=llm,
+ retriever=retriever,
+ condense_question_prompt=standalone_question_prompt,
+ return_source_documents=True,
+ combine_docs_chain_kwargs={"prompt":PROMPT},
+ )
+ return qa
+
+def run_chain(chain, prompt: str, history=[]):
+ print(prompt)
+ return chain({"question": prompt, "chat_history": history})
+
+if __name__ == "__main__":
+ chat_history = []
+ qa = build_chain()
+ print(bcolors.OKBLUE + "Hello! How can I help you?" + bcolors.ENDC)
+ print(bcolors.OKCYAN + "Ask a question, start a New search: or CTRL-D to exit." + bcolors.ENDC)
+ print(">", end=" ", flush=True)
+ for query in sys.stdin:
+ if (query.strip().lower().startswith("new search:")):
+ query = query.strip().lower().replace("new search:","")
+ chat_history = []
+ elif (len(chat_history) == MAX_HISTORY_LENGTH):
+ chat_history.pop(0)
+ result = run_chain(qa, query, chat_history)
+ chat_history.append((query, result["answer"]))
+ print(bcolors.OKGREEN + result['answer'] + bcolors.ENDC)
+ if 'source_documents' in result:
+ print(bcolors.OKGREEN + 'Sources:')
+ for idx, ref in enumerate(result["source_documents"]):
+ print("PubMed UID: "+ref.metadata["uid"])
+ print(bcolors.ENDC)
+ print(bcolors.OKCYAN + "Ask a question, start a New search: or CTRL-D to exit." + bcolors.ENDC)
+ print(">", end=" ", flush=True)
+ print(bcolors.OKBLUE + "Bye" + bcolors.ENDC)
diff --git a/tutorials/notebooks/GenAI/example_scripts/example_vectorsearch_chat_llama_2_zeroshot.py b/tutorials/notebooks/GenAI/example_scripts/example_vectorsearch_chat_llama_2_zeroshot.py
new file mode 100644
index 0000000..bcc8acb
--- /dev/null
+++ b/tutorials/notebooks/GenAI/example_scripts/example_vectorsearch_chat_llama_2_zeroshot.py
@@ -0,0 +1,120 @@
+from langchain.chains import ConversationalRetrievalChain
+from langchain.prompts import PromptTemplate
+#from langchain.llms import VertexAIModelGarden
+from langchain.embeddings import VertexAIEmbeddings
+from langchain.vectorstores import MatchingEngine
+from langchain.llms import VertexAI
+import sys
+import json
+import os
+
+
+class bcolors:
+ HEADER = '\033[95m'
+ OKBLUE = '\033[94m'
+ OKCYAN = '\033[96m'
+ OKGREEN = '\033[92m'
+ WARNING = '\033[93m'
+ FAIL = '\033[91m'
+ ENDC = '\033[0m'
+ BOLD = '\033[1m'
+ UNDERLINE = '\033[4m'
+
+MAX_HISTORY_LENGTH = 1
+
+def build_chain():
+ #if using model from uncomment Model Garden
+
+ PROJECT_ID = os.environ["PROJECT_ID"]
+ LOCATION_ID = os.environ["LOCATION_ID"]
+ #ENDPOINT_ID = os.environ["ENDPOINT_ID"]
+ BUCKET = os.environ["BUCKET"]
+ VC_INDEX_ID = os.environ["VC_INDEX_ID"]
+ VC_ENDPOINT_ID = os.environ["VC_ENDPOINT_ID"]
+
+
+ #llm = VertexAIModelGarden(project=PROJECT_ID, endpoint_id=ENDPOINT_ID, location=LOCATION_ID)
+ llm = VertexAI(
+ model_name="text-bison@001",
+ max_output_tokens=1024,
+ temperature=0.2,
+ top_p=0.8,
+ top_k=40,
+ verbose=True,
+)
+ embeddings = VertexAIEmbeddings()
+
+ vector_store = MatchingEngine.from_components(
+ project_id=PROJECT_ID,
+ region=LOCATION_ID,
+ gcs_bucket_name=BUCKET,
+ embedding=embeddings,
+ index_id=VC_INDEX_ID,
+ endpoint_id=VC_ENDPOINT_ID
+ )
+
+ retriever = vector_store.as_retriever(
+ search_type="similarity",
+ search_kwargs={"k":3}
+ )
+
+ prompt_template = """
+ Ignore everything before.
+ Instructions:
+ I will provide you with research papers on a specific topic in English, and you will create a cumulative summary.
+ The summary should be concise and should accurately and objectively communicate the takeaway of the papers related to the topic.
+ You should not include any personal opinions or interpretations in your summary, but rather focus on objectively presenting the information from the papers.
+ Your summary should be written in your own words and ensure that your summary is clear, concise, and accurately reflects the content of the original papers. First, provide a concise summary then citations at the end.
+ {question} Answer "don't know" if not present in the document.
+ {context}
+ Solution:"""
+
+
+ PROMPT = PromptTemplate(
+ template=prompt_template, input_variables=["context", "question"],
+ )
+
+ condense_qa_template = """
+ Chat History:
+ {chat_history}
+ Here is a new question for you: {question}
+ Standalone question:"""
+ standalone_question_prompt = PromptTemplate.from_template(condense_qa_template)
+
+ #RetrievalQA.from_chain_type(llm=llm, chain_type="stuff"
+ qa = ConversationalRetrievalChain.from_llm(
+ llm=llm,
+ retriever=retriever,
+ condense_question_prompt=standalone_question_prompt,
+ return_source_documents=True,
+ combine_docs_chain_kwargs={"prompt":PROMPT},
+ )
+ return qa
+
+def run_chain(chain, prompt: str, history=[]):
+ print(prompt)
+ return chain({"question": prompt, "chat_history": history})
+
+if __name__ == "__main__":
+ chat_history = []
+ qa = build_chain()
+ print(bcolors.OKBLUE + "Hello! How can I help you?" + bcolors.ENDC)
+ print(bcolors.OKCYAN + "Ask a question, start a New search: or CTRL-D to exit." + bcolors.ENDC)
+ print(">", end=" ", flush=True)
+ for query in sys.stdin:
+ if (query.strip().lower().startswith("new search:")):
+ query = query.strip().lower().replace("new search:","")
+ chat_history = []
+ elif (len(chat_history) == MAX_HISTORY_LENGTH):
+ chat_history.pop(0)
+ result = run_chain(qa, query, chat_history)
+ chat_history.append((query, result["answer"]))
+ print(bcolors.OKGREEN + result['answer'] + bcolors.ENDC)
+ if 'source_documents' in result:
+ print(bcolors.OKGREEN + 'Sources:')
+ for idx, ref in enumerate(result["source_documents"]):
+ print(ref.page_content)
+ print(bcolors.ENDC)
+ print(bcolors.OKCYAN + "Ask a question, start a New search: or CTRL-D to exit." + bcolors.ENDC)
+ print(">", end=" ", flush=True)
+ print(bcolors.OKBLUE + "Bye" + bcolors.ENDC)
diff --git a/tutorials/notebooks/GenAI/langchain_on_vertex.ipynb b/tutorials/notebooks/GenAI/langchain_on_vertex.ipynb
new file mode 100644
index 0000000..ebfec60
--- /dev/null
+++ b/tutorials/notebooks/GenAI/langchain_on_vertex.ipynb
@@ -0,0 +1,583 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "id": "b50f6f80-73aa-4a31-9eb5-ac30d5958fe7",
+ "metadata": {},
+ "source": [
+ "## Background\n",
+ "This tutorial is designed to give you the basics of using langchain to work with Large Language Models (LLMs) for document summarization and basic chat bot functionality. You could take what we have here to build a front end application using something like streamlit, or other further iterations."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "5ebc47a9-958b-4250-a3bf-688e627f2c6a",
+ "metadata": {},
+ "source": [
+ "**Increase Max Tokens**"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "d2f0d198-f1cf-40ec-b813-4b1e8d50ab80",
+ "metadata": {},
+ "source": [
+ "### Install packages"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "8662e8f8-66ce-4ca6-a121-d087c499390f",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "!pip install google-cloud-aiplatform==1.34.0 langchain==0.0.310 pypdf faiss-cpu --user"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "cd534fb6-29c8-4c15-b9cc-88f667ec8127",
+ "metadata": {},
+ "source": [
+ "### Import libraries"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "27e6851a-f15d-4881-8173-9b788a009201",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import langchain\n",
+ "from langchain.llms import VertexAI\n",
+ "from langchain.vectorstores import FAISS\n",
+ "from langchain.prompts import PromptTemplate\n",
+ "from langchain.schema import StrOutputParser\n",
+ "from langchain.document_loaders import PyPDFLoader\n",
+ "from langchain.embeddings import VertexAIEmbeddings\n",
+ "from langchain.document_loaders import WebBaseLoader\n",
+ "from langchain.chains.summarize import load_summarize_chain\n",
+ "from langchain.schema.prompt_template import format_document\n",
+ "from langchain.text_splitter import RecursiveCharacterTextSplitter"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "73b34ed1-4ad0-4ab6-8e9f-148c6ef3f575",
+ "metadata": {},
+ "source": [
+ "## Summarize a scientific article using an LLM"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "46d1b6cc-862e-4a67-a755-fbc4f7595c6f",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "loader = WebBaseLoader(\"https://pubmed.ncbi.nlm.nih.gov/37883540/\")\n",
+ "docs = loader.load()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "e34bd138-d852-40ba-87bd-ee559483aa20",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "llm = VertexAI()\n",
+ "print('the LLM and default params are : ', llm)\n",
+ "\n",
+ "chain = load_summarize_chain(llm, chain_type=\"stuff\")\n",
+ "\n",
+ "print('\\n''the LLM chain used is ''\\n', chain)\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "dee2c20d-7678-4f6d-81c7-0b2a2b62d055",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "print('the summary of the document in a single paragraph is: ')\n",
+ "\n",
+ "chain.run(docs)\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "1b1fc982-3d07-4501-8a54-6957100ebaff",
+ "metadata": {},
+ "source": [
+ "**Now try using [a different LLM](https://python.langchain.com/docs/integrations/llms/) and see if you can get the code to run!**"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "3883dab5-cabd-4eea-bddc-c4c14b2bf5dc",
+ "metadata": {},
+ "source": [
+ "## Ask a general question to an LLM, without the context of a specific source"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "0ad234c3-47c4-4aaf-a5b1-a3323555a8a5",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "template = \"\"\"Question: {question}\n",
+ "\n",
+ "Answer: Let's think step by step.\"\"\"\n",
+ "\n",
+ "prompt = PromptTemplate.from_template(template)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "126cdbda-6446-4bbb-8018-f24fce5a7216",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "chain = prompt | llm"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "7323a512-5826-4498-baa6-65dca1dc6a6f",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "question = \"What evidence do we have for chimpanzees going through menopause?\"\n",
+ "\n",
+ "print(chain.invoke({\"question\": question}))"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "813a49df-81f4-4afa-acd6-799e4cfba921",
+ "metadata": {},
+ "source": [
+ "## Build a simple Chat Bot to query specific content"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "31c47bde-210d-46d7-ab3a-43ee000d293e",
+ "metadata": {},
+ "source": [
+ "### Load your PDF file"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "8a518c3b-f20c-42df-956c-2f75081c1a6f",
+ "metadata": {},
+ "source": [
+ "Read more about document loaders from langchain [here](https://python.langchain.com/docs/modules/data_connection/document_loaders/pdf). Note that we are both loading, and splitting our document. You can read more about the default document chunking/splitting procedures [here](https://python.langchain.com/docs/modules/data_connection/document_transformers/#get-started-with-text-splitters)."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "2c5bcbbb-8e24-424d-931d-c9b6c09fb888",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "loader = PyPDFLoader(\"articles/science.add5473.pdf\")\n",
+ "pages = loader.load_and_split()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "4a4a30a3-8a71-47ff-b264-83517e2b163a",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# you could also load from the web url\n",
+ "# loader = WebBaseLoader(\"https://pubmed.ncbi.nlm.nih.gov/37883540/\")\n",
+ "# docs = loader.load()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "745f2d86-7e59-40f0-bfea-facd6fec226f",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "pages[0]"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "878675c2-183c-445f-8829-b403ae3d2858",
+ "metadata": {},
+ "source": [
+ "### Create a vector store\n",
+ "One of the usual methods for organizing and searching through unstructured data is to convert it into embedded vectors, which are compact (numerical) representations. These vectors are stored, and when you want to find something similar, you turn your query into an embedded vector as well. A \"vector store\" then manages the stored data and helps you find the most similar vectors to your query. Read more about vector stores in langchain [here](https://python.langchain.com/docs/modules/data_connection/vectorstores/). Here we are going to use a very meta technique using the Facebook AI Similarity Search (FAISS) library. You can explore the various vector store options [here](https://python.langchain.com/docs/integrations/vectorstores/). Here we are using embeddings to downselect the total information we want to feed to the LLM downstream. As token limits go up, we will eventually be able to feed a whole document to the LLM, but for now, you will usually need to use this method to downsample. If your document is small enough, just push it directly to the LLM. Also, use embeddings for when you want to query over many documents (1000's). "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "8144b302-d8a5-4c12-9e8a-8bff530c7006",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# index the document using FAISS\n",
+ "faiss_index = FAISS.from_documents(pages, VertexAIEmbeddings())"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "9d88830a-0681-4f0b-9ea1-b0a112d27091",
+ "metadata": {},
+ "source": [
+ "Define the user query, which will also be converted to embeddings"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "f394026e-f70e-4528-9f72-b35b87f1af44",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "query = 'What evidence is there that chimpanzees go through menopause'"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "5d8aab69-2042-4b75-8f5d-f06449daf063",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "docs = faiss_index.similarity_search(query, k=5)\n",
+ "docs[0]"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "aecee51f-2305-4906-bba4-a362ee1c742d",
+ "metadata": {},
+ "source": [
+ "Now we have summaries of our query based on the article. Now we need to pass the summaries to our LLM and generate a single summary. "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "a6bc812d-bb3d-4d2a-97fc-344b7c120c4e",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "doc_prompt = PromptTemplate.from_template(\"{page_content}\")\n",
+ "\n",
+ "chain = (\n",
+ " {\n",
+ " \"content\": lambda docs: \"\\n\\n\".join(\n",
+ " format_document(doc, doc_prompt) for doc in docs\n",
+ " )\n",
+ " }\n",
+ " | PromptTemplate.from_template(\"Summarize the following content in around 200 words:\\n\\n{content}\")\n",
+ " | VertexAI()\n",
+ " | StrOutputParser()\n",
+ ")"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "79b92df8-33db-45e6-aa67-85e2d8f18f54",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "print(chain.invoke(docs))"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "412cff1c-f70d-4cc6-95ea-047f882de6ec",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "doc_prompt = PromptTemplate.from_template(\"{page_content}\")\n",
+ "\n",
+ "chain = (\n",
+ " {\n",
+ " \"content\": lambda docs: \"\\n\\n\".join(\n",
+ " format_document(doc, doc_prompt) for doc in docs\n",
+ " )\n",
+ " }\n",
+ " | PromptTemplate.from_template(\"Summarize the following content:\\n\\n{content}\")\n",
+ " | VertexAI()\n",
+ " | StrOutputParser()\n",
+ ")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "e7596948-c752-42f5-a4f7-6d96bf93e20b",
+ "metadata": {},
+ "source": [
+ "Here are a few example prompts, try runnning them in the template and chain below"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "e187e188-07f1-4d67-a958-8b7080d725e6",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "prompt_str = \"Instructions: You need to summarize text from several documents. \\\n",
+ " Be professional, factual, and succinct in the response. \\\n",
+ " Your answer is ONLY based on information in the documents above. \\\n",
+ " If you can not answer the question, answer \\\n",
+ " I am sorry, I am unable to answer the question based on the information provided \\\n",
+ " ONLY use information that is based on the documents. \\\n",
+ " \\\n",
+ " Document number: \\\n",
+ " Documents: {content}\""
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "193f50a7-f5bb-4384-8326-1074750bcb70",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "prompt_str = \"Instructions: You are about to receive text from several documents. \\\n",
+ " Based on the documents, give me five ideas for follow up studies that could be conducted. \\\n",
+ " Be professional, factual, and succinct in your response. \\\n",
+ " Your answer is ONLY based on information in the documents above. \\\n",
+ " If you can not answer the question, answer \\\n",
+ " I am sorry, I am unable to answer the question based on the information provided \\\n",
+ " ONLY use information that is based on the documents. \\\n",
+ " \\\n",
+ " Documents: {content}\""
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "7d1b8228-5af6-400e-b2aa-4448d3334241",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "prompt_str = \"Instructions: You are about to receive text from several documents. \\\n",
+ " Based on the documents, describe to me what materials would be needed to recreate the study in question. \\\n",
+ " Be professional, factual, and succinct in your response. \\\n",
+ " Your answer is ONLY based on information in the documents above. \\\n",
+ " If you can not answer the question, answer \\\n",
+ " I am sorry, I am unable to answer the question based on the information provided \\\n",
+ " ONLY use information that is based on the documents. \\\n",
+ " \\\n",
+ " Documents: {content}\""
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "8f2d6717-d9e2-4365-9607-5b17afd65731",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "doc_prompt = PromptTemplate.from_template(\"{page_content}\")\n",
+ "\n",
+ "chain = (\n",
+ " {\n",
+ " \"content\": lambda docs: \"\\n\\n\".join(\n",
+ " format_document(doc, doc_prompt) for doc in docs\n",
+ " )\n",
+ " }\n",
+ " | PromptTemplate.from_template(prompt_str) \n",
+ " | VertexAI()\n",
+ " | StrOutputParser()\n",
+ ")"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "fde1f59f-4ca6-4ac1-a9d2-6cf9265f7e40",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "print(chain.invoke(docs))"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "6e8637cd-36d2-4946-b5f2-051e57a284ea",
+ "metadata": {},
+ "source": [
+ "### Deploy a local Model\n",
+ "If you want to avoid sending data over the internet, you can deploy a model to an endpoint following [these instructions](https://cloud.google.com/vertex-ai/docs/general/deployment)."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "8f114a47-90db-403f-8b32-0f40dc220877",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "#model garden\n",
+ "#https://cloud.google.com/vertex-ai/docs/general/deployment#what_happens_when_you_deploy_a_model\n",
+ "from langchain.llms import VertexAIModelGarden"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "2961cf2b-71d9-4ecf-a50b-8a748ba8291c",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "llm = VertexAIModelGarden(\n",
+ " project=\"YOUR PROJECT ID\",\n",
+ " endpoint_id=\"YOUR ENDPOINT ID\"\n",
+ ")"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "05e6c4c7-e824-417a-a41a-ddd37cb4e393",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "print(llm(\"What are the greatest questions left to answer in biomedical research?\"))"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "58dd7731-fc36-42e9-b387-5354f7b133b9",
+ "metadata": {},
+ "source": [
+ "You can repeat any of the methods shown above, but using the locally deployed LLM."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "1c9acefe-3fa7-4408-a0de-94c370d0560b",
+ "metadata": {},
+ "source": [
+ "## Generate Code"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "0db1c313-b0e0-4ff2-8496-aaa9ba8e8891",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "llm = VertexAI(model_name=\"code-bison\", max_output_tokens=1000, temperature=0.3)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "7c527588-a71c-4719-8856-068b2bc3e7ef",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "question = \"Write a python function that checks if a string is a valid email address\""
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "e591a4d2-dabf-4d2c-a2a6-cd97b882a758",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "print(llm(question))"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "3541e5ad-faa4-423c-ba09-ce49a7e10f7e",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "question = \"Write a Nextflow module from nf-core to run bwa\""
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "e33ba6a9-ff54-4f58-895e-cc7b8ae9b983",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "print(llm(question))"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "682802fa-1de6-479a-a219-e6b784e74a5c",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "question = \"Write a Snakemake module from nf-core to run bwa\""
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "07198a26-2595-44c7-897a-7b7b9dfcd8d3",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "print(llm(question))"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "8bc7c91c-0403-4568-9e6b-1d1767e905d3",
+ "metadata": {},
+ "outputs": [],
+ "source": []
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": "Python (Local)",
+ "language": "python",
+ "name": "local-base"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.10.12"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/tutorials/notebooks/GoogleBatch/.gitkeep b/tutorials/notebooks/GoogleBatch/.gitkeep
new file mode 100644
index 0000000..8b13789
--- /dev/null
+++ b/tutorials/notebooks/GoogleBatch/.gitkeep
@@ -0,0 +1 @@
+
diff --git a/tutorials/notebooks/GoogleBatch/nextflow/Part1_GBatch_Nextflow.ipynb b/tutorials/notebooks/GoogleBatch/nextflow/Part1_GBatch_Nextflow.ipynb
new file mode 100644
index 0000000..a2c1d39
--- /dev/null
+++ b/tutorials/notebooks/GoogleBatch/nextflow/Part1_GBatch_Nextflow.ipynb
@@ -0,0 +1,686 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "id": "d5cdaee0-cc9d-430a-8d95-6af15b2534a8",
+ "metadata": {},
+ "source": [
+ "# Use Nextflow to run workflows using the Google Batch Part I\n",
+ "\n",
+ "\n",
+ "__What is Google Batch?__ \n",
+ "Batch allows you to schedule, queue, and execute batch processing workloads on a VM instances. Batch provisions resources and manages capacity on your behalf, allowing your batch workloads to run at scale. \n",
+ "\n",
+ "__How does Batch differ from Cloud Life Sciences?__ \n",
+ "You don't need to configure and manage third-party job schedulers, provision and deprovision resources, or request resources one zone at a time. To run a job, you specify parameters for the resources required for your workload, then Batch obtains resources and queues the job for execution. Batch provides native integration with other Google Cloud services to aid in the scheduling, execution, storage, and analysis of batch jobs.\n",
+ "\n",
+ " Warning: Google Life Sciences API is depreciated and will no longer be available on GCP by July 8, 2025.
\n",
+ "\n",
+ "Here we are going to walk through submitting simple jobs directly to Google Batch, then dive into interacting with Google Batch using Nextflow. We will run some basic Hello World jobs, then move to a more complex [nf-core Methylseq workflow](https://nf-co.re/methylseq). "
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "0f8f4b85-9459-497d-97ec-5909e8aeacae",
+ "metadata": {
+ "id": "0f8f4b85-9459-497d-97ec-5909e8aeacae",
+ "tags": []
+ },
+ "source": [
+ "## 1. Setup your environment"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "f2e4a5ca-8a2b-4156-b83e-c89f0c1ffc9c",
+ "metadata": {
+ "id": "f2e4a5ca-8a2b-4156-b83e-c89f0c1ffc9c"
+ },
+ "source": [
+ "### Create a bucket"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "73c79eb1-6010-4d8a-8725-b92144bab944",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "#make sure you change this name, it needs to be globally unique\n",
+ "%env BUCKET=gbatch-nextflow"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "44d17e57-86e8-4fce-83fe-3c33c7db9dc8",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "#will only create the bucket if it doesn't yet exist\n",
+ "! gsutil ls gs://$BUCKET >& /dev/null || gsutil mb gs://$BUCKET"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "553761fd-4ce3-4dda-8319-a10cb9cd5314",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "#set versioning on the bucket so it can overwrite old files\n",
+ "! gsutil versioning set on gs://$BUCKET"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "f5d588a5-83b2-42ef-a65f-64b2c80bca3f",
+ "metadata": {},
+ "source": [
+ "### Install dependencies"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "2acefde5-3f8a-42cb-aa12-46396eaae644",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "#First install java\n",
+ "!sudo apt update\n",
+ "!sudo apt-get install default-jdk -y\n",
+ "!java -version"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "3d8538e0-49a3-4e61-abf3-a08e1b397fcf",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "#Specify nexflow version and platfrom\n",
+ "! export NXF_VER=21.10.0\n",
+ "! export NXF_MODE=google\n",
+ "#Install nexflow, make it exceutable, and update it\n",
+ "! curl https://get.nextflow.io | bash\n",
+ "! chmod +x nextflow\n",
+ "! ./nextflow self-update"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "07d1891a-8338-4592-a3a0-eaab55cd8df0",
+ "metadata": {},
+ "source": [
+ "### Ensure you have APIs enabled and IAM permissions\n",
+ "Make sure that Batch, Compute Engine, and Cloud Storage APIs are all enabled.\n",
+ "\n",
+ "You also want to make sure your Compute Engine Default Service Account has the following Roles:\n",
+ "\n",
+ "- Service Account User\n",
+ "- Batch Agent Reporter \n",
+ "- Storage Admin\n",
+ "- Storage Object Admin\n",
+ "- Batch Job Editor \n",
+ "\n",
+ "Your Service Account should already have these roles assigned, but if not, reach out to Support to have your account updated."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "a73b5bf4-3e68-44c2-9874-02c637e730bf",
+ "metadata": {},
+ "source": [
+ "## 2. Submit Hello World to Batch Directly"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "33f6045f-3336-46ae-917c-6528b4c0c0db",
+ "metadata": {
+ "tags": []
+ },
+ "source": [
+ "#### 2.1 Submitting a job through the command line"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "10c12fe3-0635-4e38-8153-51b60ff287ef",
+ "metadata": {},
+ "source": [
+ "To submit a batch job through the command line you first need to create a __json__ file this is your config file. You can use the below hello world script as a template for your batch job. We will name the file hello-world.json.\n",
+ "\n",
+ "```\n",
+ "{\n",
+ " \"taskGroups\": [\n",
+ " {\n",
+ " \"taskSpec\": {\n",
+ " \"runnables\": [\n",
+ " {\n",
+ " \"container\": {\n",
+ " \"imageUri\": \"gcr.io/google-containers/busybox\",\n",
+ " \"entrypoint\": \"/bin/sh\",\n",
+ " \"commands\": [\n",
+ " \"-c\",\n",
+ " \"echo Hello world! This is task ${BATCH_TASK_INDEX}. This job has a total of ${BATCH_TASK_COUNT} tasks >> /mnt/disks/gbatch-nextflow/hello-world.txt\"\n",
+ " ]\n",
+ " }\n",
+ " }\n",
+ " ]\n",
+ " \"volumes\": [\n",
+ " {\n",
+ " \"gcs\": {\n",
+ " \"remotePath\": \"gbatch-nextflow\"\n",
+ " },\n",
+ " \"mountPath\": \"/mnt/disks/gbatch-nextflow\"\n",
+ " }\n",
+ " ],\n",
+ " \n",
+ " \"computeResource\": {\n",
+ " \"cpuMilli\": 2000,\n",
+ " \"memoryMib\": 16\n",
+ " },\n",
+ " \"maxRetryCount\": 2,\n",
+ " \"maxRunDuration\": \"3600s\"\n",
+ " },\n",
+ " \"taskCount\": 4,\n",
+ " \"parallelism\": 2\n",
+ " }\n",
+ " ],\n",
+ " \"allocationPolicy\": {\n",
+ " \"instances\": [\n",
+ " {\n",
+ " \"policy\": { \"machineType\": \"e2-standard-4\" }\n",
+ " }\n",
+ " ]\n",
+ " },\n",
+ " \"labels\": {\n",
+ " \"department\": \"finance\",\n",
+ " \"env\": \"testing\"\n",
+ " },\n",
+ " \"logsPolicy\": {\n",
+ " \"destination\": \"CLOUD_LOGGING\"\n",
+ " }\n",
+ "}\n",
+ "```"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "be843007-486a-433d-bdaf-91aa2168c03d",
+ "metadata": {},
+ "source": [
+ "Let break down the script:\n",
+ "- Our image and commands are specified in the block labeled \"container\", imageURI being the theimage busybox and our commands being to echo Hello World.\n",
+ " - You will notice that in the command line we have mounted our bucket this is so our output file hello-world.txt is stored into our bucket __(do not forget to change the mount path to your bucket name)__\n",
+ " - As you noticed there are some variables that we have added these are universal variables that Google has created that dont need to be defined beforehand, they show which task the job is working on presently and how many tasks in total this job has.\n",
+ "- Under the 'volume' block this is where we are specifying our Google bucket and the path we are using to mount or join our bucket to our container. __(do not forget to change the mountPath and remotePath to your bucket name)__ \n",
+ "- 'compute Resources' is where we define how long the script should run, how many tasks it should have and how many of thoes taks should be run in parallel at a time.\n",
+ "- Under 'instances' in our script is where we can specify our machine type.\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "67b76fb0-1b4f-4b43-a27f-6046c3052858",
+ "metadata": {},
+ "source": [
+ "Now we can submit our job specifing title of the job (hello-world) the location (us-central1) and the location of our json file"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "1bec6a71-f279-41c1-8965-882612a4095c",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "!gcloud batch jobs submit hello-world \\\n",
+ " --location us-central1 \\\n",
+ " --config ~/hello-world.json"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "1b8f7dfd-5a6a-4d04-8325-88e4210cb2c3",
+ "metadata": {
+ "id": "3cb5bd4b-032a-47f0-bee4-299a547c3b48",
+ "outputId": "b0e740fa-dabc-4d45-c95b-15f72d32bffa",
+ "tags": []
+ },
+ "source": [
+ "#### 2.2 Submitting a job through the console"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "9cfdaa37-a0d5-44b3-ad4d-dffcd45801f4",
+ "metadata": {},
+ "source": [
+ "Running a batch job through the console allows for a user-friendly view to input data and scripts and view the status of the jobs you created.\n",
+ "\n",
+ "Start by searching __'Batch'__ in the console search bar you should see a similar setting like this \n",
+ " \n",
+ "\n",
+ "Near the upper left corner click "
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "82c93eae-5424-4d77-9757-9d6cb342986c",
+ "metadata": {},
+ "source": [
+ "The follow should appear on the screen\n",
+ " \n",
+ " \n",
+ " \n",
+ " This is where you can:\n",
+ " - Label your job\n",
+ " - Select a region and zone to excecute your job\n",
+ " - Select your machine type (e.g. e2-medium)\n",
+ " - Specify tasks by adding a script and/or specifiying a container to run the task in\n",
+ " - Allocating resources for each task\n",
+ " - Add storage volume"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "c812865a-6ca1-4900-b93f-1239249d952d",
+ "metadata": {},
+ "source": [
+ "Once you have entered the settings for your batch job you can even view the full script that you would submit through the command line by clicking __'EQUIVALENT COMMAND LINE'__ next to __'CREATE'__. Delete the script that is already there and paste the script we had above.\n",
+ "\n",
+ " "
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "5a8d633d-88d8-4db5-b834-1ce06e8cd91d",
+ "metadata": {},
+ "source": [
+ "Once you run your job by clicking __'CREATE'__. \n",
+ "By clicking the job name you can view more information of the jobs setting, resources applied, and logs by clicking "
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "f4892a16-f4d9-4db9-a171-6e9245df2a72",
+ "metadata": {
+ "id": "f4892a16-f4d9-4db9-a171-6e9245df2a72",
+ "tags": []
+ },
+ "source": [
+ "### Check job status\n",
+ "\n",
+ "You can view the status of your job by looking at the __'Job List'__ in the Google Console. Here you will see your job name, status, region, memory per task, machine type, date started and run time.\n",
+ "\n",
+ " \n",
+ " \n",
+ " \n",
+ "To check the job status via the command line enter the following changing the job name and location."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "411b0153-aed6-4694-b54d-af6e80db5726",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "!gcloud batch jobs describe hello-world \\\n",
+ " --location=LOCATION"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "9f056585-6c10-41b6-b7b6-0c75bebed811",
+ "metadata": {
+ "id": "9f056585-6c10-41b6-b7b6-0c75bebed811"
+ },
+ "source": [
+ "### View your output"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "a86e2e14-8efe-4a36-8a5a-9d43407653c1",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "! gsutil ls gs://$BUCKET/"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "02faf944-0143-49c7-bf4c-6b8e377fcd81",
+ "metadata": {
+ "id": "02faf944-0143-49c7-bf4c-6b8e377fcd81",
+ "outputId": "251ad4db-dcea-4d72-ff9a-01b3080acc8e"
+ },
+ "outputs": [],
+ "source": [
+ "! gsutil cp gs://$BUCKET/hello_world.txt .\n",
+ "! cat hello_world.txt"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "33a142e0-bd9a-405d-91f9-827503ff5fb1",
+ "metadata": {
+ "id": "33a142e0-bd9a-405d-91f9-827503ff5fb1"
+ },
+ "source": [
+ "## 3. Run Nextflow Locally"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "2457d31d-d8b7-42f1-a0be-0d88c95d4fc3",
+ "metadata": {},
+ "source": [
+ "### Nextflow 101"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "b709c718-96d0-4925-99dd-525a7e7b6c76",
+ "metadata": {
+ "id": "b709c718-96d0-4925-99dd-525a7e7b6c76"
+ },
+ "source": [
+ "Nextflow interacts with many different files to have a proper working workflow:\n",
+ "\n",
+ "- __Main file__: The main file is a .nf file that holds the processes and channels describing the input, output, a shell script of your commands, workflow which acts like a recipe book for nextflow, and/or conditions. For snakemake users this is equivalent to 'rules'.\n",
+ " - __Process__: Contains channels and scripts that can be executed in a Linux server like bash commands.\n",
+ " - __Channel__: Produces ways through which processes communicate to each other for example input and output are channels of value that point the process to where data is or should be located.\n",
+ "- __Config file__: The .config file contains parameters, and multiple profiles. Each profile can contain a different executor type (e.g. LS API, conda, docker, etc.), memory or machine type, output directory, working directory and more!\n",
+ "- __Docker file__: Contains dependencies and enviroments that is needed for the nextflow workflow to run.\n",
+ "- __Schema file__: Schmema files are optional and are structured json files that contain information about the usage and commands that your workflow will excecute.You might have seen this when you run a command along with the flag '--help'.\n",
+ " "
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "9bea3004-ff40-4918-ac16-83aad9427ad7",
+ "metadata": {
+ "id": "9bea3004-ff40-4918-ac16-83aad9427ad7"
+ },
+ "source": [
+ "### Run a nextflow 'Hello World' process locally"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "4715ef92-e3a6-44cf-9b1e-50f247dd0daf",
+ "metadata": {
+ "id": "4715ef92-e3a6-44cf-9b1e-50f247dd0daf"
+ },
+ "source": [
+ "We are going to first run Hello World locally using the config file called hello.nf. \n",
+ "\n",
+ "It should look like this:\n",
+ "\n",
+ "```\n",
+ "#!/usr/bin/env nextflow\n",
+ "nextflow.enable.dsl=2 \n",
+ "\n",
+ "params.str = 'Hello World'\n",
+ "\n",
+ "process sayHello {\n",
+ " input:\n",
+ " val str\n",
+ "\n",
+ " output:\n",
+ " stdout\n",
+ "\n",
+ " \"\"\"\n",
+ " echo $str > hello.txt\n",
+ " cat hello.txt\n",
+ " \"\"\"\n",
+ "}\n",
+ "workflow {\n",
+ " sayHello(params.str) | view\n",
+ "}\n",
+ "```"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "6efad386-185b-4faf-be39-6c5a3f84ffe4",
+ "metadata": {
+ "id": "6efad386-185b-4faf-be39-6c5a3f84ffe4",
+ "outputId": "9554903e-f8d5-43fa-ffe9-f00ce836bf2d"
+ },
+ "outputs": [],
+ "source": [
+ "! ./nextflow run hello.nf --str 'Hello!'"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "7619875d-7f10-4699-b4d2-120d5d7d4cd7",
+ "metadata": {
+ "id": "7619875d-7f10-4699-b4d2-120d5d7d4cd7",
+ "tags": []
+ },
+ "source": [
+ "## 4. Submit Nextflow Job to the Google Batch\n",
+ "Create and modify your own config file to include a 'gbatch' profile block to tell Nextflow to submit the job to Google Batch instead of running locally"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "ec7abe9b-dca1-4ef6-87d6-39fcdd2e3c9b",
+ "metadata": {
+ "id": "ec7abe9b-dca1-4ef6-87d6-39fcdd2e3c9b"
+ },
+ "source": [
+ "The config file allows nextflow to utilize excecuters like Google Batch. In this tutorial the config files is named __'nextflow.config'__. Make sure you open this file and update the `` that are account specific.\n",
+ "- Make sure that your region is a region included in the Google Batch!\n",
+ "- Specify your working directory bucket and output directory bucket\n",
+ "- Specify the machine type you would like to use, ensuring that there is enough memory and cpus for the workflow\n",
+ " - Otherwise Google Batch will automatically use 1 CPU\n",
+ "\n",
+ "```\n",
+ "profiles{\n",
+ " gbatch{\n",
+ " process.executor = 'google-batch'\n",
+ " workDir = 'gs:///methyl-seq'\n",
+ " google.location = 'us-central1'\n",
+ " google.region = 'us-central1'\n",
+ " google.project = ''\n",
+ " params.outdir = 'gs://methyl-seq/outdir'\n",
+ " process.machineType = 'c2-standard-30'\n",
+ " }\n",
+ "}\n",
+ "```\n",
+ "\n",
+ "__Note:__ Make sure your working directory and output directory are different! Google Batch creates temporary file in the working directory within your bucket that do take up space so once your pipeline has completed succesfully feel free to delete the temporary files."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "340f7300-449a-4a12-bbc5-073547d58cac",
+ "metadata": {
+ "id": "340f7300-449a-4a12-bbc5-073547d58cac",
+ "tags": []
+ },
+ "source": [
+ "### Optional: Listing nf-core tools with docker and viewing their commands\n",
+ "Using the command below you can see all the tools that nfcore holds and their versions/lastes releases"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "ca1ff164-cee2-446e-ab2e-a3ed984e0dc0",
+ "metadata": {
+ "id": "ca1ff164-cee2-446e-ab2e-a3ed984e0dc0",
+ "outputId": "0530644a-dd9a-4077-dbc8-d1e335788a01",
+ "scrolled": true,
+ "tags": []
+ },
+ "outputs": [],
+ "source": [
+ "! docker run nfcore/tools list"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "9e46373c-61d0-4c91-b001-e55568d9fa2d",
+ "metadata": {
+ "id": "9e46373c-61d0-4c91-b001-e55568d9fa2d"
+ },
+ "source": [
+ "You can view commands for methylseq (or any other specified nf-core tool) by using the [--help] flag"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "05ea2893-60b3-4934-ae86-b07d4bc59728",
+ "metadata": {
+ "id": "05ea2893-60b3-4934-ae86-b07d4bc59728",
+ "outputId": "1e6de26f-0433-4bbd-8a43-119097bb1f41",
+ "scrolled": true,
+ "tags": []
+ },
+ "outputs": [],
+ "source": [
+ "! ./nextflow run nf-core/methylseq -r 1.6.1 --help"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "b4dbef59-d619-4444-8870-18c1f0ba3b5c",
+ "metadata": {
+ "id": "b4dbef59-d619-4444-8870-18c1f0ba3b5c",
+ "tags": []
+ },
+ "source": [
+ "### Run Methylseq with the test profile"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "7238bd3e-1853-42c3-9d2d-c72e46975ff2",
+ "metadata": {
+ "id": "7238bd3e-1853-42c3-9d2d-c72e46975ff2"
+ },
+ "source": [
+ "The 'test' profile uses a small dataset allowing you to ensure the workflow works with your config file without long runtimes. Ensure you include:\n",
+ "- Version of the nf-core tool [-r]\n",
+ "- Location of the config file [-c]"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "4b21f170-37fa-4fbc-ab83-3f6b4d386ef9",
+ "metadata": {
+ "id": "4b21f170-37fa-4fbc-ab83-3f6b4d386ef9",
+ "outputId": "0507c847-7f83-40af-ebf0-a1fdef27499b",
+ "scrolled": true,
+ "tags": []
+ },
+ "outputs": [],
+ "source": [
+ "! ./nextflow run nf-core/methylseq -r 1.6.1 -profile test,gbatch -c nextflow-methyseq.config"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "e386ccb3-aa6d-4a77-8d7d-c20ed0419f84",
+ "metadata": {
+ "id": "e386ccb3-aa6d-4a77-8d7d-c20ed0419f84"
+ },
+ "source": [
+ "You will notice in the above that to the left of the process within the __[ ]__ is actually a __tag__ you can search in Google Batch and the text before the __/__ corresponds to the __temporary directories__ within your working directory. Feel free to delete the temporary directories once your workflow has succesfully completed.\n",
+ "\n",
+ "Congrats! You are done with Part I. If you want to keep going and learn how to use the Methylseq workflow with real data, then move to Part II. If not, then feel free to clean up your resources. "
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "26d5ad01-cc27-4407-8434-b88d324b9e2c",
+ "metadata": {},
+ "source": [
+ "## 5. Troubleshooting"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "c79edcad-7178-4219-b097-e9b21b98bcc7",
+ "metadata": {},
+ "source": [
+ "Some of the nf-core tools require extra parameters:\n",
+ "- If you receive a error of __'quota exceeded'__ error you can increase your boot disk size to the gbatch profile within your config file using the __google.batch.bootDiskSize__ parameter (e.g., google.batch.bootDiskSize = 100.GB)\n",
+ "- Some errors show that a tool could not be used, was not installed, or gives a error that doesn't really explain the reason for why the process stopped you can try to increase the process time on your profile by using the __process.time parameter__ (e.g., process.time = '2h')\n",
+ "- If you receive a error like below using the new release of Nextflow should fix this v23.04.0 or later\n",
+ "```\n",
+ "Caused by:\n",
+ " Task java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@49b3a025[Not completed, task = java.util.concurrent.Executors$RunnableAdapter@2e0ceb8c[Wrapped task = TrustedListenableFutureTask@25c1396d[status=PENDING, info=[task=[running=[NOT STARTED YET], com.google.api.gax.rpc.AttemptCallable@2db57b9a]]]]] rejected from java.util.concurrent.ScheduledThreadPoolExecutor@aa6214[Terminated, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 0]\n",
+ "\n",
+ "```\n",
+ "- adding the __-log parameter__ on the command line will help produce a log file that will help to troubleshoot other errors like so: \n",
+ "`./nextflow -log DIRECTORY_NAME/nextflow.log run `"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "f7bf5cba-995d-4404-94d1-9bc9c4a04482",
+ "metadata": {},
+ "source": [
+ "## 6. Clean up\n",
+ "If you want to clean up all resources associated with this tutorial then \n",
+ "+ delete your bucket with `gsutil rm -r $BUCKET`\n",
+ "+ delete this VM in either Vertex AI or Compute Engine"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "c9345ecf-4545-4772-8815-1e7a595ac2ee",
+ "metadata": {},
+ "outputs": [],
+ "source": []
+ }
+ ],
+ "metadata": {
+ "colab": {
+ "name": "Workshop_2_updated.ipynb",
+ "provenance": []
+ },
+ "environment": {
+ "kernel": "python3",
+ "name": "common-cpu.m93",
+ "type": "gcloud",
+ "uri": "gcr.io/deeplearning-platform-release/base-cpu:m93"
+ },
+ "kernelspec": {
+ "display_name": "Python (Local)",
+ "language": "python",
+ "name": "local-base"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.7.12"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/tutorials/notebooks/GoogleBatch/nextflow/Part2_GBatch_Nextflow.ipynb b/tutorials/notebooks/GoogleBatch/nextflow/Part2_GBatch_Nextflow.ipynb
new file mode 100644
index 0000000..50768e1
--- /dev/null
+++ b/tutorials/notebooks/GoogleBatch/nextflow/Part2_GBatch_Nextflow.ipynb
@@ -0,0 +1,375 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "id": "5384eeed-e730-46e9-9e4d-7df526fb44ee",
+ "metadata": {},
+ "source": [
+ "# Use Nextflow to run workflows using the Cloud Google Batch Part II\n",
+ "Here we are going to build on Part I to download some real data using the SRA toolkit and then submit an nf-core Methyseq job to Google Batch."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "e30edde9-dcbf-4f74-9d2a-bd4d8c15c9a9",
+ "metadata": {},
+ "source": [
+ "## 1. Optional: Setup the environment\n",
+ "If you did not do part 1, then set up your environment. Otherwise, skip to the next section."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "c71c4052-607e-4827-8126-759a6871558c",
+ "metadata": {},
+ "source": [
+ "### Create a bucket"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "cf9e21b4-07d3-4e79-9c94-067b70e78ff6",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "#make sure you change this name, it needs to be globally unique\n",
+ "%env BUCKET=gbatch-api-nextflow"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "e47789f2-e38d-46ec-8750-0a235b0c4337",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "#will only create the bucket if it doesn't yet exist\n",
+ "! gsutil ls gs://$BUCKET >& /dev/null || gsutil mb gs://$BUCKET"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "5a4ccc7d-b258-410a-85fb-f261fa0dcade",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "#set versioning on the bucket so it can overwrite old files\n",
+ "! gsutil versioning set on gs://$BUCKET"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "00dc1fb0-784f-46c7-8905-817fa3fbccb1",
+ "metadata": {},
+ "source": [
+ "### Install mambaforge\n",
+ "You can also use the default installed conda, but mamba is so much faster! "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "0b97b0e6-d44c-40e4-a32a-33708e4ed596",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "! curl -L -O https://github.com/conda-forge/miniforge/releases/latest/download/Mambaforge-$(uname)-$(uname -m).sh\n",
+ "! bash Mambaforge-$(uname)-$(uname -m).sh -b -p $HOME/mambaforge"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "08e99d91-e356-456e-8cde-7f2819114ee2",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "#add to your path\n",
+ "import os\n",
+ "os.environ[\"PATH\"] += os.pathsep + os.environ[\"HOME\"]+\"/mambaforge/bin\""
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "80eb36f1-62c1-46f0-b297-6d8bc1d2e034",
+ "metadata": {},
+ "source": [
+ "### Install other dependencies "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "2619673e-9645-4460-aa07-323d01bcd9ba",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "#First install java\n",
+ "!sudo apt update\n",
+ "!sudo apt-get install default-jdk -y\n",
+ "!java -version"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "c24cb2d9-6260-45fc-a4c0-f55688c43e11",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "#Specify nexflow version and platfrom\n",
+ "! export NXF_VER=21.10.0\n",
+ "! export NXF_MODE=google\n",
+ "#Install nexflow, make it exceutable, and update it\n",
+ "! curl https://get.nextflow.io | bash\n",
+ "! chmod +x nextflow\n",
+ "! ./nextflow self-update"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "581f3b6a-b0b8-4309-a54b-ffff14450f41",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "#Install SRAtools to download data\n",
+ "! mamba install -c bioconda -c conda-forge sra-tools==2.11.0 -y"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "1bc477a9-7b3f-431e-93b7-50e96809bfc5",
+ "metadata": {},
+ "source": [
+ "## 2. Download data with SRA tools\n",
+ "If you want more work with SRA tools, check out our [SRA-focused notebook](https://github.com/STRIDES/NIHCloudLabGCP/blob/main/tutorials/notebooks/SRADownload/SRA-Download.ipynb)."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "facdbaea-d6ca-4270-9ddc-7c5d042b7373",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "#set up directory structure\n",
+ "!mkdir -p data data/fasterqdump"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "e0c5eb64-8628-4849-b077-6a3b23aaf934",
+ "metadata": {},
+ "source": [
+ "First bring in the compressed .sra file"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "b8ca1c0f-3844-42f8-a9e2-546178a6d961",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "%%time\n",
+ "! prefetch -O data/raw_fastq -f yes SRR067701 --location GCP -v "
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "6ea6f3fa-1ce5-4c1a-b077-a5cb539e33b2",
+ "metadata": {},
+ "source": [
+ "Now convert the compressed .sra file to fastq. It will take about two minutes, so be patient. "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "6d7c1f2f-6d52-4f2f-b308-62d42c95ec1b",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "%%time\n",
+ "! fasterq-dump -f -e 8 -m 24G SRR067701.sra"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "282c31da-0ce9-4e21-ab3e-07e550bc5ceb",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "#compress the fastq files\n",
+ "! gzip data/raw_fastq/SRR067701.fastq"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "805294b9-81cc-455c-9181-20fb048fb57c",
+ "metadata": {},
+ "source": [
+ "## 3. Run methylseq with Google Batch"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "b7c45f87-e0b6-43c5-aa55-3ee3e33978a3",
+ "metadata": {},
+ "source": [
+ "Ensure you include the following in your command:\n",
+ "- nf-core tool version [-r]\n",
+ "- Add fastq.gz file input [--input]\n",
+ "- Reference Genome [--genome] (no need to have it on hand nf-core uses iGenomes and will pull in the correct reference file)\n",
+ "- Confile file location [-c]\n",
+ "- Wanted profile [-profile]\n",
+ "- Other flags such as:\n",
+ " - If the fastq file is single-ended or not\n",
+ " - The max cpus and memory wanted\n",
+ "\n",
+ "You can recycle the nextflow.config from Part I. Since our fastq file is pretty big, it may take some time to finish."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "a24cab67-be41-4b2d-a545-7e14af554022",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "!./nextflow run nf-core/methylseq -r 1.6.1 \\\n",
+ " --input 'data/raw_fastq/SRR067701.fastq.gz' \\\n",
+ " --genome GRCh38 \\\n",
+ " --single_end \\\n",
+ " -c nextflow-methyseq.config \\\n",
+ " -profile gbatch \\\n",
+ " --max_cpus 32 \\\n",
+ " --max_memory '110.GB'"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "ab826f3b-391e-44fb-8197-e81fa8a9a614",
+ "metadata": {},
+ "source": [
+ "#### Check to see if files are in your output directory bucket\n",
+ "If you skipped part one, go run the first cell where you assign your bucket name to a variable. "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "b7484fdc-eabb-47d6-99ea-609a1574651b",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "!gsutil ls gs://$BUCKET/methyl-seq/outdir"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "438e8f65-e7fe-48e3-a615-e966b138c1e3",
+ "metadata": {},
+ "source": [
+ "__Optional__: View your MultiQC HTML file"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "15b74ca3-c7c7-499d-af53-ffaccc9e2157",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "!gsutil cp -r gs://$BUCKET/methyl-seq/outdir/MultiQC/multiqc_report.html ."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "31c89604-b9a4-4f77-aa0c-67273a161130",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "from IPython.display import IFrame\n",
+ "\n",
+ "IFrame(src='multiqc_report.html', width=900, height=600)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "fb9abdb6-6815-4d6f-89d5-6a291e2928df",
+ "metadata": {},
+ "source": [
+ "## 4. Troubleshooting"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "d44785d0-1e98-49b2-b894-79a731cab057",
+ "metadata": {},
+ "source": [
+ "Some of the nf-core tools require extra parameters:\n",
+ "- If you receive a error of __'quota exceeded'__ error you can increase your boot disk size to the gbatch profile within your config file using the __google.batch.bootDiskSize__ parameter (e.g., google.batch.bootDiskSize = 100.GB)\n",
+ "- Some errors show that a tool could not be used, was not installed, or gives a error that doesn't really explain the reason for why the process stopped you can try to increase the process time on your profile by using the __process.time parameter__ (e.g., process.time = '2h')\n",
+ "- If you receive a error like below using the new release of Nextflow should fix this v23.04.0 or later\n",
+ "```\n",
+ "Caused by:\n",
+ " Task java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@49b3a025[Not completed, task = java.util.concurrent.Executors$RunnableAdapter@2e0ceb8c[Wrapped task = TrustedListenableFutureTask@25c1396d[status=PENDING, info=[task=[running=[NOT STARTED YET], com.google.api.gax.rpc.AttemptCallable@2db57b9a]]]]] rejected from java.util.concurrent.ScheduledThreadPoolExecutor@aa6214[Terminated, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 0]\n",
+ "\n",
+ "```\n",
+ "- adding the __-log parameter__ on the command line will help produce a log file that will help to troubleshoot other errors like so: \n",
+ "`./nextflow -log DIRECTORY_NAME/nextflow.log run `"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "5a9d0590-c4dd-48de-822f-3de5b47af4e1",
+ "metadata": {},
+ "source": [
+ "## 5. Clean Up"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "4fc86196-86dc-4b8c-a86c-b4354e58d05f",
+ "metadata": {},
+ "source": [
+ "If you want to clean up all resources associated with this tutorial then \n",
+ "+ delete your bucket with `gsutil rm -r $BUCKET`\n",
+ "+ delete this VM in either Vertex AI or Compute Engine"
+ ]
+ }
+ ],
+ "metadata": {
+ "environment": {
+ "kernel": "python3",
+ "name": "common-cpu.m93",
+ "type": "gcloud",
+ "uri": "gcr.io/deeplearning-platform-release/base-cpu:m93"
+ },
+ "kernelspec": {
+ "display_name": "Python (Local)",
+ "language": "python",
+ "name": "local-base"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.7.12"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/tutorials/notebooks/LifeSciencesAPI/nextflow/Part1_LS_API_Nextflow.ipynb b/tutorials/notebooks/LifeSciencesAPI/nextflow/Part1_LS_API_Nextflow.ipynb
new file mode 100644
index 0000000..cc6a0c2
--- /dev/null
+++ b/tutorials/notebooks/LifeSciencesAPI/nextflow/Part1_LS_API_Nextflow.ipynb
@@ -0,0 +1,509 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "id": "d5cdaee0-cc9d-430a-8d95-6af15b2534a8",
+ "metadata": {},
+ "source": [
+ "# Use Nextflow to run workflows using the Cloud Life Sciences API Part I\n",
+ "Here we are going to walk through submitting simple jobs directly to the Life Sciences API, then dive into interacting with the API using Nextflow. We will run some basic Hello World jobs, then move to a more complex [nf-core Methylseq workflow](https://nf-co.re/methylseq). "
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "d5aa78f4-b8f7-4fbd-846c-09142ac36891",
+ "metadata": {},
+ "source": [
+ " Warning: Google Life Sciences API is depreciated and will no longer be avaible by July 8, 2025 on the platform. Please switch to the
Google Batch Nextflow tutorials .
"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "0f8f4b85-9459-497d-97ec-5909e8aeacae",
+ "metadata": {
+ "id": "0f8f4b85-9459-497d-97ec-5909e8aeacae",
+ "tags": []
+ },
+ "source": [
+ "## 1. Setup your environment"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "f2e4a5ca-8a2b-4156-b83e-c89f0c1ffc9c",
+ "metadata": {
+ "id": "f2e4a5ca-8a2b-4156-b83e-c89f0c1ffc9c"
+ },
+ "source": [
+ "### Create a bucket"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "73c79eb1-6010-4d8a-8725-b92144bab944",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "#make sure you change this name, it needs to be globally unique\n",
+ "%env BUCKET=gls-api-nextflow"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "44d17e57-86e8-4fce-83fe-3c33c7db9dc8",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "#will only create the bucket if it doesn't yet exist\n",
+ "! gsutil ls gs://$BUCKET >& /dev/null || gsutil mb gs://$BUCKET"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "553761fd-4ce3-4dda-8319-a10cb9cd5314",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "#set versioning on the bucket so it can overwrite old files\n",
+ "! gsutil versioning set on gs://$BUCKET"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "f5d588a5-83b2-42ef-a65f-64b2c80bca3f",
+ "metadata": {},
+ "source": [
+ "### Install dependencies"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "2acefde5-3f8a-42cb-aa12-46396eaae644",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "#First install java\n",
+ "!sudo apt update\n",
+ "!sudo apt-get install default-jdk -y\n",
+ "!java -version"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "3d8538e0-49a3-4e61-abf3-a08e1b397fcf",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "#Specify nexflow version and platfrom\n",
+ "! export NXF_VER=21.10.0\n",
+ "! export NXF_MODE=google\n",
+ "#Install nexflow, make it exceutable, and update it\n",
+ "! curl https://get.nextflow.io | bash\n",
+ "! chmod +x nextflow\n",
+ "! ./nextflow self-update"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "07d1891a-8338-4592-a3a0-eaab55cd8df0",
+ "metadata": {},
+ "source": [
+ "### Ensure you have APIs enabled and IAM permissions\n",
+ "Make sure that Cloud Life Sciences, Compute Engine, and Cloud Storage APIs are all enabled.\n",
+ "\n",
+ "You also want to make sure your Compute Engine Default Service Account has the following Roles:\n",
+ "\n",
+ " - lifesciences.workflowsRunner\n",
+ " - iam.serviceAccountUser\n",
+ " - serviceusage.serviceUsageConsumer\n",
+ " - storage.objectAdmin\n",
+ "Your Service Account should already have these roles assigned, but if not, reach out to Support to have your account updated."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "a73b5bf4-3e68-44c2-9874-02c637e730bf",
+ "metadata": {},
+ "source": [
+ "## 2. Submit Hello World to the API"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "3cb5bd4b-032a-47f0-bee4-299a547c3b48",
+ "metadata": {
+ "id": "3cb5bd4b-032a-47f0-bee4-299a547c3b48",
+ "outputId": "b0e740fa-dabc-4d45-c95b-15f72d32bffa"
+ },
+ "outputs": [],
+ "source": [
+ "! gcloud beta lifesciences pipelines run \\\n",
+ " --location us-central1 \\\n",
+ " --regions us-east1 \\\n",
+ " --logging gs://$BUCKET/hello_world.log \\\n",
+ " --command-line 'echo \"hello world!\"'"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "f4892a16-f4d9-4db9-a171-6e9245df2a72",
+ "metadata": {
+ "id": "f4892a16-f4d9-4db9-a171-6e9245df2a72",
+ "tags": []
+ },
+ "source": [
+ "### Check job status\n",
+ "To check the job status enter operation ID from the gcloud output\n",
+ "\n",
+ "Running [projects/PROJECT_ID/locations/LOCATION/operations/OPERATION_ID]\n",
+ "The output is kind of hard to parse, but it starts at the bottom, the top is the most recent action. If you have an error, it should be towards the top. Even for this simple job, it may take a few minutes to finish all operations, so keep checking until it says `done: true` at the top. "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "aa92ba73-13c8-4e90-9c41-61a6fb84bf71",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "#set your operation ID here\n",
+ "%env ID=10485099716669037373"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "9cba7c4e-4b8c-4e4c-80e4-8de1f11b5790",
+ "metadata": {
+ "id": "9cba7c4e-4b8c-4e4c-80e4-8de1f11b5790",
+ "outputId": "47886ae8-869f-46d3-a6fa-a2b56242be9b",
+ "scrolled": true,
+ "tags": []
+ },
+ "outputs": [],
+ "source": [
+ "!gcloud beta lifesciences operations describe $ID"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "9f056585-6c10-41b6-b7b6-0c75bebed811",
+ "metadata": {
+ "id": "9f056585-6c10-41b6-b7b6-0c75bebed811"
+ },
+ "source": [
+ "### View your output"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "a86e2e14-8efe-4a36-8a5a-9d43407653c1",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "! gsutil ls gs://$BUCKET/"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "02faf944-0143-49c7-bf4c-6b8e377fcd81",
+ "metadata": {
+ "id": "02faf944-0143-49c7-bf4c-6b8e377fcd81",
+ "outputId": "251ad4db-dcea-4d72-ff9a-01b3080acc8e"
+ },
+ "outputs": [],
+ "source": [
+ "! gsutil cp gs://$BUCKET/hello_world.log .\n",
+ "! cat hello_world.log"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "33a142e0-bd9a-405d-91f9-827503ff5fb1",
+ "metadata": {
+ "id": "33a142e0-bd9a-405d-91f9-827503ff5fb1"
+ },
+ "source": [
+ "## 3. Run Nextflow Locally"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "2457d31d-d8b7-42f1-a0be-0d88c95d4fc3",
+ "metadata": {},
+ "source": [
+ "### Nextflow 101"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "b709c718-96d0-4925-99dd-525a7e7b6c76",
+ "metadata": {
+ "id": "b709c718-96d0-4925-99dd-525a7e7b6c76"
+ },
+ "source": [
+ "Nextflow interacts with many different files to have a proper working workflow:\n",
+ "\n",
+ "- __Main file__: The main file is a .nf file that holds the processes and channels describing the input, output, a shell script of your commands, workflow which acts like a recipe book for nextflow, and/or conditions. For snakemake users this is equivalent to 'rules'.\n",
+ " - __Process__: Contains channels and scripts that can be executed in a Linux server like bash commands.\n",
+ " - __Channel__: Produces ways through which processes communicate to each other for example input and output are channels of value that point the process to where data is or should be located.\n",
+ "- __Config file__: The .config file contains parameters, and multiple profiles. Each profile can contain a different executor type (e.g. LS API, conda, docker, etc.), memory or machine type, output directory, working directory and more!\n",
+ "- __Docker file__: Contains dependencies and enviroments that is needed for the nextflow workflow to run.\n",
+ "- __Schema file__: Schmema files are optional and are structured json files that contain information about the usage and commands that your workflow will excecute.You might have seen this when you run a command along with the flag '--help'.\n",
+ " "
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "9bea3004-ff40-4918-ac16-83aad9427ad7",
+ "metadata": {
+ "id": "9bea3004-ff40-4918-ac16-83aad9427ad7"
+ },
+ "source": [
+ "### Run a nextflow 'Hello World' process locally"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "4715ef92-e3a6-44cf-9b1e-50f247dd0daf",
+ "metadata": {
+ "id": "4715ef92-e3a6-44cf-9b1e-50f247dd0daf"
+ },
+ "source": [
+ "We are going to first run Hello World locally using the config file called hello.nf. \n",
+ "\n",
+ "It should look like this:\n",
+ "\n",
+ "```\n",
+ "#!/usr/bin/env nextflow\n",
+ "nextflow.enable.dsl=2 \n",
+ "\n",
+ "params.str = 'Hello World'\n",
+ "\n",
+ "process sayHello {\n",
+ " input:\n",
+ " val str\n",
+ "\n",
+ " output:\n",
+ " stdout\n",
+ "\n",
+ " \"\"\"\n",
+ " echo $str > hello.txt\n",
+ " cat hello.txt\n",
+ " \"\"\"\n",
+ "}\n",
+ "workflow {\n",
+ " sayHello(params.str) | view\n",
+ "}\n",
+ "```"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "6efad386-185b-4faf-be39-6c5a3f84ffe4",
+ "metadata": {
+ "id": "6efad386-185b-4faf-be39-6c5a3f84ffe4",
+ "outputId": "9554903e-f8d5-43fa-ffe9-f00ce836bf2d"
+ },
+ "outputs": [],
+ "source": [
+ "! ./nextflow run hello.nf --str 'Hello!'"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "7619875d-7f10-4699-b4d2-120d5d7d4cd7",
+ "metadata": {
+ "id": "7619875d-7f10-4699-b4d2-120d5d7d4cd7",
+ "tags": []
+ },
+ "source": [
+ "## 4. Submit Nextflow Job to the Life Sciences API\n",
+ "Create and modify your own config file to include a 'gls' profile block to tell Nextflow to submit the job to the API instead of running locally"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "ec7abe9b-dca1-4ef6-87d6-39fcdd2e3c9b",
+ "metadata": {
+ "id": "ec7abe9b-dca1-4ef6-87d6-39fcdd2e3c9b"
+ },
+ "source": [
+ "The config file allows nextflow to utilize excecuters like Life Science API. In this tutorial the config files is named __'nextflow.config'__. Make sure you open this file and update the `` that are account specific.\n",
+ "- Make sure that your region is a region included in the LS API!\n",
+ "- Specify your working directory bucket and output directory bucket\n",
+ "- Specify the machine type you would like to use, ensuring that there is enough memory and cpus for the workflow\n",
+ " - Otherwise LS API will automatically use 1 CPU\n",
+ "\n",
+ "```\n",
+ "profiles{\n",
+ " gls{\n",
+ " process.executor = 'google-lifesciences'\n",
+ " workDir = 'gs:///methyl-seq'\n",
+ " google.location = 'us-central1'\n",
+ " google.region = 'us-central1'\n",
+ " google.project = ''\n",
+ " params.outdir = 'gs://methyl-seq/outdir'\n",
+ " process.machineType = 'c2-standard-30'\n",
+ " }\n",
+ "}\n",
+ "```\n",
+ "\n",
+ "__Note:__ Make sure your working directory and output directory are different! Life Sciences creates temporary file in the working directory within your bucket that do take up space so once your pipeline has completed succesfully feel free to delete the temporary files."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "340f7300-449a-4a12-bbc5-073547d58cac",
+ "metadata": {
+ "id": "340f7300-449a-4a12-bbc5-073547d58cac",
+ "tags": []
+ },
+ "source": [
+ "### Optional: Listing nf-core tools with docker and viewing their commands\n",
+ "Using the command below you can see all the tools that nfcore holds and their versions/lastes releases"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "ca1ff164-cee2-446e-ab2e-a3ed984e0dc0",
+ "metadata": {
+ "id": "ca1ff164-cee2-446e-ab2e-a3ed984e0dc0",
+ "outputId": "0530644a-dd9a-4077-dbc8-d1e335788a01",
+ "scrolled": true,
+ "tags": []
+ },
+ "outputs": [],
+ "source": [
+ "! docker run nfcore/tools list"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "9e46373c-61d0-4c91-b001-e55568d9fa2d",
+ "metadata": {
+ "id": "9e46373c-61d0-4c91-b001-e55568d9fa2d"
+ },
+ "source": [
+ "You can view commands for methylseq (or any other specified nf-core tool) by using the [--help] flag"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "05ea2893-60b3-4934-ae86-b07d4bc59728",
+ "metadata": {
+ "id": "05ea2893-60b3-4934-ae86-b07d4bc59728",
+ "outputId": "1e6de26f-0433-4bbd-8a43-119097bb1f41",
+ "scrolled": true,
+ "tags": []
+ },
+ "outputs": [],
+ "source": [
+ "! ./nextflow run nf-core/methylseq -r 1.6.1 --help"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "b4dbef59-d619-4444-8870-18c1f0ba3b5c",
+ "metadata": {
+ "id": "b4dbef59-d619-4444-8870-18c1f0ba3b5c",
+ "tags": []
+ },
+ "source": [
+ "### Run Methylseq with the test profile"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "7238bd3e-1853-42c3-9d2d-c72e46975ff2",
+ "metadata": {
+ "id": "7238bd3e-1853-42c3-9d2d-c72e46975ff2"
+ },
+ "source": [
+ "The 'test' profile uses a small dataset allowing you to ensure the workflow works with your config file without long runtimes. Ensure you include:\n",
+ "- Version of the nf-core tool [-r]\n",
+ "- Location of the config file [-c]"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "4b21f170-37fa-4fbc-ab83-3f6b4d386ef9",
+ "metadata": {
+ "id": "4b21f170-37fa-4fbc-ab83-3f6b4d386ef9",
+ "outputId": "0507c847-7f83-40af-ebf0-a1fdef27499b",
+ "scrolled": true,
+ "tags": []
+ },
+ "outputs": [],
+ "source": [
+ "! ./nextflow run nf-core/methylseq -r 1.6.1 -profile test,gls -c nextflow-methyseq.config"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "e386ccb3-aa6d-4a77-8d7d-c20ed0419f84",
+ "metadata": {
+ "id": "e386ccb3-aa6d-4a77-8d7d-c20ed0419f84"
+ },
+ "source": [
+ "You will notice in the above that to the left of the process within the __[ ]__ is actually a __tag__ you can search in Life Sciences and the text before the __/__ corresponds to the __temporary directories__ within your working directory. Feel free to delete the temporary directories once your workflow has succesfully completed.\n",
+ "\n",
+ "Congrats! You are done with Part I. If you want to keep going and learn how to use the Methylseq workflow with real data, then move to Part II. If not, then feel free to clean up your resources. "
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "f7bf5cba-995d-4404-94d1-9bc9c4a04482",
+ "metadata": {},
+ "source": [
+ "## 5. Clean up\n",
+ "If you want to clean up all resources associated with this tutorial then \n",
+ "+ delete your bucket with `gsutil rm -r $BUCKET`\n",
+ "+ delete this VM in either Vertex AI or Compute Engine"
+ ]
+ }
+ ],
+ "metadata": {
+ "colab": {
+ "name": "Workshop_2_updated.ipynb",
+ "provenance": []
+ },
+ "environment": {
+ "kernel": "python3",
+ "name": "common-cpu.m93",
+ "type": "gcloud",
+ "uri": "gcr.io/deeplearning-platform-release/base-cpu:m93"
+ },
+ "kernelspec": {
+ "display_name": "Python 3",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.7.12"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/tutorials/notebooks/LifeSciencesAPI/nextflow/Part2_LS_API_Nextflow.ipynb b/tutorials/notebooks/LifeSciencesAPI/nextflow/Part2_LS_API_Nextflow.ipynb
new file mode 100644
index 0000000..3a24206
--- /dev/null
+++ b/tutorials/notebooks/LifeSciencesAPI/nextflow/Part2_LS_API_Nextflow.ipynb
@@ -0,0 +1,357 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "id": "5384eeed-e730-46e9-9e4d-7df526fb44ee",
+ "metadata": {},
+ "source": [
+ "# Use Nextflow to run workflows using the Cloud Life Sciences API Part II\n",
+ "Here we are going to build on Part I to download some real data using the SRA toolkit and then submit an nf-core Methyseq job to the Cloud Life Sciences API."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "9eab068b-fed8-4f80-b80f-32d538cb41c7",
+ "metadata": {},
+ "source": [
+ " Warning: Google Life Sciences API is depreciated and will no longer be available by July 8, 2025 on the platform. Please switch to the
Google Batch Nextflow tutorials .
"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "e30edde9-dcbf-4f74-9d2a-bd4d8c15c9a9",
+ "metadata": {},
+ "source": [
+ "## 1. Optional: Setup the environment\n",
+ "If you did not do part 1, then set up your environment. Otherwise, skip to the next section."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "c71c4052-607e-4827-8126-759a6871558c",
+ "metadata": {},
+ "source": [
+ "### Create a bucket"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "cf9e21b4-07d3-4e79-9c94-067b70e78ff6",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "#make sure you change this name, it needs to be globally unique\n",
+ "%env BUCKET=gls-api-nextflow"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "e47789f2-e38d-46ec-8750-0a235b0c4337",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "#will only create the bucket if it doesn't yet exist\n",
+ "! gsutil ls gs://$BUCKET >& /dev/null || gsutil mb gs://$BUCKET"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "5a4ccc7d-b258-410a-85fb-f261fa0dcade",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "#set versioning on the bucket so it can overwrite old files\n",
+ "! gsutil versioning set on gs://$BUCKET"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "00dc1fb0-784f-46c7-8905-817fa3fbccb1",
+ "metadata": {},
+ "source": [
+ "### Install mambaforge\n",
+ "You can also use the default installed conda, but mamba is so much faster! "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "0b97b0e6-d44c-40e4-a32a-33708e4ed596",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "! curl -L -O https://github.com/conda-forge/miniforge/releases/latest/download/Mambaforge-$(uname)-$(uname -m).sh\n",
+ "! bash Mambaforge-$(uname)-$(uname -m).sh -b -p $HOME/mambaforge"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "08e99d91-e356-456e-8cde-7f2819114ee2",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "#add to your path\n",
+ "import os\n",
+ "os.environ[\"PATH\"] += os.pathsep + os.environ[\"HOME\"]+\"/mambaforge/bin\""
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "80eb36f1-62c1-46f0-b297-6d8bc1d2e034",
+ "metadata": {},
+ "source": [
+ "### Install other dependencies "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "2619673e-9645-4460-aa07-323d01bcd9ba",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "#First install java\n",
+ "!sudo apt update\n",
+ "!sudo apt-get install default-jdk -y\n",
+ "!java -version"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "c24cb2d9-6260-45fc-a4c0-f55688c43e11",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "#Specify nexflow version and platfrom\n",
+ "! export NXF_VER=21.10.0\n",
+ "! export NXF_MODE=google\n",
+ "#Install nexflow, make it exceutable, and update it\n",
+ "! curl https://get.nextflow.io | bash\n",
+ "! chmod +x nextflow\n",
+ "! ./nextflow self-update"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "581f3b6a-b0b8-4309-a54b-ffff14450f41",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "#Install SRAtools to download data\n",
+ "! mamba install -c bioconda -c conda-forge sra-tools==2.11.0 -y"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "1bc477a9-7b3f-431e-93b7-50e96809bfc5",
+ "metadata": {},
+ "source": [
+ "## 2. Download data with SRA tools\n",
+ "If you want more work with SRA tools, check out our [SRA-focused notebook](https://github.com/STRIDES/NIHCloudLabGCP/blob/main/tutorials/notebooks/SRADownload/SRA-Download.ipynb)."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "facdbaea-d6ca-4270-9ddc-7c5d042b7373",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "#set up directory structure\n",
+ "!mkdir -p data data/fasterqdump"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "e0c5eb64-8628-4849-b077-6a3b23aaf934",
+ "metadata": {},
+ "source": [
+ "First bring in the compressed .sra file"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "b8ca1c0f-3844-42f8-a9e2-546178a6d961",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "%%time\n",
+ "! prefetch -O data/raw_fastq -f yes SRR067701 --location GCP -v "
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "6ea6f3fa-1ce5-4c1a-b077-a5cb539e33b2",
+ "metadata": {},
+ "source": [
+ "Now convert the compressed .sra file to fastq. It will take about two minutes, so be patient. "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "6d7c1f2f-6d52-4f2f-b308-62d42c95ec1b",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "%%time\n",
+ "! fasterq-dump -f -e 8 -m 24G SRR067701.sra"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "282c31da-0ce9-4e21-ab3e-07e550bc5ceb",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "#compress the fastq files\n",
+ "! gzip data/raw_fastq/SRR067701.fastq"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "805294b9-81cc-455c-9181-20fb048fb57c",
+ "metadata": {},
+ "source": [
+ "## 3. Run methylseq with Life Sciences API"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "b7c45f87-e0b6-43c5-aa55-3ee3e33978a3",
+ "metadata": {},
+ "source": [
+ "Ensure you include the following in your command:\n",
+ "- nf-core tool version [-r]\n",
+ "- Add fastq.gz file input [--input]\n",
+ "- Reference Genome [--genome] (no need to have it on hand nf-core uses iGenomes and will pull in the correct reference file)\n",
+ "- Confile file location [-c]\n",
+ "- Wanted profile [-profile]\n",
+ "- Other flags such as:\n",
+ " - If the fastq file is single-ended or not\n",
+ " - The max cpus and memory wanted\n",
+ "\n",
+ "You can recycle the nextflow.config from Part I. Since our fastq file is pretty big, it may take some time to finish."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "a24cab67-be41-4b2d-a545-7e14af554022",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "!./nextflow run nf-core/methylseq -r 1.6.1 \\\n",
+ " --input 'data/raw_fastq/SRR067701.fastq.gz' \\\n",
+ " --genome GRCh38 \\\n",
+ " --single_end \\\n",
+ " -c nextflow-methyseq.config \\\n",
+ " -profile gls \\\n",
+ " --max_cpus 32 \\\n",
+ " --max_memory '110.GB'"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "ab826f3b-391e-44fb-8197-e81fa8a9a614",
+ "metadata": {},
+ "source": [
+ "#### Check to see if files are in your output directory bucket\n",
+ "If you skipped part one, go run the first cell where you assign your bucket name to a variable. "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "b7484fdc-eabb-47d6-99ea-609a1574651b",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "!gsutil ls gs://$BUCKET/methyl-seq/outdir"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "438e8f65-e7fe-48e3-a615-e966b138c1e3",
+ "metadata": {},
+ "source": [
+ "__Optional__: View your MultiQC HTML file"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "15b74ca3-c7c7-499d-af53-ffaccc9e2157",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "!gsutil cp -r gs://$BUCKET/methyl-seq/outdir/MultiQC/multiqc_report.html ."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "31c89604-b9a4-4f77-aa0c-67273a161130",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "from IPython.display import IFrame\n",
+ "\n",
+ "IFrame(src='multiqc_report.html', width=900, height=600)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "5a9d0590-c4dd-48de-822f-3de5b47af4e1",
+ "metadata": {},
+ "source": [
+ "## 4. Clean Up"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "4fc86196-86dc-4b8c-a86c-b4354e58d05f",
+ "metadata": {},
+ "source": [
+ "If you want to clean up all resources associated with this tutorial then \n",
+ "+ delete your bucket with `gsutil rm -r $BUCKET`\n",
+ "+ delete this VM in either Vertex AI or Compute Engine"
+ ]
+ }
+ ],
+ "metadata": {
+ "environment": {
+ "kernel": "python3",
+ "name": "common-cpu.m93",
+ "type": "gcloud",
+ "uri": "gcr.io/deeplearning-platform-release/base-cpu:m93"
+ },
+ "kernelspec": {
+ "display_name": "Python 3",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.7.12"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/tutorials/notebooks/LifeSciencesAPI/snakemake/LS_API_Snakemake.ipynb b/tutorials/notebooks/LifeSciencesAPI/snakemake/LS_API_Snakemake.ipynb
new file mode 100644
index 0000000..70d0ebf
--- /dev/null
+++ b/tutorials/notebooks/LifeSciencesAPI/snakemake/LS_API_Snakemake.ipynb
@@ -0,0 +1,547 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "id": "99cfa998-06b6-4b42-ae3a-b4e011750d31",
+ "metadata": {},
+ "source": [
+ "# RNA-Seq Analysis using Snakemake and Google Cloud Life Sciences API"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "4126cb07-34ee-4780-838f-872015a882b3",
+ "metadata": {},
+ "source": [
+ "## Overview"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "f15ea992-faa6-4705-8384-eb5d81f5daff",
+ "metadata": {},
+ "source": [
+ "This short tutorial demonstrates how to run an RNA-Seq workflow using a prokaryotic data set. Steps in the workflow include read trimming, read QC, read mapping, and counting mapped reads per gene to quantitate gene expression. This tutorial uses a popular workflow manager called [Snakemake](https://snakemake.readthedocs.io/en/stable/) run via the [Google Cloud Life Sciences API](https://cloud.google.com/life-sciences/docs/reference/rest).\n",
+ "\n",
+ " Warning: Google Life Sciences API is depreciated and has been replaced by Google Batch. Currently Snakemake only supports Google Life Sciences API which will no longer be available by July 8, 2025, visit
Snakemake Cloud Execution for updates and instruction for utilizing Google Batch.
"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "0b6d0785-2d13-476c-b16a-196f74ea277d",
+ "metadata": {},
+ "source": [
+ "### Step 1: Create a new GS Bucket to store input and output files\n",
+ "Note that your bucket has to be globally unique, so make sure you don't just copy the example here or it won't work"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "4d3dc88f-fa0c-4e7e-972b-055321d3cdbb",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "#change this bucket name\n",
+ "%env BUCKET="
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "17ce680c-4b8c-419c-a6c4-b6caec32d9ba",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "#will only create the bucket if it doesn't yet exist\n",
+ "! gsutil ls gs://$BUCKET >& /dev/null || gsutil mb gs://$BUCKET"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "91292c6d-d5a4-407d-9816-51ca52876fba",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "#set versioning on the bucket so it can overwrite old files\n",
+ "! gsutil versioning set on gs://$BUCKET"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "dd7ab630-955d-43d1-bc43-c7b3e701ed04",
+ "metadata": {
+ "tags": []
+ },
+ "source": [
+ "### STEP 2: Install mambaforge and snakemake\n",
+ "First install mambaforge, then use mamba to install snakemake."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "682ddf88-e1d9-443f-a423-e1f85ff604a2",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "! curl -L -O https://github.com/conda-forge/miniforge/releases/latest/download/Mambaforge-$(uname)-$(uname -m).sh\n",
+ "! bash Mambaforge-$(uname)-$(uname -m).sh -b -p $HOME/mambaforge"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "de5d0630-1d85-4625-bc04-036aae11ce4a",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "#add to your path\n",
+ "import os\n",
+ "os.environ[\"PATH\"] += os.pathsep + os.environ[\"HOME\"]+\"/mambaforge/bin\""
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "bd5c221b-45ce-47fb-a8e2-29ceee0e296a",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "#install snakemake\n",
+ "! mamba install -y -c conda-forge -c bioconda snakemake"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "bb0ce8d5-4b96-4e97-88ed-44e8e85f4fc0",
+ "metadata": {},
+ "source": [
+ "### STEP 3: Copy FASTQ Files\n",
+ "In order for this tutorial to run quickly, we will only analyze 50,000 reads from a sample from both sample groups instead of analyzing all the reads from all six samples. These files have been posted on a Google Storage Bucket that we made publicly accessible."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "5f3795fd-3e03-476d-9abf-49705a72cc15",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "! gsutil -m cp -r gs://nigms-sandbox/me-inbre-rnaseq-pipelinev2/data/raw_fastqSub/ gs://$BUCKET/data/raw_fastq"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "a2bc4563-951a-45d4-8f01-0accd6b80ea8",
+ "metadata": {},
+ "source": [
+ "Create a fake path to data/fastqc so that snakemake can write files to that bucket path, otherwise the pipeline crashes."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "cd2c03dd-2248-4068-8842-ba130f29adc9",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "! touch blank.txt\n",
+ "! gsutil cp blank.txt gs://$BUCKET/data/fastqc/"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "ec692c28-f549-43af-bbdf-3c4266fb59ae",
+ "metadata": {},
+ "source": [
+ "### STEP 4: Copy reference files that will be used by Salmon\n",
+ "Salmon is a tool that aligns RNA-Seq reads to a set of transcripts rather than the entire genome."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "290238d6-39e0-4575-87e4-880b316ca1f6",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "! gsutil -m cp -r gs://nigms-sandbox/me-inbre-rnaseq-pipelinev2/data/reference/M_chelonae_transcripts.fasta gs://$BUCKET/data/reference/M_chelonae_transcripts.fasta\n",
+ "! gsutil -m cp -r gs://nigms-sandbox/me-inbre-rnaseq-pipelinev2/data/reference/decoys.txt gs://$BUCKET/data/reference/decoys.txt\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "ea2d17cb-dff6-45d3-9aef-3ec6203508f6",
+ "metadata": {
+ "tags": []
+ },
+ "source": [
+ "### STEP 5: Copy data file for Trimmomatic"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "0be371d5-382e-4a22-a300-2c5249eff825",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "! gsutil -m cp gs://nigms-sandbox/me-inbre-rnaseq-pipelinev2/config/TruSeq3-PE.fa gs://$BUCKET/TruSeq3-PE.fa"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "5ac668db-7851-418e-9b1e-0a2c4abbab6e",
+ "metadata": {},
+ "source": [
+ "### STEP 6: Copy data and config files that will be used in our snakemake environment\n",
+ "\n",
+ "Next download config files for our snakemake environment."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "1dbc460c-50af-4458-8056-c0f6146fff23",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "! gsutil -m cp -r gs://nigms-sandbox/me-inbre-rnaseq-pipelinev2/envs/ .\n",
+ "! gsutil -m cp gs://nigms-sandbox/me-inbre-rnaseq-pipelinev2/config.yaml .\n",
+ "! gsutil -m cp gs://nigms-sandbox/me-inbre-rnaseq-pipelinev2/snakefile_ls_api ."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "ab3739b6-aa4e-439e-bd73-2ea43be1801b",
+ "metadata": {},
+ "source": [
+ "Add the bucket path to the end of your config file. Since this file was written for running snakemake locally we have to make a few edits to run on the LS API."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "737c0634-171e-489b-8cfb-e93a025cbd01",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "! echo 'bucket:' >> config.yaml"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "4fd4c713-8882-4833-a117-706a4b239374",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "! echo ' '$BUCKET >>config.yaml"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "e45fe428-b9fa-46cb-a69d-2a0e989292e1",
+ "metadata": {},
+ "source": [
+ "Add bucket path to the snakefile"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "3bba277a-704a-41ab-8853-7cf324dde727",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "! sed -i 's/print(SAMPLES)/BUCKET=config[\"bucket\"]/' snakefile_ls_api"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "cfc635b4-ed50-4a92-8018-f402fdd563b9",
+ "metadata": {},
+ "source": [
+ "### Step 7: Set up your local environment\n",
+ "You need to generate a [service account key](https://cloud.google.com/iam/docs/creating-managing-service-account-keys) for the compute engine default service account to interact with the Life Sciences API using Snakemake. Download the key and copy it to this VM. Then assign the path of the json file to an environment variable."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "0e93d6ca-5afe-4f12-9401-be44ce7ac7d9",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "%env GOOGLE_APPLICATION_CREDENTIALS=cloud_creds.json"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "e629d7a5-4ef7-408b-88ff-3c52201879a1",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import os\n",
+ "os.environ[\"GOOGLE_APPLICATION_CREDENTIALS\"]"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "a04afc6c-41cc-46c9-b048-08a7677699c8",
+ "metadata": {},
+ "source": [
+ "Set your project (make sure to replace $Project with your project name)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "10f587c6-9d51-4433-bf13-bec6dabcb9d8",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "!gcloud config set project $PROJECT"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "f7d3b561-7213-4b60-9f8f-c511c6fdc067",
+ "metadata": {},
+ "source": [
+ "Initialize a local git repo"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "6d56605b-04bd-48fa-94c0-ba7fffdf08a2",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "! git init"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "f56458ef-e475-4a48-b24b-e903f82eb996",
+ "metadata": {},
+ "source": [
+ "Configure conda"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "f72b8fce-be20-4158-8bd5-00b2ad122414",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "conda config --set channel_priority strict"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "ec2c0273-c7f1-4aee-bdf3-43d5773cf2fa",
+ "metadata": {},
+ "source": [
+ "### STEP 8: Run snakemake using the Life Sciences API\n",
+ "\n",
+ "Aside from the .yaml config files which information about software, dependencies, and versions -- snakemake uses a snakefile which contains information about a workflow.\n",
+ "\n",
+ "This can be a powerful tool as it allows one to operate and think in terms of workflows instead of individual steps. You should open the snakefile to look at it further. It is composed of 'rules' we have created. Snakefiles work largely based on inputs. For a given input/output, there is an associated 'rule' that runs. Snakefiles may take a while to get the idea of what's going on, but in simplest terms here we take an input of .fastq files, and based on the snakefile rules we created, those fastq files are run through the entire workflow. The rule_all at the top determines which rules are run based on the input files for rule_all (which are outputs from the target rules. Comment out rules you don't want to run. "
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "d8bf3f71-e394-41c6-9694-b5d4b24cb265",
+ "metadata": {},
+ "source": [
+ "Snakemake requires that you have a service account key to authenticate with the Life Sciences API. This actually is not necessary to use the API from within a notebook, but Snakemake does require it since Snakemake is expecting you to run the command from your own terminal using the SDK. To see all the commands you can run with Snakemake via the Life Sciences API, check out the [docs](https://snakemake.readthedocs.io/en/stable/executor_tutorial/google_lifesciences.html)."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "67c95e75-1619-4694-9411-95edc9f4cee4",
+ "metadata": {},
+ "source": [
+ "Now we can run the Life Sciences APi. You will see that each rule is submitted as a separate job. If the pipeline crashes, the way to troubleshoot is by reading the API logs, or the snakemake rule logs (same info). You can find the Life Sciences API logs by pasting in the gcloud command given in yellow.\n",
+ "\n",
+ "For example: \n",
+ "```\n",
+ "gcloud beta lifesciences operations describe \n",
+ "```\n",
+ "Or you can view the logs by finding the path given for logs, and then use gsutil to copy that file locally, or go to the bucket and double click the file. You can get the job ID for the output file in the green section of the rule print out."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "bee32318-33df-43b2-98bc-5eb091ceae59",
+ "metadata": {
+ "scrolled": true,
+ "tags": []
+ },
+ "outputs": [],
+ "source": [
+ "%%time\n",
+ "! snakemake --forceall --snakefile snakefile_ls_api --google-lifesciences --default-remote-prefix $BUCKET --use-conda --google-lifesciences-region us-central1 -j 24 --rerun-incomplete --default-resources \"machine_type=n2-standard\""
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "df9deb0a-1030-4839-aa16-37c3b32a2c87",
+ "metadata": {},
+ "source": [
+ "### STEP 9: Report the top 10 most highly expressed genes in the samples."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "d50f9bd2-dbd2-467f-a9b6-313e63ad304b",
+ "metadata": {},
+ "source": [
+ "Top 10 most highly expressed genes in the wild-type sample. The level of expression is reported in the Transcripts Per Million (`TPM`) and number of reads (`NumReads`) fields: \n",
+ "`Name Length EffectiveLength TPM NumReads`\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "0676cbbc-9392-41d1-ab57-e2b4f3cc9aad",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "! gsutil rm gs://$BUCKET/data/quants/SRR13349122_quant\n",
+ "! gsutil rm gs://$BUCKET/data/quants/SRR13349128_quant"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "b032ce69-f62d-4f5f-90a3-68c2979d9a85",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "! gsutil ls gs://$BUCKET/data/quants/*"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "c98fd827-6829-400d-af8c-969ad196c3d2",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "! gsutil cp -r gs://$BUCKET/data/quants/SRR13349122_quant/ .\n",
+ "! gsutil cp -r gs://$BUCKET/data/quants/SRR13349128_quant/ ."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "7776f671-30a0-4ba8-a9cc-e3434d40cc48",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "! sort -nrk 5,5 SRR13349122_quant/quant.sf | head -10"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "678efdde-1782-4481-9240-054c34528163",
+ "metadata": {},
+ "source": [
+ "Top 10 most highly expressed genes in the double lysogen sample.\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "1ceee200-b741-4954-b950-85edec98eb90",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "! sort -nrk 5,5 SRR13349128_quant/quant.sf | head -10"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "50169f62-e707-4d84-b301-ded51a704130",
+ "metadata": {},
+ "source": [
+ "### STEP 10: Report the expression of a putative acyl-ACP desaturase (BB28_RS16545) that was downregulated in the double lysogen relative to wild-type\n",
+ "A acyl-transferase was reported to be downregulated in the double lysogen as shown in the table of the top 20 upregulated and downregulated genes from the paper describing the study.\n",
+ "![RNA-Seq workflow](images/table-cushman.png)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "5b3794b0-a477-45fa-aa51-4414d7671441",
+ "metadata": {},
+ "source": [
+ "Use `grep` to report the expression in the wild-type sample. The fields in the Salmon `quant.sf` file are as follows. The level of expression is reported in the Transcripts Per Million (`TPM`) and number of reads (`NumReads`) fields: \n",
+ "`Name Length EffectiveLength TPM NumReads`"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "a3cb9340-682b-4177-837d-7d803a9775a5",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "! grep 'BB28_RS16545' SRR13349122_quant/quant.sf"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "79ba6401-261d-43e9-b831-ef76122da623",
+ "metadata": {},
+ "source": [
+ "Use `grep` to report the expression in the double lysogen sample. The fields in the Salmon `quant.sf` file are as follows. The level of expression is reported in the Transcripts Per Million (`TPM`) and number of reads (`NumReads`) fields: \n",
+ "`Name Length EffectiveLength TPM NumReads`"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "745ea1c5-79d3-481c-9359-6e0a93b9a286",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "! grep 'BB28_RS16545' SRR13349128_quant/quant.sf"
+ ]
+ }
+ ],
+ "metadata": {
+ "environment": {
+ "kernel": "python3",
+ "name": "tf2-gpu.2-11.m110",
+ "type": "gcloud",
+ "uri": "gcr.io/deeplearning-platform-release/tf2-gpu.2-11:m110"
+ },
+ "kernelspec": {
+ "display_name": "Python 3",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.10.12"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/tutorials/notebooks/README.md b/tutorials/notebooks/README.md
new file mode 100644
index 0000000..415ded9
--- /dev/null
+++ b/tutorials/notebooks/README.md
@@ -0,0 +1,128 @@
+# GCP Tutorial Resources
+
+_We have pulled together a variety of tutorials here from disparate sources. Some use Compute Engine, others use Vertex AI notebooks and others use only managed services. Tutorials are organized by research method, but we try to designate what GCP services are used as well to help you navigate._
+---------------------------------
+## Overview of Page Contents
+
++ [Biomedical Workflows on GCP](#bds)
++ [Artificial Intelligence and Machine Learning](#ml)
++ [Medical Imaging](#mi)
++ [Download SRA Data](#sra)
++ [Variant Calling](#vc)
++ [VCF Query](#vcf)
++ [GWAS](#gwas)
++ [Proteomics](#pro)
++ [RNAseq and Transcriptome Assembly](#rna)
++ [scRNAseq](#sc)
++ [ATACseq and scATACseq](#atac)
++ [Methylseq](#ms)
++ [Metagenomics](#meta)
++ [Multiomics and Biomarker Analysis](#mo)
++ [BLAST+](#bl)
++ [Long Read Sequencing Analysis](#long)
++ [Drug Discovery](#atom)
++ [Using Google Batch](#gbatch)
++ [Using the Life Sciences API (depreciated)](#lsapi)
++ [Public Data Sets](#pub)
+
+## **Biomedical Workflows on GCP**
+There are a lot of ways to run workflows on GCP. Here we list a few possibilities each of which may work for different research aims. As you walk through the various tutorials below, think about how you could possibly run that workflow more efficiently using one of the other methods listed here.
+
+- The simplest method is probably to spin up a Compute Engine instance, and run your command interactively, or using `screen` or, as a [startup script](https://cloud.google.com/compute/docs/instances/startup-scripts/linux) attached as metadata.
+- You could also run your pipeline via a Vertex AI notebook, either by splitting out each command as a different block, or by running a workflow manager (Nextflow etc.). [Schedule notebooks](https://codelabs.developers.google.com/vertex_notebook_executor#0) to let them run longer.
+You can find a nice tutorial for using managed notebooks [here](https://codelabs.developers.google.com/vertex_notebook_executor#0). Note that there is now a difference between `managed notebooks` and `user managed notebooks`. The `managed notebooks` have more features and can be scheduled, but give you less control about conda environments/install.
+- You can interact with [Google Batch](https://cloud.google.com/batch/docs/get-started), or the [Google Life Sciences API](https://cloud.google.com/life-sciences/docs/reference/rest) using a workflow manager like [Nextflow](https://cloud.google.com/life-sciences/docs/tutorials/Nextflow), [Snakemake](https://snakemake.github.io/snakemake-plugin-catalog/plugins/executor/googlebatch.html), or [Cromwell](https://github.com/GoogleCloudPlatform/rad-lab/tree/main/modules/genomics_cromwell). We currently have example notebooks for both [Nextflow and Snakemake that use the Life Sciences API](/notebooks/LifeSciencesAPI/), as well as [Google Batch with Nextflow](/notebooks/GoogleBatch/Nextflow) as well as a [local version of Snakemake run via Pangolin](/notebooks/pangolin).
+- You may find other APIs better suite your needs such as the [Google Cloud Healthcare Data Engine](https://cloud.google.com/healthcare).
+- Most of the notebooks below require just a few CPUs. Start small (maybe 4 CPUs), then scale up as needed. Likewise, when you need a GPU, start with a smaller or older generation GPU (e.g. T4) for testing, then switch to a newer GPU (A100/V100) once you know things will work or you need more compute power.
+
+## **Artificial Intelligence and Machine Learning**
+Machine learning is a subfield of artificial intelligence that focuses on the development of algorithms and models that enable computers to learn from and make predictions or decisions based on data, without being explicitly programmed. Machine learning on GCP generally occurs within VertexAI. You can learn more about machine learning on GCP at this [Google Crash Course](https://developers.google.com/machine-learning/crash-course). For hands-on examples, try out [this module](https://github.com/NIGMS/COVIDMachineLearningSFSU) developed by San Francisco State University or [this one from the University of Arkasas](https://github.com/NIGMS/MachineLearningUA) developed for the NIGMS Sandbox Project.
+
+Now that the age of **Generative AI** (Gen AI) has arrived, Google has released a host of Gen AI offerings within the Vertex AI suite. Some examples of what generative AI models are capable of are extracting wanted information from text, transforming speech into text, generating images from descriptions and vice versa, and much more. Vertex AI's [Vertex AI Studio](https://cloud.google.com/vertex-ai/docs/generative-ai/learn/generative-ai-studio) console allows the user to rapidly create, test, and train generative AI models on the cloud in a safe and secure setting, see our overview in [this tutorial](/notebooks/GenAI/VertexAIStudioGCP.ipynb). The studio also has ready-to-use models all contained within the [Model Garden](https://cloud.google.com/vertex-ai/docs/start/explore-models). These models range from foundation models, fine-tunable models, and task-specific solutions.
+- To learn more about Gen AI on GCP take a look at our [GenAI tutorials](/notebooks/GenAI) that go over several GCP products such as [Gemini](/notebooks/GenAI/Gemini_Intro.ipynb) and [Vector Search](/notebooks/GenAI/GCP_Pubmed_chatbot.ipynb) and other tools like [Langchain](/notebooks/GenAI/langchain_on_vertex.ipynb) and [Huggingface](/notebooks/GenAI/GCP_GenAI_Huggingface.ipynb) to deploy, train, prompt, and implement techniques like [Retrieval-Augmented Generation (RAG)](/notebooks/GenAI/GCP_Pubmed_chatbot.ipynb) to GenAI models.
+- Google also provides many generative AI tutorials hosted on [GitHub](https://github.com/GoogleCloudPlatform/generative-ai/tree/main). Some example they provide are under [language here](https://github.com/GoogleCloudPlatform/generative-ai/tree/main/language).
+
+## **Medical Image Segmentation**
+Medical image analysis is the application of computational algorithms and techniques to extract meaningful information from medical images for diagnosis, treatment planning, and research purposes. Medical image analysis requires large image files and often elastic storage and accelerated computing.
+- Most medical imaging analyses are done in notebooks, so we would recommend downloading the Jupyter Notebook from [here](/notebooks/SpleenLiverSegmentation) and then importing or cloning it into VertexAI. The tutorial walks through image segmentation using the Monai framework.
+- You can also request early access to the new [Google Medical Imaging Suite](https://cloud.google.com/medical-imaging) to see if it would fit your use case.
+
+## **Download Data From the Sequence Read Archive (SRA)**
+Next Generation genetic sequence data is housed in the NCBI Sequence Read Archive (SRA). You can access these data using the SRA Toolkit. We walk you through this using [this notebook](/notebooks/SRADownload), including how to use BigQuery to generate your list of Accessions. You can also use BigQuery to create a list of accessions for download using [this setup guide](https://www.ncbi.nlm.nih.gov/sra/docs/sra-bigquery/) and this [query guide](https://www.ncbi.nlm.nih.gov/sra/docs/sra-bigquery-examples/). Additional example notebooks can be found at this [NCBI repo](https://github.com/ncbi/ASHG-Workshop-2021). In particular, we recommend this notebook(https://github.com/ncbi/ASHG-Workshop-2021/blob/main/1_Basic_BigQuery_Examples.ipynb), which goes into more detail on using BigQuery to access the results of the SRA Taxonomic Analysis Tool, which often differ from the user input species name due to contamination, error, or due to samples being metagenomic in nature. Further, [this notebook](https://github.com/ncbi/ASHG-Workshop-2021/blob/main/2_Array_Examples.ipynb) does a deep dive on parsing the BigQuery results and may give you some good ideas on how to search for samples from SRA. The SRA metadata and taxonomy analyses are in separate BigQuery tables, you can learn how to join those two tables using SQL from [this Powerpoint](https://github.com/NCBI-Codeathons/NCGAS-cloud-workshop/blob/main/5_BigQuery.pptx) or from our tutorial [here](/notebooks/ncbi-stat-tutorial/). Finally, NCBI released [this workshop](https://github.com/ncbi/workshop-asm-ngs-2022/wiki) that walks through a wide variety of BigQuery applications with NCBI datasets.
+
+## **Variant Calling**
+Genomic variant calling is the process of identifying and characterizing genetic variations from DNA sequencing data to understand differences in an individual's genetic makeup.
+- This [Google tutorial](https://cloud.google.com/life-sciences/docs/tutorials/gatk) shows you how to run the GATK Best Practices pipeline for genomic variant calling using the Life Sciences API. There is a section about increasing your account quotas, you can skip that. You could also run GATK using any of the workflow managers and submitting to the Life Sciences API.
+- One tutorial specific to somatic variant calling comes from the Sheffield Bioinformatics Core [here](https://sbc.shef.ac.uk/somatic-variants/index.nb.html). It runs on Galaxy, but can be adapted to run in GCP. At the very least, the [data](https://drive.google.com/drive/folders/1RhrmfW3vMhPwAiBGdFIKfINWMsdvIG6E) may prove useful to you.
+
+## **Query a VCF file in Big Query**
+The output of genomic variant calling workflows is a file in the variant call format (VCF). These are often large, structured data files that can be searched using database query tools such as Big Query.
+- Learn how to use Big Query to run queries against large VCF files from Gnomad data using [this notebook](https://github.com/GoogleCloudPlatform/rad-lab/blob/main/modules/data_science/scripts/build/notebooks/Exploring_gnomad_on_BigQuery.ipynb). If any cells give you errors, try running that cell again and it should work, there seems to be some lag time between cells.
+
+## **Genome Wide Association Studies**
+Genome-wide association studies (GWAS) are large-scale investigations that analyze the genomes of many individuals to identify common genetic variants associated with traits, diseases, or other phenotypes.
+- This [NIH CFDE written tutorial](https://training.nih-cfde.org/en/latest/Bioinformatic-Analyses/GWAS-in-the-cloud
+) walks you through running a simple GWAS using AWS, thus we have rewritten it as a notebook to work on GCP [here](/notebooks/GWASCoatColor). Make sure you select R as your kernel when you spin up your notebook so that you can switch between R and Python (this only applies to 'User Managed Notebooks') but note that our team experienced conda permission issues with the new Managed Notebooks for this tutorial, so we recommend using the 'User Managed Notebooks'. Also, if the imported notebook has cells already printed out, just go to Kernel > Restart Kernel and Clear all Outputs.
+- [This tutorial](https://github.com/david-thrower-nih/DL-gwas-gcp-example) from NIH NIEHS (credit to David Thrower) builds on a published deep learning method for GWAS of soybeans and users Kubeflow and AutoML on a Kubernetes instance.
+
+## **Proteomics**
+Proteomics is the study of the entire set of proteins in a cell, tissue, or organism, aiming to understand their structure, function, and interactions to uncover insights into biological processes and diseases. Although most primary proteomic analyses occur in proprietary software platforms, a lot of secondary analysis happens in Jupyter or R notebooks, we give several examples here:
+- Use Big Query to run a Kruskal Wallis Test on Proteomics data using [these notebooks](https://github.com/isb-cgc/Community-Notebooks/tree/master/FeaturedNotebooks). Clone the repo into Vertex AI, or just drag the notebooks into a Vertex AI Workbench instance. In the notebook titled 'ACM_BCB_2020_POSTER_KruskalWallisTest_ProteinGeneExpression_vs_ClinicalFeatures.ipyng', the first BigQuery cell may throw an error, but ignore this and keep going, the rest of the notebook should run fine. Also, in that first big cell, make sure you add your Project ID. See this [doc](/docs/protein_setup.md) for environment setup instructions.
+- Run AlphaFold in Vertex AI using [this notebook](https://github.com/GoogleCloudPlatform/vertex-ai-samples/blob/main/community-content/alphafold_on_workbench/AlphaFold.ipynb). Make sure you have a GPU for your notebook instance, and follow [these instructures](https://cloud.google.com/blog/products/ai-machine-learning/running-alphafold-on-vertexai) for setting up your environment. Namely, under Environment, select `Custom container`, and then for `Docker container image` paste in the following: `west1-docker.pkg.dev/cloud-devrel-public-resources/alphafold/alphafold-on-gcp:latest`.
+- Conduct secondary analysis of Proteomic data using this [NIGMS Sandbox notebook](https://github.com/NIGMS/ProteomicsUAMS), developed by the University of Arkansas for Medical Sciences.
+
+## **RNAseq and Transcriptome Assembly**
+RNA-seq analysis is a high-throughput sequencing method that allows the measurement and characterization of gene expression levels and transcriptome dynamics. Workflows are typically run using workflow managers, and final results can often be visualized in notebooks.
+- You can run this [Nextflow tutorial](https://nf-co.re/rnaseq/3.7) for RNAseq a variety of ways on GCP. Following the instructions outlined above, you could use Compute Engine, [Life Sciences API](https://cloud.google.com/life-sciences/docs/tutorials/Nextflow), or a Vertex AI notebook.
+- For a notebook version of a complete RNAseq pipeline from Fastq to Salmon quantification go through these tutorials from the [NIGMS Sandbox Project](https://github.com/NIGMS/RNAseqUM) developed by The University of Maine.
+- Likewise, [This multi-omics module](https://github.com/NIGMS/MultiomicsUND) from the University of North Dakota includes an RNAseq component.
+
+Transcriptome assembly is the process of reconstructing the complete set of RNA transcripts in a cell or tissue from fragmented sequencing data, providing valuable insights into gene expression and functional analysis.
+- [This module](https://github.com/NIGMS/rnaAssemblyMDI) developed by the MDI Biological Laboratory for the NIGMS Sandbox Project walks you through transcriptome assembly using Nextflow.
+
+## **Single Cell RNAseq**
+Single-cell RNA sequencing (scRNA-seq) is a technique that enables the analysis of gene expression at the individual cell level, providing insights into cellular heterogeneity, identifying rare cell types, and revealing cellular dynamics and functional states within complex biological systems.
+- This [NVIDIA blog](https://developer.nvidia.com/blog/accelerating-single-cell-genomic-analysis-using-rapids/) details how to run an accelerated scRNAseq pipeline using RAPIDS. You can find a link to the GitHub repository that has lots of example notebooks [here](https://github.com/clara-parabricks/rapids-single-cell-examples). For each example use case they show some nice benchmarking data with time and cost for each machine type. You will see that most runs cost less than $1.00 with GPU machines. Pay careful attention to the environment setup as there are a lot of dependencies for these notebooks.
+- The [Scanpy tutorials](https://scanpy.readthedocs.io/en/stable/tutorials.html) page has a lot of good CPU-based examples you could run in Vertex AI. Clone this [GitHub repo](https://github.com/scverse/scanpy-tutorials) to get the notebooks directly.
+- Alternatively, here is a [GitHub repository](https://github.com/mdozmorov/scRNA-seq_notes) with a curated list of scRNAseq resources and tutorials. We did not test these in Cloud Lab, but wanted to make them available in case you needed additional resources.
+
+## **ATACseq and Single Cell ATACseq**
+ATAC-seq is a technique that allows scientists to understand how DNA is packaged in cells by identifying the regions of DNA that are accessible and potentially involved in gene regulation.
+-[This module](https://github.com/NIGMS/atacseqUNMC) walks you through how to work through an ATACseq and single-cell ATACseq workflow on Google Cloud. The module was developed by the University of Nebraska Medical Center for the NIGMS Sandbox Project.
+
+## **Methylseq**
+As one of the most abundant and well-studied epigenetic modifications, DNA methylation plays an essential role in normal cell development and has various effects on transcription, genome stability, and DNA packaging within cells. Methylseq is a technique to identify methylated regions of the genome.
+- The University of Hawai'i at Manoa developed [this set of notebooks](https://github.com/NIGMS/MethylSeqUH) that walk you through a Methylseq analysis as part of the NIGMS Sandbox Program.
+
+## **Metagenomics**
+Metagenomics is the study of genetic material collected directly from environmental samples, enabling the exploration of microbial communities, their diversity, and their functional potential, without the need for laboratory culturing.
+-[This module](https://github.com/NIGMS/MetagenomicsUSD) walks you through conducting a metagenomic analysis using command line and Nextflow. The module was developed by the University of South Dakota as part of the NIGMS Sandbox Project.
+
+## **Multiomic Analysis and Biomarker Discovery**
+Multiomic analysis involves integrating data across modalities (e. g. genomic, transcriptomic, phenotypic) to generate additive insights.
+- [This set of notebooks](https://github.com/NIGMS/MultiomicsUND) gives you an example of conducting multiomic analysis in Jupyter notebooks and was developed by the University of North Dakota as part of the NIGMS Sandbox Project.
+
+Biomarker discovery is the process of identifying specific molecules or characteristics that can serve as indicators of biological processes, diseases, or treatment responses, aiding in diagnosis, prognosis, and personalized medicine. Biomarker discovery is typically conducted through comprehensive analysis of various types of data, such as genomics, proteomics, metabolomics, and clinical data, using advanced techniques including high-throughput screening, bioinformatics, and statistical analysis to identify patterns or signatures that differentiate between healthy and diseased individuals, or responders and non-responders to specific treatments.
+- [This module](https://github.com/NIGMS/BiomarkersURI), developed by the University of Rhode Island for the NIGMS Sandbox Project, walks you through conducting some common biomarker discovery analyses in R.
+
+## **BLAST+**
+NCBI BLAST (Basic Local Alignment Search Tool) is a widely used bioinformatics program provided by the National Center for Biotechnology Information (NCBI) that compares nucleotide or protein sequences against a large database to identify similar sequences and infer evolutionary relationships, functional annotations, and structural information.
+- This [Common Data Fund](https://training.nih-cfde.org/en/latest/Cloud-Platforms/Introduction-to-GCP/gcp3/) tutorial explains how to use basic BLAST on GCP.
+- We also rewrote [this ElastBLAST tutorial](https://blast.ncbi.nlm.nih.gov/doc/elastic-blast/quickstart-gcp.html) as a [notebook](/notebooks/elasticBLAST) that will work in VertexAI.
+
+## **Long Read Sequence Analysis**
+Long read DNA sequence analysis involves analyzing sequencing reads typically longer than 10 thousand base pairs (bp) in length, compared with short read sequencing where reads are about 150 bp in length. Oxford Nanopore has a pretty complete offering of notebook tutorials for handling long read data to do a variety of things including variant calling, RNAseq, Sars-Cov-2 analysis and much more. You can find a list and description of notebooks [here](https://labs.epi2me.io/nbindex/), or clone the [GitHub repo](https://github.com/epi2me-labs). Note that these notebooks expect you are running locally and accessing the epi2me notebook server. To run them in Cloud Lab, skip the first cell that connects to the server and then the rest of the notebook should run correctly, with a few tweaks.
+
+## **Drug Discovery**
+The [Accelerating Therapeutics for Opportunities in Medicine (ATOM) Consortium](https://atomscience.org/) created a series of [Jupyter notebooks](https://github.com/ATOMScience-org/AMPL/tree/master/atomsci/ddm/examples/tutorials) that walk you through the ATOM approach to Drug Discovery.
+
+These notebooks were created to run in Google Colab, so if you run them in Google Cloud, you will need to make a few modification. First, we recommend you use a [Google Managed Notebook](https://cloud.google.com/vertex-ai/docs/workbench/managed/introduction) rather than a User-Managed notebook simply because the Google Managed notebooks already have Tensorflow and other dependencies installed. Be sure to attach a GPU to your instance (T4 is fine). Also, you will need to comment out `%tensorflow_version 2.x` since that is a Colab-specific command. You will also need to `pip install` a few packages as needed. If you get errors with `deepchem`, try running `pip install --pre deepchem[tensorflow]` and/or `pip install --pre deepchem[torch]`. Also, some notebooks will require a Tensorflow kernel, while others require Pytorch. You may also run into a Pandas error, reach out to the ATOM GitHub developers for the best solution to this issue.
+
+## **Using Google Batch**
+You can interact with Google Batch directly to submit commands, or more commonly you can interact with it through orchestration engines like [Nextflow](https://www.Nextflow.io/docs/latest/google.html) and [Cromwell](https://cromwell.readthedocs.io/en/latest/backends/GCPBatch/), etc. We have tutorials that utilize Google Batch using [Nextflow](/notebooks/GoogleBatch/Nextflow) where we run the nf-core Methylseq pipeline, as well as several from the NIGMS Sandbox including [transcriptome assembly](https://github.com/NIGMS/rnaAssemblyMDI), [multiomics](https://github.com/NIGMS/MultiomicsUND), [methylseq](https://github.com/NIGMS/MethylSeqUH), and [metagenomics](https://github.com/NIGMS/MetagenomicsUSD).
+
+## **Using the Life Sciences API (depreciated)**
+__Life Science API is depreciated on GCP and will no longer be available by July 8, 2025 on the platform,__ we recommend using Google Batch instead. For now you can still interact with the Life Sciences API directly to submit commands, or more commonly you can interact with it through orchestration engines like [Snakemake](https://snakemake.readthedocs.io/en/v7.0.0/executor_tutorial/google_lifesciences.html), as of now this workflow manager only supports Life Sciences API.
+
+## **Public Data Sets**
+Google has a lot of public datasets available that you can use for your testing. These can be viewed [here](https://cloud.google.com/life-sciences/docs/resources/public-datasets) and can be accessed via [BigQuery](https://cloud.google.com/bigquery/public-data) or directly from the cloud bucket. For example, to view Phase 3 1k Genomes at the command line type `gsutil ls gs://genomics-public-data/1000-genomes-phase-3`.
diff --git a/tutorials/notebooks/SRADownload/SRA-Download.ipynb b/tutorials/notebooks/SRADownload/SRA-Download.ipynb
new file mode 100644
index 0000000..3e6435e
--- /dev/null
+++ b/tutorials/notebooks/SRADownload/SRA-Download.ipynb
@@ -0,0 +1,455 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "id": "1651316c",
+ "metadata": {},
+ "source": [
+ "# Download sequence data from the NCBI Sequence Read Archive (SRA)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "15022f97",
+ "metadata": {},
+ "source": [
+ "## Overview\n",
+ "\n",
+ "DNA sequence data are typically deposited into the NCBI Sequence Read Archive, and can be accessed through the SRA website, or via a collection of command line tools called SRA Toolkit. Individual sequence entries are assigned an Accession ID, which can be used to find and download a particular file. For example, if you go to the [SRA database](https://www.ncbi.nlm.nih.gov/sra) in a browser window, and search for `SRX15695630`, you should see an entry for _C. elegans_. Alternatively, you can search the SRA metadata dataset in BigQuery to generate a list of accession numbers. Here we are going to generate a list of accessions using Big Query, use tools from the SRA Toolkit to download a few fastq files, then copy those fastq files to a cloud bucket."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "3b500763",
+ "metadata": {},
+ "source": [
+ "### 1) Install Dependencies"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "01213dae",
+ "metadata": {},
+ "source": [
+ "Install dependencies, including mamba (you could also use conda). At the time of writing, the version of SRA tools available with the Anaconda distribution was v.2.11.0. If you want to install the latest version, download and install from [here](https://github.com/ncbi/sra-tools/wiki/01.-Downloading-SRA-Toolkit). If you do the direct install, you will also need to configure interactively following [this guide](https://github.com/ncbi/sra-tools/wiki/05.-Toolkit-Configuration), you can do that by opening a terminal and running the commands there."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "2fd7a16a",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "!curl -L -O https://github.com/conda-forge/miniforge/releases/latest/download/Mambaforge-$(uname)-$(uname -m).sh\n",
+ "!bash Mambaforge-$(uname)-$(uname -m).sh -b -p $HOME/mambaforge"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "ada33d36-b24a-4a9b-837e-ec42075ac440",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "#add to your path\n",
+ "import os\n",
+ "os.environ[\"PATH\"] += os.pathsep + os.environ[\"HOME\"]+\"/mambaforge/bin\""
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "8f7e349b",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "! mamba install -c bioconda -c conda-forge sra-tools==2.11.0 -y"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "b8c010c2",
+ "metadata": {},
+ "source": [
+ "Test that your install works and that fasterq-dump is available in your path"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "6e507538",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "!fasterq-dump -h"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "7d8d3512-5307-42f7-9405-495fe1ca5be2",
+ "metadata": {},
+ "source": [
+ "### 2) Setup Directory Structure"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "e3265f75-ebf7-4d90-8f4d-a486df5cb693",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "pwd"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "f94e4215",
+ "metadata": {},
+ "source": [
+ "Set up your directory structure for the raw fastq data"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "9aa93698-a082-4c11-9d48-0abe775fbcc5",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "! mkdir -p data data/fasterqdump/raw_fastq data/prefetch_fasterqdump/raw_fastq"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "b068c9da-7814-4b24-9ff8-12473048bdcf",
+ "metadata": {},
+ "source": [
+ "### 3) Create Accession List using BigQuery"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "bc30c134-c903-4605-82d4-babdaaae30c0",
+ "metadata": {},
+ "source": [
+ "Here we use BigQuery to generate a list of accessions. You can also generate a manual list by searching the [SRA Database](https://www.ncbi.nlm.nih.gov/sra) and saving to a file or list."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "d2c3ec94-16cd-43b6-a4c9-d56aa593382e",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Import the biquery api\n",
+ "from google.cloud import bigquery\n",
+ "import pandas"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "243fb23b-ec7e-423f-a531-9f39a1954087",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Designate the client for the API\n",
+ "client = bigquery.Client(location=\"US\")\n",
+ "print(\"Client creating using default project: {}\".format(client.project))"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "df380099-d6ee-43b5-8719-66b7a7b41916",
+ "metadata": {},
+ "source": [
+ "Let's download bacterial samples, one of which happens to come from a swab of a sea horse. You could change the SQL query as you like, feel free to take a look at the generated df, and then play with different parameters. For more inspiration, look at this [SRA tutorial](https://www.ncbi.nlm.nih.gov/sra/docs/sra-bigquery-examples/) or our other [BigQuery notebook](https://github.com/STRIDES/NIHCloudLabGCP/tree/main/tutorials/notebooks/SRA_and_BigQuery)."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "2eb8d3f3-6a18-40a0-b9fe-7ab4c285b7db",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "query = \"\"\"\n",
+ "#standardSQL\n",
+ "SELECT *\n",
+ "FROM `nih-sra-datastore.sra.metadata`\n",
+ "WHERE organism = 'Mycobacteroides chelonae' \n",
+ "limit 3\n",
+ "\"\"\"\n",
+ "query_job = client.query(\n",
+ " query,\n",
+ " # Location must match that of the dataset(s) referenced in the query.\n",
+ " location=\"US\",\n",
+ ") # API request - starts the query\n",
+ "\n",
+ "df = query_job.to_dataframe()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "e049fdce-7ed7-43f4-a4d5-6811dd590dd5",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "df"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "4dfe9c32-817b-4e9c-901f-d27b98b95db2",
+ "metadata": {},
+ "source": [
+ "As you can see, most of what you need to know is shown in this data frame. If you wanted to just show the accession, you could replace the * for acc in the SELECT command. One other thing to think about, is how large are these files, and do you have space on your VM to download them? You can figure this out by looking at the 'jattr' column, and then converting the number of bites to GB, then add that for a few samples to get a ballpark figure. If you need more space, stop the VM, go Compute Engine and either [resize your disk](https://cloud.google.com/compute/docs/disks/resize-persistent-disk) or add a disk. You can see the amount of space on your disk from the command line using `!df -h .`"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "b7ca8b00-a467-4e70-a63e-faf6c53f6b6f",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "df['jattr'][0]"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "ab7d464a-ea2d-42d8-bb01-ee601e7a68cb",
+ "metadata": {},
+ "source": [
+ "You can also get the same info using `vdb-dump --info ` although that command does not always work as expected. You can also get the path for the sra compressed file in a bucket using `srapath `."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "f98a04b7-4bd5-40e5-9860-65c0c0d48283",
+ "metadata": {},
+ "source": [
+ "Save our accession list to a text file"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "2693f4f3-4ebf-41d7-95d6-8a23456897d4",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "with open('list_of_accessionIDS.txt', 'w') as f:\n",
+ " accs = df['acc'].to_string(header=False, index=False)\n",
+ " f.write(accs)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "d5bda88c-1263-497f-bb9c-b949a8ff5272",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "cat list_of_accessionIDS.txt"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "a53f13c3-6b62-4408-84d8-ebad27c2eedb",
+ "metadata": {
+ "tags": []
+ },
+ "source": [
+ "### 4) Download FASTQ Files with fasterq dump"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "c5e261a3-54f4-4c30-8aee-4808afdc6251",
+ "metadata": {},
+ "source": [
+ "Fasterq-dump is the replacement for the legacy fastq-dump tool. You can read [this guide](https://github.com/ncbi/sra-tools/wiki/HowTo:-fasterq-dump) to see the full details on this tool. You can also run `fasterq-dump -h` to see most of the options"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "ea5f7b67",
+ "metadata": {},
+ "source": [
+ "Fasterq dump doesn't run in batch mode, so one way to run a command on multiple samples is by using a for loop. There are many options you can explore, but here we are running -O for outdir, -e for the number of threads, -m for memory (4GB), and --location for the location we want to retrieve the file from. Depending on the type of cloud storage, it may be faster to select `NCBI` for the location. You may consider running a few tests with one or two of your accession numbers before downloading a whole batch. The default number of threads = 6, so adjust -e based on your machine size. For large files, you may also benefit from a machine type with more memory and/or threads. You may need to stop this VM, resize it, then restart and come back. There are also a bunch of ways to split your fastq files (defined [here](https://github.com/ncbi/sra-tools/wiki/HowTo:-fasterq-dump)) but the default of `split 3` will split into forward, reverse, and unpaired reads."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "96307376",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "%%time\n",
+ "! for x in `cat list_of_accessionIDS.txt`; do fasterq-dump -f -O data/raw_fastq -e 8 -m 4G $x ; done"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "f2e26c03-eeb9-4340-89e7-1eb82c9e32bb",
+ "metadata": {},
+ "source": [
+ "### 5) Download FASTQ files with prefetch + fasterq dump"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "00ef5310-539e-4d21-bd5d-05bf41986c47",
+ "metadata": {},
+ "source": [
+ "Using the example bacterial data, fasterq dump takes about 3.5 min to download the files. Under the hood, fasterq dump is pulling the compressed sra files from the database and converting them on the fly, which is slow (ish) because it has to do a lot over the network. A better, though less advertised method, is to disaggregate these functions using prefetch to pull the compressed files, then fasterq-dump to convert them locally, rather than over the network. For this to work, you need to either give the path to the prefetch directories in your text file, or make sure you cd into the raw_fastq dir so that it can find those directories with the .sra files. In this case, --location GCP is a lot faster than NCBI, but feel free to run your own tests with different locations."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "81e5fd57-2cb0-42f7-88c5-4f2f7ea4a1ea",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "%%time\n",
+ "! prefetch --option-file list_of_accessionIDS.txt -O data/prefetch_fasterqdump/raw_fastq/ -f yes"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "2dafb175-67ea-4b70-a4f5-5e98737cece2",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "ls data/prefetch_fasterqdump/raw_fastq/"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "fe02bb95-c9b0-494a-a187-c1b955f2788e",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "%%time\n",
+ "! for x in `cat list_of_accessionIDS.txt`; do fasterq-dump -f -O data/prefetch_fasterqdump/raw_fastq/ -e 8 -m 4G data/prefetch_fasterqdump/raw_fastq/$x; done"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "d1b5b0c8-ffb0-4e44-9178-f80e701d16cb",
+ "metadata": {},
+ "source": [
+ "Comparing the two methods, we can see that fasterq-dump on its own took 3.5 min, whereas prefetch + fasterq-dump takes less than 40 seconds."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "22d262cb-d49f-4e33-b461-e4f5b9c778b7",
+ "metadata": {},
+ "source": [
+ "### Step 6) Copy Files to a Bucket"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "1db9e5a6-db09-4923-a9df-feb2cd6d5e13",
+ "metadata": {},
+ "source": [
+ "Create a new bucket, or give the path to an existing bucket"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "a6703fae-2752-4717-93be-fd8d3e0b41d6",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "! gsutil mb gs://cloud-lab-tutorials_sra/"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "6c8d5b92-80dc-4d99-8f6f-88200eb98815",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "ls data/prefetch_fasterqdump/raw_fastq/"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "9ec2ea15-b0fb-4ce2-8093-972d348c988c",
+ "metadata": {},
+ "source": [
+ "Using `-m` allows multithreading, `-r` would allow for recursive copy of a directory, but here we are just giving the path to fastq files."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "a17ff514-bbac-4974-afeb-9eb847ba857f",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "! gsutil -m cp data/prefetch_fasterqdump/raw_fastq/*.fastq gs://cloud-lab-tutorials_sra/raw_fastq/"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "3d97b47b-830d-49ee-8add-c2c9fd4e41d4",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "! gsutil ls gs://cloud-lab-tutorials_sra/raw_fastq/"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "e8979484-117f-465d-b134-36372a8c8bfc",
+ "metadata": {},
+ "source": [
+ "### Step 7) Clean up\n",
+ "Make sure you shut down this VM, or delete it if you don't plan to use if further."
+ ]
+ }
+ ],
+ "metadata": {
+ "environment": {
+ "kernel": "python3",
+ "name": "tf2-gpu.2-11.m110",
+ "type": "gcloud",
+ "uri": "gcr.io/deeplearning-platform-release/tf2-gpu.2-11:m110"
+ },
+ "kernelspec": {
+ "display_name": "Python 3",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.10.12"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/tutorials/notebooks/SpleenLiverSegmentation/README.md b/tutorials/notebooks/SpleenLiverSegmentation/README.md
new file mode 100644
index 0000000..976552f
--- /dev/null
+++ b/tutorials/notebooks/SpleenLiverSegmentation/README.md
@@ -0,0 +1,49 @@
+# Spleen Segmentation with Liver Example using NVIDIA Models and MONAI
+_We have put together a training example that segments the Spleen in 3D CT Images. At the end is an example of combining both the Spleen model and the Liver model._
+
+*Nvidia has changed some of the models used in this tutorial and it may crash, if you have issues, try commenting out the liver model, we are working on a patch*
+
+## Introduction
+Two pre-trained models from NVIDIA are used in this training, a Spleen model and Liver.
+The Spleen model is additionally retrained on the medical decathlon spleen dataset: [http://medicaldecathlon.com/](http://medicaldecathlon.com/)
+Data is not necessary to be downloaded to run the notebook. The notebook downloads the data during it's run.
+The notebook uses the Python package [MONAI](https://monai.io/), the Medical Open Network for Artificial Intelligence.
+
+- Spleen Model - [clara_pt_spleen_ct_segmentation_V2](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/monaitoolkit/models/monai_spleen_ct_segmentation)
+- Liver Model - [clara_pt_liver_and_tumor_ct_segmentation_V1]()
+
+## Outcomes
+After following along with this notebook the user will be familiar with:
+- Downloading public datasets using MONAI
+- Using MONAI transformations for training
+- Downloading a pretrained NVIDIA Clara model using MONAI
+- Retrain model using MONAI
+- Visualizing medical images in python/matplotlib
+
+## Installing MONAI
+Please follow the [instructions](https://monai.io/started.html#installation) on MONAI's website for up to date install.
+Installing MONAI in a notebook environment can be completed with the commands:
+- !python -c "import monai" || pip install -q 'monai[all]'
+- !python -c "import matplotlib" || pip install -q matplotlib
+
+## Dependencies
+_It is recommended to use an NVIDIA GPU for training. If the user does not have access to a NVIDIA GPU then it is recommended to skip the training cells._
+
+The following packages and versions were installed during the testing of this notebook:
+- MONAI version: 0.8.1
+- Numpy version: 1.21.1
+- Pytorch version: 1.9.0
+- Pytorch Ignite version: 0.4.8
+- Nibabel version: 3.2.1
+- scikit-image version: 0.18.2
+- Pillow version: 8.3.1
+- Tensorboard version: 2.5.0
+- gdown version: 3.13.0
+- TorchVision version: 0.10.0+cu111
+- tqdm version: 4.61.2
+- lmdb version: 1.2.1
+- psutil version: 5.8.0
+- pandas version: 1.3.0
+- einops version: 0.3.0
+- transformers version: 4.18.0
+- mlflow version: 1.25.1
diff --git a/tutorials/notebooks/SpleenLiverSegmentation/SpleenSeg_Pretrained-4_27.ipynb b/tutorials/notebooks/SpleenLiverSegmentation/SpleenSeg_Pretrained-4_27.ipynb
new file mode 100644
index 0000000..48b8141
--- /dev/null
+++ b/tutorials/notebooks/SpleenLiverSegmentation/SpleenSeg_Pretrained-4_27.ipynb
@@ -0,0 +1,2002 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "id": "1452463e",
+ "metadata": {},
+ "source": [
+ "## Spleen Model With NVIDIA Pretrain\n",
+ "- Uses Unet architecture\n",
+ "- Pretrained model at: https://ngc.nvidia.com/catalog/models/nvidia:med:clara_pt_spleen_ct_segmentation"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "f59ba435",
+ "metadata": {},
+ "source": [
+ "##### Uncomment below to install all dependencies"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 1,
+ "id": "82db674f",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "#!pip install 'monai[all]'\n",
+ "#!pip install matplotlib "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 2,
+ "id": "bb1228b3",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "%matplotlib inline"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 3,
+ "id": "540e5d47",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# MONAI version: 0.6.0+38.gf6ad4ba5\n",
+ "# Numpy version: 1.21.1\n",
+ "# Pytorch version: 1.9.0\n",
+ "# Pytorch Ignite version: 0.4.5\n",
+ "# Nibabel version: 3.2.1\n",
+ "# scikit-image version: 0.18.2\n",
+ "# Pillow version: 8.3.1\n",
+ "# Tensorboard version: 2.5.0\n",
+ "# gdown version: 3.13.0\n",
+ "# TorchVision version: 0.10.0+cu111\n",
+ "# tqdm version: 4.61.2\n",
+ "# lmdb version: 1.2.1\n",
+ "# psutil version: 5.8.0\n",
+ "# pandas version: 1.3.0\n",
+ "# einops version: 0.3.0"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 4,
+ "id": "07510582",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "MONAI version: 0.8.1\n",
+ "Numpy version: 1.21.1\n",
+ "Pytorch version: 1.9.0\n",
+ "MONAI flags: HAS_EXT = False, USE_COMPILED = False\n",
+ "MONAI rev id: 71ff399a3ea07aef667b23653620a290364095b1\n",
+ "\n",
+ "Optional dependencies:\n",
+ "Pytorch Ignite version: 0.4.8\n",
+ "Nibabel version: 3.2.1\n",
+ "scikit-image version: 0.18.2\n",
+ "Pillow version: 8.3.1\n",
+ "Tensorboard version: 2.5.0\n",
+ "gdown version: 3.13.0\n",
+ "TorchVision version: 0.10.0+cu111\n",
+ "tqdm version: 4.61.2\n",
+ "lmdb version: 1.2.1\n",
+ "psutil version: 5.8.0\n",
+ "pandas version: 1.3.0\n",
+ "einops version: 0.3.0\n",
+ "transformers version: 4.18.0\n",
+ "mlflow version: 1.25.1\n",
+ "\n",
+ "For details about installing the optional dependencies, please visit:\n",
+ " https://docs.monai.io/en/latest/installation.html#installing-the-recommended-dependencies\n",
+ "\n"
+ ]
+ }
+ ],
+ "source": [
+ "import os\n",
+ "import tempfile\n",
+ "import glob\n",
+ "\n",
+ "import matplotlib.pyplot as plt\n",
+ "#import plotly.graph_objects as go\n",
+ "import torch\n",
+ "import numpy as np\n",
+ "\n",
+ "from monai.apps import download_and_extract\n",
+ "from monai.networks.nets import UNet\n",
+ "from monai.networks.layers import Norm\n",
+ "from monai.losses import DiceFocalLoss\n",
+ "from monai.metrics import DiceMetric\n",
+ "from monai.inferers import sliding_window_inference\n",
+ "from monai.data import (\n",
+ " LMDBDataset,\n",
+ " DataLoader,\n",
+ " decollate_batch,\n",
+ " ImageDataset,\n",
+ " Dataset\n",
+ ")\n",
+ "from monai.apps import load_from_mmar\n",
+ "from monai.transforms import (\n",
+ " AsDiscrete,\n",
+ " EnsureChannelFirstd,\n",
+ " Compose,\n",
+ " LoadImaged,\n",
+ " ScaleIntensityRanged,\n",
+ " Spacingd,\n",
+ " Orientationd,\n",
+ " CropForegroundd,\n",
+ " RandCropByPosNegLabeld,\n",
+ " RandAffined,\n",
+ " RandRotated,\n",
+ " EnsureType,\n",
+ " EnsureTyped,\n",
+ ")\n",
+ "from monai.utils import first, set_determinism\n",
+ "from monai.apps.mmars import RemoteMMARKeys\n",
+ "from monai.config import print_config\n",
+ "\n",
+ "print_config()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "6f523cbf",
+ "metadata": {},
+ "source": [
+ "#### Running a pretrained model"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 5,
+ "id": "0be7401d",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "PRETRAINED = True"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "e9f3e5f3",
+ "metadata": {},
+ "source": [
+ "#### Create the directory for storing data"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 6,
+ "id": "311c3282",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "monai_data/\n"
+ ]
+ }
+ ],
+ "source": [
+ "directory = \"monai_data/\"\n",
+ "root_dir = tempfile.mkdtemp() if directory is None else directory\n",
+ "print(root_dir)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "38463a18",
+ "metadata": {},
+ "source": [
+ "#### Download the public dataset"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 7,
+ "id": "da7cfede",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "2022-04-27 14:49:41,401 - INFO - Verified 'Task09_Spleen.tar', md5: 410d4a301da4e5b2f6f86ec3ddba524e.\n",
+ "2022-04-27 14:49:41,402 - INFO - File exists: monai_data/Task09_Spleen.tar, skipped downloading.\n",
+ "2022-04-27 14:49:41,403 - INFO - Non-empty folder exists in monai_data/Task09_Spleen, skipped extracting.\n"
+ ]
+ }
+ ],
+ "source": [
+ "resource = \"https://msd-for-monai.s3-us-west-2.amazonaws.com/Task09_Spleen.tar\"\n",
+ "md5 = \"410d4a301da4e5b2f6f86ec3ddba524e\"\n",
+ "\n",
+ "compressed_file = os.path.join(root_dir, \"Task09_Spleen.tar\")\n",
+ "download_and_extract(resource, compressed_file, root_dir, md5)\n",
+ "data_dir = os.path.join(root_dir, \"Task09_Spleen\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "fae7c51b",
+ "metadata": {},
+ "source": [
+ "#### Create Date Dictionaries and separate files from training and validation"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 8,
+ "id": "2515b177",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "train_images = sorted(\n",
+ " glob.glob(os.path.join(data_dir, \"imagesTr\", \"*.nii.gz\")))\n",
+ "train_labels = sorted(\n",
+ " glob.glob(os.path.join(data_dir, \"labelsTr\", \"*.nii.gz\")))\n",
+ "data_dicts = [\n",
+ " {\"image\": image_name, \"label\": label_name}\n",
+ " for image_name, label_name in zip(train_images, train_labels)\n",
+ "]\n",
+ "train_files, val_files = data_dicts[:-9], data_dicts[-9:]"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "974fc5aa",
+ "metadata": {},
+ "source": [
+ "#### Define your transformations for training and validation"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 9,
+ "id": "2357d35d",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "train_transforms = Compose( #Transformations for training dataset\n",
+ " [\n",
+ " LoadImaged(keys=[\"image\", \"label\"]), #Load dictionary based images and labels\n",
+ " EnsureChannelFirstd(keys=[\"image\", \"label\"]), #Ensures the first channel of each image is the channel dimension\n",
+ " Spacingd(keys=[\"image\", \"label\"], pixdim=( #Change spacing of voxels to be same across images\n",
+ " 1.5, 1.5, 2.0), mode=(\"bilinear\", \"nearest\")),\n",
+ " Orientationd(keys=[\"image\", \"label\"], axcodes=\"RAS\"), #Correct the orientation of images (Right, Anterior, Superior)\n",
+ " ScaleIntensityRanged( #Scale intensity of all images (For images only and not labels)\n",
+ " keys=[\"image\"], a_min=-57, a_max=164,\n",
+ " b_min=0.0, b_max=1.0, clip=True,\n",
+ " ),\n",
+ " CropForegroundd(keys=[\"image\", \"label\"], source_key=\"image\"), #Crop foreground of image\n",
+ " RandCropByPosNegLabeld( #Randomly crop fixed sized region\n",
+ " keys=[\"image\", \"label\"],\n",
+ " label_key=\"label\",\n",
+ " spatial_size=(96, 96, 96),\n",
+ " pos=1,\n",
+ " neg=1,\n",
+ " num_samples=4,\n",
+ " image_key=\"image\",\n",
+ " image_threshold=0,\n",
+ " ),\n",
+ " RandAffined( #Do a random affine transformation with some probability\n",
+ " keys=['image', 'label'],\n",
+ " mode=('bilinear', 'nearest'),\n",
+ " prob=0.5,\n",
+ " spatial_size=(96, 96, 96),\n",
+ " rotate_range=(np.pi/18, np.pi/18, np.pi/5),\n",
+ " scale_range=(0.05, 0.05, 0.05)\n",
+ " ),\n",
+ " EnsureTyped(keys=[\"image\", \"label\"]),\n",
+ " ]\n",
+ ")\n",
+ "val_transforms = Compose( #Transformations for testing dataset\n",
+ " [\n",
+ " LoadImaged(keys=[\"image\", \"label\"]),\n",
+ " EnsureChannelFirstd(keys=[\"image\", \"label\"]),\n",
+ " Spacingd(keys=[\"image\", \"label\"], pixdim=(\n",
+ " 1.5, 1.5, 2.0), mode=(\"bilinear\", \"nearest\")),\n",
+ " Orientationd(keys=[\"image\", \"label\"], axcodes=\"RAS\"),\n",
+ " ScaleIntensityRanged(\n",
+ " keys=[\"image\"], a_min=-57, a_max=164,\n",
+ " b_min=0.0, b_max=1.0, clip=True,\n",
+ " ),\n",
+ " RandRotated(\n",
+ " keys=['image', 'label'],\n",
+ " mode=('bilinear', 'nearest'),\n",
+ " range_x=np.pi/18,\n",
+ " range_y=np.pi/18,\n",
+ " range_z=np.pi/5,\n",
+ " prob=1.0,\n",
+ " padding_mode=('reflection', 'reflection'),\n",
+ " ),\n",
+ " CropForegroundd(keys=[\"image\", \"label\"], source_key=\"image\"),\n",
+ " EnsureTyped(keys=[\"image\", \"label\"]),\n",
+ " ]\n",
+ ")"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 10,
+ "id": "ada5757a",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "[{'image': 'monai_data/Task09_Spleen/imagesTr/spleen_56.nii.gz',\n",
+ " 'label': 'monai_data/Task09_Spleen/labelsTr/spleen_56.nii.gz'},\n",
+ " {'image': 'monai_data/Task09_Spleen/imagesTr/spleen_59.nii.gz',\n",
+ " 'label': 'monai_data/Task09_Spleen/labelsTr/spleen_59.nii.gz'},\n",
+ " {'image': 'monai_data/Task09_Spleen/imagesTr/spleen_6.nii.gz',\n",
+ " 'label': 'monai_data/Task09_Spleen/labelsTr/spleen_6.nii.gz'},\n",
+ " {'image': 'monai_data/Task09_Spleen/imagesTr/spleen_60.nii.gz',\n",
+ " 'label': 'monai_data/Task09_Spleen/labelsTr/spleen_60.nii.gz'},\n",
+ " {'image': 'monai_data/Task09_Spleen/imagesTr/spleen_61.nii.gz',\n",
+ " 'label': 'monai_data/Task09_Spleen/labelsTr/spleen_61.nii.gz'},\n",
+ " {'image': 'monai_data/Task09_Spleen/imagesTr/spleen_62.nii.gz',\n",
+ " 'label': 'monai_data/Task09_Spleen/labelsTr/spleen_62.nii.gz'},\n",
+ " {'image': 'monai_data/Task09_Spleen/imagesTr/spleen_63.nii.gz',\n",
+ " 'label': 'monai_data/Task09_Spleen/labelsTr/spleen_63.nii.gz'},\n",
+ " {'image': 'monai_data/Task09_Spleen/imagesTr/spleen_8.nii.gz',\n",
+ " 'label': 'monai_data/Task09_Spleen/labelsTr/spleen_8.nii.gz'},\n",
+ " {'image': 'monai_data/Task09_Spleen/imagesTr/spleen_9.nii.gz',\n",
+ " 'label': 'monai_data/Task09_Spleen/labelsTr/spleen_9.nii.gz'}]"
+ ]
+ },
+ "execution_count": 10,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "val_files"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "ba3c7695",
+ "metadata": {},
+ "source": [
+ "#### Visualize Image and Label (example)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 11,
+ "id": "689eea4e",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "image shape: torch.Size([239, 239, 113]), label shape: torch.Size([239, 239, 113])\n"
+ ]
+ },
+ {
+ "data": {
+ "image/png": "\n",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {
+ "needs_background": "light"
+ },
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "check_ds = Dataset(data=val_files, transform=val_transforms)\n",
+ "check_loader = DataLoader(check_ds, batch_size=1)\n",
+ "check_data = first(check_loader)\n",
+ "image, label = (check_data[\"image\"][0][0], check_data[\"label\"][0][0])\n",
+ "print(f\"image shape: {image.shape}, label shape: {label.shape}\")\n",
+ "# plot the slice [:, :, 80]\n",
+ "plt.figure(\"check\", (12, 6))\n",
+ "plt.subplot(1, 2, 1)\n",
+ "plt.title(\"image\")\n",
+ "plt.imshow(image[:, :, 80], cmap=\"gray\")\n",
+ "plt.subplot(1, 2, 2)\n",
+ "plt.title(\"label\")\n",
+ "plt.imshow(label[:, :, 80])\n",
+ "plt.show()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "f45ba707",
+ "metadata": {},
+ "source": [
+ "#### Use a dataloader to load files\n",
+ " - Ability to use LMDB (Lightning Memory-Mapped Database)\n",
+ " - Here is where transforms take place and they happen on both images and labels"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 12,
+ "id": "fe3285d0",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "100%|██████████| 32/32 [00:00<00:00, 57113.93it/s]\n"
+ ]
+ },
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Accessing lmdb file: /home/jupyter/covid19det-kaggle/kaggle/MonaiTesting/monai_data/monai_cache.lmdb.\n"
+ ]
+ },
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "100%|██████████| 32/32 [00:00<00:00, 47679.48it/s]\n"
+ ]
+ },
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "{'map_addr': 0, 'map_size': 1099511627776, 'last_pgno': 941102, 'last_txnid': 100, 'max_readers': 126, 'num_readers': 0, 'size': 32, 'filename': '/home/jupyter/covid19det-kaggle/kaggle/MonaiTesting/monai_data/monai_cache.lmdb'}\n"
+ ]
+ },
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "100%|██████████| 9/9 [00:00<00:00, 10999.05it/s]\n"
+ ]
+ },
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Accessing lmdb file: /home/jupyter/covid19det-kaggle/kaggle/MonaiTesting/monai_data/monai_cache.lmdb.\n"
+ ]
+ },
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "100%|██████████| 9/9 [00:00<00:00, 17739.07it/s]"
+ ]
+ },
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "{'map_addr': 0, 'map_size': 1099511627776, 'last_pgno': 941102, 'last_txnid': 100, 'max_readers': 126, 'num_readers': 0, 'size': 9, 'filename': '/home/jupyter/covid19det-kaggle/kaggle/MonaiTesting/monai_data/monai_cache.lmdb'}\n"
+ ]
+ },
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "\n"
+ ]
+ }
+ ],
+ "source": [
+ "train_ds = LMDBDataset(data=train_files, transform=train_transforms, cache_dir=root_dir)\n",
+ "# initialize cache and print meta information\n",
+ "print(train_ds.info())\n",
+ "\n",
+ "# use batch_size=2 to load images and use RandCropByPosNegLabeld\n",
+ "# to generate 2 x 4 images for network training\n",
+ "train_loader = DataLoader(train_ds, batch_size=2, shuffle=True, num_workers=2)\n",
+ "\n",
+ "# the validation data loader will be created on the fly to ensure \n",
+ "# a deterministic validation set for demo purpose.\n",
+ "val_ds = LMDBDataset(data=val_files, transform=val_transforms, cache_dir=root_dir)\n",
+ "# initialize cache and print meta information\n",
+ "print(val_ds.info())"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 13,
+ "id": "455cbcdc",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "{'map_addr': 0, 'map_size': 1099511627776, 'last_pgno': 941102, 'last_txnid': 100, 'max_readers': 126, 'num_readers': 0, 'size': 32, 'filename': '/home/jupyter/covid19det-kaggle/kaggle/MonaiTesting/monai_data/monai_cache.lmdb'}\n"
+ ]
+ }
+ ],
+ "source": [
+ "print(train_ds.info())"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "a77e7856",
+ "metadata": {},
+ "source": [
+ "#### Now we want to download the pretrained model from NVIDIA"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 14,
+ "id": "8539fb7d",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "mmar = {\n",
+ " RemoteMMARKeys.ID: \"clara_pt_spleen_ct_segmentation_1\",\n",
+ " RemoteMMARKeys.NAME: \"clara_pt_spleen_ct_segmentation\",\n",
+ " RemoteMMARKeys.FILE_TYPE: \"zip\",\n",
+ " RemoteMMARKeys.HASH_TYPE: \"md5\",\n",
+ " RemoteMMARKeys.HASH_VAL: None,\n",
+ " RemoteMMARKeys.MODEL_FILE: os.path.join(\"models\", \"model.pt\"),\n",
+ " RemoteMMARKeys.CONFIG_FILE: os.path.join(\"config\", \"config_train.json\"),\n",
+ " RemoteMMARKeys.VERSION: 2,\n",
+ "}"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 15,
+ "id": "de7fb262",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "'clara_pt_spleen_ct_segmentation'"
+ ]
+ },
+ "execution_count": 15,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "mmar['name']"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 16,
+ "id": "bf96f9f9",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "using a pretrained model.\n",
+ "2022-04-27 14:49:45,704 - INFO - Expected md5 is None, skip md5 check for file monai_data/clara_pt_spleen_ct_segmentation_2.zip.\n",
+ "2022-04-27 14:49:45,705 - INFO - File exists: monai_data/clara_pt_spleen_ct_segmentation_2.zip, skipped downloading.\n",
+ "2022-04-27 14:49:45,706 - INFO - Non-empty folder exists in monai_data/clara_pt_spleen_ct_segmentation, skipped extracting.\n",
+ "2022-04-27 14:49:45,707 - INFO - \n",
+ "*** \"clara_pt_spleen_ct_segmentation\" available at monai_data/clara_pt_spleen_ct_segmentation.\n",
+ "2022-04-27 14:49:49,353 - INFO - *** Model: \n",
+ "2022-04-27 14:49:49,400 - INFO - *** Model params: {'dimensions': 3, 'in_channels': 1, 'out_channels': 2, 'channels': [16, 32, 64, 128, 256], 'strides': [2, 2, 2, 2], 'num_res_units': 2, 'norm': 'batch'}\n",
+ "2022-04-27 14:49:49,411 - INFO - \n",
+ "---\n",
+ "2022-04-27 14:49:49,412 - INFO - For more information, please visit https://ngc.nvidia.com/catalog/models/nvidia:med:clara_pt_spleen_ct_segmentation\n",
+ "\n"
+ ]
+ }
+ ],
+ "source": [
+ "device = torch.device(\"cuda:0\" if torch.cuda.is_available() else \"cpu\") #torch.device(\"cpu\")\n",
+ "if PRETRAINED:\n",
+ " print(\"using a pretrained model.\")\n",
+ " try: #MONAI=0.8\n",
+ " unet_model = load_from_mmar(\n",
+ " item = mmar['name'], \n",
+ " mmar_dir=root_dir,\n",
+ " map_location=device,\n",
+ " version=mmar['version'],\n",
+ " pretrained=True)\n",
+ " except: #MONAI<0.8\n",
+ " unet_model = load_from_mmar(\n",
+ " mmar, \n",
+ " mmar_dir=root_dir,\n",
+ " map_location=device,\n",
+ " pretrained=True)\n",
+ " model = unet_model\n",
+ "else: \n",
+ " print(\"using a randomly init. model.\")\n",
+ " model = UNet(\n",
+ " dimensions=3,\n",
+ " in_channels=1,\n",
+ " out_channels=2,\n",
+ " channels=(16, 32, 64, 128, 256),\n",
+ " strides=(2, 2, 2, 2),\n",
+ " num_res_units=2,\n",
+ " norm=Norm.BATCH,\n",
+ " )\n",
+ "\n",
+ "model = model.to(device)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "39910557",
+ "metadata": {},
+ "source": [
+ "### This will be our test file we will view for reference\n",
+ " - Here we see how our initial model appears to perform"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 17,
+ "id": "4be7eb8f",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "100%|██████████| 1/1 [00:00<00:00, 4639.72it/s]"
+ ]
+ },
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Accessing lmdb file: /home/jupyter/covid19det-kaggle/kaggle/MonaiTesting/monai_data/monai_cache.lmdb.\n"
+ ]
+ },
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "\n"
+ ]
+ }
+ ],
+ "source": [
+ "test_file = data_dicts[20:21]\n",
+ "test_ds = LMDBDataset(data=test_file, transform=None, cache_dir=root_dir)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "2544a774",
+ "metadata": {},
+ "source": [
+ "#### We use a sliding window technique to search the image"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 18,
+ "id": "16fd4e94",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "num_classes=2\n",
+ "post_pred = Compose([EnsureType(), AsDiscrete(argmax=True, to_onehot=num_classes)])\n",
+ "post_label = Compose([EnsureType(), AsDiscrete(to_onehot=num_classes)])\n",
+ "model.eval()\n",
+ "with torch.no_grad():\n",
+ " for data in DataLoader(test_ds, batch_size=1, num_workers=2):\n",
+ " test_inputs, test_labels = (\n",
+ " data[\"image\"].to(device),\n",
+ " data[\"label\"].to(device),\n",
+ " )\n",
+ " roi_size = (160, 160, 160)\n",
+ " sw_batch_size = 4\n",
+ " test_outputs = sliding_window_inference(\n",
+ " test_inputs, roi_size, sw_batch_size, model, overlap=0.5)\n",
+ " test_outputspre = [post_pred(i) for i in decollate_batch(test_outputs)] # Decollate our results\n",
+ " test_labelspre = [post_label(i) for i in decollate_batch(test_labels)]"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 19,
+ "id": "9782ec96",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ ""
+ ]
+ },
+ "execution_count": 19,
+ "metadata": {},
+ "output_type": "execute_result"
+ },
+ {
+ "data": {
+ "image/png": "\n",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {
+ "needs_background": "light"
+ },
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "fig = plt.figure(frameon=False, figsize=(7,7))\n",
+ "plt.title('Actual Spleen')\n",
+ "plt.imshow(test_labelspre[0].cpu().numpy()[1][:,:,200], cmap='Greys_r') #Actual spleen"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 20,
+ "id": "76cd38e6",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ ""
+ ]
+ },
+ "execution_count": 20,
+ "metadata": {},
+ "output_type": "execute_result"
+ },
+ {
+ "data": {
+ "image/png": "iVBORw0KGgoAAAANSUhEUgAAAVcAAAGrCAYAAAB0YdR6AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjQuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8rg+JYAAAACXBIWXMAAAsTAAALEwEAmpwYAAAdx0lEQVR4nO3de7RdZXnv8e+ThHBJQkjkgJsQS8SggkpAoFa5enogUm3ASxuO1lhUZIhUa+0BbFUYUls7hHM8amkjBKFYEG8j6BlSAhWxYJVAQW4NBAlkJyFAQhIgEC55zh9zblyEfUl29rvnWnt/P2OssdZ652U9890zv8z5zrn2jsxEkjS0xjRdgCSNRIarJBVguEpSAYarJBVguEpSAYarJBVguKpfEfGPEfG5QuvOiHhNoXUvi4jf3851fCgi/n2oahrgs/ap+2PcEKzrWxFx7lDUpcEzXNtMHQpPR8STEbE6Ii6OiInbsa7tCpjMPDUzv7g96xisiDguIm6IiCci4tGI+FlE/GETtQwkIs6OiMuGcH0HRMQ1EfF4RKyLiFsi4vihWr/KM1zb07sycyJwMHAo8NdbzjBERzjbvY5SIuK9wHeBS4G9gT2BzwPvarKuYfQjYBHVdu8B/BmwodGKtE0M1zaWmSuAnwBvgBdPo0+LiPuA++q2d0bEbfXRzU0R8aa6/Z+BVwE/qo+C/1fLqeeHI+Ih4N/qeb8bEQ9HxPr6SPGAnhpaTzEj4uiI6I6Iv4iIRyJiVUT8acu8O0bEVyLiofqo+x8jYueW6X9ZL7MyIk7ua7sjIoDzgS9m5oWZuT4zN2fmzzLzo/U8+0bEv0XEmoh4LCK+HRG79bG+sRHx2Yi4vz4KviUipvd2Kh4R10fER/pYz1cjYnlEbKjXcUTdPhv4LPDHdV/fXrdPjoiL6m1eERHnRsTYlpq+Utf+G+APWj5nd2AG8M3MfLZ+3JiZ/77Fz+Gz9fLLIuL9/fRnr/tIPW2viPh+fWbwQET8Wcu0syPiyoi4tO63uyLikL4+Ry9luLaxiJgOHA/8Z0vzCcDvAvtHxMHAAuBjwCuAfwKuiogdM/NPgIeoj4Iz8+9b1nEU8HrguPr9T4CZVEdItwLf7qesVwKTgWnAh4FvRMSUetqXgf2AWcBr6nk+X2/LbOAzwP+oP6u/4YrXAtOB7/UzTwB/C+xVb8t04Ow+5v00cBJVX+4KnAxs7GfdfbmZatumAv8CfDcidsrMq4EvAd+p+/rAev5LgOep+uIg4FigJ7g/Cryzbj8EeG/L56wBlgKXRcQJEbFnL7W8Etidqo/nAfMj4rVbztTfPhIRY6iOkG+v1/PfgU9FxHEtq/hD4ApgN+Aq4OsDd5MAyEwfbfQAlgFPAuuAB4F/AHaupyXw9pZ5L6A6umtdfglwVMu6fr9l2j71Ol7dz+fvVs8zuX7/LeDc+vXRwNPAuJb5HwHeQhV2TwH7tkz7PeCB+vUC4O9apu1Xf85reqnhbfW0nbah304A/nOLfvz9lj6Z08syPf3Ruj3XAx+pX38I+Pd+PvNx4MD69dnAZS3T9gQ29fzs6raTgJ/Wr/8NOLVl2rGttVANhXwduB/YDNwAzGz5OTwPTGhZ/krgc738zPrcR6j+k35oi2lnARe3bNO1LdP2B55u+t9IpzzadsxtlDshM6/tY9rylte/A8yLiNNb2sZTHc3158V11KepfwO8D/hvVP+QoToqWt/Lsmsy8/mW9xuBifWyuwC3VGf11eqBsfXrvYBbWpZ7sJ/61tTPXcADvc0QEXsA/xc4AphEdRb2eB/rm04VUtslIv6C6shzL6og3JWqn3rzO8AOwKqW/hjDb/t+L176s3xJf2RmN/CJ+nOnA/Opxp9/r57l8cx8aovle/u597ePvADsFRHrWqaNBX7e8v7hltcbgZ0iYtwW+4B64bBA52n9NWbLgb/JzN1aHrtk5uW9zNvXOv4nMIfqNH0y1dEcVMG4LR6jOqo9oKWWyVldmANYRRVyPV7Vz7qWUG3be/qZ52+ptuNNmbkr8IF+al4O7NtLe0847dLS9sreVlCPr54B/BEwJTN3o/rPp+czt+zr5VRHrru39Meumdkznr3V/ZGZy4FvUI+916ZExIQtll/Zy+L97SPLqc4sWqdNykzvShgChmtn+yZwakT8blQmRMQfRMSkevpq4NUDrGMSVQisoQqZLw2mkMzcXNfzv+ujSiJiWsv43ZXAhyJi/4jYBfhCP+tKqnHSz0XEn0bErhExJiIOj4j5LXU/CayLiGnAX/ZT3oXAFyNiZt1Pb4qIV2Tmo8AK4AP1BaaT6T2Eez7veeBRYFxEfJ7qyLXHamCfehyTzFwFXAOc11L/vhFxVEt//FlE7F2PWZ/Zs6KImBIR50TEa+rldqcaJ/6PLWo6JyLG18H/Tqq7K7bU3z7yK2BDRJwRETvXffCGiDi0n77UVjJcO1hmLqa6MPJ1qlPipVTjhD3+Fvjr+irxZ/pYzaVUp5QrgLt5+T/gbXFGXcN/RMQG4Fqqi1Nk5k+A/0M11ri0fu5TZn4P+GOqUFlJFV7nAgvrWc6hulVtPfD/gB/0s7rzqcLsGqrbmS4Ceu5i+ChVMK8BDgBu6mMd/0p14e9eqv56hpee1vcE25qIuLV+/UGqU/C7qX4+36Ma6oAq9P6V6mLSrVvU/yzVGcS1db13Uv0H+KGWeR6u17mS6gLkqZn5X1sW3d8+kpkvUN3aNotq+OUxqv+IJvfRB9oGUQ9US+oQEXE01cWzvRsuRf3wyFWSCjBcJamAYuEaEbMjYklELI2IMwdeQtLWyMzrHRJof0XGXOt7J++l+jZON9U3W07KzLuH/MMkqQ2V+hLBYcDSzPwNQERcQXUvZa/hGhFeVZPUkTKz1/urSw0LTOOlt6l0120viohTImJxRCwuVIMkNabUkWtvSf6So9PMnE/1lT6PXCWNOKWOXLt56Vf79qb3r+ZJ0ohUKlxvBmZGxIyIGA/Mpfp1ZZI0KhQZFsjM5yPiE1Rf7xsLLMjMu0p8liS1o7b4+qtjrpI61XDfLSBJo5rhKkkFGK6SVIDhKkkFGK6SVIDhKkkFGK6SVIDhKkkFGK6SVIDhKkkFGK6SVIDhKkkFGK6SVIDhKkkFGK6SVIDhKkkFGK6SVIDhKkkFGK6SVIDhKkkFGK6SVIDhKkkFGK6SVIDhKkkFGK6SVIDhKkkFGK6SVIDhKkkFGK6SVIDhKkkFGK6SVIDhKkkFGK6SVIDhKkkFGK6SVIDhKkkFGK6SVIDhKkkFGK6SVIDhKkkFGK6SVIDhKkkFGK6SVIDhKkkFGK6SVIDhKkkFGK6SVIDhKkkFjGu6AGlbvf71r+eGG24YsvVddtll/Pmf//mQrU8Cw1Vt7h3veAcnnnjiS9pe+cpXsvvuuw/ZZ5x44olMmDABgM997nOsXr16yNat0Ssys+kaiIjmi1BjxowZw0EHHUREvGza6aefzgc/+MFhq2XevHlcc801PPzww8P2mepsmfnyHRfDVW1g8uTJrFmzhrFjxzZdCgAXXHABH//4x5suQx2ir3D1gpaG1Zw5c9iwYcNLHt3d3W0TrNJQ8chVxZx66qnMnj37JW3Tp0/n4IMPbqiirfPggw/yox/9iNNPP73pUtQBHBbQsDrmmGP40pe+xFve8pamSxmURx99lD322KPpMtQBDFcNm7Fjx/L4448zadKkpksZlMxk9erVdHV1NV2KOkCRMdeIWBYRd0TEbRGxuG6bGhGLIuK++nnK9nyGOsusWbN48sknmThxYtOlDNqll17K6173uqbLUIcbigtax2TmrMw8pH5/JnBdZs4Erqvfa5QYN24cO+20U6+3VXWKjRs3sn79+qbLUIcr8SWCOcDR9etLgOuBMwp8jtrInDlzmDhxIjNmzGi6FKktbG+4JnBNPWb6T5k5H9gzM1cBZOaqiPCqwAgWEey8885cdNFFvOIVr2i6HKltbG+4vi0zV9YBuigi/mtrF4yIU4BTtvPz1bAZM2awZMkS71OVtrBd4ZqZK+vnRyLih8BhwOqI6KqPWruAR/pYdj4wH7xboJNFBOPGjZxfUXHmmWdy1VVXNV2GRoBBX9CKiAkRMannNXAscCdwFTCvnm0esHB7i1R7mjVr1rB+7384XH311dxzzz1Nl6ERYHsOOfYEflhfFR4H/EtmXh0RNwNXRsSHgYeA921/mWo3kydP5uSTTx4x32LKTDZt2sQLL7zQdCkaIfwSgQZl2bJlvOpVr+roW65abdiwgalTpxqu2mb+4hYNialTp3L//ffT1dU1YoL1xhtv5F3vepfBqiE1cq5EaFiMHz+eGTNmjJhgBeju7h7Sv2wggUeu2gYTJkxgn332abqMIbVx40bWrl3bdBkagQxXbbXTTjuNX/ziFyPqqPWd73ynvxhbRTgsoK1y7bXXcthhhzVdxpC4+eabXwzUu+66q+FqNFIZrurX+PHj+fKXv8yb3/zmjv0Vgq2uv/56LrvsMhYvXtx0KRrhvBVL/Wq3v281WJnJ2rVrOemkk1i0aFHT5WgE6etWLI9cNSo8++yzdHV18dxzzzVdikYJL2ipTx/4wAd46KGHOvqo9ayzzmLfffflda97ncGqYeWRq3p1xhlnMHfuXHbdddemSxmUzZs3c/7557Nw4UJ+85vfNF2ORiHHXPUSEcERRxzBhRdeyMyZM5suZ1A2bdrEihUreOMb38jGjRubLkcjnH+gUFtll112Yd26deywww5NlzIomcndd9/NG97whqZL0Sjh7xbQgN797nfz2GOPdWywApxzzjkcfvjhTZchOeaq3xo/fjw777xz02UMSmZyzjnncOWVV7Ju3bqmy5EMV3W+TZs2sXTpUr761a8arGobhquA6k9id+JwwObNm+nu7naMVW3HMVcB8OMf/5gFCxY0XcY2O++883jjG9/YdBnSy3jkOsqNGTOG66+/nlmzZnXkHxp85plnePrpp5suQ3qZzvvXpCEVERx66KHstNNOTZeyzW677TbuvffepsuQeuV9rqPYmDFj2G233VixYkVHhuuMGTNYtmxZ02VolPM+V73M29/+dh599NGODFap3Rmuo9S5557Lt771LcaM6bxdYP369ZxwwgmsXLmy6VKkPjnmOkodcMABTJs2rekyBuWZZ55h4cKFTZch9avzDls0qj377LNs2LCh6TKkARmu6igXXngh++23X9NlSAMyXNVR2uHuFmlrGK7qGFdffTU/+9nPmi5D2ipe0FLb27x5M+vXr+eTn/ykXxpQxzBc1faeeOIJpk6d2nQZ0jZxWEBtbdGiRRx88MFNlyFtM8NVbW3NmjX+gUF1JMNVbeuxxx7jwQcfbLoMaVAcc1Vbykze8573cMMNNzRdijQohqvaztNPP81rX/taVq1a1XQp0qAZrmo7mUl3d7dfGFBHM1zVVp566imWLFnSdBnSdvOCltrKTTfdxJvf/GaPWtXxDFdJKsBwVVs59NBDWbRoERG9/uUMqWMYrmoru+22G0cccQTz5s1jypQpTZcjDZrhqq3y/PPP89xzz/X5GMox0h133JGLL76Yt771rYwfP37I1isNJ8NVW+X9738/EydO7POxdu3aIf/MhQsXcuGFFw75eqXh4J/WHqUOPPBATjzxRL7whS/0Ov1rX/saP//5z198f9111/UboLNnz+ass87iyCOPHNI6169fz+23385RRx01pOuVhkpff1rb+1xHqdtvv51NmzZxxBFH9Dr9O9/5DjfeeONWr+/qq69m7733ZpddduGQQw4ZqjKZPHkyhx56KJ/+9KdZsGAB69atG7J1SyV55KohdeCBB/KLX/yCnXbaaciv+B977LHceOONbNy4cUjXK22Pvo5cDVcNuTFjxrBu3TomTZo0pOvNTL7xjW9w+umnD+l6pe3RV7h6QUtDbvPmzcyePZubbrppSNcbEd7/qo7hmKuKuOmmm7j44otZvnz5i23HHHMMe+yxR4NVScMoMxt/AOlj5D+uuOKKfOqpp3J7fP3rX298O3z4aH30lWsOC2jYzJ07l5NPPrnpMqRh4bCAhtWPf/xj3vSmN734/r3vfS+f//znG6xIKsNw1bB66qmnuOOOO158P3bsWLq6uvjIRz4y4MWq7373u1xzzTWlS5SGhLdiqXETJkzg3nvvZcyY/kepjjzySO67775hqkraOt7nKkkFeJ+rJA0jw1WSCjBcJamAAcM1IhZExCMRcWdL29SIWBQR99XPU1qmnRURSyNiSUQcV6pwSWpnW3Pk+i1g9hZtZwLXZeZM4Lr6PRGxPzAXOKBe5h8iYuyQVStJHWLAcM3MG4Atf0vyHOCS+vUlwAkt7Vdk5qbMfABYChw2NKVKUucY7Jjrnpm5CqB+7vltHNOA5S3zdddtLxMRp0TE4ohYPMgaJKltDfU3tHq736vXe1gzcz4wH7zPVdLIM9gj19UR0QVQPz9St3cD01vm2xtYOfjyJKkzDTZcrwLm1a/nAQtb2udGxI4RMQOYCfxq+0qUpM4z4LBARFwOHA3sHhHdwBeAvwOujIgPAw8B7wPIzLsi4krgbuB54LTMfKFQ7ZLUtvzdApK0HfzdApI0jAxXSSrAcJWkAgxXSSrAcJWkAgxXSSrAcJWkAgxXSSrAcJWkAgxXSSrAcJWkAgxXSSrAcJWkAgxXSSrAcJWkAgxXSSrAcJWkAgxXSSrAcJWkAgxXSSrAcJWkAgxXSSrAcJWkAgxXSSrAcJWkAgxXSSrAcJWkAgxXSSrAcJWkAgxXSSrAcJWkAgxXSSrAcJWkAgxXSSrAcJWkAgxXSSrAcJWkAgxXSSrAcJWkAgxXSSrAcJWkAgxXSSrAcJWkAgxXSSrAcJWkAgxXSSrAcJWkAgxXSSrAcJWkAgxXSSrAcJWkAgxXSSrAcJWkAgxXSSrAcJWkAgxXSSrAcJWkAgYM14hYEBGPRMSdLW1nR8SKiLitfhzfMu2siFgaEUsi4rhShUtSO4vM7H+GiCOBJ4FLM/MNddvZwJOZ+ZUt5t0fuBw4DNgLuBbYLzNfGOAz+i9CktpUZkZv7QMeuWbmDcDarfycOcAVmbkpMx8AllIFrSSNKtsz5vqJiPh1PWwwpW6bBixvmae7bnuZiDglIhZHxOLtqEGS2tJgw/UCYF9gFrAKOK9u7+3wuNdT/sycn5mHZOYhg6xBktrWoMI1M1dn5guZuRn4Jr899e8GprfMujewcvtKlKTOM6hwjYiulrcnAj13ElwFzI2IHSNiBjAT+NX2lShJnWfcQDNExOXA0cDuEdENfAE4OiJmUZ3yLwM+BpCZd0XElcDdwPPAaQPdKSBJI9GAt2INSxHeiiWpQw36VixJ0rYzXCWpAMNVkgowXCWpAMNVkgowXCWpAMNVkgowXCWpAMNVkgowXCWpAMNVkgowXCWpAMNVkgowXCWpAMNVkgowXCWpAMNVkgowXCWpAMNVkgowXCWpAMNVkgowXCWpAMNVkgowXCWpAMNVkgowXCWpAMNVkgowXCWpAMNVkgowXCWpAMNVkgowXCWpAMNVkgowXCWpAMNVkgowXCWpAMNVkgowXCWpAMNVkgowXCWpAMNVkgowXCWpAMNVkgowXCWpAMNVkgowXCWpAMNVkgowXCWpAMNVkgowXCWpAMNVkgowXCWpAMNVkgowXCWpAMNVkgowXCWpgAHDNSKmR8RPI+KeiLgrIj5Zt0+NiEURcV/9PKVlmbMiYmlELImI40pugCS1o8jM/meI6AK6MvPWiJgE3AKcAHwIWJuZfxcRZwJTMvOMiNgfuBw4DNgLuBbYLzNf6Ocz+i9CktpUZkZv7QMeuWbmqsy8tX79BHAPMA2YA1xSz3YJVeBSt1+RmZsy8wFgKVXQStKosU1jrhGxD3AQ8Etgz8xcBVUAA3vUs00Dlrcs1l23bbmuUyJicUQsHkTdktTWxm3tjBExEfg+8KnM3BDR65EwQG8TXnban5nzgfn1uh0WkDSibNWRa0TsQBWs387MH9TNq+vx2J5x2Ufq9m5gesviewMrh6ZcSeoMW3O3QAAXAfdk5vktk64C5tWv5wELW9rnRsSOETEDmAn8auhKlqT2tzV3CxwO/By4A9hcN3+Watz1SuBVwEPA+zJzbb3MXwEnA89TDSP8ZIDPcFhAUkfq626BAcN1OBiukjrVoG/FkiRtO8NVkgowXCWpAMNVkgowXCWpAMNVkgowXCWpAMNVkgowXCWpAMNVkgowXCWpAMNVkgowXCWpAMNVkgowXCWpAMNVkgowXCWpAMNVkgowXCWpAMNVkgowXCWpAMNVkgowXCWpAMNVkgowXCWpAMNVkgowXCWpAMNVkgowXCWpAMNVkgowXCWpAMNVkgowXCWpAMNVkgowXCWpAMNVkgowXCWpAMNVkgowXCWpAMNVkgowXCWpAMNVkgowXCWpAMNVkgowXCWpAMNVkgowXCWpAMNVkgowXCWpAMNVkgowXCWpAMNVkgowXCWpAMNVkgowXCWpAMNVkgowXCWpgAHDNSKmR8RPI+KeiLgrIj5Zt58dESsi4rb6cXzLMmdFxNKIWBIRx5XcAElqR5GZ/c8Q0QV0ZeatETEJuAU4Afgj4MnM/MoW8+8PXA4cBuwFXAvsl5kv9PMZ/RchSW0qM6O39gGPXDNzVWbeWr9+ArgHmNbPInOAKzJzU2Y+ACylClpJGjW2acw1IvYBDgJ+WTd9IiJ+HRELImJK3TYNWN6yWDe9hHFEnBIRiyNi8baXLUntbavDNSImAt8HPpWZG4ALgH2BWcAq4LyeWXtZ/GWn/Zk5PzMPycxDtrVoSWp3WxWuEbEDVbB+OzN/AJCZqzPzhczcDHyT3576dwPTWxbfG1g5dCVLUvvbmrsFArgIuCczz29p72qZ7UTgzvr1VcDciNgxImYAM4FfDV3JktT+xm3FPG8D/gS4IyJuq9s+C5wUEbOoTvmXAR8DyMy7IuJK4G7geeC0/u4UkKSRaMBbsYalCG/FktShBn0rliRp2xmuklSA4SpJBRiuklSA4SpJBRiuklSA4SpJBRiuklSA4SpJBRiuklSA4SpJBRiuklSA4SpJBRiuklSA4SpJBRiuklSA4SpJBRiuklSA4SpJBRiuklSA4SpJBRiuklSA4SpJBRiuklSA4SpJBRiuklSA4SpJBRiuklSA4SpJBRiuklSA4SpJBRiuklSA4SpJBRiuklSA4SpJBRiuklSA4SpJBRiuklSA4SpJBRiuklSA4SpJBRiuklSA4SpJBRiuklSA4SpJBRiuklTAuKYLqD0GPFU/j3a7Yz/YBxX7of374Hf6mhCZOZyF9CkiFmfmIU3X0TT7wT7oYT90dh84LCBJBRiuklRAO4Xr/KYLaBP2g33Qw37o4D5omzFXSRpJ2unIVZJGDMNVkgpoPFwjYnZELImIpRFxZtP1DKeIWBYRd0TEbRGxuG6bGhGLIuK++nlK03UOtYhYEBGPRMSdLW19bndEnFXvH0si4rhmqh5affTB2RGxot4fbouI41umjbg+AIiI6RHx04i4JyLuiohP1u2dvz9kZmMPYCxwP/BqYDxwO7B/kzUN8/YvA3bfou3vgTPr12cCX266zgLbfSRwMHDnQNsN7F/vFzsCM+r9ZWzT21CoD84GPtPLvCOyD+pt6wIOrl9PAu6tt7fj94emj1wPA5Zm5m8y81ngCmBOwzU1bQ5wSf36EuCE5kopIzNvANZu0dzXds8BrsjMTZn5ALCUar/paH30QV9GZB8AZOaqzLy1fv0EcA8wjRGwPzQdrtOA5S3vu+u20SKBayLilog4pW7bMzNXQbXjAXs0Vt3w6mu7R9s+8omI+HU9bNBzKjwq+iAi9gEOAn7JCNgfmg7X6KVtNN0b9rbMPBh4B3BaRBzZdEFtaDTtIxcA+wKzgFXAeXX7iO+DiJgIfB/4VGZu6G/WXtrasi+aDtduYHrL+72BlQ3VMuwyc2X9/AjwQ6rTm9UR0QVQPz/SXIXDqq/tHjX7SGauzswXMnMz8E1+e7o7ovsgInagCtZvZ+YP6uaO3x+aDtebgZkRMSMixgNzgasarmlYRMSEiJjU8xo4FriTavvn1bPNAxY2U+Gw62u7rwLmRsSOETEDmAn8qoH6iusJk9qJVPsDjOA+iIgALgLuyczzWyZ1/v7Q9BU14HiqK4T3A3/VdD3DuN2vprrqeTtwV8+2A68ArgPuq5+nNl1rgW2/nOq09zmqI5EP97fdwF/V+8cS4B1N11+wD/4ZuAP4NVWIdI3kPqi363Cq0/pfA7fVj+NHwv7g118lqYCmhwUkaUQyXCWpAMNVkgowXCWpAMNVkgowXCWpAMNVkgr4/zs1kMDX7qheAAAAAElFTkSuQmCC\n",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {
+ "needs_background": "light"
+ },
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "fig = plt.figure(frameon=False, figsize=(7,7))\n",
+ "plt.title('Pretrained CalculatedSpleen')\n",
+ "plt.imshow(test_outputspre[0].cpu().numpy()[1][:,:,200], cmap='Greys_r') #Pretrained model spleen"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 21,
+ "id": "65c68242",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ ""
+ ]
+ },
+ "execution_count": 21,
+ "metadata": {},
+ "output_type": "execute_result"
+ },
+ {
+ "data": {
+ "image/png": "\n",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {
+ "needs_background": "light"
+ },
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "fig = plt.figure(frameon=False, figsize=(7,7))\n",
+ "plt.title('Differences Between Actual and Model')\n",
+ "pretraineddif = test_labelspre[0].cpu().numpy()[1][:,:,200] - test_outputspre[0].cpu().numpy()[1][:,:,200]\n",
+ "plt.imshow(pretraineddif, cmap='Greys_r') #Differences"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "2f60e5b5",
+ "metadata": {},
+ "source": [
+ "#### Using just the pretrained model, it appears we are performing pretty well\n",
+ " - We can now continue to train with our data using the NVIDIA models initial weights"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "c3e40010",
+ "metadata": {},
+ "source": [
+ "## Training\n",
+ "#### Without a GPU, training can take a while\n",
+ "#### Recommend skipping next three cells and load in model"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 22,
+ "id": "a8ad6aee",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "loss_function = DiceFocalLoss(to_onehot_y=True, softmax=True)\n",
+ "optimizer = torch.optim.Adam(model.parameters(), 5e-4)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 23,
+ "id": "d91d340c",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "----------\n",
+ "epoch 1/25\n",
+ "1/16, train_loss: 0.8680\n",
+ "2/16, train_loss: 0.3699\n",
+ "3/16, train_loss: 0.3849\n",
+ "4/16, train_loss: 0.1306\n",
+ "5/16, train_loss: 0.2781\n",
+ "6/16, train_loss: 0.3628\n",
+ "7/16, train_loss: 0.3609\n",
+ "8/16, train_loss: 0.1828\n",
+ "9/16, train_loss: 0.1493\n",
+ "10/16, train_loss: 0.5063\n",
+ "11/16, train_loss: 0.2929\n",
+ "12/16, train_loss: 0.2826\n",
+ "13/16, train_loss: 0.2017\n",
+ "14/16, train_loss: 0.2591\n",
+ "15/16, train_loss: 0.2568\n",
+ "16/16, train_loss: 0.2385\n",
+ "epoch 1 average loss: 0.3203\n",
+ "----------\n",
+ "epoch 2/25\n",
+ "1/16, train_loss: 0.3457\n",
+ "2/16, train_loss: 0.2234\n",
+ "3/16, train_loss: 0.3443\n",
+ "4/16, train_loss: 0.0816\n",
+ "5/16, train_loss: 0.2259\n",
+ "6/16, train_loss: 0.1580\n",
+ "7/16, train_loss: 0.2593\n",
+ "8/16, train_loss: 0.1651\n",
+ "9/16, train_loss: 0.1124\n",
+ "10/16, train_loss: 0.4822\n",
+ "11/16, train_loss: 0.2900\n",
+ "12/16, train_loss: 0.2571\n",
+ "13/16, train_loss: 0.1799\n",
+ "14/16, train_loss: 0.1984\n",
+ "15/16, train_loss: 0.2286\n",
+ "16/16, train_loss: 0.2216\n",
+ "epoch 2 average loss: 0.2359\n",
+ "saved new best metric model\n",
+ "current epoch: 2 current mean dice: 0.8615\n",
+ "best mean dice: 0.8615 at epoch: 2\n",
+ "----------\n",
+ "epoch 3/25\n",
+ "1/16, train_loss: 0.3400\n",
+ "2/16, train_loss: 0.2297\n",
+ "3/16, train_loss: 0.3453\n",
+ "4/16, train_loss: 0.0822\n",
+ "5/16, train_loss: 0.2285\n",
+ "6/16, train_loss: 0.1213\n",
+ "7/16, train_loss: 0.2370\n",
+ "8/16, train_loss: 0.1607\n",
+ "9/16, train_loss: 0.1065\n",
+ "10/16, train_loss: 0.4543\n",
+ "11/16, train_loss: 0.2848\n",
+ "12/16, train_loss: 0.2848\n",
+ "13/16, train_loss: 0.1763\n",
+ "14/16, train_loss: 0.1748\n",
+ "15/16, train_loss: 0.4361\n",
+ "16/16, train_loss: 0.2234\n",
+ "epoch 3 average loss: 0.2429\n",
+ "----------\n",
+ "epoch 4/25\n",
+ "1/16, train_loss: 0.3328\n",
+ "2/16, train_loss: 0.2447\n",
+ "3/16, train_loss: 0.3436\n",
+ "4/16, train_loss: 0.0723\n",
+ "5/16, train_loss: 0.2213\n",
+ "6/16, train_loss: 0.1676\n",
+ "7/16, train_loss: 0.2672\n",
+ "8/16, train_loss: 0.2121\n",
+ "9/16, train_loss: 0.1122\n",
+ "10/16, train_loss: 0.5265\n",
+ "11/16, train_loss: 0.2810\n",
+ "12/16, train_loss: 0.2688\n",
+ "13/16, train_loss: 0.1795\n",
+ "14/16, train_loss: 0.1853\n",
+ "15/16, train_loss: 0.2458\n",
+ "16/16, train_loss: 0.2314\n",
+ "epoch 4 average loss: 0.2433\n",
+ "saved new best metric model\n",
+ "current epoch: 4 current mean dice: 0.8744\n",
+ "best mean dice: 0.8744 at epoch: 4\n",
+ "----------\n",
+ "epoch 5/25\n",
+ "1/16, train_loss: 0.3378\n",
+ "2/16, train_loss: 0.2047\n",
+ "3/16, train_loss: 0.3350\n",
+ "4/16, train_loss: 0.0583\n",
+ "5/16, train_loss: 0.2161\n",
+ "6/16, train_loss: 0.1008\n",
+ "7/16, train_loss: 0.2325\n",
+ "8/16, train_loss: 0.1629\n",
+ "9/16, train_loss: 0.1037\n",
+ "10/16, train_loss: 0.4499\n",
+ "11/16, train_loss: 0.2763\n",
+ "12/16, train_loss: 0.2321\n",
+ "13/16, train_loss: 0.1702\n",
+ "14/16, train_loss: 0.1652\n",
+ "15/16, train_loss: 0.2206\n",
+ "16/16, train_loss: 0.2169\n",
+ "epoch 5 average loss: 0.2177\n",
+ "----------\n",
+ "epoch 6/25\n",
+ "1/16, train_loss: 0.3303\n",
+ "2/16, train_loss: 0.1888\n",
+ "3/16, train_loss: 0.3331\n",
+ "4/16, train_loss: 0.0535\n",
+ "5/16, train_loss: 0.2149\n",
+ "6/16, train_loss: 0.0962\n",
+ "7/16, train_loss: 0.2267\n",
+ "8/16, train_loss: 0.1555\n",
+ "9/16, train_loss: 0.0995\n",
+ "10/16, train_loss: 0.4476\n",
+ "11/16, train_loss: 0.2751\n",
+ "12/16, train_loss: 0.2215\n",
+ "13/16, train_loss: 0.1644\n",
+ "14/16, train_loss: 0.1603\n",
+ "15/16, train_loss: 0.2159\n",
+ "16/16, train_loss: 0.2141\n",
+ "epoch 6 average loss: 0.2123\n",
+ "saved new best metric model\n",
+ "current epoch: 6 current mean dice: 0.8952\n",
+ "best mean dice: 0.8952 at epoch: 6\n",
+ "----------\n",
+ "epoch 7/25\n",
+ "1/16, train_loss: 0.3286\n",
+ "2/16, train_loss: 0.1815\n",
+ "3/16, train_loss: 0.3317\n",
+ "4/16, train_loss: 0.0487\n",
+ "5/16, train_loss: 0.2127\n",
+ "6/16, train_loss: 0.0926\n",
+ "7/16, train_loss: 0.2236\n",
+ "8/16, train_loss: 0.1536\n",
+ "9/16, train_loss: 0.0955\n",
+ "10/16, train_loss: 0.4468\n",
+ "11/16, train_loss: 0.2730\n",
+ "12/16, train_loss: 0.2171\n",
+ "13/16, train_loss: 0.1616\n",
+ "14/16, train_loss: 0.1565\n",
+ "15/16, train_loss: 0.2147\n",
+ "16/16, train_loss: 0.2123\n",
+ "epoch 7 average loss: 0.2094\n",
+ "----------\n",
+ "epoch 8/25\n",
+ "1/16, train_loss: 0.3276\n",
+ "2/16, train_loss: 0.1800\n",
+ "3/16, train_loss: 0.3311\n",
+ "4/16, train_loss: 0.0459\n",
+ "5/16, train_loss: 0.2114\n",
+ "6/16, train_loss: 0.0853\n",
+ "7/16, train_loss: 0.2206\n",
+ "8/16, train_loss: 0.1529\n",
+ "9/16, train_loss: 0.0939\n",
+ "10/16, train_loss: 0.4467\n",
+ "11/16, train_loss: 0.2725\n",
+ "12/16, train_loss: 0.2171\n",
+ "13/16, train_loss: 0.1600\n",
+ "14/16, train_loss: 0.1502\n",
+ "15/16, train_loss: 0.2140\n",
+ "16/16, train_loss: 0.2115\n",
+ "epoch 8 average loss: 0.2075\n",
+ "saved new best metric model\n",
+ "current epoch: 8 current mean dice: 0.8957\n",
+ "best mean dice: 0.8957 at epoch: 8\n",
+ "----------\n",
+ "epoch 9/25\n",
+ "1/16, train_loss: 0.3275\n",
+ "2/16, train_loss: 0.1822\n",
+ "3/16, train_loss: 0.3309\n",
+ "4/16, train_loss: 0.0455\n",
+ "5/16, train_loss: 0.2110\n",
+ "6/16, train_loss: 0.0818\n",
+ "7/16, train_loss: 0.2194\n",
+ "8/16, train_loss: 0.1520\n",
+ "9/16, train_loss: 0.0917\n",
+ "10/16, train_loss: 0.4467\n",
+ "11/16, train_loss: 0.2723\n",
+ "12/16, train_loss: 0.2165\n",
+ "13/16, train_loss: 0.1593\n",
+ "14/16, train_loss: 0.1236\n",
+ "15/16, train_loss: 0.2136\n",
+ "16/16, train_loss: 0.2107\n",
+ "epoch 9 average loss: 0.2053\n",
+ "----------\n",
+ "epoch 10/25\n",
+ "1/16, train_loss: 0.3271\n",
+ "2/16, train_loss: 0.1726\n",
+ "3/16, train_loss: 0.3308\n",
+ "4/16, train_loss: 0.0439\n",
+ "5/16, train_loss: 0.2106\n",
+ "6/16, train_loss: 0.0886\n",
+ "7/16, train_loss: 0.2209\n",
+ "8/16, train_loss: 0.1518\n",
+ "9/16, train_loss: 0.0860\n",
+ "10/16, train_loss: 0.4452\n",
+ "11/16, train_loss: 0.2715\n",
+ "12/16, train_loss: 0.2150\n",
+ "13/16, train_loss: 0.1589\n",
+ "14/16, train_loss: 0.1150\n",
+ "15/16, train_loss: 0.2142\n",
+ "16/16, train_loss: 0.2095\n",
+ "epoch 10 average loss: 0.2038\n",
+ "saved new best metric model\n",
+ "current epoch: 10 current mean dice: 0.8958\n",
+ "best mean dice: 0.8958 at epoch: 10\n",
+ "----------\n",
+ "epoch 11/25\n",
+ "1/16, train_loss: 0.3271\n",
+ "2/16, train_loss: 0.1735\n",
+ "3/16, train_loss: 0.3314\n",
+ "4/16, train_loss: 0.0430\n",
+ "5/16, train_loss: 0.2099\n",
+ "6/16, train_loss: 0.0801\n",
+ "7/16, train_loss: 0.2201\n",
+ "8/16, train_loss: 0.1508\n",
+ "9/16, train_loss: 0.0721\n",
+ "10/16, train_loss: 0.4451\n",
+ "11/16, train_loss: 0.2714\n",
+ "12/16, train_loss: 0.2155\n",
+ "13/16, train_loss: 0.1592\n",
+ "14/16, train_loss: 0.1247\n",
+ "15/16, train_loss: 0.2139\n",
+ "16/16, train_loss: 0.2107\n",
+ "epoch 11 average loss: 0.2030\n",
+ "----------\n",
+ "epoch 12/25\n",
+ "1/16, train_loss: 0.3268\n",
+ "2/16, train_loss: 0.1712\n",
+ "3/16, train_loss: 0.3305\n",
+ "4/16, train_loss: 0.0453\n",
+ "5/16, train_loss: 0.2103\n",
+ "6/16, train_loss: 0.0783\n",
+ "7/16, train_loss: 0.2179\n",
+ "8/16, train_loss: 0.1529\n",
+ "9/16, train_loss: 0.0912\n",
+ "10/16, train_loss: 0.4469\n",
+ "11/16, train_loss: 0.2724\n",
+ "12/16, train_loss: 0.2162\n",
+ "13/16, train_loss: 0.1588\n",
+ "14/16, train_loss: 0.1072\n",
+ "15/16, train_loss: 0.2129\n",
+ "16/16, train_loss: 0.2091\n",
+ "epoch 12 average loss: 0.2030\n",
+ "saved new best metric model\n",
+ "current epoch: 12 current mean dice: 0.9008\n",
+ "best mean dice: 0.9008 at epoch: 12\n",
+ "----------\n",
+ "epoch 13/25\n",
+ "1/16, train_loss: 0.3266\n",
+ "2/16, train_loss: 0.1666\n",
+ "3/16, train_loss: 0.3304\n",
+ "4/16, train_loss: 0.0419\n",
+ "5/16, train_loss: 0.2105\n",
+ "6/16, train_loss: 0.0826\n",
+ "7/16, train_loss: 0.2195\n",
+ "8/16, train_loss: 0.1506\n",
+ "9/16, train_loss: 0.0553\n",
+ "10/16, train_loss: 0.4447\n",
+ "11/16, train_loss: 0.2715\n",
+ "12/16, train_loss: 0.2125\n",
+ "13/16, train_loss: 0.1575\n",
+ "14/16, train_loss: 0.1083\n",
+ "15/16, train_loss: 0.2135\n",
+ "16/16, train_loss: 0.2085\n",
+ "epoch 13 average loss: 0.2000\n",
+ "----------\n",
+ "epoch 14/25\n",
+ "1/16, train_loss: 0.3270\n",
+ "2/16, train_loss: 0.1647\n",
+ "3/16, train_loss: 0.3316\n",
+ "4/16, train_loss: 0.0405\n",
+ "5/16, train_loss: 0.2091\n",
+ "6/16, train_loss: 0.0686\n",
+ "7/16, train_loss: 0.2185\n",
+ "8/16, train_loss: 0.1499\n",
+ "9/16, train_loss: 0.0482\n",
+ "10/16, train_loss: 0.4443\n",
+ "11/16, train_loss: 0.2708\n",
+ "12/16, train_loss: 0.2106\n",
+ "13/16, train_loss: 0.1568\n",
+ "14/16, train_loss: 0.1043\n",
+ "15/16, train_loss: 0.2121\n",
+ "16/16, train_loss: 0.2079\n",
+ "epoch 14 average loss: 0.1978\n",
+ "saved new best metric model\n",
+ "current epoch: 14 current mean dice: 0.9015\n",
+ "best mean dice: 0.9015 at epoch: 14\n",
+ "----------\n",
+ "epoch 15/25\n",
+ "1/16, train_loss: 0.3259\n",
+ "2/16, train_loss: 0.1630\n",
+ "3/16, train_loss: 0.3303\n",
+ "4/16, train_loss: 0.0399\n",
+ "5/16, train_loss: 0.2085\n",
+ "6/16, train_loss: 0.0579\n",
+ "7/16, train_loss: 0.2165\n",
+ "8/16, train_loss: 0.1509\n",
+ "9/16, train_loss: 0.0487\n",
+ "10/16, train_loss: 0.4449\n",
+ "11/16, train_loss: 0.2704\n",
+ "12/16, train_loss: 0.2090\n",
+ "13/16, train_loss: 0.1557\n",
+ "14/16, train_loss: 0.1021\n",
+ "15/16, train_loss: 0.2118\n",
+ "16/16, train_loss: 0.2084\n",
+ "epoch 15 average loss: 0.1965\n",
+ "----------\n",
+ "epoch 16/25\n",
+ "1/16, train_loss: 0.3258\n",
+ "2/16, train_loss: 0.1620\n",
+ "3/16, train_loss: 0.3307\n",
+ "4/16, train_loss: 0.0394\n",
+ "5/16, train_loss: 0.2086\n",
+ "6/16, train_loss: 0.0699\n",
+ "7/16, train_loss: 0.2170\n",
+ "8/16, train_loss: 0.1516\n",
+ "9/16, train_loss: 0.0540\n",
+ "10/16, train_loss: 0.4444\n",
+ "11/16, train_loss: 0.2698\n",
+ "12/16, train_loss: 0.2102\n",
+ "13/16, train_loss: 0.1548\n",
+ "14/16, train_loss: 0.1016\n",
+ "15/16, train_loss: 0.2114\n",
+ "16/16, train_loss: 0.2078\n",
+ "epoch 16 average loss: 0.1974\n",
+ "current epoch: 16 current mean dice: 0.8994\n",
+ "best mean dice: 0.9015 at epoch: 14\n",
+ "----------\n",
+ "epoch 17/25\n",
+ "1/16, train_loss: 0.3255\n",
+ "2/16, train_loss: 0.1636\n",
+ "3/16, train_loss: 0.3300\n",
+ "4/16, train_loss: 0.0399\n",
+ "5/16, train_loss: 0.2085\n",
+ "6/16, train_loss: 0.0483\n",
+ "7/16, train_loss: 0.2150\n",
+ "8/16, train_loss: 0.1506\n",
+ "9/16, train_loss: 0.0446\n",
+ "10/16, train_loss: 0.4445\n",
+ "11/16, train_loss: 0.2692\n",
+ "12/16, train_loss: 0.2077\n",
+ "13/16, train_loss: 0.1515\n",
+ "14/16, train_loss: 0.0980\n",
+ "15/16, train_loss: 0.2110\n",
+ "16/16, train_loss: 0.2076\n",
+ "epoch 17 average loss: 0.1947\n",
+ "----------\n",
+ "epoch 18/25\n",
+ "1/16, train_loss: 0.3255\n",
+ "2/16, train_loss: 0.1614\n",
+ "3/16, train_loss: 0.3297\n",
+ "4/16, train_loss: 0.0381\n",
+ "5/16, train_loss: 0.2081\n",
+ "6/16, train_loss: 0.0422\n",
+ "7/16, train_loss: 0.2152\n",
+ "8/16, train_loss: 0.1485\n",
+ "9/16, train_loss: 0.0415\n",
+ "10/16, train_loss: 0.4442\n",
+ "11/16, train_loss: 0.2690\n",
+ "12/16, train_loss: 0.2070\n",
+ "13/16, train_loss: 0.1515\n",
+ "14/16, train_loss: 0.0980\n",
+ "15/16, train_loss: 0.2112\n",
+ "16/16, train_loss: 0.2068\n",
+ "epoch 18 average loss: 0.1936\n",
+ "current epoch: 18 current mean dice: 0.8991\n",
+ "best mean dice: 0.9015 at epoch: 14\n",
+ "----------\n",
+ "epoch 19/25\n",
+ "1/16, train_loss: 0.3254\n",
+ "2/16, train_loss: 0.1635\n",
+ "3/16, train_loss: 0.3297\n",
+ "4/16, train_loss: 0.0372\n",
+ "5/16, train_loss: 0.2078\n",
+ "6/16, train_loss: 0.0424\n",
+ "7/16, train_loss: 0.2145\n",
+ "8/16, train_loss: 0.1483\n",
+ "9/16, train_loss: 0.0402\n",
+ "10/16, train_loss: 0.4436\n",
+ "11/16, train_loss: 0.2695\n",
+ "12/16, train_loss: 0.2076\n",
+ "13/16, train_loss: 0.1514\n",
+ "14/16, train_loss: 0.1009\n",
+ "15/16, train_loss: 0.2116\n",
+ "16/16, train_loss: 0.2071\n",
+ "epoch 19 average loss: 0.1938\n",
+ "----------\n",
+ "epoch 20/25\n",
+ "1/16, train_loss: 0.3256\n",
+ "2/16, train_loss: 0.1616\n",
+ "3/16, train_loss: 0.3302\n",
+ "4/16, train_loss: 0.0376\n",
+ "5/16, train_loss: 0.2080\n",
+ "6/16, train_loss: 0.0756\n",
+ "7/16, train_loss: 0.2150\n",
+ "8/16, train_loss: 0.1476\n",
+ "9/16, train_loss: 0.0400\n",
+ "10/16, train_loss: 0.4440\n",
+ "11/16, train_loss: 0.2686\n",
+ "12/16, train_loss: 0.2071\n",
+ "13/16, train_loss: 0.1512\n",
+ "14/16, train_loss: 0.0990\n",
+ "15/16, train_loss: 0.2103\n",
+ "16/16, train_loss: 0.2066\n",
+ "epoch 20 average loss: 0.1955\n",
+ "current epoch: 20 current mean dice: 0.8984\n",
+ "best mean dice: 0.9015 at epoch: 14\n",
+ "----------\n",
+ "epoch 21/25\n",
+ "1/16, train_loss: 0.3253\n",
+ "2/16, train_loss: 0.1599\n",
+ "3/16, train_loss: 0.3295\n",
+ "4/16, train_loss: 0.0370\n",
+ "5/16, train_loss: 0.2074\n",
+ "6/16, train_loss: 0.0587\n",
+ "7/16, train_loss: 0.2138\n",
+ "8/16, train_loss: 0.1483\n",
+ "9/16, train_loss: 0.0479\n",
+ "10/16, train_loss: 0.4449\n",
+ "11/16, train_loss: 0.2684\n",
+ "12/16, train_loss: 0.2082\n",
+ "13/16, train_loss: 0.1520\n",
+ "14/16, train_loss: 0.1122\n",
+ "15/16, train_loss: 0.2110\n",
+ "16/16, train_loss: 0.2088\n",
+ "epoch 21 average loss: 0.1958\n",
+ "----------\n",
+ "epoch 22/25\n",
+ "1/16, train_loss: 0.3258\n",
+ "2/16, train_loss: 0.1628\n",
+ "3/16, train_loss: 0.3298\n",
+ "4/16, train_loss: 0.0395\n",
+ "5/16, train_loss: 0.2082\n",
+ "6/16, train_loss: 0.0614\n",
+ "7/16, train_loss: 0.2181\n",
+ "8/16, train_loss: 0.1566\n",
+ "9/16, train_loss: 0.0650\n",
+ "10/16, train_loss: 0.4442\n",
+ "11/16, train_loss: 0.2693\n",
+ "12/16, train_loss: 0.2118\n",
+ "13/16, train_loss: 0.1532\n",
+ "14/16, train_loss: 0.0998\n",
+ "15/16, train_loss: 0.2121\n",
+ "16/16, train_loss: 0.2076\n",
+ "epoch 22 average loss: 0.1978\n",
+ "saved new best metric model\n",
+ "current epoch: 22 current mean dice: 0.9054\n",
+ "best mean dice: 0.9054 at epoch: 22\n",
+ "----------\n",
+ "epoch 23/25\n",
+ "1/16, train_loss: 0.3266\n",
+ "2/16, train_loss: 0.1723\n",
+ "3/16, train_loss: 0.3315\n",
+ "4/16, train_loss: 0.0413\n",
+ "5/16, train_loss: 0.2091\n",
+ "6/16, train_loss: 0.0807\n",
+ "7/16, train_loss: 0.2143\n",
+ "8/16, train_loss: 0.1514\n",
+ "9/16, train_loss: 0.0432\n",
+ "10/16, train_loss: 0.4441\n",
+ "11/16, train_loss: 0.2704\n",
+ "12/16, train_loss: 0.2081\n",
+ "13/16, train_loss: 0.1532\n",
+ "14/16, train_loss: 0.0983\n",
+ "15/16, train_loss: 0.2106\n",
+ "16/16, train_loss: 0.2072\n",
+ "epoch 23 average loss: 0.1976\n",
+ "----------\n",
+ "epoch 24/25\n",
+ "1/16, train_loss: 0.3257\n",
+ "2/16, train_loss: 0.1711\n",
+ "3/16, train_loss: 0.3307\n",
+ "4/16, train_loss: 0.0376\n",
+ "5/16, train_loss: 0.2077\n",
+ "6/16, train_loss: 0.0705\n",
+ "7/16, train_loss: 0.2141\n",
+ "8/16, train_loss: 0.1482\n",
+ "9/16, train_loss: 0.0392\n",
+ "10/16, train_loss: 0.4439\n",
+ "11/16, train_loss: 0.2688\n",
+ "12/16, train_loss: 0.2070\n",
+ "13/16, train_loss: 0.1512\n",
+ "14/16, train_loss: 0.0969\n",
+ "15/16, train_loss: 0.2098\n",
+ "16/16, train_loss: 0.2062\n",
+ "epoch 24 average loss: 0.1955\n",
+ "saved new best metric model\n",
+ "current epoch: 24 current mean dice: 0.9060\n",
+ "best mean dice: 0.9060 at epoch: 24\n",
+ "----------\n",
+ "epoch 25/25\n",
+ "1/16, train_loss: 0.3251\n",
+ "2/16, train_loss: 0.1621\n",
+ "3/16, train_loss: 0.3298\n",
+ "4/16, train_loss: 0.0367\n",
+ "5/16, train_loss: 0.2075\n",
+ "6/16, train_loss: 0.0430\n",
+ "7/16, train_loss: 0.2132\n",
+ "8/16, train_loss: 0.1490\n",
+ "9/16, train_loss: 0.0390\n",
+ "10/16, train_loss: 0.4432\n",
+ "11/16, train_loss: 0.2699\n",
+ "12/16, train_loss: 0.2080\n",
+ "13/16, train_loss: 0.1520\n",
+ "14/16, train_loss: 0.0959\n",
+ "15/16, train_loss: 0.2101\n",
+ "16/16, train_loss: 0.2057\n",
+ "epoch 25 average loss: 0.1931\n",
+ "train completed, best_metric: 0.9060 at epoch: 24\n"
+ ]
+ }
+ ],
+ "source": [
+ "max_epochs = 25\n",
+ "val_interval = 2\n",
+ "num_classes = 2\n",
+ "best_metric = -1\n",
+ "best_metric_epoch = -1\n",
+ "epoch_loss_values = []\n",
+ "metric_values = []\n",
+ "post_pred = Compose([EnsureType(), AsDiscrete(argmax=True, to_onehot=num_classes)])\n",
+ "post_label = Compose([EnsureType(), AsDiscrete(to_onehot=num_classes)])\n",
+ "dice_metric = DiceMetric(include_background=False, reduction=\"mean\", get_not_nans=False)\n",
+ "\n",
+ "for epoch in range(max_epochs):\n",
+ " print(\"-\" * 10)\n",
+ " print(f\"epoch {epoch + 1}/{max_epochs}\")\n",
+ " model.train()\n",
+ " epoch_loss = 0\n",
+ " step = 0\n",
+ " set_determinism(seed=42)\n",
+ " for batch_data in train_loader:\n",
+ " step += 1\n",
+ " inputs, labels = (\n",
+ " batch_data[\"image\"].to(device),\n",
+ " batch_data[\"label\"].to(device),\n",
+ " )\n",
+ " optimizer.zero_grad()\n",
+ " outputs = model(inputs)\n",
+ " loss = loss_function(outputs, labels)\n",
+ " loss.backward()\n",
+ " optimizer.step()\n",
+ " epoch_loss += loss.item()\n",
+ " print(\n",
+ " f\"{step}/{len(train_ds) // train_loader.batch_size}, \"\n",
+ " f\"train_loss: {loss.item():.4f}\")\n",
+ " epoch_loss /= step\n",
+ " epoch_loss_values.append(epoch_loss)\n",
+ " print(f\"epoch {epoch + 1} average loss: {epoch_loss:.4f}\")\n",
+ "\n",
+ " if (epoch + 1) % val_interval == 0:\n",
+ " model.eval()\n",
+ " with torch.no_grad():\n",
+ " set_determinism(seed=42)\n",
+ " for val_data in DataLoader(val_ds, batch_size=1, num_workers=2):\n",
+ " val_inputs, val_labels = (\n",
+ " val_data[\"image\"].to(device),\n",
+ " val_data[\"label\"].to(device),\n",
+ " )\n",
+ " roi_size = (160, 160, 160)\n",
+ " sw_batch_size = 4\n",
+ " val_outputs = sliding_window_inference(\n",
+ " val_inputs, roi_size, sw_batch_size, model, overlap=0.5)\n",
+ " val_outputs = [post_pred(i) for i in decollate_batch(val_outputs)]\n",
+ " val_labels = [post_label(i) for i in decollate_batch(val_labels)]\n",
+ " dice_metric(y_pred=val_outputs, y=val_labels)\n",
+ " metric = dice_metric.aggregate().item()\n",
+ " dice_metric.reset()\n",
+ " metric_values.append(metric)\n",
+ " if metric > best_metric:\n",
+ " best_metric = metric\n",
+ " best_metric_epoch = epoch + 1\n",
+ " torch.save(model.state_dict(), os.path.join(\n",
+ " root_dir, \"Spleen_best_metric_model_pretrained.pth\"))\n",
+ " print(\"saved new best metric model\")\n",
+ " print(\n",
+ " f\"current epoch: {epoch + 1} current mean dice: {metric:.4f}\"\n",
+ " f\"\\nbest mean dice: {best_metric:.4f} \"\n",
+ " f\"at epoch: {best_metric_epoch}\"\n",
+ " )\n",
+ "print(\n",
+ " f\"train completed, best_metric: {best_metric:.4f} \"\n",
+ " f\"at epoch: {best_metric_epoch}\")"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 24,
+ "id": "5cf1fd04",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "image/png": "\n",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {
+ "needs_background": "light"
+ },
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "plt.figure(\"train\", (12, 6))\n",
+ "plt.subplot(1, 2, 1)\n",
+ "plt.title(\"Epoch Average Loss\")\n",
+ "x = [i + 1 for i in range(len(epoch_loss_values))]\n",
+ "y = epoch_loss_values\n",
+ "plt.xlabel(\"epoch\")\n",
+ "plt.ylim([0.1, 0.7])\n",
+ "plt.plot(x, y)\n",
+ "plt.subplot(1, 2, 2)\n",
+ "plt.title(\"Val Mean Dice\")\n",
+ "x = [val_interval * (i + 1) for i in range(len(metric_values))]\n",
+ "y = metric_values\n",
+ "plt.xlabel(\"epoch\")\n",
+ "plt.ylim([0, 1.0])\n",
+ "plt.plot(x, y)\n",
+ "plt.show()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "4ff0035d",
+ "metadata": {},
+ "source": [
+ "#### The model shows that it has improved fairly quickly over just 25 epochs"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "0499fa93",
+ "metadata": {},
+ "source": [
+ "## Inference\n",
+ "#### Without GPU skip to here to load previously trained best model (without a gpu the training will take a while)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 25,
+ "id": "29441405",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ ""
+ ]
+ },
+ "execution_count": 25,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "model.load_state_dict(torch.load('monai_data/best_metric_model_pretrained.pth'))"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "fab5b4b9",
+ "metadata": {},
+ "source": [
+ "#### With the model loaded let's see if much has changed for our example image"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 26,
+ "id": "94615f38",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "num_classes = 2\n",
+ "post_pred = Compose([EnsureType(), AsDiscrete(argmax=True, to_onehot=num_classes)])\n",
+ "post_label = Compose([EnsureType(), AsDiscrete(to_onehot=num_classes)])\n",
+ "model.eval()\n",
+ "with torch.no_grad():\n",
+ " for data in DataLoader(test_ds, batch_size=1, num_workers=2):\n",
+ " test_inputs, test_labels = (\n",
+ " data[\"image\"].to(device),\n",
+ " data[\"label\"].to(device),\n",
+ " )\n",
+ " roi_size = (160, 160, 160)\n",
+ " sw_batch_size = 4\n",
+ " test_outputs = sliding_window_inference(\n",
+ " test_inputs, roi_size, sw_batch_size, model, overlap=0.5)\n",
+ " test_outputsSpl = [post_pred(i) for i in decollate_batch(test_outputs)]\n",
+ " test_labelsSpl = [post_label(i) for i in decollate_batch(test_labels)]"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 27,
+ "id": "a3f78dd4",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ ""
+ ]
+ },
+ "execution_count": 27,
+ "metadata": {},
+ "output_type": "execute_result"
+ },
+ {
+ "data": {
+ "image/png": "\n",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {
+ "needs_background": "light"
+ },
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "fig = plt.figure(frameon=False, figsize=(7,7))\n",
+ "plt.title('Trained Calculated Spleen')\n",
+ "plt.imshow(test_outputsSpl[0].cpu().numpy()[1][:,:,200], cmap='Greys_r') #Pretrained model spleen"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 28,
+ "id": "a67f89f2",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ ""
+ ]
+ },
+ "execution_count": 28,
+ "metadata": {},
+ "output_type": "execute_result"
+ },
+ {
+ "data": {
+ "image/png": "\n",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {
+ "needs_background": "light"
+ },
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "fig = plt.figure(frameon=False, figsize=(7,7))\n",
+ "plt.title('Differences Between Actual and Model')\n",
+ "traineddif = test_labelsSpl[0].cpu().numpy()[1][:,:,200] - test_outputsSpl[0].cpu().numpy()[1][:,:,200]\n",
+ "plt.imshow(traineddif, cmap='Greys_r') #Differences"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 29,
+ "id": "382c7285",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ ""
+ ]
+ },
+ "execution_count": 29,
+ "metadata": {},
+ "output_type": "execute_result"
+ },
+ {
+ "data": {
+ "image/png": "iVBORw0KGgoAAAANSUhEUgAAAVcAAAGrCAYAAAB0YdR6AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjQuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8rg+JYAAAACXBIWXMAAAsTAAALEwEAmpwYAAAfZElEQVR4nO3de5RedX3v8feXXMbABEzCRRLuECBJixFSrtLC0Roay4ltjxq8AC1HbFVOXWoL6hGilVV6Wi/rrJ5asYABQaQKEj16ABGl1gKGS4CQAgFCMhBzAQK5QEIy3/PHsyc+hJnMZGZ+2XN5v9Z61rP3b1+e7/7Nns/s5/dcJjITSVL/2q3uAiRpKDJcJakAw1WSCjBcJakAw1WSCjBcJakAw7VmEfHPEfG5pvm/iIiVEbE+IiZExCkR8Xg1/64aS1U/i4hvRsQX665jRyIiI+KIHqx3WkS07YqaBgvDtaCIWBoRL0fEuohYGxG/jIg/j4ht/Z6Zf56Zf1OtPwr4MvCOzGzNzOeALwD/WM1/v5YDKSQi5kbEq9UfjvURsTgi/mQntl8aEW8vWWNvRcRBTce1vgqpDU3zp/bz4/2seow3b9f+/ar9tP58PHXPcC3vzMwcCxwMXAZcCFzRxbr7AW8AFjW1HbzdfI9FxMjebLeLfaf6w9EKfBz4VkTsV3NNfZaZyzqOqzo2gDc3tf1bgYd9DDi7YyYiJgAnAqsLPJa6YbjuIpn5YmbOB94LnBMRvwW/eWoYEUcCj1arr42In0bEE8BhwA+qq52WiNgrIq6IiBUR8Uy17YhqX+dGxL9HxFci4nlgbrXNP0TEsmq44Z8jYky1/mkR0RYRn4yIVdU+/7Sj5ogYExFfioinI+LFiPhF07YnVlfiayNiYfOVUVXHk9UV+1MR8f4e9tEtwDrg8KZ9/WFEPNB05X9M1X4NcFBT3/x1RMyLiE9WyydVV2wfqeaPiIjnIyJ2tN9q2cSI+F5ErK7q/x9Ny+ZGxA0RcXV1fIsiYkZPjq8L4yLi/1b7ujsimo/96Ii4rar70Yh4Tzf7uhZ4b8f5AJwF3ARsbtpnS0R8NSKerW5fjYiWpuV/VZ0Hz0bEnzXvfEfn0vYi4sLq/FxX1f62neyXwS8zvRW6AUuBt3fSvgz4i2r6m8AXq+lDgARGdrUP4PvA14E9gH2Be4APV8vOBbYAFwAjgTHAV4H5wHhgLPAD4G+r9U+r1v8CMAqYBWwExlXL/w/wM2ASMAI4GWip5p+r1t8N+P1qfp+qrpeAo6p97A9M66J/5gLfqqYDeCewFnhj1XYssAo4oXr8c6r+aOmib/4M+EE1/T7gCRpXxh3Lbu5uv9Xx3AtcDIym8cftSWBmU82vVMc+Avhb4K4enAsJHLFd2zeB54Hjq5/XtcD11bI9gOXAn1bLjgXW7KAvfwb8d+BW4A+qtnuAk4A24LSq7QvAXTTOnX2AXwJ/Uy07A1gJ/Fb1+Nc1103351JbNX1UVfvEpvP68Lp/H3f573/dBQzl2/a//E3tdwGfraa/SQ/DlcawwSZgTNPys4A7qulzgWVNywLY0HxiV79sT1XTpwEvb/d4q2g8ldytWvbmTuq/ELhmu7ZbaITUHjQC8k+a6+yif+bSuKpaSyPUtwJ/3bT8ax2/+E1tjwK/11n/0rjiXVvV/s/Ah5t+4ecBn+huvzQCd9l2yz4NXNVU80+alk0FXu7BudBVuP5L0/ws4D+r6fcC/7bd+l8HLuli/z+jEa4fAL5NI+Aeq5Y1h+sTwKym7WYCS6vpK4HLmpYd2VF3D8+ljr4+ojqP3g6Mqvv3sK7bYBiTG4om0bhi2VkH07jCXFE9u4VGkCxvWqd5eh9gd+DepvWDxhVXh+cyc0vT/EagFdibxvjvE13U8e6IOLOpbRSNkN8QEe8FPgVcERH/DnwyM/+zi2O6ITM/ABARhwA/jIgXM/Pr1eOcExEXNK0/GpjY2Y4y84mIWA9MB04F/gY4LyKOohGc/7up/q72uxWYGBFrm5aNAJrHSH/dNL0ReENEjNyuH3tq+311jM8eDJywXR0jgWu62d+NwJdoPJPobN2JwNNN80/zm/6cSOOqvXlZh56cSwBk5pKI+DiNP0TTIuIWGn/Ynu2m9iHFcN3FIuJ3aITrL3qx+XIaV6577+AXuflrztbQuPqclpnP7ORjraHx9PdwYGEndVyTmR/qtIDG2Okt1XjcF4Fv0Ai7HcrMpRHxY+BMGldpy4FLM/PSrjbppO3nwH8DRmfmMxHxcxov8owDHmiqv9P9RkTH1djk7uotbDnw88z8/Z3ZKDM3Vn34FzSNXTd5lte+SHpQ1QawAjiwad2DmqZ36lzKzOuA6yJiTxo/y78DPrgThzLo+YLWLhIRe0bEHwLX0xhnfGhn95GZK2iMqX2p2t9uEXF4RPxeF+u30wi2r0TEvlUdkyJiZg8eq53G08QvVy/wjIiIk6oXP74FnBkRM6v2N0TjxbEDImK/iPivEbEHjT8E62lcDXYrIg6gMe7X8Yv/DeDPI+KEaNgjIt4ZEWOr5StpjIk2+znwMeDOav5nNMagf5GZHXXsaL/3AC9VL8iMqY7vt6o/irvSD4EjI+KDETGquv1OREzpwbafoTF0srSTZd8G/mdE7BMRe9MYW/5WtewG4NyImBoRuwOXdGy0M+dSRBwVEf+lOldeoRHKPToHhhLDtbwfRMQ6Glcin6XxPtY/3fEmO3Q2jaewjwAvAN+l8aJRVy4ElgB3RcRLwE9ojMf1xKeAh4Bf0RjG+Dtgt8xcDsym8Uu8msax/RWN82k34JM0roaep/F0/CM7eIz3RvXez+px/h34PEBmLgA+BPxjdaxLaIwrd/hbGkGxNiI+VbX9nMaLLR3h+gsaT2c75ne43yqAz6QxtPAUjSu2fwH26q6z+lNmrgPeAcyh0Ze/ptH/LTvartr22czs6pnRF4EFwIM0frb3VW1k5o9pvGj1Uxp98tPttu3pudRC422Ha6q696VxrgwrUQ1AS5L6kVeuklSA4SpJBRQL14g4o/pkxpKIuKjU40jSQFRkzLX6+N1jND6500bjhYqzMvORfn8wSRqASr3P9XhgSWY+CRAR19N4dbnTcG1tbc0JEyYUKkWSynjuuedYv359dLasVLhO4rWfFGqj8bHCbSLifOB8gPHjx3PRRY4cSBpcLrvssi6XlRpz7SzJXzP+kJmXZ+aMzJzR2trayeqSNHiVCtc2XvsxugP4zUfsJGnIKxWuvwImR8ShETGaxqdM5hd6LEkacIqMuWbmloj4GI2voRsBXJmZvfo2fUkajIp9K1Zm/gj4Uan9S9JA5ie0JKkAw1WSCjBcJakAw1WSCjBcJakAw1WSCjBcJakAw1WSCjBcJakAw1WSCjBcJakAw1WSCjBcJakAw1WSCjBcJakAw1WSCjBcJakAw1WSCjBcJakAw1WSCjBcJakAw1WSCjBcJakAw1WSCjBcJakAw1WSCjBcJakAw1WSCjBcJakAw1WSCjBcJakAw1WSCjBcJakAw1WSCjBcJakAw1WSCjBcJakAw1WSCjBcJakAw1WSCjBcJakAw1WSCjBcJakAw1WSCjBcJakAw1WSCjBcJakAw1WSChhZdwFSX5x++umMGzeOG2+8sdf7WLBgAT/84Q8BuOCCC5gwYUJ/ladhzHDVgDdmzBhOOeWUbfMbN27kl7/8JQArV65kw4YNfdr/Pvvsw6mnngo0gvbwww/niCOO6NM+JcNVA97o0aOZNGnStvnmMH3kkUf6vP+DDz6Ygw8+GIC5c+eybt069tprL/bZZ58+71vDl+GqAe/FF19k3rx5u+SxIoKf/vSnLFq0iEsuuWSXPKaGJsNVtRs9ejTvf//7iQieeeYZbrnlltpq+cQnPsEdd9zB/fffX1sNGhoMV9XmxBNPJDNZsGABixcvJiJ4/vnna61p7Nix7L777rXWoKHBcFVt9txzT9rb29m6dSt33XVX3eVI/cpw1S4REWTma9puvfXWmqqRyuvThwgiYmlEPBQRD0TEgqptfETcFhGPV/fj+qdUDVYRwbnnnstb3vKWukuRdpn++ITW6Zk5PTNnVPMXAbdn5mTg9mpew9CMGTN4+9vfTmZy3333sXz58rpL6pFp06bxnve8p+4yNMiV+PjrbKDjfTPzgHcVeAwNYJnJkiVLePnllxk5sjHytHDhQtasWVNzZT2z//77M2XKlLrL0CDX1zHXBG6NiAS+npmXA/tl5gqAzFwREfv2tUgNPvPmzWP16tWcdNJJdZci1aKv4XpKZj5bBehtEfGfPd0wIs4HzgcYP358H8vQQHPJJZew225+L5CGrz6d/Zn5bHW/CrgJOB5YGRH7A1T3q7rY9vLMnJGZM1pbW/tShgaAM888k2OPPRZovIA1cuRIw1XDWq/P/ojYIyLGdkwD7wAeBuYD51SrnQPc3NciNfBt3ryZLVu21F2GNGD0ZVhgP+CmiOjYz3WZ+f8i4lfADRFxHrAMeHffy9RAV+dHVqWBqNfhmplPAm/upP054G19KUqSBjsHxSSpAMNVkgowXNVjo0ePZq+99qq7jH7R3t7Oiy++yNatW+suRUOU4aoemz59Ou9+99B4fXLjxo185jOfYcWKFXWXoiHKb8VSjz388MM8/fTTdZfRL3bffXc+97nP+c8IVYzhqh7buHEjGzdurLuMPrn33nsZM2YMU6dO5U1velPd5WgIc1hAO9Te3s7q1at59dVX6y6l11555RVWr15NZrJo0SKWLl1ad0kaBgxX7dDmzZv5/Oc/z7Jly+oupVc6vqHr0ksvpb29nbPPPptZs2bVXZaGAYcF1KVp06YxY8YMNm/ePGj/r9R3vvMdNm/ezNy5c/2uA+1Shqu6tHr1ah577DH22muv1/2LlsHimGOOAeCNb3xjvYVo2DFc1anVq1ezZs0aVq3q9EvNBo2pU6fWXYKGKZ8n6TUyk/b2dm688Ubmz59fdznSoOWVq15j69atXHzxxcyZM4cjjzyy7nKkQcsrV73Ohg0bGDlyJG94wxvqLmWnzJo1iyOOOKLuMiTAcNV2IoKjjjpqUL47YPfdd2f06NF1lyEBDgtoOy0tLXzkIx+pu4xe+e53v1t3CdI2Xrlqm6OPPppzzz2XUaNG1V2KNOgZrtqmra2N//iP/xhU/wvrhBNOYObMmXWXIb2OwwLaZv369SxatKjuMnZKe3s77e3tdZchvY7hOsxlJps2bWLUqFGMGDGi7nJ2SktLC/fdd59feK0ByWGBYS4zufjii1m4cGHdpey0973vfZxwwgl1lyF1yivXYezXv/411157Leeddx4TJ06su5ydFhFU/9pdGnAM12Hqqaeeoq2tjQMPPJBDDjmElpaWukvaaW1tbTz33HN1lyF1ynAdpu69916eeOIJLrzwwrpL6bVbb7217hKkLjnmKkkFGK6SVIDDAsPUlClTBuWLWNJgYbgOM5nJCy+8wGGHHcaYMWPqLmenjBs3jk2bNg36/0Cr4cFhgWHo7//+77n//vvrLmOnzZ49mxNPPLHuMqQe8cpVg8bNN9/Mpk2b6i5D6hHDVYPGCy+8UHcJUo85LDAMTZw4cVB+GbY0mHjlOsxEBBdccEHdZUhDnleuklSA4aoB5YgjjuCMM86ouwypzwxXDRgHHHAAb3rTmxg7dmzdpUh95pirBoxTTz2V9evX86//+q91lyL1mVeuklSAV67qlZkzZ9La2rpt/vbbb2ft2rV92ue9997rhwQ0ZBiuw0xm8vjjjzNhwgQmTJjQ7fojRoxgypQpr2tvaWmhtbV12/tlp0yZwrp163j11Vd59NFHe1XbY489BsDIkSM5+uijeeKJJ3j55Zd7tS+pbobrMHTVVVdx5plncvLJJ3e5zsiRjVNjxIgRnHTSSa/7dyo//vGPOeigg5g2bRoA06dPB2Djxo29DtcOLS0tnHzyybz44ossX768T/uS6mK4qlOzZ8+mvb2dm266iauuuup1y7ds2UJbWxt33303AB/4wAf67Vu2NmzYwJVXXsmWLVv6ZX9SHQzXYehDH/oQixcv5qtf/epr2idPnsw73/lOAO6++24yE6DLkMvMbcvuvPNOjjnmGPbdd1/OOuss5s+fz4YNG3pdo8Gqwc5wHWYigsMOO4wXXnjhdV+E0vwCVVtb207t98knn2Ts2LHbAvm3f/u3efrpp1mxYkXfi5YGIcN1mDruuOM47rjj+nWfCxcuZOHChQCcffbZjBw5krVr1/qilIYl3+eqIq6++mr23HNP/viP/7juUqRaeOWqYu666y5aWlqICObMmcODDz7IokWL6i5L2iW8clUxzz//PCtWrCAzWb16NRs2bGD8+PGcfPLJjBgxou7ypKIMV+0SP/nJT1i6dCkTJkzgqKOOYu+992bUqFF1lyUVY7hql3r88ce55pprmD17NkcffXTd5UjFOOaqXW7Lli18//vf56STTuK4445j69atXHfddWzdurXu0qR+Y7iqFqtWrWLZsmW89NJLAJx++unb2h988MHXrPvyyy9z55138ta3vpU99thjl9cq9Ybhqtrcf//9AIwePZo5c+YQEYwePZo1a9a8Zr3nnnuOu+++mxkzZhiuGjQMV9Vu8+bNXH311QAcf/zxzJ49+zXLX3nlFb+KUIOO4aoB5YEHHtj21YMd2tvba6pG6j3DVQPK5s2b2bx5c91lSH3mW7EkqYBuwzUiroyIVRHxcFPb+Ii4LSIer+7HNS37dEQsiYhHI2JmqcIlaSDryZXrN4Ht/5H8RcDtmTkZuL2aJyKmAnOAadU2/xQRfs5R0rDTbbhm5p3A89s1zwbmVdPzgHc1tV+fmZsy8ylgCXB8/5QqSYNHb8dc98vMFQDV/b5V+ySg+Z8etVVtrxMR50fEgohYsH79+l6WIUkDU3+/oBWdtGVnK2bm5Zk5IzNnNH8DviQNBb0N15URsT9Adb+qam8DDmxa7wDg2d6XJ0mDU2/DdT5wTjV9DnBzU/uciGiJiEOBycA9fStRkgafbj9EEBHfBk4D9o6INuAS4DLghog4D1gGvBsgMxdFxA3AI8AW4KOZ6VcdSRp2ug3XzDyri0Vv62L9S4FL+1KUJA12fkJLkgowXCWpAMNVkgowXCWpAMNVkgowXCWpAMNVkgowXCWpAMNVkgowXCWpAMNVkgowXCWpAMNVkgowXCWpAMNVkgowXCWpAMNVkgowXCWpAMNVkgowXCWpAMNVkgowXCWpAMNVkgowXCWpAMNVkgowXCWpAMNVkgowXCWpAMNVkgowXCWpAMNVkgowXCWpAMNVkgowXCWpAMNVkgowXCWpAMNVkgowXCWpAMNVkgowXCWpAMNVkgowXCWpAMNVkgowXCWpAMNVkgowXCWpAMNVkgowXCWpAMNVkgowXCWpAMNVkgowXCWpAMNVkgowXCWpAMNVkgowXCWpAMNVkgroNlwj4sqIWBURDze1zY2IZyLigeo2q2nZpyNiSUQ8GhEzSxUuSQNZT65cvwmc0Un7VzJzenX7EUBETAXmANOqbf4pIkb0V7GSNFh0G66ZeSfwfA/3Nxu4PjM3ZeZTwBLg+D7UJ0mDUl/GXD8WEQ9WwwbjqrZJwPKmddqqtteJiPMjYkFELFi/fn0fypCkgae34fo14HBgOrAC+FLVHp2sm53tIDMvz8wZmTmjtbW1l2VI0sDUq3DNzJWZuTUz24Fv8Jun/m3AgU2rHgA827cSJWnw6VW4RsT+TbN/BHS8k2A+MCciWiLiUGAycE/fSpSkwWdkdytExLeB04C9I6INuAQ4LSKm03jKvxT4MEBmLoqIG4BHgC3ARzNza5HKJWkA6zZcM/OsTpqv2MH6lwKX9qUoSRrs/ISWJBVguEpSAYarJBVguEpSAYarJBVguEpSAYarJBVguEpSAYarJBVguEpSAYarJBVguEpSAYarJBVguEpSAYarJBVguEpSAYarJBVguEpSAYarJBVguEpSAYarJBVguEpSAYarJBVguEpSAYarJBVguEpSAYarJBVguEpSAYarJBVguEpSAYarJBVguEpSAYarJBVguEpSAYarJBVguEpSAYarJBVguEpSAYarJBVguEpSAYarJBVguEpSAYarJBVguEpSAYarJBVguEpSAYarJBVguEpSAYarJBVguEpSAYarJBVguEpSAYarJBVguEpSAYarJBVguEpSAd2Ga0QcGBF3RMTiiFgUEX9ZtY+PiNsi4vHqflzTNp+OiCUR8WhEzCx5AJI0EPXkynUL8MnMnAKcCHw0IqYCFwG3Z+Zk4PZqnmrZHGAacAbwTxExokTxkjRQdRuumbkiM++rptcBi4FJwGxgXrXaPOBd1fRs4PrM3JSZTwFLgOP7uW5JGtB2asw1Ig4B3gLcDeyXmSugEcDAvtVqk4DlTZu1VW3b7+v8iFgQEQvWr1/fi9IlaeDqcbhGRCvwPeDjmfnSjlbtpC1f15B5eWbOyMwZra2tPS1DkgaFHoVrRIyiEazXZuaNVfPKiNi/Wr4/sKpqbwMObNr8AODZ/ilXkgaHnrxbIIArgMWZ+eWmRfOBc6rpc4Cbm9rnRERLRBwKTAbu6b+SJWngG9mDdU4BPgg8FBEPVG2fAS4DboiI84BlwLsBMnNRRNwAPELjnQYfzcyt/V24JA1k3YZrZv6CzsdRAd7WxTaXApf2oS5JGtT8hJYkFWC4SlIBhqskFWC4SlIBhqskFWC4SlIBhqskFWC4SlIBhqskFWC4SlIBhqskFWC4SlIBhqskFWC4SlIBhqskFWC4SlIBhqskFWC4SlIBhqskFWC4SlIBhqskFWC4SlIBhqskFWC4SlIBhqskFWC4SlIBhqskFWC4SlIBhqskFWC4SlIBhqskFWC4SlIBhqskFWC4SlIBhqskFWC4SlIBhqskFWC4SlIBhqskFWC4SlIBhqskFWC4SlIBhqskFWC4SlIBhqskFWC4SlIBhqskFWC4SlIBhqskFWC4SlIBhqskFWC4SlIBhqskFWC4SlIBhqskFWC4SlIBhqskFdBtuEbEgRFxR0QsjohFEfGXVfvciHgmIh6obrOatvl0RCyJiEcjYmbJA5CkgWhkD9bZAnwyM++LiLHAvRFxW7XsK5n5D80rR8RUYA4wDZgI/CQijszMrf1ZuCQNZN1euWbmisy8r5peBywGJu1gk9nA9Zm5KTOfApYAx/dHsZI0WOzUmGtEHAK8Bbi7avpYRDwYEVdGxLiqbRKwvGmzNjoJ44g4PyIWRMSC9evX73zlkjSA9ThcI6IV+B7w8cx8CfgacDgwHVgBfKlj1U42z9c1ZF6emTMyc0Zra+vO1i1JA1qPwjUiRtEI1msz80aAzFyZmVszsx34Br956t8GHNi0+QHAs/1XsiQNfD15t0AAVwCLM/PLTe37N632R8DD1fR8YE5EtETEocBk4J7+K1mSBr6evFvgFOCDwEMR8UDV9hngrIiYTuMp/1LgwwCZuSgibgAeofFOg4/6TgFJw0234ZqZv6DzcdQf7WCbS4FL+1CXJA1qfkJLkgowXCWpAMNVkgowXCWpAMNVkgowXCWpAMNVkgowXCWpAMNVkgowXCWpAMNVkgowXCWpAMNVkgowXCWpAMNVkgowXCWpAMNVkgowXCWpAMNVkgowXCWpAMNVkgowXCWpAMNVkgowXCWpAMNVkgowXCWpAMNVkgowXCWpAMNVkgowXCWpAMNVkgowXCWpAMNVkgowXCWpAMNVkgowXCWpAMNVkgowXCWpAMNVkgowXCWpAMNVkgowXCWpAMNVkgowXCWpAMNVkgowXCWpgMjMumsgIlYDG4A1ddcyAOyN/WAfNNgPA78PDs7MfTpbMCDCFSAiFmTmjLrrqJv9YB90sB8Gdx84LCBJBRiuklTAQArXy+suYICwH+yDDvbDIO6DATPmKklDyUC6cpWkIcNwlaQCag/XiDgjIh6NiCURcVHd9exKEbE0Ih6KiAciYkHVNj4ibouIx6v7cXXX2d8i4sqIWBURDze1dXncEfHp6vx4NCJm1lN1/+qiD+ZGxDPV+fBARMxqWjbk+gAgIg6MiDsiYnFELIqIv6zaB//5kJm13YARwBPAYcBoYCEwtc6advHxLwX23q7tfwEXVdMXAX9Xd50Fjvt3gWOBh7s7bmBqdV60AIdW58uIuo+hUB/MBT7VybpDsg+qY9sfOLaaHgs8Vh3voD8f6r5yPR5YkplPZuZm4Hpgds011W02MK+ange8q75SysjMO4Hnt2vu6rhnA9dn5qbMfApYQuO8GdS66IOuDMk+AMjMFZl5XzW9DlgMTGIInA91h+skYHnTfFvVNlwkcGtE3BsR51dt+2XmCmiceMC+tVW3a3V13MPtHPlYRDxYDRt0PBUeFn0QEYcAbwHuZgicD3WHa3TSNpzeG3ZKZh4L/AHw0Yj43boLGoCG0znyNeBwYDqwAvhS1T7k+yAiWoHvAR/PzJd2tGonbQOyL+oO1zbgwKb5A4Bna6pll8vMZ6v7VcBNNJ7erIyI/QGq+1X1VbhLdXXcw+YcycyVmbk1M9uBb/Cbp7tDug8iYhSNYL02M2+smgf9+VB3uP4KmBwRh0bEaGAOML/mmnaJiNgjIsZ2TAPvAB6mcfznVKudA9xcT4W7XFfHPR+YExEtEXEoMBm4p4b6iusIk8of0TgfYAj3QUQEcAWwODO/3LRo8J8Pdb+iBsyi8QrhE8Bn665nFx73YTRe9VwILOo4dmACcDvweHU/vu5aCxz7t2k87X2VxpXIeTs6buCz1fnxKPAHdddfsA+uAR4CHqQRIvsP5T6ojuutNJ7WPwg8UN1mDYXzwY+/SlIBdQ8LSNKQZLhKUgGGqyQVYLhKUgGGqyQVYLhKUgGGqyQV8P8BvCQcf0k9HKQAAAAASUVORK5CYII=\n",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {
+ "needs_background": "light"
+ },
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "fig = plt.figure(frameon=False, figsize=(7,7))\n",
+ "plt.title('Differences Between The Models')\n",
+ "modelsdif = test_outputspre[0].cpu().numpy()[1][:,:,200] - test_outputsSpl[0].cpu().numpy()[1][:,:,200]\n",
+ "plt.imshow(traineddif, cmap='Greys_r') #Differences"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "6606bce2",
+ "metadata": {},
+ "source": [
+ "#### We see not much has changed, which is a good sign for how well the NVIDIA model performs out of the box."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "5cfd20c6",
+ "metadata": {},
+ "source": [
+ "#### Here is the final image of our Spleen"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 30,
+ "id": "91e83d40",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ ""
+ ]
+ },
+ "execution_count": 30,
+ "metadata": {},
+ "output_type": "execute_result"
+ },
+ {
+ "data": {
+ "image/png": "\n",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {
+ "needs_background": "light"
+ },
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "maskedspleen = np.ma.masked_where(test_outputsSpl[0].cpu().numpy()[1][:,:,200] == 0, test_outputsSpl[0].cpu().numpy()[1][:,:,200])\n",
+ "fig = plt.figure(frameon=False, figsize=(10,10))\n",
+ "plt.imshow(np.rot90(test_ds[0]['image'][0][:,:,200]), cmap='Greys_r')\n",
+ "plt.imshow(np.rot90(maskedspleen), cmap='viridis', alpha=1.0)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "6030d210",
+ "metadata": {},
+ "source": [
+ "#### Feel free to play around in this notebook or download it and use it where a GPU is accessible"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "896388a1",
+ "metadata": {},
+ "source": [
+ "## Additional Exercise: Use liver segmentation in addition to spleen\n",
+ " - Just need to load liver segmentation from NVIDIA\n",
+ " - While we can't train this model, since we don't have training data, we can use it as a rough estimate"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 31,
+ "id": "657e44a0",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "mmarliver = {\n",
+ " RemoteMMARKeys.ID: \"clara_pt_liver_and_tumor_ct_segmentation_1\",\n",
+ " RemoteMMARKeys.NAME: \"clara_pt_liver_and_tumor_ct_segmentation\",\n",
+ " RemoteMMARKeys.FILE_TYPE: \"zip\",\n",
+ " RemoteMMARKeys.HASH_TYPE: \"md5\",\n",
+ " RemoteMMARKeys.HASH_VAL: None,\n",
+ " RemoteMMARKeys.MODEL_FILE: os.path.join(\"models\", \"model.pt\"),\n",
+ " RemoteMMARKeys.CONFIG_FILE: os.path.join(\"config\", \"config_train.json\"),\n",
+ " RemoteMMARKeys.VERSION: 1,\n",
+ "}"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 32,
+ "id": "a6fb0da7",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "2022-04-27 15:06:54,404 - INFO - Expected md5 is None, skip md5 check for file monai_data/clara_pt_liver_and_tumor_ct_segmentation_1.zip.\n",
+ "2022-04-27 15:06:54,405 - INFO - File exists: monai_data/clara_pt_liver_and_tumor_ct_segmentation_1.zip, skipped downloading.\n",
+ "2022-04-27 15:06:54,425 - INFO - Non-empty folder exists in monai_data/clara_pt_liver_and_tumor_ct_segmentation, skipped extracting.\n",
+ "2022-04-27 15:06:54,426 - INFO - \n",
+ "*** \"clara_pt_liver_and_tumor_ct_segmentation\" available at monai_data/clara_pt_liver_and_tumor_ct_segmentation.\n",
+ "2022-04-27 15:06:54,889 - INFO - *** Model: \n",
+ "2022-04-27 15:06:54,938 - INFO - *** Model params: {'dimensions': 3, 'in_channels': 1, 'out_channels': 3, 'channels': [16, 32, 64, 128, 256], 'strides': [2, 2, 2, 2], 'num_res_units': 2, 'norm': 'batch'}\n",
+ "2022-04-27 15:06:54,950 - INFO - \n",
+ "---\n",
+ "2022-04-27 15:06:54,951 - INFO - For more information, please visit https://ngc.nvidia.com/catalog/models/nvidia:med:clara_pt_liver_and_tumor_ct_segmentation\n",
+ "\n"
+ ]
+ }
+ ],
+ "source": [
+ " try: #MONAI=0.8\n",
+ " unet_model = load_from_mmar(\n",
+ " item = mmarliver['name'], \n",
+ " mmar_dir=root_dir,\n",
+ " map_location=device,\n",
+ " version=mmarliver['version'],\n",
+ " pretrained=True)\n",
+ " except: #MONAI<0.8\n",
+ " unet_model = load_from_mmar(\n",
+ " mmarliver, \n",
+ " mmar_dir=root_dir,\n",
+ " map_location=device,\n",
+ " pretrained=True)\n",
+ " model = unet_model"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 33,
+ "id": "55034354",
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "using a pretrained model.\n",
+ "2022-04-27 15:06:55,931 - INFO - Expected md5 is None, skip md5 check for file monai_data/clara_pt_liver_and_tumor_ct_segmentation_1.zip.\n",
+ "2022-04-27 15:06:55,931 - INFO - File exists: monai_data/clara_pt_liver_and_tumor_ct_segmentation_1.zip, skipped downloading.\n",
+ "2022-04-27 15:06:55,932 - INFO - Non-empty folder exists in monai_data/clara_pt_liver_and_tumor_ct_segmentation, skipped extracting.\n",
+ "2022-04-27 15:06:55,933 - INFO - \n",
+ "*** \"clara_pt_liver_and_tumor_ct_segmentation\" available at monai_data/clara_pt_liver_and_tumor_ct_segmentation.\n",
+ "2022-04-27 15:06:55,962 - INFO - *** Model: \n",
+ "2022-04-27 15:06:56,010 - INFO - *** Model params: {'dimensions': 3, 'in_channels': 1, 'out_channels': 3, 'channels': [16, 32, 64, 128, 256], 'strides': [2, 2, 2, 2], 'num_res_units': 2, 'norm': 'batch'}\n",
+ "2022-04-27 15:06:56,023 - INFO - \n",
+ "---\n",
+ "2022-04-27 15:06:56,024 - INFO - For more information, please visit https://ngc.nvidia.com/catalog/models/nvidia:med:clara_pt_liver_and_tumor_ct_segmentation\n",
+ "\n"
+ ]
+ }
+ ],
+ "source": [
+ "device = torch.device(\"cuda:0\" if torch.cuda.is_available() else \"cpu\")\n",
+ "\n",
+ "print(\"using a pretrained model.\")\n",
+ "try: #MONAI=0.8\n",
+ " unet_model = load_from_mmar(\n",
+ " item = mmarliver['name'], \n",
+ " mmar_dir=root_dir,\n",
+ " map_location=device,\n",
+ " version=mmarliver['version'],\n",
+ " pretrained=True)\n",
+ "except: #MONAI<0.8\n",
+ " unet_model = load_from_mmar(\n",
+ " mmarliver, \n",
+ " mmar_dir=root_dir,\n",
+ " map_location=device,\n",
+ " pretrained=True)\n",
+ "model = unet_model.to(device)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 34,
+ "id": "a79c1731",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "num_classesP=3\n",
+ "num_classesL=2\n",
+ "post_pred = Compose([EnsureType(), AsDiscrete(argmax=True, to_onehot=num_classesP)])\n",
+ "post_label = Compose([EnsureType(), AsDiscrete(to_onehot=num_classesL)])\n",
+ "model.eval()\n",
+ "with torch.no_grad():\n",
+ " for data in DataLoader(test_ds, batch_size=1, num_workers=2):\n",
+ " test_inputs, test_labels = (\n",
+ " data[\"image\"].to(device),\n",
+ " data[\"label\"].to(device),\n",
+ " )\n",
+ " roi_size = (160, 160, 160)\n",
+ " sw_batch_size = 4\n",
+ " test_outputs = sliding_window_inference(\n",
+ " test_inputs, roi_size, sw_batch_size, model, overlap=0.5)\n",
+ " test_outputsliv = [post_pred(i) for i in decollate_batch(test_outputs)] # Decollate our results\n",
+ " test_labelsliv = [post_label(i) for i in decollate_batch(test_labels)]"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 35,
+ "id": "c0956706",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ ""
+ ]
+ },
+ "execution_count": 35,
+ "metadata": {},
+ "output_type": "execute_result"
+ },
+ {
+ "data": {
+ "image/png": "\n",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {
+ "needs_background": "light"
+ },
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "sliceval = 215\n",
+ "maskedliv = np.ma.masked_where(test_outputsliv[0].cpu().numpy()[1][:,:,sliceval] == 0, test_outputsliv[0].cpu().numpy()[1][:,:,sliceval])\n",
+ "maskedspleen = np.ma.masked_where(test_outputsSpl[0].cpu().numpy()[1][:,:,sliceval] == 0, test_outputsSpl[0].cpu().numpy()[1][:,:,sliceval])\n",
+ "fig = plt.figure(frameon=False, figsize=(7,7))\n",
+ "plt.title('Pretrained Calculated Liver and spleen')\n",
+ "plt.imshow(np.rot90(test_ds[0]['image'][0][:,:,sliceval]), cmap='Greys_r')\n",
+ "plt.imshow(np.rot90(maskedliv), cmap='cividis', alpha=0.75)\n",
+ "plt.imshow(np.rot90(maskedspleen), cmap='viridis', alpha=0.75)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 36,
+ "id": "5bdfdbe9",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ ""
+ ]
+ },
+ "execution_count": 36,
+ "metadata": {},
+ "output_type": "execute_result"
+ },
+ {
+ "data": {
+ "image/png": "\n",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {
+ "needs_background": "light"
+ },
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "sliceval = 110\n",
+ "maskedliv = np.ma.masked_where(test_outputsliv[0].cpu().numpy()[1][:,sliceval,:] == 0, test_outputsliv[0].cpu().numpy()[1][:,sliceval,:])\n",
+ "maskedspleen = np.ma.masked_where(test_outputsSpl[0].cpu().numpy()[1][:,sliceval,:] == 0, test_outputsSpl[0].cpu().numpy()[1][:,sliceval,:])\n",
+ "fig = plt.figure(frameon=False, figsize=(7,7))\n",
+ "plt.title('Pretrained Calculated Liver and Spleen')\n",
+ "plt.imshow(np.rot90(test_ds[0]['image'][0][:,sliceval,:]), cmap='Greys_r')\n",
+ "plt.imshow(np.rot90(maskedliv), cmap='cividis', alpha=0.75)\n",
+ "plt.imshow(np.rot90(maskedspleen), cmap='viridis', alpha=0.75)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "af1169b6",
+ "metadata": {},
+ "source": [
+ "#### Continue including more models found at the NGC Catalog: \n",
+ "#### https://catalog.ngc.nvidia.com/models\n",
+ "##### - Recommend filtering by 'CT' "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "0dce4d55",
+ "metadata": {},
+ "outputs": [],
+ "source": []
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "e17e6228",
+ "metadata": {},
+ "outputs": [],
+ "source": []
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "7034135a",
+ "metadata": {},
+ "outputs": [],
+ "source": []
+ }
+ ],
+ "metadata": {
+ "environment": {
+ "name": "pytorch-gpu.1-9.m75",
+ "type": "gcloud",
+ "uri": "gcr.io/deeplearning-platform-release/pytorch-gpu.1-9:m75"
+ },
+ "kernelspec": {
+ "display_name": "Python 3",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.7.10"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/tutorials/notebooks/SpleenLiverSegmentation/monai_data/Spleen_best_metric_model_pretrained.pth b/tutorials/notebooks/SpleenLiverSegmentation/monai_data/Spleen_best_metric_model_pretrained.pth
new file mode 100644
index 0000000..61ed04c
Binary files /dev/null and b/tutorials/notebooks/SpleenLiverSegmentation/monai_data/Spleen_best_metric_model_pretrained.pth differ
diff --git a/tutorials/notebooks/elasticBLAST/run_elastic_blast.ipynb b/tutorials/notebooks/elasticBLAST/run_elastic_blast.ipynb
new file mode 100644
index 0000000..7c63a6e
--- /dev/null
+++ b/tutorials/notebooks/elasticBLAST/run_elastic_blast.ipynb
@@ -0,0 +1,229 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "id": "8c3f3bb2",
+ "metadata": {},
+ "source": [
+ "# Run ElasticBLAST on GCP"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "aee3b229",
+ "metadata": {},
+ "source": [
+ "This notebook is based on the [this tutorial](https://blast.ncbi.nlm.nih.gov/doc/elastic-blast/quickstart-gcp.html). If at any point, you get an API has not been enabled error, go to [this page](https://cloud.google.com/endpoints/docs/openapi/enable-api#console), click `Go to APIs and Services`, then search for you API and click `Enable`."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "38dfb579",
+ "metadata": {},
+ "source": [
+ "### 1) Install elastic blast"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "d96bb988",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "!pip3 install elastic-blast"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "684e79f6",
+ "metadata": {},
+ "source": [
+ "Test your install, it should print out a version and full help menu."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "2aa11ccc",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "!elastic-blast --version\n",
+ "!elastic-blast --help"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "58b59cb0",
+ "metadata": {},
+ "source": [
+ "### 2) Optionally, create a bucket for this tutorial if one does not yet exist"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "319ff226",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "!gsutil ls gs://elasticblast-${USER} >& /dev/null || gsutil mb gs://elasticblast-${USER}"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "38267c47-029c-4026-8dc1-6020f978e496",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "!gsutil ls gs://elasticblast-jupyter"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "449d7511",
+ "metadata": {},
+ "source": [
+ "### 3) Create a config file that defines the job parameters"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "d0017943-bbd3-472f-a3f8-30d88777f70a",
+ "metadata": {},
+ "source": [
+ "Confirm your user name to include in the config"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "e4bd7452-79ea-4c8b-a13e-b46cff6a5564",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "! echo ${USER}"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "b578c1ea",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "!touch BDQA.ini"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "a1b0a866",
+ "metadata": {},
+ "source": [
+ "Open the config file and add the following:\n",
+ "```\n",
+ "[cloud-provider]\n",
+ "gcp-project = YOUR_GCP_PROJECT_ID\n",
+ "gcp-region = us-east4\n",
+ "gcp-zone = us-east4-c\n",
+ "\n",
+ "[cluster]\n",
+ "num-nodes = 6\n",
+ "num-cpus = 30\n",
+ "labels = owner=jupyter\n",
+ "\n",
+ "[blast]\n",
+ "program = blastp\n",
+ "db = refseq_protein\n",
+ "queries = gs://elastic-blast-samples/queries/protein/BDQA01.1.fsa_aa\n",
+ "results = gs://elasticblast-jupyter/results/BDQA\n",
+ "options = -task blastp-fast -evalue 0.01 -outfmt \"7 std sskingdoms ssciname\"\n",
+ "```\n",
+ "Replace _YOUR_GCP_PROJECT_ID_ with your actual project ID. The default CPUs for the cluster is 16 CPUs, here we set it to 30 to allow enough CPUs per job.\n",
+ "\n",
+ "You can add additional configuration values from [this guide](https://blast.ncbi.nlm.nih.gov/doc/elastic-blast/configuration.html)."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "9a9f8192",
+ "metadata": {},
+ "source": [
+ "### 4) Submit the job"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "398253e8",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "!elastic-blast submit --cfg BDQA.ini"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "9a8e7716",
+ "metadata": {},
+ "source": [
+ "### 5) Check results and troubleshoot"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "94a43c5e",
+ "metadata": {},
+ "source": [
+ "It should take a about 15-20 min to spin up your cluster and start submitting jobs. You can check the status of your job by opening a terminal within this instance, and paste in `elastic-blast status --cfg BDQA.ini`. You can also go to Kubernetes Engine and monitor the health of your cluster, or interact with the pods via `kubectl`. For example, in the terminal you can type `kubectl get pods`, to see your pods, then use `kubectl describe pods ` to get details of a particular pod, and use `kubectl logs ` to view the status of a particular pod. You can also monitor the cloud bucket with `!gsutil ls gs://elasticblast-jupyter/` to see if results files are being written."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "74ce84e2-43db-46b3-b3e5-af401587e28a",
+ "metadata": {},
+ "source": [
+ "### 6) Clean up cloud resources"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "3961d577-72b5-4a06-8597-7c724cf278c5",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "!elastic-blast delete --cfg BDQA.ini"
+ ]
+ }
+ ],
+ "metadata": {
+ "environment": {
+ "kernel": "python3",
+ "name": "common-cpu.m93",
+ "type": "gcloud",
+ "uri": "gcr.io/deeplearning-platform-release/base-cpu:m93"
+ },
+ "kernelspec": {
+ "display_name": "Python 3",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.7.12"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/tutorials/notebooks/ncbi-stat-tutorial/STAT-tutorial.ipynb b/tutorials/notebooks/ncbi-stat-tutorial/STAT-tutorial.ipynb
new file mode 100644
index 0000000..4e77b08
--- /dev/null
+++ b/tutorials/notebooks/ncbi-stat-tutorial/STAT-tutorial.ipynb
@@ -0,0 +1,268 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "id": "1651316c",
+ "metadata": {},
+ "source": [
+ "# Query the NCBI STAT Metadata Tables to Search for Pathogens! "
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "15022f97",
+ "metadata": {},
+ "source": [
+ "## Overview\n",
+ "\n",
+ "DNA sequence data are typically deposited into the [NCBI Sequence Read Archive](https://www.ncbi.nlm.nih.gov/sra). Each FASTQ file is assigned a taxon id (usually a species) defined by [NCBI taxonomy](https://www.ncbi.nlm.nih.gov/taxonomy). So, if you were to search SRA for a particular species, such as _[Mus musculus](https://www.ncbi.nlm.nih.gov/sra/?term=Mus+musculus)_ we can find the files associated with this taxon as defined by the sequence submitter. There are three possible issues with this approach. First, sometime people make mistakes about the taxon id of their sequence. They may have said the sequence was a mouse when it was actually a dog, and you won't know until you have analyzed that sequence. Second, most FASTQ files have mixed DNA sequence due to some level of contamination. If the mouse DNA came from a tail tip, the FASTQ will likely be full of microbial sequence as well as mouse DNA! Finally, many samples in SRA are metagenomic, and so you really have no idea what DNA is in there until you analyze it.\n",
+ "\n",
+ "To address these issues, NCBI came up with a tool called the [SRA Taxonomy Analysis Tool]( https://www.ncbi.nlm.nih.gov/sra/docs/sra-taxonomy-analysis-tool/#:~:text=The%20NCBI%20SRA%20Taxonomy%20Analysis,from%20next%20generation%20sequencing%20runs), or STAT. STAT maps sequencing reads against a precomputed kmer dictionary and assigns reads to the lowest ambiguous taxonomic node (it is based on a known phylogeny). STAT is run for all SRA submissions, and the results are stored in Cloud-based metadata tables that can be queried using BigQuery. This table can then be matched to the SRA metadata tables to get robust information on each Accession. Here we walk through a basic STAT query for MPOX virus and teach you how to create your own queries. "
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "b068c9da-7814-4b24-9ff8-12473048bdcf",
+ "metadata": {},
+ "source": [
+ "### 1) Import libraries"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "d2c3ec94-16cd-43b6-a4c9-d56aa593382e",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Import the biquery api\n",
+ "from google.cloud import bigquery\n",
+ "import pandas as pd"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "243fb23b-ec7e-423f-a531-9f39a1954087",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Designate the client for the API\n",
+ "client = bigquery.Client(location=\"US\")\n",
+ "print(\"Client creating using default project: {}\".format(client.project))"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "7ec384ad-732e-4cb9-b0ff-4b6addb6c2a7",
+ "metadata": {},
+ "source": [
+ "### 2) Define and run our query\n",
+ "Note that we are doing a few things here. First, we are going to query the STAT metadata table (nih-sra-datastore.sra_tax_analysis_tool.tax_analysis) and only get accessions that include Monkeypox virus (tax id = 10244). You could also cast a wider net and filter to Orthopox (10242) since many reads will not map unambiguously to Monkeypox and will be assigned to Orthopox. Second, we are going to JOIN this table with the SRA metadata table (nih-sra-datastore.sra.metadata) on the Accession number, this gives us more information about our record. Finally, we are going to only keep samples with at least 50 reads (total_count) assigned to the target tax id. This means that at least 50 reads need to be assigned to Monkeypox or daughter nodes in the phylogeny. Also we are going to look for records submitted anytime in the past 5 years (INTERVAL parameter)."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "7fb2102a-4f91-4002-a521-db31f7271045",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Define the query\n",
+ "query = \"\"\"\n",
+ "#standardSQL\n",
+ "WITH\n",
+ " span AS (\n",
+ " SELECT\n",
+ " acc,\n",
+ " ileft AS rileft,\n",
+ " iright AS riright\n",
+ " FROM\n",
+ " nih-sra-datastore.sra_tax_analysis_tool.tax_analysis\n",
+ " WHERE\n",
+ " tax_id = 10244 )\n",
+ "SELECT\n",
+ " acc,\n",
+ " name,\n",
+ " total_count,\n",
+ " self_count,\n",
+ " center_name,\n",
+ " librarysource,\n",
+ " organism,\n",
+ " releasedate,\n",
+ " geo_loc_name_sam,\n",
+ " attributes\n",
+ "FROM\n",
+ " nih-sra-datastore.sra_tax_analysis_tool.tax_analysis\n",
+ "JOIN\n",
+ " nih-sra-datastore.sra.metadata\n",
+ "USING\n",
+ " (acc)\n",
+ "JOIN\n",
+ " span\n",
+ "USING\n",
+ " (acc)\n",
+ "WHERE\n",
+ " (ileft>=rileft\n",
+ " AND iright<=riright)\n",
+ " AND (total_count>99\n",
+ " OR organism='wastewater metagenome')\n",
+ " AND CAST(releasedate AS date) > DATE_SUB(CURRENT_DATE(), INTERVAL 120 month)\n",
+ "ORDER BY\n",
+ " releasedate DESC,\n",
+ " acc,\n",
+ " total_count DESC\n",
+ "\"\"\""
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "214bdba1-173f-4ea8-9189-b06711de90c3",
+ "metadata": {},
+ "source": [
+ "Execute the query"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "9722d4d4-3150-47cf-81fd-141de19488ea",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "query_job = client.query(\n",
+ " query,\n",
+ " # Location must match that of the dataset(s) referenced in the query.\n",
+ " location=\"US\",\n",
+ ") # API request - starts the query\n",
+ "\n",
+ "df = query_job.to_dataframe()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "f0d15f46-1f59-479a-9ba7-6b1f412188eb",
+ "metadata": {},
+ "source": [
+ "See how many unique accessions are in the df"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "fcd6b33a-f4ef-452f-b9e8-3b54b86d98b9",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "len(df['acc'].unique())"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "410acf31-3fc1-4394-98eb-92e0993ca5d9",
+ "metadata": {},
+ "source": [
+ "Now view the data frame. You will see that we first have the STAT metadata and then the SRA metadata such as submitter-assigned organism name, library source, release date etc. In the STAT metadata there are a few things to note. First, each tax id (name) with reads assigned from that accession is listed with the accession. So if the accession SRR12345 had reads assigned to both Monkeypox virus and variola virus, then you would see two records of that accession with both virus' listed under name. Next we have the numbers of reads (in NCBI parlance 'spots') assigned to that taxon id. We see two counts: total_count and self_count. Total count refers to the number of reads assigned to that node in the phylogeny (taxon) and all daughter nodes (decendents in the phylogeny). Self count refers to the number of reads assigned to that particular taxon. For example, if we had filtered more broadly to Orthopox (genus) instead of Monkeypox, we would have seen that Orthopox would have a total count of say 100 reads, which includes all reads assigned to any taxon within Orthopox with a self count of say 50. Then Monkeypox could have a total count of 50 and self count of 50 (assuming it is the terminal taxon in the tree with no daughter taxa). These 50 reads would be included in the total count of Orthopox, but not the self count of Orthopox. You can see all the STAT metadata fields [here](https://www.ncbi.nlm.nih.gov/sra/docs/sra-cloud-based-taxonomy-analysis-table/)."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "28f5f9ef-2d19-470d-abaf-ccb4e008b626",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# look at the df\n",
+ "df"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "197596ef-6bad-439f-bf8d-f9e3f2ae481b",
+ "metadata": {},
+ "source": [
+ "Now we have a pandas data frame and you can filter and manipulate as desired."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "b2a264f3-329c-428b-bb58-28826abf444a",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "print(df['organism'].unique())"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "dd4578f8-354d-44b6-9afb-adb6b6278302",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# for example we can look at just wastewater samples \n",
+ "df_filt = df[df['organism'] == 'wastewater metagenome']\n",
+ "df_filt"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "cb2f0320-54e4-499c-8925-57b539d20e34",
+ "metadata": {},
+ "source": [
+ "You can also write to an outfile."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "bb8698e0-21bd-4282-b2cb-04160a33b6f2",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Specify the columns to be written to the text file\n",
+ "columns_to_write = [\"acc\",\"name\",\"total_count\",\"self_count\",\"center_name\",\"librarysource\",\"organism\",\"releasedate\",\"geo_loc_name_sam\"]\n",
+ "\n",
+ "# Write the specified columns to a text file\n",
+ "output_file = 'stat_results_mpox.txt'\n",
+ "df[columns_to_write].to_csv(output_file, sep='\\t', index=False)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "04eabbd9-24dc-4035-8dee-dd6f049a3270",
+ "metadata": {},
+ "source": [
+ "If you want to experiment a bit, rerun the query with a different tax id, modify the total_count, and modify the time Interval and see how your results change. Or, we can run a few more example queries from the [NCBI STAT page](https://www.ncbi.nlm.nih.gov/sra/docs/sra-cloud-based-taxonomy-analysis-table/). "
+ ]
+ }
+ ],
+ "metadata": {
+ "environment": {
+ "kernel": "python3",
+ "name": "tf2-gpu.2-11.m109",
+ "type": "gcloud",
+ "uri": "gcr.io/deeplearning-platform-release/tf2-gpu.2-11:m109"
+ },
+ "kernelspec": {
+ "display_name": "Python 3",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.10.11"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/tutorials/notebooks/pangolin/pangolin_pipeline.ipynb b/tutorials/notebooks/pangolin/pangolin_pipeline.ipynb
new file mode 100644
index 0000000..b46421c
--- /dev/null
+++ b/tutorials/notebooks/pangolin/pangolin_pipeline.ipynb
@@ -0,0 +1,424 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "id": "31e8c3cd",
+ "metadata": {},
+ "source": [
+ "# Pangolin SARS-CoV-2 Pipeline Notebook "
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "56a29212",
+ "metadata": {},
+ "source": [
+ "We are going to run a standard covid bioinformatics pipeline using the Pangolin workflow. https://cov-lineages.org/resources/pangolin/usage.html"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "03541941",
+ "metadata": {},
+ "source": [
+ "### Required software"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 1,
+ "id": "f994b990",
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "env: CPU=4\n"
+ ]
+ }
+ ],
+ "source": [
+ "#change this depending on how many threads are available in your notebook\n",
+ "%env CPU=4"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "f421805e",
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [],
+ "source": [
+ "#install biopython to import packages below\n",
+ "! pip install biopython"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "87c08f11-494f-4aa9-9fe9-61790b68567f",
+ "metadata": {},
+ "source": [
+ "### Install mambaforge\n",
+ "You can also use the default installed conda, but mamba is so much faster! "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "f9d1e2be-9c4a-4fc0-be33-1f3db8f1dce1",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "! curl -L -O https://github.com/conda-forge/miniforge/releases/latest/download/Mambaforge-$(uname)-$(uname -m).sh\n",
+ "! bash Mambaforge-$(uname)-$(uname -m).sh -b -p $HOME/mambaforge"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 1,
+ "id": "cd06e08b-6ba3-49e2-a118-fde1533f89f3",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "#add to your path\n",
+ "import os\n",
+ "os.environ[\"PATH\"] += os.pathsep + os.environ[\"HOME\"]+\"/mambaforge/bin\""
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 17,
+ "id": "fd936fd6",
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "\n",
+ " __ __ __ __\n",
+ " / \\ / \\ / \\ / \\\n",
+ " / \\/ \\/ \\/ \\\n",
+ "███████████████/ /██/ /██/ /██/ /████████████████████████\n",
+ " / / \\ / \\ / \\ / \\ \\____\n",
+ " / / \\_/ \\_/ \\_/ \\ o \\__,\n",
+ " / _/ \\_____/ `\n",
+ " |/\n",
+ " ███╗ ███╗ █████╗ ███╗ ███╗██████╗ █████╗\n",
+ " ████╗ ████║██╔══██╗████╗ ████║██╔══██╗██╔══██╗\n",
+ " ██╔████╔██║███████║██╔████╔██║██████╔╝███████║\n",
+ " ██║╚██╔╝██║██╔══██║██║╚██╔╝██║██╔══██╗██╔══██║\n",
+ " ██║ ╚═╝ ██║██║ ██║██║ ╚═╝ ██║██████╔╝██║ ██║\n",
+ " ╚═╝ ╚═╝╚═╝ ╚═╝╚═╝ ╚═╝╚═════╝ ╚═╝ ╚═╝\n",
+ "\n",
+ " mamba (1.4.2) supported by @QuantStack\n",
+ "\n",
+ " GitHub: https://github.com/mamba-org/mamba\n",
+ " Twitter: https://twitter.com/QuantStack\n",
+ "\n",
+ "█████████████████████████████████████████████████████████████\n",
+ "\n",
+ "\n",
+ "Looking for: ['sra-tools', 'pangolin', 'ete3', 'minimap2']\n",
+ "\n",
+ "\u001b[?25l\u001b[2K\u001b[0G[+] 0.0s\n",
+ "\u001b[2K\u001b[1A\u001b[2K\u001b[0G[+] 0.1s\n",
+ "conda-forge/linux-64 \u001b[90m━━━━━━━━━━━━━╸\u001b[0m\u001b[33m━━━━━━━━━\u001b[0m 0.0 B / ??.?MB @ ??.?MB/s 0.1s\n",
+ "conda-forge/noarch \u001b[33m━━━━━━━━━━━━━━╸\u001b[0m\u001b[90m━━━━━━━━\u001b[0m 0.0 B / ??.?MB @ ??.?MB/s 0.1s\n",
+ "bioconda/linux-64 \u001b[33m━━━━━━━━━━━╸\u001b[0m\u001b[90m━━━━━━━━━━━\u001b[0m 0.0 B / ??.?MB @ ??.?MB/s 0.1s\n",
+ "bioconda/noarch \u001b[90m━━━━━━━━━╸\u001b[0m\u001b[33m━━━━━━━━━━━━━\u001b[0m 0.0 B / ??.?MB @ ??.?MB/s 0.1s\n",
+ "etetoolkit/linux-64 \u001b[90m━━━╸\u001b[0m\u001b[33m━━━━━━━━━━━━━━━╸\u001b[0m\u001b[90m━━━\u001b[0m 0.0 B / ??.?MB @ ??.?MB/s 0.1s\u001b[2K\u001b[1A\u001b[2K\u001b[1A\u001b[2K\u001b[1A\u001b[2K\u001b[1A\u001b[2K\u001b[1A\u001b[2K\u001b[0G[+] 0.2s\n",
+ "conda-forge/linux-64 \u001b[33m━━━━━━━━╸\u001b[0m\u001b[90m━━━━━━━━━━━━━━\u001b[0m 0.0 B / ??.?MB @ ??.?MB/s 0.2s\n",
+ "conda-forge/noarch \u001b[33m━━━━━━━━━━━━━━━╸\u001b[0m\u001b[90m━━━━━━━\u001b[0m 0.0 B / ??.?MB @ ??.?MB/s 0.2s\n",
+ "bioconda/linux-64 \u001b[33m━━━━━━━━━━━━╸\u001b[0m\u001b[90m━━━━━━━━━━\u001b[0m 0.0 B / ??.?MB @ ??.?MB/s 0.2s\n",
+ "bioconda/noarch \u001b[90m━━━━━━━━━━╸\u001b[0m\u001b[33m━━━━━━━━━━━━\u001b[0m 0.0 B / ??.?MB @ ??.?MB/s 0.2s\n",
+ "etetoolkit/linux-64 \u001b[90m━━━━╸\u001b[0m\u001b[33m━━━━━━━━━━━━━━━╸\u001b[0m\u001b[90m━━\u001b[0m 0.0 B / ??.?MB @ ??.?MB/s 0.2s\u001b[2K\u001b[1A\u001b[2K\u001b[1A\u001b[2K\u001b[1A\u001b[2K\u001b[1A\u001b[2K\u001b[1A\u001b[2K\u001b[0Gconda-forge/linux-64 No change\n",
+ "bioconda/noarch No change\n",
+ "[+] 0.3s\n",
+ "conda-forge/noarch \u001b[90m╸\u001b[0m\u001b[33m━━━━━━━━━━━━━━━╸\u001b[0m\u001b[90m━━━━━━━\u001b[0m 0.0 B / ??.?MB @ ??.?MB/s 0.3s\n",
+ "bioconda/linux-64 \u001b[33m━━━━━━━━━━━━━╸\u001b[0m\u001b[90m━━━━━━━━━━\u001b[0m 0.0 B / ??.?MB @ ??.?MB/s 0.3s\n",
+ "etetoolkit/linux-64 \u001b[90m━━━━━━╸\u001b[0m\u001b[33m━━━━━━━━━━━━━━━╸\u001b[0m\u001b[90m━\u001b[0m 0.0 B / ??.?MB @ ??.?MB/s 0.3s\n",
+ "etetoolkit/noarch \u001b[90m━━━━━━━━━━━╸\u001b[0m\u001b[33m━━━━━━━━━━━━\u001b[0m 0.0 B / ??.?MB @ ??.?MB/s 0.1s\n",
+ "pkgs/r/linux-64 \u001b[90m╸\u001b[0m\u001b[33m━━━━━━━━━━━━━━━╸\u001b[0m\u001b[90m━━━━━━━\u001b[0m 0.0 B / ??.?MB @ ??.?MB/s 0.1s\u001b[2K\u001b[1A\u001b[2K\u001b[1A\u001b[2K\u001b[1A\u001b[2K\u001b[1A\u001b[2K\u001b[1A\u001b[2K\u001b[0Gconda-forge/noarch No change\n",
+ "bioconda/linux-64 No change\n",
+ "etetoolkit/noarch No change\n",
+ "etetoolkit/linux-64 No change\n",
+ "pkgs/r/noarch No change\n",
+ "pkgs/main/noarch No change\n",
+ "pkgs/r/linux-64 No change\n",
+ "[+] 0.4s\n",
+ "pkgs/main/linux-64 \u001b[90m━━━━╸\u001b[0m\u001b[33m━━━━━━━━━━━━━━━╸\u001b[0m\u001b[90m━━━━\u001b[0m 0.0 B / ??.?MB @ ??.?MB/s 0.1s\u001b[2K\u001b[1A\u001b[2K\u001b[0Gpkgs/main/linux-64 No change\n",
+ "\u001b[?25h\n",
+ "Pinned packages:\n",
+ " - python 3.10.*\n",
+ "\n",
+ "\n",
+ "Transaction\n",
+ "\n",
+ " Prefix: /opt/conda\n",
+ "\n",
+ " Updating specs:\n",
+ "\n",
+ " - sra-tools\n",
+ " - pangolin\n",
+ " - ete3\n",
+ " - minimap2\n",
+ " - ca-certificates\n",
+ " - certifi\n",
+ " - openssl\n",
+ "\n",
+ "\n",
+ " Package Version Build Channel Size\n",
+ "───────────────────────────────────────────────────────────────\n",
+ " Install:\n",
+ "───────────────────────────────────────────────────────────────\n",
+ "\n",
+ " \u001b[32m+ k8 \u001b[0m 0.2.5 hdcf5f25_4 bioconda/linux-64 2MB\n",
+ " \u001b[32m+ minimap2\u001b[0m 2.26 he4a0461_2 bioconda/linux-64 1MB\n",
+ "\n",
+ " Summary:\n",
+ "\n",
+ " Install: 2 packages\n",
+ "\n",
+ " Total download: 3MB\n",
+ "\n",
+ "───────────────────────────────────────────────────────────────\n",
+ "\n",
+ "\n",
+ "\u001b[?25l\u001b[2K\u001b[0G[+] 0.0s\n",
+ "Downloading \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m 0.0 B 0.0s\n",
+ "Extracting \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m 0 0.0s\u001b[2K\u001b[1A\u001b[2K\u001b[1A\u001b[2K\u001b[0G[+] 0.1s\n",
+ "Downloading (2) \u001b[33m━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m 0.0 B k8 0.0s\n",
+ "Extracting \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m 0 0.0s\u001b[2K\u001b[1A\u001b[2K\u001b[1A\u001b[2K\u001b[0Gk8 1.8MB @ 11.4MB/s 0.2s\n",
+ "minimap2 1.1MB @ 6.7MB/s 0.2s\n",
+ "[+] 0.2s\n",
+ "Downloading ━━━━━━━━━━━━━━━━━━━━━━━ 2.9MB 0.1s\n",
+ "Extracting (2) \u001b[33m━━━━━━━━━━━╸\u001b[0m\u001b[90m━━━━━━━━━━━\u001b[0m 0 k8 0.0s\u001b[2K\u001b[1A\u001b[2K\u001b[1A\u001b[2K\u001b[0G[+] 0.3s\n",
+ "Downloading ━━━━━━━━━━━━━━━━━━━━━━━ 2.9MB 0.1s\n",
+ "Extracting (2) \u001b[33m━━━━━━━━━━━━━╸\u001b[0m\u001b[90m━━━━━━━━━\u001b[0m 0 k8 0.1s\u001b[2K\u001b[1A\u001b[2K\u001b[1A\u001b[2K\u001b[0G[+] 0.4s\n",
+ "Downloading ━━━━━━━━━━━━━━━━━━━━━━━ 2.9MB 0.1s\n",
+ "Extracting (1) ━━━━━━━━━━╸\u001b[33m━━━━━━━━━━━━\u001b[0m 1 k8 0.2s\u001b[2K\u001b[1A\u001b[2K\u001b[1A\u001b[2K\u001b[0G\u001b[?25h\n",
+ "Downloading and Extracting Packages\n",
+ "\n",
+ "Preparing transaction: done\n",
+ "Verifying transaction: done\n",
+ "Executing transaction: done\n"
+ ]
+ }
+ ],
+ "source": [
+ "! mamba install -y -c conda-forge -c bioconda -c etetoolkit sra-tools pangolin ete3 minimap2 -y"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "5a99cf0d",
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [],
+ "source": [
+ "#import libraries\n",
+ "import os\n",
+ "from Bio import SeqIO\n",
+ "from Bio import Entrez"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "dc694629",
+ "metadata": {},
+ "source": [
+ "### Set up your directory structure and remove files from previous runs if they exist"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "8f831fca",
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [],
+ "source": [
+ "if not os.path.exists('pangolin_analysis'):\n",
+ " os.mkdir('pangolin_analysis')\n",
+ "os.chdir('pangolin_analysis')"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "6423ca5d",
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [],
+ "source": [
+ "if os.path.exists('sarscov2_sequences.fasta'):\n",
+ " os.remove('sarscov2_sequences.fasta')\n",
+ "!rm sarscov2_*\n",
+ "!rm lineage_report.csv"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "9d7015e6",
+ "metadata": {},
+ "source": [
+ "### Fetch viral sequences using a list of accession IDs"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "16824bcf",
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [],
+ "source": [
+ "#give a list of accession number for sars sequences\n",
+ "acc_nums=['NC_045512','LR757995','LR757996','OL698718','OL677199','OL672836','MZ914912','MZ916499','MZ908464','MW580573','MW580574','MW580576','MW991906','MW931310','MW932027','MW424864','MW453109','MW453110']\n",
+ "print('the number of sequences we will analyze = ',len(acc_nums))"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "9e382d33",
+ "metadata": {},
+ "source": [
+ "Let this block run without going to the next until it finishes, otherwise you may get an error about too many requests. If that happens, reset your kernel and just rerun everything (except installing software)."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "a28a7122",
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [],
+ "source": [
+ "#use the bio.entrez toolkit within biopython to download the accession numbers\n",
+ "#save those sequences to a single fasta file\n",
+ "Entrez.email = \"email@example.com\" # ell NCBI who you are\n",
+ "filename = \"sarscov2_seqs.fasta\"\n",
+ "if not os.path.isfile(filename):\n",
+ " # Downloading...\n",
+ " for acc in acc_nums:\n",
+ " net_handle = Entrez.efetch(\n",
+ " db=\"nucleotide\", id=acc, rettype=\"fasta\", retmode=\"text\"\n",
+ " )\n",
+ " out_handle = open(filename, \"a\")\n",
+ " out_handle.write(net_handle.read())\n",
+ " out_handle.close()\n",
+ " net_handle.close()\n",
+ " print(\"Saved\",acc)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "56acb7cc",
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [],
+ "source": [
+ "#make sure our fasta file has the same number of seqs as the acc_nums list\n",
+ "print('the number of seqs in our fasta file: ')\n",
+ "! grep '>' sarscov2_seqs.fasta | wc -l"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "8606c352",
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [],
+ "source": [
+ "#let's peek at our new fasta file\n",
+ "! head sarscov2_seqs.fasta"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "2db37b4e",
+ "metadata": {
+ "tags": []
+ },
+ "source": [
+ "### Run pangolin to identify lineages and output alignment\n",
+ "Here we call pangolin, give it our input sequences and the number of threads. We also tell it to output the alignment. The full list of pangolin parameters can be found in the [docs](https://cov-lineages.org/resources/pangolin/usage.html)."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "f277c6c4-a286-4020-8e1c-3941f3da9035",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "! pangolin --help"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "f1a17a74",
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [],
+ "source": [
+ "! pangolin sarscov2_seqs.fasta --threads $CPU"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "b0e56a4b",
+ "metadata": {},
+ "source": [
+ "You can view the output file from pangolin called lineage_report.csv (within pangolin_analysis folder) by double clicking on the file, or by right clicking and downloading. What lineages are present in the dataset? Is Omicron in there?"
+ ]
+ }
+ ],
+ "metadata": {
+ "environment": {
+ "kernel": "python3",
+ "name": "tf2-gpu.2-11.m110",
+ "type": "gcloud",
+ "uri": "gcr.io/deeplearning-platform-release/tf2-gpu.2-11:m110"
+ },
+ "kernelspec": {
+ "display_name": "Python 3",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.10.12"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}