Skip to content

Commit

Permalink
Merge pull request #52 from STRIDES/kao_link_check
Browse files Browse the repository at this point in the history
Added link check workflow and fixed identified broken links
  • Loading branch information
petersonjdNIH authored Dec 18, 2023
2 parents 30c3d15 + 2b404de commit 68a3797
Show file tree
Hide file tree
Showing 55 changed files with 108 additions and 5 deletions.
30 changes: 30 additions & 0 deletions .github/workflows/check-jupyter.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
name: Test Notebook

on:
push:
branches:
- "*"
pull_request:
branches:
- "*"

jobs:
test-notebook:
runs-on: ubuntu-latest

steps:
- name: Checkout repository
uses: actions/checkout@v2

- name: Set up Python
uses: actions/setup-python@v2
with:
python-version: 3.8

- name: Install dependencies
run: |
pip install jupyter
- name: Test notebook
run: |
jupyter nbconvert --to notebook --execute /tutorials/notebooks/LifeSciencesAPI/nextflow/*.ipynb
28 changes: 28 additions & 0 deletions .github/workflows/check_links.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
name: Check Links

on:
push:
branches:
- '*'
pull_request:
branches:
- '*'
jobs:
check-links:
runs-on: ubuntu-latest

steps:
- name: Checkout repository
uses: actions/checkout@v2

- name: Set up Node.js
uses: actions/setup-node@v3
with:
node-version: 16

- name: Install dependencies
run: |
npm install -g markdown-link-check
- name: Check links in Markdown files
run: find . -name '*.md' -print0 | xargs -0 -n1 markdown-link-check -q -c .markdown-link-check.json
43 changes: 43 additions & 0 deletions .markdown-link-check.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
{
"ignorePatterns": [
{
"pattern": "^#"
},
{
"pattern": "^https://iam.nih.gov"
},
{
"pattern": "^https://github.com/conda-forge/miniforge/releases/"
},
{
"pattern": "^https://github.com/david-thrower-nih/DL-gwas-gcp-example"
}
],
"replacementPatterns": [
{
"pattern": "^/docs",
"replacement": "https://github.com/STRIDES/NIHCloudLabGCP/tree/main/docs"
},
{
"pattern": "^/tutorials",
"replacement": "https://github.com/STRIDES/NIHCloudLabGCP/tree/main/tutorials"
},
{
"pattern": "^/images",
"replacement": "https://github.com/STRIDES/NIHCloudLabGCP/tree/main/images"
},
{
"pattern": "^/issues",
"replacement": "https://github.com/STRIDES/NIHCloudLabGCP/issues"
},
{
"pattern": "^/assets",
"replacement": "https://github.com/STRIDES/NIHCloudLabGCP/tree/main/tutorials/notebooks/DL-gwas-gcp-example/assets"
}
],
"timeout": "20s",
"retryOn429": true,
"retryCount": 5,
"fallbackRetryDelay": "30s",
"aliveStatusCodes": [200, 206]
}
6 changes: 3 additions & 3 deletions tutorials/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,14 +31,14 @@ There are a lot of ways to run workflows on GCP. Here we list a few possibilitie
- The simplest method is probably to spin up a Compute Engine instance, and run your command interactively, or using `screen` or, as a [startup script](https://cloud.google.com/compute/docs/instances/startup-scripts/linux) attached as metadata.
- You could also run your pipeline via a Vertex AI notebook, either by splitting out each command as a different block, or by running a workflow manager (Nextflow etc.). [Schedule notebooks](https://codelabs.developers.google.com/vertex_notebook_executor#0) to let them run longer.
You can find a nice tutorial for using managed notebooks [here](https://codelabs.developers.google.com/vertex_notebook_executor#0). Note that there is now a difference between `managed notebooks` and `user managed notebooks`. The `managed notebooks` have more features and can be scheduled, but give you less control about conda environments/install.
- You can interact with [Google Batch](https://cloud.google.com/batch/docs/get-started), or the [Google Life Sciences API](https://cloud.google.com/life-sciences/docs/reference/rest) using a workflow manager like [Nextflow](https://cloud.google.com/life-sciences/docs/tutorials/nextflow), [Snakemake](https://snakemake.readthedocs.io/en/stable/executing/cloud.html), or [Cromwell](https://github.com/GoogleCloudPlatform/rad-lab/tree/main/modules/genomics_cromwell). We currently have example notebooks for both [Nextflow and Snakemake that use the Life Sciences API](/tutorials/notebooks/LifeSciencesAPI/), as well as [Google Batch with Nextflow](/tutorials/notebooks/GooogleBatch/nextflow) as well as a [local version of Snakemake run via Pangolin](/tutorials/notebooks/pangolin).
- You can interact with [Google Batch](https://cloud.google.com/batch/docs/get-started), or the [Google Life Sciences API](https://cloud.google.com/life-sciences/docs/reference/rest) using a workflow manager like [Nextflow](https://cloud.google.com/life-sciences/docs/tutorials/nextflow), [Snakemake](https://snakemake.readthedocs.io/en/stable/executing/cloud.html), or [Cromwell](https://github.com/GoogleCloudPlatform/rad-lab/tree/main/modules/genomics_cromwell). We currently have example notebooks for both [Nextflow and Snakemake that use the Life Sciences API](/tutorials/notebooks/LifeSciencesAPI/), as well as [Google Batch with Nextflow](/tutorials/notebooks/GoogleBatch/nextflow) as well as a [local version of Snakemake run via Pangolin](/tutorials/notebooks/pangolin).
- You may find other APIs better suite your needs such as the [Google Cloud Healthcare Data Engine](https://cloud.google.com/healthcare).
- Most of the notebooks below require just a few CPUs. Start small (maybe 4 CPUs), then scale up as needed. Likewise, when you need a GPU, start with a smaller or older generation GPU (e.g. T4) for testing, then switch to a newer GPU (A100/V100) once you know things will work or you need more horsepower.

## **Artificial Intelligence and Machine Learning** <a name='ml'></a>
Machine learning is a subfield of artificial intelligence that focuses on the development of algorithms and models that enable computers to learn from and make predictions or decisions based on data, without being explicitly programmed. Machine learning on GCP generally occurs within VertexAI. You can learn more about machine learning on GCP at this [Google Crash Course](https://developers.google.com/machine-learning/crash-course). For hands-on examples, try out [this module](https://github.com/NIGMS/COVIDMachineLearningSFSU) developed by San Francisco State University or [this one from the University of Arkasas](https://github.com/NIGMS/MachineLearningUA) developed for the NIGMS Sandbox Project.

Now that the age of **Generative AI** (Gen AI) has arrived, Google has released a host of Gen AI offerings within the Vertex AI suite. Some examples of what generative AI models are capabile of are extracting wanted information from text, transforming speech into text, generating images from describtions and vice versa, and much more. Vertex AI's [Generative AI Studio](https://cloud.google.com/vertex-ai/docs/generative-ai/learn/generative-ai-studio) console allows the user to rapidly create, test, and train generative AI models on the cloud in a safe and secure setting. See our overview in [this tutorial](/tutorials/notebooks/GenAI/GenAIStudioGCP.ipynb). The studio also has ready-to-use models all contained with in the [Model Garden](https://cloud.google.com/vertex-ai/docs/start/explore-models). These models range from foundation models, fine-tunable models, and task-specific solutions. You are also able to use these models within a jupyter notebook and Google provides many generative AI tutorials hosted on [GitHub](https://github.com/GoogleCloudPlatform/generative-ai/tree/main). Some example they provide are [document summarization](https://github.com/GoogleCloudPlatform/generative-ai/blob/main/language/examples/document-summarization/summarization_with_documentai.ipynb), and [Q&A](https://github.com/GoogleCloudPlatform/generative-ai/blob/main/gen-app-builder/retrieval-augmented-generation/examples/question_answering.ipynb).
Now that the age of **Generative AI** (Gen AI) has arrived, Google has released a host of Gen AI offerings within the Vertex AI suite. Some examples of what generative AI models are capabile of are extracting wanted information from text, transforming speech into text, generating images from describtions and vice versa, and much more. Vertex AI's [Generative AI Studio](https://cloud.google.com/vertex-ai/docs/generative-ai/learn/generative-ai-studio) console allows the user to rapidly create, test, and train generative AI models on the cloud in a safe and secure setting. See our overview in [this tutorial](/tutorials/notebooks/GenAI/GenAIStudioGCP.ipynb). The studio also has ready-to-use models all contained with in the [Model Garden](https://cloud.google.com/vertex-ai/docs/start/explore-models). These models range from foundation models, fine-tunable models, and task-specific solutions. You are also able to use these models within a jupyter notebook and Google provides many generative AI tutorials hosted on [GitHub](https://github.com/GoogleCloudPlatform/generative-ai/tree/main). Some example they provide are under [language here](https://github.com/GoogleCloudPlatform/generative-ai/tree/main/language).

We created [this tutorial](/tutorials/notebooks/GenAI/langchain_on_vertex.ipynb) that uses [langchain with Vertex AI](https://python.langchain.com/docs/integrations/llms/google_vertex_ai_palm) to walk you through various Gen AI use cases, including programmatically querying a LLM, creating a custom chatbox that queries a scientific article about menopause in :monkey: (sorry no emoji for chimps), and generates code according to a user-specified prompt.

Expand Down Expand Up @@ -111,7 +111,7 @@ NCBI BLAST (Basic Local Alignment Search Tool) is a widely used bioinformatics p
- We also rewrote [this ElastBLAST tutorial](https://blast.ncbi.nlm.nih.gov/doc/elastic-blast/quickstart-gcp.html) as a [notebook](/tutorials/notebooks/elasticBLAST) that will work in VertexAI.

## **Long Read Sequence Analysis** <a name="long"></a>
Long read DNA sequence analysis involves analyzing sequencing reads typically longer than 10 thousand base pairs (bp) in length, compared with short read sequencing where reads are about 150 bp in length. Oxford Nanopore has a pretty complete offering of notebook tutorials for handling long read data to do a variety of things including variant calling, RNAseq, Sars-Cov-2 analysis and much more. You can find a list and description of notebooks [here](https://labs.epi2me.io/nbindex/), or clone the [GitHub repo](https://github.com/epi2me-labs/tutorials/tree/master/tutorials). Note that these notebooks expect you are running locally and accessing the epi2me notebook server. To run them in Cloud Lab, skip the first cell that connects to the server and then the rest of the notebook should run correctly, with a few tweaks. If you are just looking to try out notebooks, don't start with these. If you are interested in long read sequence analysis, then some troubleshooting may be needed to adapt these to the Cloud Lab environment. You may even need to rewrite them in a fresh notebook by adapting the commands.
Long read DNA sequence analysis involves analyzing sequencing reads typically longer than 10 thousand base pairs (bp) in length, compared with short read sequencing where reads are about 150 bp in length. Oxford Nanopore has a pretty complete offering of notebook tutorials for handling long read data to do a variety of things including variant calling, RNAseq, Sars-Cov-2 analysis and much more. You can find a list and description of notebooks [here](https://labs.epi2me.io/nbindex/), or clone the [GitHub repo](https://github.com/epi2me-labs). Note that these notebooks expect you are running locally and accessing the epi2me notebook server. To run them in Cloud Lab, skip the first cell that connects to the server and then the rest of the notebook should run correctly, with a few tweaks. If you are just looking to try out notebooks, don't start with these. If you are interested in long read sequence analysis, then some troubleshooting may be needed to adapt these to the Cloud Lab environment. You may even need to rewrite them in a fresh notebook by adapting the commands.

## **Drug Discovery** <a name="atom"></a>
The [Accelerating Therapeutics for Opportunities in Medicine (ATOM) Consortium](https://atomscience.org/) created a series of [Jupyter notebooks](https://github.com/ATOMScience-org/AMPL/tree/master/atomsci/ddm/examples/tutorials) that walk you through the ATOM approach to Drug Discovery.
Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
6 changes: 4 additions & 2 deletions tutorials/notebooks/SpleenLiverSegmentation/README.md
Original file line number Diff line number Diff line change
@@ -1,14 +1,16 @@
# Spleen Segmentation with Liver Example using NVIDIA Models and MONAI
_We have put together a training example that segments the Spleen in 3D CT Images. At the end is an example of combining both the Spleen model and the Liver model._

*Nvidia has changed some of the models used in this tutorial and it may crash, if you have issues, try commenting out the liver model, we are working on a patch*

## Introduction
Two pre-trained models from NVIDIA are used in this training, a Spleen model and Liver.
The Spleen model is additionally retrained on the medical decathlon spleen dataset: [http://medicaldecathlon.com/](http://medicaldecathlon.com/)
Data is not necessary to be downloaded to run the notebook. The notebook downloads the data during it's run.
The notebook uses the Python package [MONAI](https://monai.io/), the Medical Open Network for Artificial Intelligence.

- Spleen Model - [clara_pt_spleen_ct_segmentation_V2](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/med/models/clara_pt_spleen_ct_segmentation)
- Liver Model - [clara_pt_liver_and_tumor_ct_segmentation_V1](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/med/models/clara_pt_liver_and_tumor_ct_segmentation)
- Spleen Model - [clara_pt_spleen_ct_segmentation_V2](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/monaitoolkit/models/monai_spleen_ct_segmentation)
- Liver Model - [clara_pt_liver_and_tumor_ct_segmentation_V1]()

## Outcomes
After following along with this notebook the user will be familiar with:
Expand Down

0 comments on commit 68a3797

Please sign in to comment.