-
Notifications
You must be signed in to change notification settings - Fork 14
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #54 from STRIDES/kao_fix_broken_links
Kao fix broken links
- Loading branch information
Showing
4 changed files
with
10 additions
and
9 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -26,7 +26,7 @@ There are a lot of ways to run workflows on AWS. Here we list a few possibilitie | |
|
||
- The most simple is probably to spin up an EC2 instance, and run your command interactively, or using `screen` or, as a [startup script](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/user-data.html) attached as metadata. See the [GWAS tutorial](https://training.nih-cfde.org/en/latest/Bioinformatic-Analyses/GWAS-in-the-cloud) below for more info on how to run a pipeline using EC2. | ||
- You could also run your pipeline via a SageMaker notebook, either by splitting out each command as a different block, or by running a workflow manager (Nextflow etc.). See [here](https://aws.amazon.com/blogs/machine-learning/scheduling-jupyter-notebooks-on-sagemaker-ephemeral-instances/) about scheduling a notebook to let it run longer. You can find some example notebooks in the [tutorials below](/tutorials/notebooks/). | ||
- If you are running bioinformatic workflows, you can leverage the serverless functionality of AWS using [Amazon HealthOmics](https://aws.amazon.com/healthomics/). Read [this blog](https://aws.amazon.com/blogs/industries/automated-end-to-end-genomics-data-storage-and-analysis-using-amazon-omics/) for more detailed information and also see if any new blogs have come out. If you want to get some hands on experience with HealthOmics using Cloud Lab, follow [this on-demand workshop](https://catalog.workshops.aws/amazon-omics-end-to-end/en-US/001-getting-started/010-self-directed-workshop) from Amazon! Since you already have an account set up, skip directly to the _Workshop_ section and then you can decide if you want to complete the tutorial via the console, the CLI, or via Notebooks. If you go the notebook route, just spin up a notebook via [Sagemaker](https://github.com/STRIDES/NIHCloudLabAWS/tree/kao_update_docs#launch-a-sagemaker-notebook-). If you want to create a private workflow using Nextflow, you will need to migrate your containers to a private Amazon Elastic Container Registry (ECR). You can follow [this workshop](https://catalog.us-east-1.prod.workshops.aws/workshops/76d4a4ff-fe6f-436a-a1c2-f7ce44bc5d17/en-US) to learn how that process works. | ||
- If you are running bioinformatic workflows, you can leverage the serverless functionality of AWS using [Amazon HealthOmics](https://aws.amazon.com/healthomics/). Read [this blog](https://aws.amazon.com/blogs/industries/automated-end-to-end-genomics-data-storage-and-analysis-using-amazon-omics/) for more detailed information and also see if any new blogs have come out. If you want to get some hands on experience with HealthOmics using Cloud Lab, follow [this on-demand workshop](https://catalog.workshops.aws/amazon-omics-end-to-end/en-US/001-getting-started/010-self-directed-workshop) from Amazon! Since you already have an account set up, skip directly to the _Workshop_ section and then you can decide if you want to complete the tutorial via the console, the CLI, or via Notebooks. If you go the notebook route, just spin up a notebook via [Sagemaker](/docs/Jupyter_notebook.md). If you want to create a private workflow using Nextflow, you will need to migrate your containers to a private Amazon Elastic Container Registry (ECR). You can follow [this workshop](https://catalog.us-east-1.prod.workshops.aws/workshops/76d4a4ff-fe6f-436a-a1c2-f7ce44bc5d17/en-US) to learn how that process works. | ||
- If you are using a workflow manager other than WDL, Nextflow, or CWL (e. g. Snakemake), use [AWS Genomics CLI](https://aws.amazon.com/genomics-cli/), which is a wrapper for genomics workflow managers and AWS Batch (serverless computing cluster). See our [docs](/docs/agc.md) on how to set up the AGC CLI for Cloud Lab. You can also just run Snakemake locally within a VM. See our [Pangolin tutorial](/tutorials/notebooks/pangolin) for one example. | ||
- Finally, one benefit of the cloud is access to GPUs for workflow acceleration. While a lot of focus on GPU implementation will focus on AI/ML workflows, NVIDIA has software called Parabricks that will accelerate genomic workflows for pretty low costs. See the full list of command line options [here](https://docs.nvidia.com/clara/parabricks/3.7.0/index.html)) to see if your specific workflow is accelerated. The easiest way to run Parabricks right now is via AWS HealthOmics [Ready2Run workflows](https://docs.aws.amazon.com/omics/latest/dev/service-workflows.html), but to run it via EC2 see our [guide](/docs/parabricks.md). | ||
|
||
|
@@ -49,7 +49,6 @@ Genome-wide association studies (GWAS) are large-scale investigations that analy | |
Medical imaging analysis requires the analysis of large image files and often requires elastic storage and accelerated computing. | ||
- Most medical imaging analyses are done using notebooks, so we would recommend accessing this [Jupyter Notebook](/tutorials/notebooks/SpleenLiverSegmentation) and cloning it into SageMaker. The tutorial walks through image segmentation. | ||
- [This Sagemaker Studio on-demand workshop](https://catalog.workshops.aws/hcls-aiml/en-US/chest-xrays-object-detection) has a nice section on building a model on medical imaging data. | ||
- AWS has a nice intro to Machine Learning in a SageMaker notebook that predicts breast cancer from features extracted from image data, which walks you through both image analysis and some of the ML functionality of SageMaker, the notebook is found [here](https://github.com/aws/amazon-sagemaker-examples/blob/main/introduction_to_applying_machine_learning/breast_cancer_prediction/Breast%20Cancer%20Prediction.ipynb). | ||
- You can also view this [AWS blog](https://aws.amazon.com/blogs/machine-learning/annotate-dicom-images-and-build-an-ml-model-using-the-monai-framework-on-amazon-sagemaker/) on how to annotate DICOM images and build a custom AI model with the data. | ||
- You can learn to deidentify medical images following this AWS [tutorial](https://aws.amazon.com/blogs/machine-learning/de-identify-medical-images-with-the-help-of-amazon-comprehend-medical-and-amazon-rekognition/). | ||
|
||
|
@@ -68,7 +67,7 @@ Single-cell RNA sequencing (scRNA-seq) is a technique that enables the analysis | |
NCBI BLAST (Basic Local Alignment Search Tool) is a widely used bioinformatics program provided by the National Center for Biotechnology Information (NCBI) that compares nucleotide or protein sequences against a large database to identify similar sequences and infer evolutionary relationships, functional annotations, and structural information. The NCBI team has written a version of BLAST for the cloud called ElasticBLAST, and you can read all about it [here](https://blast.ncbi.nlm.nih.gov/doc/elastic-blast/index.html). Essentially, ElasticBLAST helps you submit BLAST jobs to AWS Batch and write the results back to S3. Feel free to experiment with the example tutorial in Cloud Shell, or try our [notebook version](/tutorials/notebooks/ElasticBLAST/run_elastic_blast.ipynb). | ||
|
||
## **Protein Folding** <a name="af"></a> | ||
You can run several protein folding algorithms including Alpha Fold on AWS. Because the databases are so large, the setup is normally pretty difficult, but AWS has created a StackFormation stack that automates spinning up all the resources necessary for running Alpha Fold and other protein folding algorithms. You can read about the AWS resources [here](https://aws.amazon.com/solutions/guidance/protein-folding-on-aws/), and view the GitHub page [here](https://github.com/aws-solutions-library-samples/aws-batch-arch-for-protein-folding). To get this to work, you will need to modify your security groups following [these instructions](https://docs.aws.amazon.com/fsx/latest/LustreGuide/limit-access-security-groups.html). You will also likely have to [grant additional permissions to the Role](https://github.com/STRIDES/NIHCloudLabAWS/blob/kao_update_docs/docs/update_sagemaker_role.md) that CloudFormation is using. If you get stuck, reach out to [email protected]. You can also run ESMFold [using this tutorial](https://catalog.workshops.aws/hcls-aiml/en-US/protein-analysis/esmfold). | ||
You can run several protein folding algorithms including Alpha Fold on AWS. Because the databases are so large, the setup is normally pretty difficult, but AWS has created a StackFormation stack that automates spinning up all the resources necessary for running Alpha Fold and other protein folding algorithms. You can read about the AWS resources [here](https://aws.amazon.com/solutions/guidance/protein-folding-on-aws/), and view the GitHub page [here](https://github.com/aws-solutions-library-samples/aws-batch-arch-for-protein-folding). To get this to work, you will need to modify your security groups following [these instructions](https://docs.aws.amazon.com/fsx/latest/LustreGuide/limit-access-security-groups.html). You will also likely have to [grant additional permissions to the Role](/docs/update_sagemaker_role.md) that CloudFormation is using. If you get stuck, reach out to [email protected]. You can also run ESMFold [using this tutorial](https://catalog.workshops.aws/hcls-aiml/en-US/protein-analysis/esmfold). | ||
|
||
## **Long Read Sequence Analysis** <a name="long"></a> | ||
Long read DNA sequence analysis involves analyzing sequencing reads typically longer than 10 thousand base pairs (bp) in length, compared with short read sequencing where reads are about 150 bp in length. | ||
|
@@ -77,7 +76,7 @@ Oxford Nanopore has a pretty complete offering of notebook tutorials for handlin | |
## **Drug Discovery** <a name="atom"></a> | ||
The [Accelerating Therapeutics for Opportunities in Medicine (ATOM) Consortium](https://atomscience.org/) created a series of [Jupyter notebooks](https://github.com/ATOMScience-org/AMPL/tree/master/atomsci/ddm/examples/tutorials) that walk you through the ATOM approach to Drug Discovery. | ||
|
||
These notebooks were created to run in Google Colab, so if you run them in AWS, you will need to make a few modification. First, we recommend you use a [Sagemaker Studio Notebook](https://github.com/STRIDES/NIHCloudLabAWS/blob/kao_update_docs/README.md#launch-a-sagemaker-notebook-) rather than a User-Managed notebook simply because it will have Tensorflow and other dependencies installed. Be sure to attach a GPU to your instance (T4 is fine). Also, you will need to comment out `%tensorflow_version 2.x` since that is a Colab-specific command. You will also need to `pip install` a few packages as needed. If you get errors with `deepchem`, try running `pip install --pre deepchem[tensorflow]` and/or `pip install --pre deepchem[torch]`. Also, some notebooks will require a Tensorflow kernel, while others require Pytorch. You may also run into a Pandas error, reach out to the ATOM GitHub developers for the best solution to this issue. | ||
These notebooks were created to run in Google Colab, so if you run them in AWS, you will need to make a few modification. First, we recommend you use a [Sagemaker Studio Notebook](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-updated.html) rather than a User-Managed notebook simply because it will have Tensorflow and other dependencies installed. Be sure to attach a GPU to your instance (T4 is fine). Also, you will need to comment out `%tensorflow_version 2.x` since that is a Colab-specific command. You will also need to `pip install` a few packages as needed. If you get errors with `deepchem`, try running `pip install --pre deepchem[tensorflow]` and/or `pip install --pre deepchem[torch]`. Also, some notebooks will require a Tensorflow kernel, while others require Pytorch. You may also run into a Pandas error, reach out to the ATOM GitHub developers for the best solution to this issue. | ||
|
||
## **Artificial Intelligence** <a name="ai"></a> | ||
Machine learning is a subfield of artificial intelligence that focuses on the development of algorithms and models that enable computers to learn from and make predictions or decisions based on data, without being explicitly programmed. Artificial intelligence and machine learning algorithms are being applied to a variety of biomedical research questions, ranging from image classification to genomic variant calling. AWS has a long list of AI/ML tutorials available and we have compiled a list here. Most recent development focuses on generative AI including use cases such as extracting information from text, transforming speech to text, and generating images from text. Sagemaker Studio allows the user to rapidly create, test, and train generative AI models and has ready to use models all contained with [JumpStart](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-jumpstart.html). These models range from foundation models, fine-tunable models, and task-specific solutions. | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters