Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update Nextflow documentation before testing #33

Merged
merged 1 commit into from
Apr 27, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file modified .DS_Store
Binary file not shown.
2 changes: 1 addition & 1 deletion docs/17-workspace_faq.md
Original file line number Diff line number Diff line change
Expand Up @@ -87,7 +87,7 @@
> STRIDES program benefits include:
>
> * Cost discounts on AWS services (e.g., compute, storage, and egress fees)
> * Premium-level [AWS Enterprise Support][AWS Enterprise Support]
> * [AWS Enterprise Support][AWS Enterprise Support]
> * [Training and education programs][STRIDES training]
> * and more! See the [STRIDES program benefits][STRIDES benefits] page for more information

Expand Down
Binary file added docs/img/base-image-URLs.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/img/nextflow-config.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/img/scout-quickview.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/img/test-docker-pull.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
70 changes: 31 additions & 39 deletions docs/nextflow-create-docker.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,118 +2,110 @@

# **Create a Dockerfile**

### Overview
### **Overview**

This guide is for users who want to build Docker containers for use in Gen3 workspaces.

### Prerequisites
### **Prerequisites**

- [Docker](https://www.docker.com/get-started/) installed on your local machine
- Clone or download the [`bio-nextflow` repo](https://github.com/uc-cdis/bio-nextflow/tree/master)

## Start with a security-validated base image
## **Start with a security-validated base image**

Gen3 offers a collection of FedRAMP security-compliant base images. Building on these base images makes it easier for your customized Docker image to pass the security scanning.
Gen3 offers a collection of FedRAMP security-compliant base images. We re-assess these base images regularly for security compliance. Building on these base images makes it easier for your customized Docker image to pass the security scanning.

You can access these images on Quay.io, a repository site for Docker images:
You can access the URLs to pull these images using Docker here:

[https://quay.io/repository/cdis/containers?tab=tags](https://quay.io/repository/cdis/containers?tab=tags)
[https://github.com/uc-cdis/containers/blob/eec9789a57c5bb196a91f035e4cb069cfaa5abcd/nextflow-base-images/allowed_base_images.txt](https://github.com/uc-cdis/containers/blob/eec9789a57c5bb196a91f035e4cb069cfaa5abcd/nextflow-base-images/allowed_base_images.txt)

### How to choose your base image
![Screenshot of URLs for secure base images](img/base-image-URLs.png)

### **How to choose your base image**

**GPU vs. CPU**

*Not sure what these are? [Here's a nice overview.](https://blogs.nvidia.com/blog/whats-the-difference-between-a-cpu-and-a-gpu/)*

Some tools you may be using in your workflow can take advantage of GPU capacity for parallel processing. If so - use one of our GPU images. If your workflow is not designed for GPU, use our CPU image.
If your workflow requires GPU (e.g., deep learning or other AI/ML models), please use the GPU instance; otherwise, use CPU.

**GPU images**

We have 2 images in our current selection that offer [CUDA](https://www.turing.com/kb/understanding-nvidia-cuda) support for running on GPUs -- these have "cuda" in the image name, followed by the CUDA version. When possible, please choose the latest version of CUDA compatible with your tools.
We have 3 images in our current selection that offer [CUDA](https://www.turing.com/kb/understanding-nvidia-cuda) support for running on GPUs -- these have "cuda" in the image name, followed by the CUDA version. When possible, please choose the latest version of CUDA compatible with your tools.

> gen3-cuda-12.3-ubuntu22.04-openssl *(preferred)*
>
> gen3-cuda-12.3-torch2.2-ubuntu22.04-openssl *(also preferred)*
>
> gen3-cuda-11.8-ubuntu22.04-openssl *(only use if your tools require a lower version of CUDA)*

**CPU images**

We have one image that is available for running workflows on CPUs.

> gen3-amazonlinux-base-AL2023fix

### Fetch the tag to pull the Docker image

![Fetch tag button](img/quay-fetch-tag.png)

To get a command that will pull your desired Docker image for use in your Dockerfile, click on the Fetch Tag button (labeled #1 above). This will open a pop-up window:

![Fetch tag window](img/quay-fetch-tag2.png)

Click on the dropdown for image format (#2), then select `Docker pull (by tag)` (#3). Click the "Copy Command" button to copy the command to fetch this Docker image. This command will be the first line in your Dockerfile.
> amazonlinux-base

### Test the fetch command
### **Test pulling the Docker image**

Before you proceed with this command in your Dockerfile, you want to make sure you can pull the image. You can verify this by running the fetch tag commmand in your terminal while Docker is running.
Before you proceed with using this URL in your Dockerfile, you want to make sure you can pull the image. You can verify this by running the `docker pull` commmand in your terminal while Docker is running.

First, open your Docker Desktop application (just to be sure Docker is running).

Next, open your terminal. Paste the fetch tag command you copied from Quay. If it's working, you will see language that it is pulling (see below). When it's complete (and successfully pulled), there will be a line that says `Status: Downloaded <image>` (see yellow highlight below). If you see this, you know that all the steps necessary to pull your image work. If you don't see this, reach out to us on Slack.
Next, open your terminal. Run `docker pull <image URL>`, where the image URL is the full line as displayed in the [file of security-validated base images](https://github.com/uc-cdis/containers/blob/eec9789a57c5bb196a91f035e4cb069cfaa5abcd/nextflow-base-images/allowed_base_images.txt). If it's working, you will see language that it is pulling (see below). When it's complete (and successfully pulled), there will be a line that says `Status: Downloaded <image>` (see yellow highlight below). If you see this, you know that all the steps necessary to pull your image work. If you don't see this, reach out to us on Slack.

![Test fetch tag command in terminal](img/test-fetch-tag.png)
![Test docker pull command in terminal](img/test-docker-pull.png)

### Test using Docker Scout to evaluate image vulnerabilities
### **Test using Docker Scout to evaluate image vulnerabilities**

At the end of your test fetch, Docker offers a suggestion to use Docker Scout to examine your image for vulnerabilities (see red box above). We have already evaluated the security compliance for our image, so it's not necessary here. However, since you will want to use Docker Scout to evaluate your custom build later, now is a convenient time to test this tool and make sure you are fully set up to run Docker Scout.

#### Run Docker Scout
#### **Run Docker Scout**

To run Docker Scout, you must:

* have Docker running (for example, the desktop application open)
* be signed in to Docker (in the desktop application, there is a Sign In button in the upper right corner)
* have created a Docker account (when you sign in for the first time, you will be asked to create an account).

Once you are signed in to Docker, you can run the command they suggest after pulling an image (for example, see the command in blue text in the red box above, `docker scout quickview...`). If the command runs successfully, you should see output similar to the screenshot below. This is a summary of the vulnerabilities in your image.
Once you are signed in to Docker, you can run the command they suggest after pulling an image (for example, see the command in blue text in the red box above, `docker scout quickview <image URL>`). If the command runs successfully, you should see output similar to the screenshot below. This is a summary of the vulnerabilities in your image.

![Test Docker Scout command in terminal](img/scout-quickview.png)

You can run the next suggested command (shown in red box above, `docker scout cves...`) to see the full list of vulnerabilities.

Images should be able to pass Gen3 security scanning if there are no Critical vulnerabilities.
**Images should be able to pass Gen3 security scanning if there are no Critical vulnerabilities.**

*Want to know more about Docker Scout? [Check out the documentation](https://docs.docker.com/scout/quickstart/).*

## Build your image locally on top of the base image
## **Build your image locally on top of the base image**

To build your own image, you need to create a Dockerfile. To build your image on a base image, the first line of the Dockerfile should reference the base image tag. The Dockerfile you create typically lives in the Git repository where you have your code to make it easier to copy into your container.

### Unfamiliar with creating Dockerfiles?
### **Unfamiliar with creating Dockerfiles?**

If you are unfamiliar with creating Dockerfiles, we encourage you to explore the [excellent tutorial here](https://medium.com/@anshita.bhasin/a-step-by-step-guide-to-create-dockerfile-9e3744d38d11), as well as review the [Dockerfile documentation here](https://docs.docker.com/develop/develop-images/dockerfile_best-practices/), before you proceed. We have identified only key highlights below.

One reminder: Dockerfiles are typically named just "Dockerfile" - with a capital D and no file extension.

### Example: Build an image with a Dockerfile and a requirements.txt
### **Example: Build an image with a Dockerfile and a requirements.txt**

In our example here, we will have you build your image using a `requirements.txt` to identify the software tools you want to add to the base image, as well as a Dockerfile that pulls in the base image, adds the software tools specified in the requirements file, copies relevant code files, and establishes some setup parameters.

Our example will use the files in the [torch_cuda_test directory](https://github.com/uc-cdis/bio-nextflow/tree/master/nextflow_notebooks/containerized_gpu_workflows/torch_cuda_test) of the bio-nextflow repository. You can review the `readme` file in this directory for more information. It is a simple example that will build up from our base image by adding PyTorch. The Nextflow script will ultimately use a python script that checks the version of CUDA in the GPU instance and checks whether it is compatible with the version of PyTorch and CUDA available in the container.

First, in the terminal, navigate to the directory where you cloned the the `bio-nextflow` repository (see [Prerequisites section](#prerequisites)). Next, navigate to where the downloaded Dockerfile and requirements.txt are located:

>>>
cd bio-nextflow/nextflow_notebooks/containerized_gpu_workflows/torch_cuda_test
>>>
`cd bio-nextflow/nextflow_notebooks/containerized_gpu_workflows/torch_cuda_test`

> Note that the first line of the Dockerfile references the fetch tag for one of our GPU base images. This is always how you will reference a base image -- with `FROM` and the Dockertag.
> If you open the Dockerfile, note that the first line of the Dockerfile references the URL for one of our GPU base images. This is always how you will reference a base image -- with `FROM` and the URL.

Then, run the Docker `build` command. For example:

`docker build . -t my_docker`

will tag the Dockerfile in the directory with the tag `my_docker`, and build a Docker image using the Dockerfile.
will tag the Dockerfile in the directory with the tag `my_docker`, and build a Docker image using the Dockerfile.

### Example: Examine built image for vulnerabilities
### **Example: Examine built image for vulnerabilities**

You now have a new Docker image built upon our security-compliant base image. To more rapidly identify and address any security concerns in your customized image, we encourage all users to locally scan their image for vulnerabilities using Docker Scout, as described [in our test above](#test-using-docker-scout-to-evaluate-image-vulnerabilities). Here, we have tagged our new image with `my_docker`. So, we would run the Docker Scout `quickview` command on the image using this command:

Expand All @@ -123,8 +115,8 @@ And to identify the specific vulnerabilities and recommendations, you would run:

`docker scout cves my_docker`

### My image passes the local security scanning
### **My image passes the local security scanning**

Once your custom image is security-compliant based on the analysis from Docker Scout, you are ready to submit your Docker image for Gen3 security scanning.
Once your custom image is security-compliant based on the analysis from Docker Scout, you are ready to request credentials to submit your Docker image for Gen3 security scanning.

[*Continue to Request Credentials*](./nextflow-request-creds.md)
28 changes: 10 additions & 18 deletions docs/nextflow-getting-started.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,12 @@

# **Getting started with workflows on Gen3**

## **Background**

[![Gen3 logo](img/Gen3-Logo-01-blue.png){: style="height:75px"}](https://gen3.org/)

*Please note: Nextflow features are only available to users with a Direct Pay workspace account. [See our documentation for persistent paymodels](13-workspace_accounts.md) to learn more about getting a Direct Pay workspace account.*

## **Background**

### *What is Gen3?*

The Gen3 platform consists of open-source software services that make up data commons and data ecosystems (also called meshes or fabrics). A data commons is a platform that co-locates both data and compute resources so researchers can bring algorithms to the data. Data ecosystems or meshes are systems that researchers can use to search and query across multiple data commons in one location.
Expand Down Expand Up @@ -40,29 +42,23 @@ To run workflows in Gen3, you will need the following:

#### 1) Request access to the BRH workspace

The BRH exposes a [computational workspace](https://brh.data-commons.org/workspace) that researchers can use to run simple Jupyter notebooks and submit workflows. To submit workflow jobs, you need access to the BRH workspace -- specifically in our QA environment (QA-BRH).
The BRH exposes a [computational workspace](https://brh.data-commons.org/workspace) that researchers can use to run simple Jupyter notebooks and submit workflows. To submit workflow jobs, you need access to the BRH workspace.

Follow [these instructions](https://uc-cdis.github.io/BRH-documentation/05-workspace_registration/#requesting-temporary-trial-access-to-brh-workspace) to request temporary trial access to the QA-BRH workspace (but **note**: request access from the QA site: [https://qa-brh.planx-pla.net/login](https://qa-brh.planx-pla.net/login) and not the production BRH site that is linked in the documentation). After you have submitted your request, please ping `@Sara Volk de Garcia` in Slack to alert her to look for your request and approve it.

**Some notes about QA-BRH**

* QA-BRH is scheduled to spin itself up at 6am CDT (12pm UTC) on weekdays, and shuts down at 9:20pm CDT (02:20am UTC). If QA-BRH is not up, you will not be able to log in or access the workspace to run anything.
* Be sure that your workflow is complete and your data saved by the time QA-BRH is scheduled to shut down for the night; otherwise, you could lose all your unsaved work.
* If QA-BRH is expected to be up, but you are getting a "Bad Gateway" message (or any other trouble) when trying to open the page, reach out to Sara on Slack.
Follow [these instructions](https://uc-cdis.github.io/BRH-documentation/05-workspace_registration/#requesting-temporary-trial-access-to-brh-workspace) to request trial access to the BRH workspace. After you have submitted your request, please ping `@Sara Volk de Garcia` in Slack to alert her to look for your request and approve it.

#### 2) Establish a workspace account with a persistent pay model in BRH

When you initially are granted workspace access in BRH, it is a trial access that is free for the user (paid by CTDS). However, the trial access paymodel does not permit access to the Nextflow image. To gain access to the Nextflow image needed for testing, you must request a workspace account with a persistent paymodel, so that the cost of compute jobs in your project can accrue to the right account. BRH currently supports several persistent pay models such as NIH STRIDES (payment through grant funds) and Direct Pay (credit card payment). If you're curious, see [here](https://uc-cdis.github.io/BRH-documentation/13-workspace_accounts/) for more information about pay models.
When you initially are granted workspace access in BRH, it is a trial access that is free for the user (paid by CTDS). However, the trial access paymodel does not permit access to the Nextflow image. To gain access to the Nextflow image needed for testing, you must request a workspace account with a persistent paymodel, so that the cost of compute jobs in your project can accrue to the right account. BRH currently supports several persistent pay models such as NIH STRIDES (payment through grant funds) and Direct Pay (credit card payment). If you're curious, [see here](https://uc-cdis.github.io/BRH-documentation/13-workspace_accounts/) for more information about pay models.

For MIDRC, we have already established a Direct-Pay-type* of workspace account for testing. When you have workspace access, please contact Ao Liu (`@Ao`) over Slack to request the team to add a Direct Pay account to your workspace (please share with Ao the email you use to log in).
For MIDRC, we have already established a Direct-Pay-type* of workspace account for testing. When you receive workspace access, Sara will work with the Nextflow team to add a Direct Pay account to your workspace.

*\* Note about this Direct-Pay-type of account: It is not an ACTUAL Direct Pay account, and it does not go through the normal Direct Pay account route, nor through OCC, at all. It is funded with MIDRC contract funds, but will be labeled Direct Pay in your workspace.*

### **3) Launch a workspace with the persistent paymodel**

Once you have been notified that you have a workspace account provisioned with persistent paymodel funds, you can proceed.

* Log in to QA-BRH and open the workspace page.
* Log in to BRH and open the workspace page.
* In the dropdown under "Account" in top left, select "Direct Pay" as your paymodel (#1 in screenshot below).
* Once you select the Direct Pay workspace account, you should see a new option for workspace image: "(Beta) Nextflow with CPU instances"
* Click the Launch button for this Nextflow workspace image (#2).
Expand All @@ -83,12 +79,8 @@ Store all files you want to keep after the workspace closes in the `/pd` directo

![Screenshot of /pd and data folders](img/workspace_pd_folder_080422.png){: style="height:400px"}

#### Automated shutdown for the night

**QA-BRH shuts down at 9:20pm CDT** (02:20am UTC). Be sure that your workflow is complete and your data saved by the time QA-BRH is scheduled to shut down for the night; otherwise, you could lose all your unsaved work.

#### Automated shutdown for idle workspaces

Workspaces will automatically be shut down (and all workflows terminated) after **90 minutes of idle time**.

[*Continue to General Workflows*](./nextflow-tutorial-1.md)
[*Continue to Overview of Containers in Gen3*](./nextflow-overview-containers.md)
Loading
Loading