diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml index 554d3c2..f328359 100644 --- a/.pre-commit-config.yaml +++ b/.pre-commit-config.yaml @@ -13,6 +13,7 @@ repos: rev: v4.0.1 hooks: - id: trailing-whitespace + args: [--markdown-linebreak-ext=md] - id: end-of-file-fixer - id: no-commit-to-branch args: [--branch, develop, --branch, master, --pattern, release/.*] diff --git a/.python-version b/.python-version new file mode 100644 index 0000000..511e4ec --- /dev/null +++ b/.python-version @@ -0,0 +1 @@ +ve-3.11.5 diff --git a/docs/img/Gen3-Logo-01-blue.png b/docs/img/Gen3-Logo-01-blue.png new file mode 100644 index 0000000..3868637 Binary files /dev/null and b/docs/img/Gen3-Logo-01-blue.png differ diff --git a/docs/img/MIDRC-credentials.png b/docs/img/MIDRC-credentials.png new file mode 100644 index 0000000..34aa49a Binary files /dev/null and b/docs/img/MIDRC-credentials.png differ diff --git a/docs/img/container-development.png b/docs/img/container-development.png new file mode 100644 index 0000000..0909f83 Binary files /dev/null and b/docs/img/container-development.png differ diff --git a/docs/img/launch-workspace.png b/docs/img/launch-workspace.png new file mode 100644 index 0000000..fb52aa1 Binary files /dev/null and b/docs/img/launch-workspace.png differ diff --git a/docs/img/nextflow.svg b/docs/img/nextflow.svg new file mode 100644 index 0000000..21d1902 --- /dev/null +++ b/docs/img/nextflow.svg @@ -0,0 +1,21 @@ + + + + + + + + + + + + + + + + + + + + + diff --git a/docs/img/quay-fetch-tag.png b/docs/img/quay-fetch-tag.png new file mode 100644 index 0000000..5f43ead Binary files /dev/null and b/docs/img/quay-fetch-tag.png differ diff --git a/docs/img/quay-fetch-tag2.png b/docs/img/quay-fetch-tag2.png new file mode 100644 index 0000000..72301d8 Binary files /dev/null and b/docs/img/quay-fetch-tag2.png differ diff --git a/docs/img/scout-quickview.png b/docs/img/scout-quickview.png new file mode 100644 index 0000000..daad29f Binary files /dev/null and b/docs/img/scout-quickview.png differ diff --git a/docs/img/test-fetch-tag.png b/docs/img/test-fetch-tag.png new file mode 100644 index 0000000..82425a1 Binary files /dev/null and b/docs/img/test-fetch-tag.png differ diff --git a/docs/nextflow-create-docker.md b/docs/nextflow-create-docker.md new file mode 100644 index 0000000..17cd650 --- /dev/null +++ b/docs/nextflow-create-docker.md @@ -0,0 +1,125 @@ +[![Nextflow logo](img/nextflow.svg){: style="height:75px"}](https://www.nextflow.io/) + +# **Create a Dockerfile** + +### Overview + +This guide is for users who want to build Docker containers for use in Gen3 workspaces. + +### Prerequisites + +- [Docker](https://www.docker.com/get-started/) installed on your local machine +- Clone or download the [`bio-nextflow` repo](https://github.com/uc-cdis/bio-nextflow/tree/master) + +## Start with a security-validated base image + +Gen3 offers a collection of FedRAMP security-compliant base images. Building on these base images makes it easier for your customized Docker image to pass the security scanning. + +You can access these images on on Quay.io, a repository site for Docker images: + +[https://quay.io/repository/cdis/containers?tab=tags&tag=latest](https://quay.io/repository/cdis/containers?tab=tags&tag=latest) + +### How to choose your base image + +**GPU vs. CPU** + +*Not sure what these are? [Here's a nice overview.](https://blogs.nvidia.com/blog/whats-the-difference-between-a-cpu-and-a-gpu/)* + +Some tools you may be using in your workflow can take advantage of GPU capacity for parallel processing. If so - use one of our GPU images. If your workflow is not designed for GPU, use our CPU image. + +**GPU images** + +We have 2 images in our current selection that offer [CUDA](https://www.turing.com/kb/understanding-nvidia-cuda) support for running on GPUs -- these have "cuda" in the image name, followed by the CUDA version. When possible, please choose the latest version of CUDA compatible with your tools. + +> gen3-cuda-12.3-ubuntu22.04-openssl *(preferred)* +> gen3-cuda-11.8-ubuntu22.04-openssl *(only use if your tools require a lower version of CUDA)* + +**CPU images** + +We have one image that is available for running workflows on CPUs. + +> gen3-amazonlinux-base-AL2023fix + +### Fetch the tag to pull the Docker image + +![Fetch tag button](img/quay-fetch-tag.png) + +To get a command that will pull your desired Docker image for use in your Dockerfile, click on the Fetch Tag button (labeled #1 above). This will open a pop-up window: + +![Fetch tag window](img/quay-fetch-tag2.png) + +Click on the dropdown for image format (#2), then select `Docker pull (by tag)` (#3). Click the "Copy Command" button to copy the command to fetch this Docker image. This command will be the first line in your Dockerfile. + +### Test the fetch command + +Before you proceed with this command in your Dockerfile, you want to make sure you can pull the image. You can verify this by running the fetch tag commmand in your terminal while Docker is running. + +First, open your Docker Desktop application (just to be sure Docker is running). + +Next, open your terminal. Paste the fetch tag command you copied from Quay. If it's working, you will see language that it is pulling (see below). When it's complete (and successfully pulled), there will be a line that says `Status: Downloaded ` (see yellow highlight below). If you see this, you know that all the steps necessary to pull your image work. If you don't see this, reach out to us on Slack. + +![Test fetch tag command in terminal](img/test-fetch-tag.png) + +### Test using Docker Scout to evaluate image vulnerabilities + +At the end of your test fetch, Docker offers a suggestion to use Docker Scout to examine your image for vulnerabilities (see red box above). We have already evaluated the security compliance for our image, so it's not necessary here. However, since you will want to use Docker Scout to evaluate your custom build later, now is a convenient time to test this tool and make sure you are fully set up to run Docker Scout. + +#### Run Docker Scout + +To run Docker Scout, you must: + +* have Docker running (for example, the desktop application open) +* be signed in to Docker (in the desktop application, there is a Sign In button in the upper right corner) +* have created a Docker account (when you sign in for the first time, you will be asked to create an account). + +Once you are signed in to Docker, you can run the command they suggest after pulling an image (for example, see the command in blue text in the red box above, `docker scout quickview...`). If the command runs successfully, you should see output similar to the screenshot below. This is a summary of the vulnerabilities in your image. + +![Test Docker Scout command in terminal](img/scout-quickview.png) + +You can run the next suggested command (shown in red box above, `docker scout cves...`) to see the full list of vulnerabilities. + +Images will be able to pass Gen3 security scanning if there are no Critical or High vulnerabilities, and **[add something about CVSS?]** + +*Want to know more about Docker Scout? [Check out the documentation](https://docs.docker.com/scout/quickstart/).* + +## Build your image locally on top of the base image + +To build your own image, you need to create a Dockerfile. To build your image on a base image, the first line of the Dockerfile should reference the base image tag. The Dockerfile you create typically lives in the Git repository where you have your code to make it easier to copy into your container. + +### Unfamiliar with creating Dockerfiles? + +If you are unfamiliar with creating Dockerfiles, we encourage you to explore the [excellent tutorial here](https://medium.com/@anshita.bhasin/a-step-by-step-guide-to-create-dockerfile-9e3744d38d11), as well as review the [Dockerfile documentation here](https://docs.docker.com/develop/develop-images/dockerfile_best-practices/), before you proceed. We have identified only key highlights below. + +One reminder: Dockerfiles are typically named just "Dockerfile" - with a capital D and no file extension. + +### Example: Build an image with a Dockerfile and a requirements.txt + +In our example here, we will have you build your image using a `requirements.txt` to identify the software tools you want to add to the base image, as well as a Dockerfile that pulls in the base image, adds the software tools specified in the requirements file, copies relevant code files, and establishes some setup parameters. + +Our example will use the files in the [torch_cuda_test directory](https://github.com/uc-cdis/bio-nextflow/tree/master/nextflow_notebooks/containerized_gpu_workflows/torch_cuda_test) of the bio-nextflow repository. You can review the `readme` file in this directory for more information. It is a simple example that will build up from our base image by adding PyTorch. The Nextflow script will ultimately use a python script that checks the version of CUDA in the GPU instance and checks whether it is compatible with the version of PyTorch and CUDA available in the container. + +First, in the terminal, navigate to the directory where the downloaded Dockerfile and requirements.txt are located. + +> Note that the first line of the Dockerfile references the fetch tag for one of our GPU base images. This is always how you will reference a base image -- with `FROM` and the Dockertag. + +Then, run the Docker `build` command. For example: + +`docker build . -t my_docker` + +will tag the Dockerfile in the directory with the tag `my_docker`, and build a Docker image using the Dockerfile. + +### Example: Examine built image for vulnerabilities + +You now have a new Docker image built upon our security-compliant base image. To more rapidly identify and address any security concerns in your customized image, we encourage all users to locally scan their image for vulnerabilities using Docker Scout, as described [in our test above](#test-using-docker-scout-to-evaluate-image-vulnerabilities). Here, we have tagged our new image with `my_docker`. So, we would run the Docker Scout `quickview` command on the image using this command: + +`docker scout quickview my_docker` + +And to identify the specific vulnerabilities and recommendations, you would run: + +`docker scout cves my_docker` + +### My image passes the local security scanning + +Once your custom image is security-compliant based on the analysis from Docker Scout, you are ready to submit your Docker image for Gen3 security scanning. + +[*Continue to Request Credentials*](./nextflow-request-creds.md) diff --git a/docs/nextflow-getting-started.md b/docs/nextflow-getting-started.md new file mode 100644 index 0000000..9e3bbf0 --- /dev/null +++ b/docs/nextflow-getting-started.md @@ -0,0 +1,94 @@ +[![Nextflow logo](img/nextflow.svg){: style="height:75px"}](https://www.nextflow.io/) + +# **Getting started with workflows on Gen3** + +## **Background** + +[![Gen3 logo](img/Gen3-Logo-01-blue.png){: style="height:75px"}](https://gen3.org/) + +### *What is Gen3?* + +The Gen3 platform consists of open-source software services that make up data commons and data ecosystems (also called meshes or fabrics). A data commons is a platform that co-locates both data and compute resources so researchers can bring algorithms to the data. Data ecosystems or meshes are systems that researchers can use to search and query across multiple data commons in one location. + +More information about Gen3 can be found [here](https://gen3.org/). A list of data platforms using the Gen3 technology can be found [here](https://stats.gen3.org/). + +### *What are workflows?* + +A workflow is a computational pipeline that consists of a series of steps to be executed. It could run using a software container that is a standalone, self-contained piece of software containing all the executables needed for the workflow. + +Many workflow languages have been developed in recent years. Common examples include [Common Workflow Language (CWL)](https://www.commonwl.org/), [Open Workflow Description Language (WDL)](https://openwdl.org/), and [Nextflow](https://www.nextflow.io/). We will be using Nextflow for our exercises. + +### *Workflow execution in Gen3* + +Gen3 is based on kubernetes and is container-based. A container is a standalone, self-contained collection of software that contains specific software you may need for your application (e.g., Pydicom/DICOM, Numpy, SciPy). We are testing a new workflow execution system in Gen3 that researchers can use to run containers on the cloud for various applications in a secure and isolated manner. We developed an isolation process so that each user’s workflow is separate from each other, from the Gen3 core system, and from Gen3 data, except when approved and required for the specific task. The testing and development of workflows is currently underway in the [Biomedical Research Hub (BRH)](https://brh.data-commons.org/), one of the first data ecosystems (or meshes) built at CTDS. + +### *What is Nextflow? What is AWS Batch?* + +The workflow execution in Gen3 is powered using [Nextflow](https://www.nextflow.io/), a framework for writing data-driven computational pipelines using software containers. It is a very popular and convenient framework for specifying containers, inputs and outputs, and running jobs on the cloud. Researchers have used Nextflow for several years, and 2023 has continued to see a rapid gain in its popularity per a [recent survey](https://seqera.io/blog/the-state-of-the-workflow-2023-community-survey-results/). The scalability of workflows in Gen3 comes from [AWS Batch](https://docs.aws.amazon.com/batch/latest/userguide/what-is-batch.html), an AWS service capable of running compute jobs over large datasets on the Cloud. + +## **Steps to run workflows in Gen3** + +To run workflows in Gen3, you will need the following: + +* Access to the BRH workspace (covered on this page) +* A funded workspace account (covered on this page) +* A Docker image uploaded to an ECR created for you (start with [Create Dockerfile](./nextflow-create-docker.md)) + +*Depending on your specific workflows, you may also need additional tools, resources, or access.* + +### **Get access to the BRH workspace and set up a funded account** + +#### 1) Request access to the BRH workspace + +The BRH exposes a [computational workspace](https://brh.data-commons.org/workspace) that researchers can use to run simple Jupyter notebooks and submit workflows. To submit workflow jobs, you need access to the BRH workspace -- specifically in our QA environment (QA-BRH). + +Follow [these instructions](https://uc-cdis.github.io/BRH-documentation/05-workspace_registration/#requesting-temporary-trial-access-to-brh-workspace) to request temporary trial access to the QA-BRH workspace (but **note**: request access from the QA site: [https://qa-brh.planx-pla.net/login](https://qa-brh.planx-pla.net/login) and not the production BRH site that is linked in the documentation). After you have submitted your request, please ping `@Sara Volk de Garcia` in Slack to alert her to look for your request and approve it. + +**Some notes about QA-BRH** + +* QA-BRH is scheduled to spin itself up at 6am CDT (12pm UTC) on weekdays, and shuts down at 9:20pm CDT (02:20am UTC). If QA-BRH is not up, you will not be able to log in or access the workspace to run anything. +* Be sure that your workflow is complete and your data saved by the time QA-BRH is scheduled to shut down for the night; otherwise, you could lose all your unsaved work. +* If QA-BRH is expected to be up, but you are getting a "Bad Gateway" message (or any other trouble) when trying to open the page, reach out to Sara on Slack. + +#### 2) Establish a workspace account with a persistent pay model in BRH + +When you initially are granted workspace access in BRH, it is a trial access that is free for the user (paid by CTDS). However, the trial access paymodel does not permit access to the Nextflow image. To gain access to the Nextflow image needed for testing, you must request a workspace account with a persistent paymodel, so that the cost of compute jobs in your project can accrue to the right account. BRH currently supports several persistent pay models such as NIH STRIDES (payment through grant funds) and Direct Pay (credit card payment). If you're curious, see [here](https://uc-cdis.github.io/BRH-documentation/13-workspace_accounts/) for more information about pay models. + +For MIDRC, we have already established a Direct-Pay-type* of workspace account for testing. When you have workspace access, please contact Ao Liu (`@Ao`) over Slack to request the team to add a Direct Pay account to your workspace (please share with Ao the email you use to log in). + +*\* Note about this Direct-Pay-type of account: It is not an ACTUAL Direct Pay account, and it does not go through the normal Direct Pay account route, nor through OCC, at all. It is funded with MIDRC contract funds, but will be labeled Direct Pay in your workspace.* + +### **3) Launch a workspace with the persistent paymodel** + +Once you have been notified that you have a workspace account provisioned with persistent paymodel funds, you can proceed. + +* Log in to QA-BRH and open the workspace page. +* In the dropdown under "Account" in top left, select "Direct Pay" as your paymodel (#1 in screenshot below). +* Once you select the Direct Pay workspace account, you should see a new option for workspace image: "(Beta) Nextflow with CPU instances" +* Click the Launch button for this Nextflow workspace image (#2). +* When you click the button, the workspace will begin to launch. This can take 5-10 minutes. You will know you successfully started the launch because you will see 3 animated dots rippling in the Launch button (see yellow highlight). +> *If it takes longer than 10 minutes, try refreshing the screen and re-trying the launch. If it seems to stall out (longer than 10 min again), or if you get an error, reach out to CTDS staff through the Slack channel (but don't close the tab with the launch).* + +![Screenshot of workspace launch screen](img/launch-workspace.png) + +### **Quick orientation to the the workspace** + +Before using the workspace, we strongly encourage you to [review the BRH Workspace documentation](https://uc-cdis.github.io/BRH-documentation/09-workspace_page/#guideline-to-get-started-in-workspaces). + +There are several key points we want you to be aware of: + +#### Store all data in the persistent directory (/pd) + +Store all files you want to keep after the workspace closes in the `/pd` directory; **only files saved in the /pd directory will persist**. Any personal files in the folder `data` will be lost. + +![Screenshot of /pd and data folders](img/workspace_pd_folder_080422.png){: style="height:400px"} + +#### Automated shutdown for the night + +**QA-BRH shuts down at 9:20pm CDT** (02:20am UTC). Be sure that your workflow is complete and your data saved by the time QA-BRH is scheduled to shut down for the night; otherwise, you could lose all your unsaved work. + +#### Automated shutdown for idle workspaces + +Workspaces will automatically be shut down (and all workflows terminated) after **90 minutes of idle time**. + +[*Continue to General Workflows*](./nextflow-tutorial-1.md) diff --git a/docs/nextflow-overview-containers.md b/docs/nextflow-overview-containers.md new file mode 100644 index 0000000..bd94a2d --- /dev/null +++ b/docs/nextflow-overview-containers.md @@ -0,0 +1,22 @@ +[![Nextflow logo](img/nextflow.svg){: style="height:75px"}](https://www.nextflow.io/) + +# **Overview: Developing and Deploying Containers in Gen3** + +![Overview of steps in developing a container and making it available for use in workflows](./img/container-development.png) + +**Locally build and test container:** +Gen3 provides several FedRAMP security-compliant base images that users can pull and customize. + +**Request credentials and push container to Gen3 staging:** +Users can email Gen3 to request short-term credentials that permit them to authenticate Docker in their terminal to upload the local Docker image to a Gen3 staging repo for security review. + +**Container is security-scanned; Gen3 sends approved container URI:** +Gen3 completes the security scan within minutes. If it is compliant, the image is moved to an ECR repo ("approved") from where the container can be run, and Gen3 staff will send a container URI to the user. + +If there are problems that make the image non-compliant with security requirements, a report of the vulnerabilities is provided for remediation and resubmission. + +**Run workflow using approved container URI:** +In the BRH workspace, use a Nextflow Jupyter notebook to run Nextflow workflows in the approved container using the approved container URI. + +--- +[*Continue to Create Dockerfile*](./nextflow-create-docker.md) diff --git a/docs/nextflow-request-creds.md b/docs/nextflow-request-creds.md new file mode 100644 index 0000000..550bf26 --- /dev/null +++ b/docs/nextflow-request-creds.md @@ -0,0 +1,30 @@ +[![Nextflow logo](img/nextflow.svg){: style="height:75px"}](https://www.nextflow.io/) + +# **Request Credentials for Uploading a Docker Container** + +*Please copy and paste the email template below into a new email and send to [brhsupport@datacommons.io](mailto:brhsupport@datacommons.io). Please be sure to add the relevant information to the bolded fields.* + +------ +Hello, User Services, + +Please create new temporary AWS credentials to permit me to upload a Nextflow container. + +The email address I use to log in to BRH is: **[BRH login email here]** + +I understand that these credentials will last for 1 hour, once created. If I continue to need access to upload after they expire, I will request new credentials. + +Since the credentials will ONLY last 1 hour after creation, you may prefer we send them at a certain time of day. Please **delete which of these do NOT apply**: + +* Please generate and send my credentials tomorrow morning +* Please generate and send my credentials in the afternoon +* Please generate and send my credentials ASAP + +To ensure prompt attention, I will also ping `@Sara Volk de Garcia` on the Slack channel after I have sent my email. + +Thanks! + +**[your name]** + +------ + +[*Continue to Upload Docker Image*](./nextflow-upload-docker.md) diff --git a/docs/nextflow-tutorial-1.md b/docs/nextflow-tutorial-1.md new file mode 100644 index 0000000..c36b12b --- /dev/null +++ b/docs/nextflow-tutorial-1.md @@ -0,0 +1,13 @@ +[![Nextflow logo](img/nextflow.svg){: style="height:75px"}](https://www.nextflow.io/) + +# **Tutorial Nextflow workflows - no containers** + +## Example Nextflow notebooks + +We have a collection of notebooks using Nextflow in Gen3 here: [https://github.com/uc-cdis/bio-nextflow/tree/master/nextflow_notebooks](https://github.com/uc-cdis/bio-nextflow/tree/master/nextflow_notebooks) + +For this section, you will be most interested in the general Nextflow notebooks that do not use containers at all. You can find those here: + +* [non_containerized_nextflow_workflows](https://github.com/uc-cdis/bio-nextflow/tree/master/nextflow_notebooks/non_containerized_nextflow_workflows) + +[*Continue to Overview of Containers in Gen3*](./nextflow-overview-containers.md) diff --git a/docs/nextflow-tutorial-workflows.md b/docs/nextflow-tutorial-workflows.md new file mode 100644 index 0000000..197fa19 --- /dev/null +++ b/docs/nextflow-tutorial-workflows.md @@ -0,0 +1,50 @@ +[![Nextflow logo](img/nextflow.svg){: style="height:75px"}](https://www.nextflow.io/) + +# **Tutorial Nextflow Workflows** + +## Get set: Download necessary credentials and software + +Be ready to execute the tutorial workflows below by gathering credentials and installing necessary software. + +### **MIDRC credentials** + +You need to generate a MIDRC credentials on the profile page of the [MIDRC portal](https://data.midrc.org/) to download GUIDs in the workspace. For this, please go to [data.midrc.org](https://data.midrc.org/), click on the user icon in the right corner (#1), and open the Profile page (#2). Click on Create API Key (#3). A pop-up window will appear with the key. If you scroll down slightly, you can see the button to download the credentials as a JSON. Credentials are valid for 1 month. + +![Screenshot showing how to find and save MIDRC credentials](./img/MIDRC-credentials.png) + +### Get and replace placeholder values from the Nextflow config + +You can find the values to replace the placeholders in the`queue`, `jobRole` and `workDir` fields in the `nextflow.config` file in your Nextflow workspace. Directions for finding this file are at the bottom of the "Welcome to Nextflow" page that opens when your Nextflow workspace first opens. These placeholder values will need to be replaced in each of the various tutorial Nextflow notebooks. + +Note that you should only copy/paste the value to replace `placeholder` for each field; do not copy/paste larger sections of the nextflow config, or there could be indentation problems that interfere with the code. + +## Example Nextflow notebooks + +We have a collection of notebooks using Nextflow in Gen3 here: [https://github.com/uc-cdis/bio-nextflow/tree/master/nextflow_notebooks](https://github.com/uc-cdis/bio-nextflow/tree/master/nextflow_notebooks) + +For this section, you will be most interested in: + +* containerized_cpu workflows +* containerized_gpu workflows + +### Containerized gpu workflow example 1 + +Link to notebook here: [https://github.com/uc-cdis/bio-nextflow/tree/master/nextflow_notebooks/containerized_gpu_workflows/covid_challenge_container](https://github.com/uc-cdis/bio-nextflow/tree/master/nextflow_notebooks/containerized_gpu_workflows/covid_challenge_container) + +For container building, see Dockerfile and requirements file, python code to containerize, and a README that explains how to grab open source models and code and build container. For running workflows there's a python notebook (midrc_gpu_batch_template.ipynb) with additional desc in README + +### Containerized gpu workflow example 2 + +Link to notebook here: [https://github.com/uc-cdis/bio-nextflow/tree/master/nextflow_notebooks/containerized_gpu_workflows/torch_cuda_test](https://github.com/uc-cdis/bio-nextflow/tree/master/nextflow_notebooks/containerized_gpu_workflows/torch_cuda_test) + +For container building, see Dockerfile and requirements file, python code to containerize, and a similar README as above. For running workflows there's a python notebook (torch_cuda_batch_template.ipynb) with additional desc in README + +### Containerized cpu workflow example + +Link to notebook here: [https://github.com/uc-cdis/bio-nextflow/tree/master/nextflow_notebooks/containerized_cpu_workflows/midrc_batch_demo](https://github.com/uc-cdis/bio-nextflow/tree/master/nextflow_notebooks/containerized_cpu_workflows/midrc_batch_demo) + +Has same stuff as above and a README [here](https://github.com/uc-cdis/bio-nextflow/blob/master/nextflow_notebooks/containerized_cpu_workflows/README) that describes two workflows: one local download workflow, and one batch workflow + +## Tutorial 1: Test running Nextflow and AWS Batch workflow in existing Docker container to get MIDRC image files, convert them to PNG, and extract the metadata + +Please see code snippet below that shows an example of how to run two basic processes on DICOM files on AWS Batch: i) convert to PNG, ii) extract metadata. Note that to run this, you need to first download open-access DICOM files first to your workspace using the Gen3 SDK (PART 1), and you can stage the files on AWS Batch and run the workflow (PART 2) diff --git a/docs/nextflow-upload-docker.md b/docs/nextflow-upload-docker.md new file mode 100644 index 0000000..5ac77d7 --- /dev/null +++ b/docs/nextflow-upload-docker.md @@ -0,0 +1,89 @@ +[![Nextflow logo](img/nextflow.svg){: style="height:75px"}](https://www.nextflow.io/) + +# **Pushing Docker Images to AWS ECR** + +### Overview + +This guide is for users who have received temporary credentials granting access to push container images to a specific AWS Elastic Container Registry (ECR) repository. + +### Prerequisites + +- [Docker](https://www.docker.com/get-started/) installed on your local machine. +- [AWS CLI](https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html) installed locally +- Temporary AWS credentials provided by the User Services team. +- The URI of the ECR repository you have been given access to (shared in the AWS credentials) +- A image you want to push, or a Dockerfile to build your image. + +### A note about timing + +Your temporary AWS credentials only lasts for 1 hour from when they were created; User Services should have provided an expiration time when sharing the credentials with you. You must fully complete the push to ECR before they expire, or you will need to request new credentials from User Services. + +## Set AWS environment variables: + +*The commands in this section are valid if you are using a Linux or MacOS. If you are using Windows, we will provide a separate set of commands for you to set the AWS environment variables.* + +Before you can push your Docker image to the ECR repository, you need to configure the AWS CLI with the temporary credentials you received. In the credentials sent to you, there should be the commands needed to run this below the line "Please run the following commands to set your AWS credentials:". Copy those (they will look similar to the block below) and run them in the terminal. + + export AWS_ACCESS_KEY_ID= + export AWS_SECRET_ACCESS_KEY= + export AWS_SESSION_TOKEN= + +> Note: the variables are set only as long as the terminal is open; export the variables again if you close and open a new terminal. + +### Verify configuration: + +Run `aws sts get-caller-identity` to verify that your CLI is using the temporary credentials. + +### Authenticate Docker to ECR + +Next, use the AWS CLI to retrieve an authentication token and authenticate your Docker client to your registry. In the credentials, there is a command below the line "After setting credentials you will need to log in to your docker registry. Please run the following command:". Copy that (it will look similar to the command below) and run it in the terminal. + + aws ecr get-login-password --region | docker login --username AWS --password-stdin + +## Preparing to push your Docker image + +The specific steps you use to prepare to push your image depends on whether you have an image already built or if you will need to build from a Dockerfile. + +### If you already built a local Docker image: Tag your Docker image + +If you already have a locally-built Docker image, you will not need to run the `docker build` command included in the credentials. But, you do need to tag it with the ECR repository URI and the image tag you want to use. This command is not in the credentials file. + + docker tag : : + +Replace `< local-image >` with the name of your local Docker image and `< local-tag >` with the tag you want to push. +> If you're not sure what your local image and tag names are, you can run `docker images` in your terminal. It will provide a list of all your saved docker images. The column called `REPOSITORY` is the local image name. The column called `TAG` is the local tag name for this image. + +Replace `< repositoryUri >` with the ECR repository URI provided at the top of the credentials file, and `< image-tag >` with the image tag name you want to use in your ECR. +> Important note: If you do not want the most-recently-pushed image to replace an earlier version with the same tag in your ECR, image-tags should be unique. +> For example, you create an image with an image-tag `batch-poc`. If you later push another image to `< repositoryUri >:batch-poc`, it will overwrite the previous version of the image in your ECR (you will only have 1 container with the image tag "batch-poc"). +> If you do not want to overwrite, you can use versioned image-tags. For example: `batch-poc-1.0`, and then `batch-poc-1.1`. +> If you want to replace previous versions of your container, you can use the same image-tag. + +### If you need to build your Docker image: + +If you haven't already built your Docker image, you can use the "build" command that is included in your credentials, similar to what is shown below. Note that you should run this command from the directory holding your Dockerfile. You will need to replace the `< tag >` in your command with the image tag name you want to use in your ECR. (Read more about image tags in the previous section.) + + docker build -t : + +> If you use this `docker build` command from your credentials, you do not need to use the `docker tag` command (described in the previous section). + +## Push the Docker image to the ECR + +Push the tagged image to the ECR repository. The `docker push` command is also in the credentials - you just need to specify the image tag you selected when either tagging or building the image in the previous section. + + docker push : + +If the **push is successful**, you will see various "layer 1" "layer 2" etc outputs, and it will indicate progress in pushing. This can take minutes, depending on how large your container is. + +If the **push fails**, you will get a persistent message about "Waiting for layer". This usually means it cannot find the repository, so double-check that there is no typo, and that you have set your [AWS environment variables](#set-aws-environment-variables) since you opened the terminal most recently. + +### Completion + +Once the push completes, your Docker image will be available in the ECR repository (although you will not be able to see it). It will be scanned, and if passes the security scanning, CTDS will move it to the nextflow-approved repo. When it's available in nextflow-approved, User Services will share a docker URI that looks something like this: +`143731057154.dkr.ecr.us-east-1.amazonaws.com/nextflow-approved/< your username >:< image-tag >`. You can then use this new URI to run Nextflow workflows with your container in the BRH workspace. + +## Support + +If you encounter any issues or require assistance, please reach out to the User Services team that provided you with the temporary credentials, or brhsupport@datacommons.io, or reach out on Slack. (Slack is likely to result in the quickest reply.) + +[*Continue to Tutorial Workflows*](./nextflow-tutorial-workflows.md) diff --git a/mkdocs.yml b/mkdocs.yml index 9212d9d..ea8c81b 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -22,6 +22,14 @@ nav: - FAIR Data: 02-types_of_shared_data.md - Data Management and Repositories: 03-data_and_repos.md - Contact: 12-contact.md + - '': + - Nextflow - Getting Started: nextflow-getting-started.md + - Nextflow - General Workflows: nextflow-tutorial-1.md + - Nextflow - Containers in Gen3: nextflow-overview-containers.md + - Nextflow - Create Dockerfile: nextflow-create-docker.md + - Nextflow - Request Credentials: nextflow-request-creds.md + - Nextflow - Upload Docker Image: nextflow-upload-docker.md + - Nextflow - Tutorials with Containers: nextflow-tutorial-workflows.md theme: favicon: img/favicon.ico