Skip to content

A Quick Tutorial: Working with GA4GH TES API

Venkat Malladi edited this page Sep 18, 2024 · 1 revision

The GA4GH Task Execution Service (TES) API is a standardized interface for executing and managing computational tasks on various computing environments. This tutorial provides a simple guide to help you get started with the TES API, including how to create, submit, and manage tasks.

Prerequisites

Before you begin, ensure you have the following:

  • A running TES server.
  • Basic knowledge of RESTful APIs and JSON.
  • A tool like curl or a REST client (Postman, Insomnia, etc.) to interact with the API.

Overview of TES API

The TES API allows you to submit, track, and manage computational tasks. Each task typically defines inputs, execution resources, commands, and outputs. The API endpoints you'll interact with include:

  • POST /v1/tasks: Submit a new task.
  • GET /v1/tasks/{id}: Retrieve information about a specific task.
  • GET /v1/tasks: List tasks.
  • DELETE /v1/tasks/{id}: Cancel or delete a task.

1. Submitting a Task

The first step is submitting a task to the TES API. A task typically includes metadata, input files, the commands to execute, and where to store the outputs.

Here’s an example of a task submission using curl:

curl -X POST https://tes.example.com/v1/tasks \
-H "Content-Type: application/json" \
-d '{
  "name": "Example Task",
  "inputs": [
    {
      "url": "/myContainer/input.txt",
      "path": "/data/input.txt",
      "type": "FILE"
    }
  ],
  "outputs": [
    {
      "path": "/myContainer/output.txt",
      "url": "/data/output.txt",
      "type": "FILE"
    }
  ],
  "resources": {
    "cpu_cores": 4,
    "ram_gb": 16,
    "preemptible": true,
  },
  "executors": [
    {
      "image": "ubuntu:latest",
      "command": [
        "bash", "-c", "cat input.txt > output.txt"
      ],
      "workdir": "/data"
    }
  ]
}'

This task runs a simple cat command inside an Ubuntu container to copy an input file to an output file.

  • inputs: Specifies the files needed for the task.
  • outputs: Defines where the task’s outputs should be stored.
  • executors: Contains the details of the command(s) to be executed in a container, including the Docker image.
  • resources: Specifies the CPU cores and RAM required for the task.

Example Response

A successful task submission will return a task ID:

{
  "id": "12345"
}

You can now use this ID to track the task’s progress.

2. Checking Task Status

Once the task is submitted, you can retrieve its status using the task ID:

curl -X GET https://tes.example.com/v1/tasks/12345

Example response:

{
  "id": "12345",
  "state": "RUNNING",
  "logs": [
    {
      "start_time": "2024-09-17T10:00:00Z",
      "end_time": "",
      "system_logs": [],
      "outputs": []
    }
  ]
}
  • state: Indicates the current status of the task (e.g., QUEUED, RUNNING, COMPLETE, ERROR).
  • logs: Includes information about when the task started, finished, and any system logs.

3. Listing All Tasks

You can list all the tasks submitted to the TES server using the following endpoint:

curl -X GET https://tes.example.com/v1/tasks

This will return a list of all tasks with their current statuses:

{
  "tasks": [
    {
      "id": "12345",
      "state": "RUNNING"
    },
    {
      "id": "12346",
      "state": "COMPLETE"
    }
  ]
}

4. Canceling or Deleting a Task

To cancel a task that is running or queued, you can use the DELETE method:

curl -X DELETE https://tes.example.com/v1/tasks/12345

If the task was successfully canceled or deleted, the response will be:

{
  "message": "Task canceled"
}

Example Use Case: Running a Bioinformatics Workflow

In a bioinformatics context, the TES API can be used to run tasks such as:

  • Aligning sequencing data
  • Variant calling
  • Data preprocessing for machine learning

You could configure the inputs as large sequencing files (BAM/FASTQ), define the appropriate Docker container with the relevant bioinformatics tool (e.g., BWA, GATK), and submit the task to the TES server. Outputs could be the processed results, which are saved back to a cloud bucket.

curl -X POST https://tes.example.com/v1/tasks \
-H "Content-Type: application/json" \
-d '{
  "name": "BWA Alignment",
  "inputs": [
    {
      "url": "/myContainer/inputs/input.fastq",
      "path": "/data/input.fastq",
      "type": "FILE"
    }
  ],
  "outputs": [
    {
      "path": "/data/aligned.bam",
      "url": "/myContainer/outputs/aligned.bam",
      "type": "FILE"
    }
  ],
  "executors": [
    {
      "image": "biocontainers/bwa:v0.7.17_cv1",
      "command": [
        "bwa", "mem", "input.fastq", "output.bam"
      ],
      "workdir": "/data"
    }
  ]
}'

Conclusion

The GA4GH TES API provides a standardized and flexible way to submit, manage, and track computational tasks across various environments. By using the endpoints described above, you can automate task submissions, monitor their progress, and manage large-scale computational pipelines with ease.

For more details, check out the GA4GH TES API specification.

To search, expand the Pages section above.

Home

Release Notes and Announcements
Getting Started
FAQ and Troubleshooting
Submitting tasks to TES on Azure
Clone this wiki locally