-
Notifications
You must be signed in to change notification settings - Fork 27
A Quick Tutorial: Working with GA4GH TES API
The GA4GH Task Execution Service (TES) API is a standardized interface for executing and managing computational tasks on various computing environments. This tutorial provides a simple guide to help you get started with the TES API, including how to create, submit, and manage tasks.
Before you begin, ensure you have the following:
- A running TES server.
- Basic knowledge of RESTful APIs and JSON.
- A tool like
curl
or a REST client (Postman, Insomnia, etc.) to interact with the API.
The TES API allows you to submit, track, and manage computational tasks. Each task typically defines inputs, execution resources, commands, and outputs. The API endpoints you'll interact with include:
- POST /v1/tasks: Submit a new task.
- GET /v1/tasks/{id}: Retrieve information about a specific task.
- GET /v1/tasks: List tasks.
- DELETE /v1/tasks/{id}: Cancel or delete a task.
The first step is submitting a task to the TES API. A task typically includes metadata, input files, the commands to execute, and where to store the outputs.
Here’s an example of a task submission using curl
:
curl -X POST https://tes.example.com/v1/tasks \
-H "Content-Type: application/json" \
-d '{
"name": "Example Task",
"inputs": [
{
"url": "/myContainer/input.txt",
"path": "/data/input.txt",
"type": "FILE"
}
],
"outputs": [
{
"path": "/myContainer/output.txt",
"url": "/data/output.txt",
"type": "FILE"
}
],
"resources": {
"cpu_cores": 4,
"ram_gb": 16,
"preemptible": true,
},
"executors": [
{
"image": "ubuntu:latest",
"command": [
"bash", "-c", "cat input.txt > output.txt"
],
"workdir": "/data"
}
]
}'
This task runs a simple cat
command inside an Ubuntu container to copy an input file to an output file.
- inputs: Specifies the files needed for the task.
- outputs: Defines where the task’s outputs should be stored.
- executors: Contains the details of the command(s) to be executed in a container, including the Docker image.
- resources: Specifies the CPU cores and RAM required for the task.
A successful task submission will return a task ID:
{
"id": "12345"
}
You can now use this ID to track the task’s progress.
Once the task is submitted, you can retrieve its status using the task ID:
curl -X GET https://tes.example.com/v1/tasks/12345
Example response:
{
"id": "12345",
"state": "RUNNING",
"logs": [
{
"start_time": "2024-09-17T10:00:00Z",
"end_time": "",
"system_logs": [],
"outputs": []
}
]
}
-
state: Indicates the current status of the task (e.g.,
QUEUED
,RUNNING
,COMPLETE
,ERROR
). - logs: Includes information about when the task started, finished, and any system logs.
You can list all the tasks submitted to the TES server using the following endpoint:
curl -X GET https://tes.example.com/v1/tasks
This will return a list of all tasks with their current statuses:
{
"tasks": [
{
"id": "12345",
"state": "RUNNING"
},
{
"id": "12346",
"state": "COMPLETE"
}
]
}
To cancel a task that is running or queued, you can use the DELETE
method:
curl -X DELETE https://tes.example.com/v1/tasks/12345
If the task was successfully canceled or deleted, the response will be:
{
"message": "Task canceled"
}
In a bioinformatics context, the TES API can be used to run tasks such as:
- Aligning sequencing data
- Variant calling
- Data preprocessing for machine learning
You could configure the inputs as large sequencing files (BAM/FASTQ), define the appropriate Docker container with the relevant bioinformatics tool (e.g., BWA, GATK), and submit the task to the TES server. Outputs could be the processed results, which are saved back to a cloud bucket.
curl -X POST https://tes.example.com/v1/tasks \
-H "Content-Type: application/json" \
-d '{
"name": "BWA Alignment",
"inputs": [
{
"url": "/myContainer/inputs/input.fastq",
"path": "/data/input.fastq",
"type": "FILE"
}
],
"outputs": [
{
"path": "/data/aligned.bam",
"url": "/myContainer/outputs/aligned.bam",
"type": "FILE"
}
],
"executors": [
{
"image": "biocontainers/bwa:v0.7.17_cv1",
"command": [
"bwa", "mem", "input.fastq", "output.bam"
],
"workdir": "/data"
}
]
}'
The GA4GH TES API provides a standardized and flexible way to submit, manage, and track computational tasks across various environments. By using the endpoints described above, you can automate task submissions, monitor their progress, and manage large-scale computational pipelines with ease.
For more details, check out the GA4GH TES API specification.
To search, expand the Pages section above.