You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Introducing a status API providing reporting on run status, error and logs.
Job Stories
When I run a DAG in AirCan I want to
know its status (e.g. running, success, failed)
(?) other info (e.g. how long its running)
detailed errors on failure e.g. if it failed ...
return results (or pointer to results) on success
so that I can report on this to users and empower them to resolve errors
(?) get realtime output (cf gitlab runner)
Get notified rather than poll for updates (push notifications rather than pull)
Acceptance
An API exists like /api/3/action/aircan_submit?dag_id=... that runs a DAG and returns the run ID
An API exists like /api/3/action/aircan_status?run_id=... which reports on status of a run e.g. PENDING | RUNNING | PAUSED | FAILED | SUCCESS and provides error information
When DAG fails error information including access to full logs (either via previous API or a new one)
Logging - Logs are enabled on Composer and can be consumed via API. Note: There is no standard format for logging yet
Failed End to end run test: CKAN instance with ckannext- aircan-connector; upload a CSV file and have a DAG on GCP triggered. CKAN instance must know something went wrong.
FUTURE
Callbacks from AirCan to CKAN so rather than polling we have live status (this would be part of having "Run/Job" objects in CKAN (this is a future item)
Tasks
[ ]
Analysis
Client flow
Thinking of user using a CKAN instance. A run of a DAG is triggered by the CKAN instance.
The user knows the name of the DAG they'll trigger (atm specified in an .env var; it can change later)
They'd then access the following endpoint to get the status of the DAG_run
GET http://ckan:500/api/3/action/dag_run/<dag_id> # returns all recent runs of that DAG
GET http://ckan:500/api/3/action/dag_run/<dag_id>/<run_id>
They'd see a page with the execution dates for the dag_id
As the answer of this request, you must get back the run_id
What do you do with this run ID? [For now we can assume the client keeps that run id and it's up to them. Longer term we will have a "Run/Job" objects in CKAN] We'd need to persist it on a DB... Otherwise it'll be lost
Our customized response including access to GCP logs
Specify on the DAG where it fails. Return "success: False" works for the logs, but we need to trigger the Fail action on the task (not being done RN)
Treat all corner cases of failing tasks
Shall we implement retires?
Create a default error set that will be used both in the connector and on Aircan DAGs
Logs
Planning to create the job_status page. Correct? What should we see in this page besides the task_id info + logs info?
Obtain combined info from Airflow status API + GCloud logs when displaying task status. Sounds good?
Other questions
What are the endpoints (on CKAN) that will trigger the DAG? right now we have datastore_create and aircan_submit. Are there any other triggering endpoints?
What is the best way to organize the docs? I fing the README on aircan is extensive and potentially with lots of non-useful information. i.e. are people going to use aircan on standalone?
The text was updated successfully, but these errors were encountered:
Introducing a status API providing reporting on run status, error and logs.
Job Stories
When I run a DAG in AirCan I want to
so that I can report on this to users and empower them to resolve errors
Acceptance
/api/3/action/aircan_submit?dag_id=...
that runs a DAG and returns the run ID/api/3/action/aircan_status?run_id=...
which reports on status of a run e.g.PENDING | RUNNING | PAUSED | FAILED | SUCCESS
and provides error informationFUTURE
Tasks
Analysis
Client flow
Thinking of user using a CKAN instance. A run of a DAG is triggered by the CKAN instance.
The user knows the name of the DAG they'll trigger (atm specified in an .env var; it can change later)
They'd then access the following endpoint to get the status of the DAG_run
They'd see a page with the execution dates for the dag_id
Response from Airflow:
The flow we'd need
on CKAN you hit:
As the answer of this request, you must get back the run_id
What do you do with this run ID? [For now we can assume the client keeps that run id and it's up to them. Longer term we will have a "Run/Job" objects in CKAN] We'd need to persist it on a DB... Otherwise it'll be lost
Our customized response including access to GCP logs
Response:
They'd get the result of the Airflow API for DAG status
https://airflow.apache.org/docs/stable/rest-api-ref.html
Ideally combined with GCP logs
FAQs
Callbacks [Rufus: this should be later]
Another path to consider (or support both): having an endpoint set up on airflow ready to receive a post from AirCan.
i.e. a task fails while running on a DAG. Aircan sends a notification by hitting an endpoint on CKAN.
Questions to discuss
Questions Errors Handling
Logs
Other questions
What are the endpoints (on CKAN) that will trigger the DAG? right now we have datastore_create and aircan_submit. Are there any other triggering endpoints?
What is the best way to organize the docs? I fing the README on aircan is extensive and potentially with lots of non-useful information. i.e. are people going to use aircan on standalone?
The text was updated successfully, but these errors were encountered: