Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Towards Tracking Server API for auto-resume of previously failed app executions #441

Open
rgareev opened this issue Nov 28, 2024 · 0 comments

Comments

@rgareev
Copy link

rgareev commented Nov 28, 2024

Is your feature request related to a problem? Please describe.

I'd like to have a helper script that will be useful for quick prototyping (faster iterations / ad-hoc hypothesis testing) of multi-step Burr FSM-based workflows with a few heavy (time and computer consuming) steps/actions.
It should accelerate the following use case:

  • developer works on Burr workflow A->B->C, where B takes, say, 20 minutes.
  • developer runs a 'main' script with Burr app inside, waits 20 minutes and see a failure on step C
  • developer fixes the bug in C and re-runs the script on the same inputs
  • developer waits 20 minutes again for B being re-computed... and then it finally goes to C, and then... it depends 🤷
  • instead, developer wants to resume execution from C, since A and B did not change, and inputs did not change.

It is very similar to what is shown here - this notebook https://github.com/DAGWorks-Inc/burr/blob/main/examples/multi-modal-chatbot/burr_demo.ipynb , see usage of initialize_from and with_identifiers but without need to manually deal with application_id and sequence_ids.

From one view point it is kinda caching problem (one of 2 oldest, right?), but we have a Burr tracking server that solves this problem 😎
From another point of view – I am not sure that it should be part of burr "core" since it is more about serving, or like a "high-level" Burr application script dealing with bunch of burr-interfaced services like the tracking server.

So that's why I am looking for missing tracking server API operations to make the following possible:
a script (kinda Burr app/graph runner) that

  1. imports a Burr graph definition from some project module
    2a. it checks for script flag --no-resume . If it is present then it just runs Burr app for the given inputs, entrypoint and halt config – just pass it through from the script arguments to Burr app builder.
    2b. If no --no-resume is present (by default) then it connects to a Burr tracking server instance given URL in BURR_TRACKING_SERVER_URL env variable.

  2. It takes a Burr project name from BURR_PROJECT_NAME env variable.

  3. It uses tracking server API to fetch latest trace for the configured project name and the same inputs.

  4. If it is found and it is in failed state, it tries to resume execution initializing state from this trace using state right before the failure.

Describe the solution you'd like
A documented API of the Burr tracking server(s), with minimal set of operations required to make the aforementioned script happen.

Describe alternatives you've considered
TODO

Additional context
None

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant