Skip to content

[Feature request] Add graceful shutdown hook for functions #983

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
jlizen opened this issue May 6, 2025 · 0 comments · May be fixed by #982
Open

[Feature request] Add graceful shutdown hook for functions #983

jlizen opened this issue May 6, 2025 · 0 comments · May be fixed by #982

Comments

@jlizen
Copy link

jlizen commented May 6, 2025

Context:

It would be nice to have a simple turnkey API to enable users to handle graceful shutdowns properly in their functions.

Essentially we want to make it really easy to do something like the below, without reasoning about extensions, selecting across signals, etc. Just move into a closure whatever resources you need to clean up and invoke whatever logic.

    let log_guard = tracing_appender::NonBlocking::new(std::io::stdout);
    tokio::spawn(async move {
        let mut sigint = signal(SignalKind::interrupt()).unwrap();
        let mut sigterm = signal(SignalKind::terminate()).unwrap();
        tokio::select! {
            _sigint = sigint.recv() => {
                tracing::info!("[runtime] SIGINT received");
                tracing::info!("[runtime] Graceful shutdown in progress ...");
                std::mem::drop(log_guard);

                eprintln!("[runtime] Graceful shutdown completed");
                std::process::exit(0);
            },
            _sigterm = sigterm.recv()=> {
                tracing::info!("[runtime] SIGTERM received");
                tracing::info!("[runtime] Graceful shutdown in progress ...");
                std::mem::drop(log_guard);

                eprintln!("[runtime] Graceful shutdown completed");
                std::process::exit(0);
            },
        }
    });

This requires two parts:

  • An API that accepts an async closure that can be invoked in case a SIGTERM or SIGINT is caught
  • Under the hood handling to ensure that graceful shutdown signals are sent (meaning, an extension is registered)

A few questions to answer with some initial thoughts:

Should the user-specified closure be able to differentiate between SIGTERM or SIGINT / have some sort of input parameter?
I think probably no, we should keep it on rails and avoid surfacing types, maybe throw in an implicit tracing::debug log if tracing feature is enabled. Anyway the caller can always specify their signal handling directly if they need more customizability.

Should we just always specify a no-op internal extension, or do we attempt to avoid it based on detection of other registered extensions, caller specifying, etc
I think probably we just always implicitly add a no-op extension and keep the API simple. We can get away with this because of two things:

  1. we can simply not subscribe to any events in the no-op extension
  2. the next event extension API takes the form of a long poll that just hangs indefinitely if no subscribed events are enabled (ie, the extension client connects to lambda orchestrator, and then just waits for data that never comes)

This means that the no-op extension is very cheap - we can just spawn a task that drives its future, but is never woken after initial connect (or very rarely woken, if for some reason the long-running connection is interrupted).

As long as we document some special cased extension name that the caller shouldn't collide with, I don't see much downside to just always initializing it. We could probably do some smart detection of whether other extensions are registered, right before starting the function handler, but it strikes me as needless complexity. Same thing goes for explicit 'opt out of no-op extension' api facing users.

Can we assume we are in a tokio runtime?
Not sure... This matters because the previously mentioned approach involves spawning a background task. Tokio with rt feature looks like a hard dependency of the runtime currently. If that's the case, I guess we can also just document that it will panic if called outside a tokio runtime (and that would happen on startup so it would be obvious). It would be more awkward if we had to bring in a tokio feature flag or something. I need to poke around more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
1 participant