Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature proposal: Signal client disconnect using Request.signal #3033

Open
brettwillis opened this issue Oct 30, 2024 · 24 comments
Open

Feature proposal: Signal client disconnect using Request.signal #3033

brettwillis opened this issue Oct 30, 2024 · 24 comments
Labels
feature request Request for Workers team to add a feature

Comments

@brettwillis
Copy link

As I understand it, when a client disconnects, the context/"thread" for a request is aborted as is. And if ctx.waitUntil() was called, then those promises are allowed some time to resolve.

When you are running async callbacks using ctx.waitUntil(promise), there is no easy way to know that the client has disconnected.

Conceptually, if you responded with a ReadableStream, then controller.enqueue(chunk) should throw or writer.write(chunk) of the writable side should reject on the next write. This would require keeping the context alive with ctx.waitUntil(writable.closed) or something like that, otherwise the context would just die immediately anyway. However I've found that these don't seem to reliably throw/reject when the client disconnects. Perhaps that is a separate issue.

Much of what I'm describing above is from experience/testing or examining workerd rather than documentation, so feel free to correct me.

Proposal

The Request.signal API is an existing standard API which reflects the AbortSignal passed when constructing the request. However the signal property of the incoming request currently does nothing.

I propose that the Workers runtime utilise this existing API by "aborting" the signal of the incoming request when a client disconnects. Any scripts that are interested in reacting when the client has disconnected can utilise this signal.

If an abort listener has been registered using request.signal.addEventListener('abort', ...) then the runtime will call those handler(s) when the client disconnects, before aborting the context. If the handler(s) synchronously call ctx.waitUntil(promise) then that will keep the context alive for a further period of time.

Credit to uri2 here for this inspiration of this idea, as other proposals I had entertained were new non-standard APIs such as ctx.addEventListener('end', ...) etc.

I'm happy to contribute if this proposal makes it through.

Use cases

The use-cases I require this for are scenarios when the Worker is generating a long-running response stream on the fly, for example Server-sent events.

When the client disconnects from the stream, it is useful to react to perform logging, or do cleanup such as signalling user presence in a database.

Currently the request will either die immediately, or if ctx.waitUntil(writable.closed) was called then it will continue running unknowingly until it gets terminated anyway 30s or so later.

@jasnell jasnell added the feature request Request for Workers team to add a feature label Nov 3, 2024
@kentonv
Copy link
Member

kentonv commented Nov 11, 2024

FWIW, I agree this is the right API.

(But I don't currently know when we'll be able to implement it.)

@brettwillis
Copy link
Author

(But I don't currently know when we'll be able to implement it.)

I'd be open to contributing this. But I wouldn't want to get started until the proposal has been through the appropriate reviews etc? Let me know.

@ricardopolo
Copy link

👍 to this feature, cloudflare should support abort requests

in the meantime, anyone have any workaround in mind?

@kentonv
Copy link
Member

kentonv commented Nov 12, 2024

@mikenomitch @irvinebroque thoughts?

@irvinebroque
Copy link
Collaborator

We want to support Request.signal, and agree that registering the event listener on the incoming request makes sense. Matter of when not if in my mind (particularly now that the cache property of Request has some support in Workers cloudflare/cloudflare-docs#17926)

Defer to others on if we think this is a viable thing for someone externally to contribute but in principle really appreciate that you're interested in contributing @brettwillis.

@jasnell FYI re: standards

@jasnell
Copy link
Member

jasnell commented Nov 13, 2024

To be clear, we already support request.signal on the client side of a fetch, just not on the server side. We've had folks request the server-side for a while but it needs to come with the caveat that we likely cannot guarantee that it will fire in every case. Detecting that the connection has been aborted reliably can be a bit tricky but definitely doable.

One key question to determine is under what conditions would the request.signal be triggered on the server-side? Only when the runtime detects that the client has disconnected? Should it be called under other circumstances such as if the runtime cancels the request due to an error?

And just to make sure we're on the same page, this is what we're talking about, correct?

export default {
  fetch(request) {
    request.signal.addEventListener('abort', () => {
      // If this is called, it means the incoming request was canceled and we should stop trying to do stuff for it.
    });
  }
}

@brettwillis
Copy link
Author

And just to make sure we're on the same page, this is what we're talking about, correct?

Yes correct. I'd add that the work in that callback may be async, and therefore involve ctx.waitUntil():

export default {
  fetch(request, env, ctx) {
    request.signal.addEventListener('abort', () => {
      // Do some logging or async work to signal other parts of the system
      const promise = ...;
      ctx.waitUntil(promise);
    });

    // Response may be a long-running readable stream
    const readable = ...;
    return readable;
  }
}

One key question to determine is under what conditions would the request.signal be triggered on the server-side? Only when the runtime detects that the client has disconnected? Should it be called under other circumstances such as if the runtime cancels the request due to an error?

Good point, needs some careful consideration. Here's a train of thought from my side:

Is it correct to say that there are currently two ways for a request to end: (1) end "naturally" by emptying the event loop, or (2) "aborted" by the runtime by cancelling any pending callbacks in the event loop.

We could then say that request.signal is aborted whenever the runtime itself aborts a request as in (2) without it ending naturally? Worded in another way: "abort callbacks are guaranteed to be called when the runtime [knowingly] aborts a request before it naturally ends".

While this sounds clean, it may break down a little when reasoning about (3) and (4) below...

Conditions I can think of where the runtime currently aborts a request (cancelling event loop callbacks) are:

  1. Platform detects a client disconnect (we see this as "Request cancelled" error in the logs). I think you are saying that the platform does not always reliably detect this, but when it does detect this the runtime does abort the request.
  2. Internal reasons like server maintenance.
  3. Timeout limits like callbacks in ctx.waitUntil() that take too long to resolve.
  4. Resource limits like CPU or memory.

In the cases of (3) and (4), the script has already reached it's limits and therefore you might not want to give it more life with the abort callbacks?

There's also another case: I think "dangling" timers and async callbacks are cancelled from the event loop if they were not wrapped in ctx.waitUntil() once the response has been sent or an unhandled exception is thrown. In such cases the script has already yielded naturally under it's own control flow. It is fully capable of detecting it's own natural end or catching the exception, so I'd it does not need a signal from the runtime. We want the abort callback for when the runtime aborts the script outside of the script's control.

@kentonv
Copy link
Member

kentonv commented Nov 18, 2024

I think only case 1 makes sense to support via this API. AFAIK the platform does reliably detect client disconnects, so that's not an issue. It should be straightforward to implement.

I think cases 2-4 raise a lot more questions and concerns. I'm not sure we can really support them at all, but if we did, I don't think request.signal is the right API for them. These cases are not really about the request, they are about the isolate.

One gotcha regarding request.signal that we'll need to work out: What happens in the common case that the request is passed along to an outgoing fetch()? If the response from that fetch is immediately returned by the worker, then it makes sense that if the client disconnects, the outgoing fetch should be canceled. But what if the response of that fetch is going to be used in some other logic, maybe involving a waitUntil() that writes to cache, etc.? Then, if we cancel the subrequest based on the client disconnecting, we might break that logic.

I think we might have to say: If the signal on an outgoing fetch() is the exact same signal from the original incoming request that started the Worker, it is ignored. If that feels wrong, then we would have to introduce a compat flag to fix it without breaking anyone. But I think it's fine really... if the client disconnects and there are no waitUntil()s, then all subrequests are canceled implicitly. So this only makes a difference if there are waitUntil()s, in which case you probably don't actually want to cancel the subrequest?

@jasnell
Copy link
Member

jasnell commented Nov 18, 2024

I think only case 1 makes sense to support via this API. AFAIK the platform does reliably detect client disconnects, so that's not an issue. It should be straightforward to implement.

I think this potentially becomes a bit wonky in deferred proxy cases. Specifically, if I have a case like...

export default {
  fetch(request) {
    request.signal.addEventListener('abort', () => {
      // should I expect this to be called?
    });
    return fetch('http://example/subrequest');
  }
}

In this case, because of deferred proxying, if the client disconnects after we've returned the response but before the response has been fully completed, would we expect the abort handler to be called? Do we care at this point?

@kentonv
Copy link
Member

kentonv commented Nov 18, 2024

In this case, because of deferred proxying, if the client disconnects after we've returned the response but before the response has been fully completed, would we expect the abort handler to be called? Do we care at this point?

I would say no, don't deliver any notification at that point. It would break the optimization, and I can't see why any app would need it.

@jasnell
Copy link
Member

jasnell commented Nov 18, 2024

I would say no, don't deliver any notification at that point. It would break the optimization, and I can't see why any app would need it.

We'll need to document that carefully then to avoid confusion. It would mean that the handler would be called in some cases after a response is sent but not called in others, depending on whether the deferred proxy optimization is in use or not... for instance, in a case like the following I would still expect the abort handler to be called since we're still running JavaScript after the response is returned. It just might be a bit confusing for folks if they aren't aware of when we apply the optimization or not.

export default {
  fetch(request) {
    request.signal.addEventListener('abort', () => {  /* ... */ });
    const rs = new ReadableStream({
      // ...
    });
    return new Response(rs);
  }
}

To be clear, I'm not disagreeing with you, just pointing out that we'll need to document this bit carefully.

@kentonv
Copy link
Member

kentonv commented Nov 18, 2024

Maybe, but the same issue already exists today. Like if you use setInterval(), and then return a response eligible for deferred proxying, the interval will stop firing while the response streams through. Basically, once you've set up deferred proxying, JavaScript execution is canceled (unless you have waitUntil()).

@jasnell
Copy link
Member

jasnell commented Nov 18, 2024

That's fair. And fwiw, this is what I meant earlier when I said, "we likely cannot guarantee that it will fire in every case." Specifically, there will be some cases (like deferred proxy) where we say that the abort listener will not be invoked even if the request has not been fully processed... simply because we might have passed the point at which we're actually running JavaScript. As long as that's understood this should be fairly straightforward to implement.

@ricardopolo
Copy link

Hey, any progress on this? We need to able to cancel requests in CF

@brettwillis
Copy link
Author

I think we might have to say: If the signal on an outgoing fetch() is the exact same signal from the original incoming request that started the Worker, it is ignored. If that feels wrong, then we would have to introduce a compat flag to fix it without breaking anyone. But I think it's fine really... if the client disconnects and there are no waitUntil()s, then all subrequests are canceled implicitly. So this only makes a difference if there are waitUntil()s, in which case you probably don't actually want to cancel the subrequest?

That's a really good point I hadn't thought of. It feels like most people would find it pretty unexpected for their subrequest to be cancelled because of the incoming requests signal when using waitUntil()s to follow up on the responses, and it would break a lot of workers (I know it would break some of mine), so would definitely have to be behind a flag. Even if I knew about this behaviour, it would probably keep catching me out in the future.

And personally it also feels a little convoluted that signal would be ignored if it's the same instance. I mean I can't actually think of a real circumstance when this would be unexpected, but it just smells a bit and makes me question now if request.signal is the right API... it seemed perfect until you mentioned this. Because aborting the incoming request.signal would be a new behaviour anyway, it wouldn't be breaking anything.

All that to say that of those two options my vote would be to implicitly ignore the same signal instance on subrequests. Either that or a non-standard API that lives beside ctx.waitUntil() like ctx.addEventListener('end', ...) or ctx.whenCancelled(() => {...}) or similar. Are there any standards proposals?

Maybe, but the same issue already exists today. Like if you use setInterval(), and then return a response eligible for deferred proxying, the interval will stop firing while the response streams through. Basically, once you've set up deferred proxying, JavaScript execution is canceled (unless you have waitUntil()).

Would we want request.signal.addEventListener('abort', ...) to behave the same as ctx.waitUntil(), i.e. in that adding the abort callback keeps JS alive can prevents the deferred proxy optimisation?

@kentonv
Copy link
Member

kentonv commented Dec 2, 2024

A benefit of ctx.addEventListener('end', ...) is that it could work for non-fetch events, including RPC. But it would be non-standard, and being non-standard also means we will need to argue about exactly how the API should look. (Honestly the best thing about following standards... no bikeshedding.)

Would we want request.signal.addEventListener('abort', ...) to behave the same as ctx.waitUntil(), i.e. in that adding the abort callback keeps JS alive can prevents the deferred proxy optimisation?

No I don't think so. I doubt most people would actually want this to prevent deferred proxying. We just won't deliver the abort event to JS if we've already gone into deferred proxying before the client disconnects. That's probably fine because this implies the JS context is already over and so all resources it was holding open have been closed implicitly (except for the proxy stream), so what else would it care to do?

@jasnell
Copy link
Member

jasnell commented Dec 2, 2024

Why not ctx.signal.addEventListener('abort') ... that is, drop the AbortSignal off of the ctx? That would at least give us a standard API even if it is exposed via a non-standard property (ctx).

No I don't think so. I doubt most people would actually want this to prevent deferred proxying. We just won't deliver the abort event to JS if we've already gone into deferred proxying before the client disconnects. That's probably fine because this implies the JS context is already over and so all resources it was holding open have been closed implicitly (except for the proxy stream), so what else would it care to do?

+1 to this.

@kentonv
Copy link
Member

kentonv commented Dec 2, 2024

Why not ctx.signal.addEventListener('abort')

Good point, that also makes it easy to pass on the signal to other APIs expecting an AbortSignal. And it limits bikeshedding. :)

@brettwillis
Copy link
Author

Why not ctx.signal.addEventListener('abort')

Great idea, I like it too.

@brettwillis
Copy link
Author

What's the next step here?

@jasnell
Copy link
Member

jasnell commented Dec 4, 2024

Really comes down to prioritizing and scheduling the change here. It's on our roadmap. Difficult to say at the moment exactly when we'll be able to work on this.

@brettwillis
Copy link
Author

Absolutely. Sorry I wasn't meaning "when", I was more asking, because I've offered to contribute to this, are there more reviews that need to take place, or is this a feature you'd prefer to keep in house?

@jasnell
Copy link
Member

jasnell commented Dec 4, 2024

... or is this a feature you'd prefer to keep in house?

Certainly open to a contribution here. I just know there's likely some internal testing and possible an internal PR that would also be needed before we could pick this up in production so I think it at least warrants making sure we have someone on the runtime team engaged to work on this also.

@brettwillis
Copy link
Author

Ok sounds great, then I'll wait to hear back before doing anything. Thanks 🙏🏼

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request Request for Workers team to add a feature
Projects
None yet
Development

No branches or pull requests

5 participants