-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Specify destroy() methods and behavior around context loss and error reporting #744
Conversation
Based on @mingmingtasd's work in the Chromium prototype implementation. For webmachinelearning#477
Some points for discussion:
@mingmingtasd and @reillyeon could you do an initial review? |
Thanks! @inexorabletash @reillyeon
I will check context lost for all of the synchronous and asynchronous actions depending on the context in my chromium CL. |
Something this PR doesn't do is specify the behavior around rejecting in-flight asynchronous operations. |
Thoughts on how to do that and to what level of detail?
One generic approach for both of those would be to replace: "Queue an ML task with global to resolve promise with ..." with:
... which I think covers the script-observable behavior, but not that the async steps internally should fail. |
There are two separate considerations: what happens to the promise returned by a method and what happens to an asynchronous operation itself. Operations like |
I suppose the following step of execute graph could handle the device lost error:
A question is if the error is device lost, should it run the "context-lost" steps? Like the Chromium prototype does in
Or we could just say "If that returns an error, then queue an ML task with global to reject promise with an 'OperationError' DOMException" similar to
The new steps may not run because these steps may already be aborted (by "abort these steps") if previous steps fail . |
+1. Script timeline:
Device/queue timeline:
Note: impl. is always free to abort pending async ops immediately (and release buffers). |
I think this would be script observable - i.e. what order do the |
@reillyeon @huningxin @bbernhar I think the DirectML backend here has considered this, |
I think we've settled on the idea that destroying an Destroying an |
I have submitted a CL to expose |
(not actually ready for review, but I wanted to force an updated preview to be generated) |
The <dfn method for=MLGraph>destroy()</dfn> method steps are: | ||
</summary> | ||
1. If [=this=].{{MLGraph/[[isDestroyed]]}} is true, then abort these steps. | ||
1. Set [=this=].{{MLGraph/[[isDestroyed]]}} to true. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MLGraph may own device (e.g. GPU/NPU) resources. The destroy steps may need to release them on the context timeline. Worth adding a note? (Like MLTensor.destory() does)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added in 0fc4151 but it's a very generic note, it doesn't mention the timeline. Should it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can be a bit more explicit and queue a task on the context timeline to "mark resources owned by this graph as freeable".
index.bs
Outdated
@@ -1080,16 +1097,17 @@ Reads back the {{MLTensor/[[data]]}} of an {{MLTensor}} from the {{MLContext}}.{ | |||
1. If |tensor|.{{MLTensor/[[isDestroyed]]}} is true, then return [=a new promise=] [=rejected=] with a {{TypeError}}. | |||
1. If |tensor|.{{MLTensor/[[descriptor]]}}.{{MLTensorDescriptor/readable}} is false, then return [=a new promise=] [=rejected=] with a {{TypeError}}. | |||
1. Let |promise| be [=a new promise=]. | |||
1. Enqueue the following steps to |tensor|.{{MLGraph/[[context]]}}.{{MLContext/[[timeline]]}}: | |||
1. Enqueue the following steps to |tensor|.{{MLGraph/[[context]]}}.{{MLContext/[[timeline]]}}, which [=/abort when=] [=this=] [=MLContext/is lost=]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is |tensor|.{{MLGraph/[[context]]}}
a typo (an old issue)? Should it be |tensor|.{{MLTensor/[[context]]}}
or just [=this=].{{MLContext/[[timeline]]}}
? If it is a typo, the line 1096 also needs to be fixed. For example,
1. Enqueue the following steps to |tensor|.{{MLGraph/[[context]]}}.{{MLContext/[[timeline]]}}, which [=/abort when=] [=this=] [=MLContext/is lost=]: | |
1. Enqueue the following steps to [=this=].{{MLContext/[[timeline]]}}, which [=/abort when=] [=this=] [=MLContext/is lost=]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another question is whether it should check context lost earlier? Like createTensor
steps do, e.g.
1. If [=this=] [=MLContext/is lost=], then return [=a new promise=] [=rejected=] with an "{{InvalidStateError}}" {{DOMException}}.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that the context loss will destroy every associated tensor, I don't think that is needed. I'm not opposed to adding it explicitly though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the "type confusion" I added a lint tool test and caught several more places. It's not perfect, but it is progress!
The <dfn method for=MLGraph>destroy()</dfn> method steps are: | ||
</summary> | ||
1. If [=this=].{{MLGraph/[[isDestroyed]]}} is true, then abort these steps. | ||
1. Set [=this=].{{MLGraph/[[isDestroyed]]}} to true. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can be a bit more explicit and queue a task on the context timeline to "mark resources owned by this graph as freeable".
Co-authored-by: Reilly Grant <[email protected]>
…ntext timeline
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I scanned it, and it looks fine (not as close scrutiny given Reilly and Ningxin already looked). I'll delay merging a bit though because Rafael wanted to look at it tomorrow. ⏳
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. @inexorabletash , thank you very much for putting this together.
Also looks good from WebKit's side @inexorabletash - informal approval from discussion in today's call |
Based on @mingmingtasd's work in the Chromium prototype implementation.
MLContext
gains adestroy()
method and alost
promise attribute. Callingdestroy()
releases resources for the context itself and any associatedMLGraph
s andMLTensor
s, and any outstandingbuild()
requests for an associatedMLGraphBuilder
are aborted. Thelost
attribute then resolves, and any future use of the context fails. If the context is lost for other reasons, the same behavior occurs (releasing associated resources), and thelost
promise provides an implementation-defined message explaining the reason.MLGraph
similarly gains adestroy()
attribute. When destroyed - either by an explicit call, or because the associated context is lost - then any outstandingdispatch()
requests using the graph are aborted.This also modifies the omnipresent "has built" tests on
MLGraphBuilder
methods to be a "can built" test which also checks that the builder's context is not lost.For #477
Preview | Diff