-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API lacks handling for async ML device errors on the context #477
Comments
@anssiko this issue is for non-interop and still needs a follow-up proposal (low priority atm). |
Or, should we add plain error event(s) to MLContext?
|
A promise has slightly better ergonomics because (1) the transition to "lost" only happens once for a device, and (2) it works even if your code runs after the state has changed; you aren't forced to add an event listener immediately. |
Based on @mingmingtasd's work in the Chromium prototype implementation. For webmachinelearning#477
@anssiko SGTM to move the discussion there. |
…reporting (#744) * Specify behavior around context loss and error reporting. Based on @mingmingtasd's work in the Chromium prototype implementation. For #477 * Add link for 'settled' * Add 'is lost' check to build() * Add 'is not lost' dfn alias * Add "can (not) build" defn, fix throwing vs. rejecting for methods * Make build() and compute() explicitly invoke context lost steps * reword to reduce some indenting * restore missing algorithm class * Specify various destroy() methods * Lint/wording fix * whitespace change * another whitespace change * Move destroy of dependencies to 'lost' steps and other feedback * Abort steps / reject promises when context is lost * Separate out lost/lose steps, make dispatch() use abort-when * Incorporate more feedback, another lint rule * Include tensor creation in MLContext/destroy() note * Move async 'abort when'/'if aborted' into substeps * Update index.bs Co-authored-by: Reilly Grant <[email protected]> * Add suggested wording by @reillyeon for freeing graph resources on context timeline --------- Co-authored-by: Reilly Grant <[email protected]>
What happens if a WebNN operation dispatched through
MLContext
encounters some internal error which causes the GPU device to get removed?I would expect WebNN to provide a spec into how fatal (device) errors are handled so the WebNN developer could respond appropriately. If we want to do more with
MLContext
(ex. create buffers), I believe we'll need a more robust error mechanism like WebGPU [1].[1] https://www.w3.org/TR/webgpu/#errors-and-debugging
The text was updated successfully, but these errors were encountered: