Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ORT dist - Use dispatch() for SD 1.5, SD Turbo and Whisper Base #56

Merged
merged 1 commit into from
Nov 12, 2024

Conversation

ibelem
Copy link
Contributor

@ibelem ibelem commented Nov 12, 2024

compute() will be deprecated and removed in favor of dispatch().

Update latest dev version of ORT dists which support dispatch() for SD 1.5, SD Turbo and Whisper Base demos.

@fdwr PTAL

Copy link
Collaborator

@fdwr fdwr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Belem. The dev change makes sense, but stable being empty?

assets/js/common_utils.js Show resolved Hide resolved
@fdwr fdwr merged commit 6d0fed2 into microsoft:main Nov 12, 2024
1 check passed
@ibelem ibelem deleted the ort-dists-dispatch branch November 13, 2024 01:37
@eyaler
Copy link

eyaler commented Nov 29, 2024

@ibelem @fdwr I am seeing a considerable slowdown in the sd-turbo performance with unet inference times going from 100ms to 1000ms. this happens when changing from 1.20.0-dev.20240927-b81e76b9a6 (or 1.20.0-dev.20240919-bd60add8ce used here before this commit) to 1.20.0-dev.20240928-1bda91fc57 (or later including 1.21.0-dev.20241109-d3ad76b2cf of this commit and the latest 1.21.0-dev.20241127-b930b4ab5b). The main change seems to be the dispatch() change in ORT. I did not change anything in the code other than the ORT version.

@ibelem
Copy link
Contributor Author

ibelem commented Dec 2, 2024

eyaler Thanks for the report! What's your detailed test environment? We looked at the performance gap between compute() and dispatch(), but didn't see a 10x performance drop when comparing the daily performance test reports for unet models. Did you clear the cache/memory when compare the performance?

  • CPU model
  • GPU model
  • GPU driver version
  • Memory

@eyaler
Copy link

eyaler commented Dec 2, 2024

@ibelem

AMD Ryzen 7 6800H
RTX 3070 Ti Laptop (8 GB)
Nvidia Studio driver 566.14 (latest)
64GB RAM
Chrome 131.0.6778.86

not sure about clearing cache/memory - what is the recommended procedure? i did switch ORT versions back and forth multiple times and could consistently see the performance differences correlated with the compute/dispatch change for the UNET (as well as the VAE encoder in my im2im fork)

@ibelem
Copy link
Contributor Author

ibelem commented Dec 3, 2024

Thanks @eyaler , I just wanted to check if you are under clean test environment (e.g. no other tabs openned or no other backgound heavy applications are running) since there is a knonw memory increasing issue for Tab (blink process).

Screenshot 2024-12-03 093920

We don't have RTX 3070 Ti Laptop but the RTX 4070S, the first and second inference times are:

Screenshot 2024-12-03 093650
Screenshot 2024-12-03 093726

1 2 3 4
167.30 26.60 24.80 23.10
93.80 25.90 25.20 23.80

ONNX Runtime Web: 1.21.0-dev.20241122-a2ba3cb547 dispatch()

Considering the performance gap between 3070Ti Laptop and 4070S, the ~26ms is the expected results on 4070S.

Processor	12th Gen Intel(R) Core(TM) i9-12900K   3.20 GHz
Installed RAM	32.0 GB (31.7 GB usable)
GPU: RTX 4070 SUPER 
GPU Driver: 32.0.15.6603
Edition	Windows 11 Enterprise
Version	23H2
OS build	22631.4460
Tested on Chrome Canary 133.0.6873.0

Have you tried on https://microsoft.github.io/webnn-developer-preview/demos/sd-turbo/ directly? Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants