-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature request: Provide support for concurrent HTTP requests #75
Comments
Hello, @fullymiddleaged :) The onnxruntime-server currently operates using a thread pool, at least for the TCP server. After reviewing your question, I noticed that the HTTP/HTTPS server does not seem to utilize the thread pool. I appreciate you bringing this to our attention. I’m currently working on addressing this issue. It seems that a thread-per-client model might be a better fit for handling concurrent HTTP requests effectively. I’ll explore potential solutions and get back to you with an update soon. Thank you for your patience! |
Hi, @fullymiddleaged I have resolved this issue. I usually align my releases with the ONNX Runtime release cycle. Can you wait until a version after 1.20.1 is released? If it’s urgent, I can release a 1.20.1a version. |
Hey @kibae! Happy new year! And no worries, im glad I raised it and you have resolved. I can probably wait until the next release for this but looking forward to testing then! |
Howdy! :)
I appreciate this could be a challenge in C++, but as onnxruntime supports multi-threading it would be great if the HTTP/TCP server side of this app did too and created a new thread per request. This would better utilise the CPU as well as provide an improved client experience. At the moment, if I test concurrent/over-lapping HTTP requests I see the results coming back in serial. This leads to a sharp increase in response times for each concurrent user.
E.g. If my model takes around 500ms to respond, I am seeing average times of 2500ms when tested with 5 concurrent users.
I'm not sure if the TCP API is different and perhaps supports concurrent requests instead of HTTP.
Let me know what you think!
The text was updated successfully, but these errors were encountered: