-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Attempt to fix GCP autoscaler threading issue #49440
Conversation
Thank you for your contribution! Looks good. Would you mind converting it to a formal PR by signing off all commits and fixing the lint CI error? I will use your PR to run some experiments to see if your changes applied to 2.37.0 can resolve the issue. I’ll post the results here afterward. Update: Experiment PR is here. #49442 |
b9d9c8f
to
cd26d3d
Compare
I signed-off and formatted (with |
As per https://googleapis.github.io/google-api-python-client/docs/thread_safety.html Signed-off-by: Kristian Hartikainen <[email protected]>
cd26d3d
to
376c73b
Compare
I think it should be fine. First, the two files you modified are only related to GCP, so components other than GCP won't be affected. Secondly, the functions you changed are all related to creating resources. As I mentioned in #49440 (comment), I manually built your changes into a wheel and tested deploying it on GCP. From my observations, the compute engines were created correctly. As for cc @rynewang Do you have any other concerns about this? |
Looks good! |
Merged. Thanks for your contribution! |
This is an attempt to fix #46451. I believe that the issue is caused by incorrect handling of
google-api-python-client
-library, which is thread-unsafe as per the documentation.The above-mentioned documentation also describes how the thread-unsafety can be worked around by instantiating separate
httplib2.Http()
object for each thread. Here, I've just slapped the request build pattern onto GCP service creation code. I have not yet fully verified whether this works or not as debugging this is slow and quite a pain in the ass. What I have verified though is that the SSL errors that I mentioned in #46451 (comment) and which consistently occurred in/tmp/ray/session_latest/logs/monitor.out
are now gone.I'm not sure when I have time to get back to this but I'll leave this here in case someone else wants to take a look.
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.