-
-
Notifications
You must be signed in to change notification settings - Fork 393
api: Add random exponential backoff to do_api_query. #538
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Made a PR for do_api_query. But base and cap need to be decided. Also if this is done, then I can change class RandomExponentialBackoff accordingly. |
zulip/zulip/__init__.py
Outdated
time.sleep(1) | ||
delay_cap = 10 | ||
delay_base = 0.5 | ||
delay_time = random.random() * min(delay_cap, delay_base * (2 ** query_state["failures"])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@orientor Thanks for working on this! The code looks nice and clean, but I have one concern.
The expression delay_base * (2 ** query_state{'failures'])
will start overwhelming delay_cap
by the fifth failure, and then we'll be computing exponential factors of 2 for no reason, and they can get big, well, exponentially.
I think the better strategy here is to tweak it so that after N failures, we just using the delay_cap
value. I also think there is some point after which we should simply quit, but that can be for a future PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@showell changed the PR. Now it doesn't compute after 5th power of 2. Regarding stopping do_api_query stops after 10 failures.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@showell should delay_cap and delay_base be added as function parameters so that it can be decided by user?
a0312fe
to
541df63
Compare
zulip/zulip/__init__.py
Outdated
message = "Sleeping for %ss [max %s] before retrying." % (delay, delay_scale * 2) | ||
delay_base = 0.5 | ||
delay = random.random() * self.delay_cap | ||
if math.log2(self.delay_cap) > self.number_of_retries: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ugh, I think my prior suggestion was a bit misleading. I didn't want us to actually calculate the logarithm at run time; I was just explaining mathematically that after log2N delays, we're always gonna get delay_cap
here instead of 2 ** retries
, but I didn't mean to actually code it this way.
It's my fault here for probably prematurely optimizing. (Well, I wasn't prematurely optimizing, I was more mistakenly worried about overflow bugs.) I think JS can handle large exponents of 2.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@showell changed the code accordingly. The optimization could be added later if needed.
3064c35
to
4724c3c
Compare
Can you provide terminal output showing you've tested the error handling behaves correctly? |
I have tested the condition when retry is required. But in the original code retry happens only in two cases:
Hence in ALL other cases (with no retry) my code works same as the original code.(Tested it and also an error gets raised in these cases so changing loop doesn't effect them) For the first 2 cases the output will be like that shown in the image. |
91cd715
to
967487a
Compare
I have thoroughly checked most of the cases. Code working fine in all. |
@orientor can you rework your commits here to follow our commit style? I think what makes sense are:
That approach will be reviewable, biseactable, and mergable. |
@timabbott Now going to update it to 4. |
@timabbott removed delay_base completely as it changed only one iteration(it was set to 1/2 so it would basically reduce power of 2 by 1) and I think was not required. Reworked changes into 3 commits.
|
The first commit adds the Zulip's development model is around every commit in every PR being mergable incrementally, which is important for bisecting as well as efficient code review. Can you fix the first commit to fully implement delay_cap? |
@timabbott Improved the commits. |
@@ -526,9 +523,6 @@ def error_retry(error_string): | |||
sys.stdout.write(".") | |||
sys.stdout.flush() | |||
query_state["request"]["dont_block"] = json.dumps(True) | |||
time.sleep(1) | |||
query_state["failures"] += 1 | |||
return True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just have this function do the backoff.fail()
and return backoff.keep_going()
? It's deduplicate code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a really nice idea. Implemented this and one other change and now my do_api_query commit is way more readable and has minimal changes.
zulip/zulip/__init__.py
Outdated
@@ -611,6 +610,7 @@ def end_error_retry(succeeded): | |||
end_error_retry(False) | |||
return {'msg': "Unexpected error from the server", "result": "http-error", | |||
"status_code": res.status_code} | |||
return {'msg': "Unexpected error from the server", "result": "unexpected-error"} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to add this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed after previous changes.
I merged the first commit and posted a couple comments. I'll also note that this doesn't replace the |
4fc8183
to
62f6f7c
Compare
@timabbott Changed the code to be more readable and have minimal changes. I was not aware of call_on_each_event. Will make the necessary commits soon. Also can you shed some light on invalid authentication? According to me the only way of having invalid authentication is wrong API key. Are there any other methods? I will make the changes accordingly. |
Added exponential backoff to call_on_each_event. But the function was originally intended to keep retrying to connect the server until successful. So an infinite loop. @timabbott @showell Is that intended or should I add a max number of retries? |
Heads up @orientor, we just merged some commits that conflict with the changes your made in this pull request! You can review this repository's recent commits to see where the conflicts occur. Please rebase your feature branch against the |
Set delay time on failure in do_api_query according to random exponential backoff.
Fixes #537 .