api: Add random exponential backoff to do_api_query. #538

orientor · 2020-02-23T10:47:03Z

Set delay time on failure in do_api_query according to random exponential backoff.

Fixes #537 .

orientor · 2020-02-23T10:52:30Z

Made a PR for do_api_query. But base and cap need to be decided. Also if this is done, then I can change class RandomExponentialBackoff accordingly.

showell · 2020-02-23T11:57:48Z

zulip/zulip/__init__.py

-            time.sleep(1)
+            delay_cap = 10
+            delay_base = 0.5
+            delay_time = random.random() * min(delay_cap, delay_base * (2 ** query_state["failures"]))


@orientor Thanks for working on this! The code looks nice and clean, but I have one concern.

The expression delay_base * (2 ** query_state{'failures']) will start overwhelming delay_cap by the fifth failure, and then we'll be computing exponential factors of 2 for no reason, and they can get big, well, exponentially.

I think the better strategy here is to tweak it so that after N failures, we just using the delay_cap value. I also think there is some point after which we should simply quit, but that can be for a future PR.

See https://chat.zulip.org/#narrow/stream/92-learning/topic/exponential.20backoff.20--.20with.20jitter/near/820056 for more discussion.

@showell changed the PR. Now it doesn't compute after 5th power of 2. Regarding stopping do_api_query stops after 10 failures.

@showell should delay_cap and delay_base be added as function parameters so that it can be decided by user?

showell · 2020-02-25T12:04:56Z

zulip/zulip/__init__.py

-        message = "Sleeping for %ss [max %s] before retrying." % (delay, delay_scale * 2)
+        delay_base = 0.5
+        delay = random.random() * self.delay_cap
+        if math.log2(self.delay_cap) > self.number_of_retries:


Ugh, I think my prior suggestion was a bit misleading. I didn't want us to actually calculate the logarithm at run time; I was just explaining mathematically that after log2N delays, we're always gonna get delay_cap here instead of 2 ** retries, but I didn't mean to actually code it this way.

It's my fault here for probably prematurely optimizing. (Well, I wasn't prematurely optimizing, I was more mistakenly worried about overflow bugs.) I think JS can handle large exponents of 2.

@showell changed the code accordingly. The optimization could be added later if needed.

timabbott · 2020-02-25T19:32:01Z

Can you provide terminal output showing you've tested the error handling behaves correctly?

orientor · 2020-02-27T01:59:40Z

I have tested the condition when retry is required. But in the original code retry happens only in two cases:

When we receive a 50x status code.
If we have connected to the server before but couldn't connect now.

Hence in ALL other cases (with no retry) my code works same as the original code.(Tested it and also an error gets raised in these cases so changing loop doesn't effect them)

For the first 2 cases the output will be like that shown in the image.

orientor · 2020-03-02T21:05:48Z

I also checked if it works normally in normal conditions.

orientor · 2020-03-02T21:10:46Z

Some of the other errors:

orientor · 2020-03-02T21:13:19Z

I have thoroughly checked most of the cases. Code working fine in all.

timabbott · 2020-03-04T21:12:06Z

@orientor can you rework your commits here to follow our commit style? I think what makes sense are:

A first commit that adjusts the behavior of RandomExponentialBackoff to add a delay_cap, with a default value of 90 (10 is too short).
A second commit that does the delay_base refactor (nonfunctional).
A third commit that changes the randomization approach in RandomExponentialBackoff, which we might choose to skip.
A next commit that migrates the error_retry logic to use RandomExponentialBackoff
Etc.

That approach will be reviewable, biseactable, and mergable.

orientor · 2020-03-04T21:46:43Z

@timabbott
I have reworked my changes into two commits.
In the first commit I am improving the Random Exponential Backoff class.
In the second commit I am updating the do_api_query method to use Random Exponential Backoff.

Now going to update it to 4.

orientor · 2020-03-04T23:02:11Z

@timabbott removed delay_base completely as it changed only one iteration(it was set to 1/2 so it would basically reduce power of 2 by 1) and I think was not required.

Reworked changes into 3 commits.

Added delay_cap as class variable to the CountingBackoff class, the superclass for RandomExponentialBackoff. Changed the default delay_cap to 90.
Changed the algorithm for Random Exponential Backoff.
Changed do_api_query so as to use Random Exponential backoff.

timabbott · 2020-03-05T00:43:08Z

The first commit adds the delay_cap field but doesn't have it do anything. That isn't a coherent commit -- it's just confusing were to merge just that.

Zulip's development model is around every commit in every PR being mergable incrementally, which is important for bisecting as well as efficient code review.

Can you fix the first commit to fully implement delay_cap?

orientor · 2020-03-05T16:26:53Z

@timabbott Improved the commits.
The first commit now basically adds delay_cap properly, with the Random Exponential Backoff now using delay_cap.
The second commit improves Random Exponential Backoff algorithm.
The third commit adds Random Exponential Backoff to do_api_query method.

timabbott · 2020-03-05T21:40:52Z

zulip/zulip/__init__.py

@@ -526,9 +523,6 @@ def error_retry(error_string):
                    sys.stdout.write(".")
                sys.stdout.flush()
            query_state["request"]["dont_block"] = json.dumps(True)
-            time.sleep(1)
-            query_state["failures"] += 1
-            return True


Why not just have this function do the backoff.fail() and return backoff.keep_going() ? It's deduplicate code.

This is a really nice idea. Implemented this and one other change and now my do_api_query commit is way more readable and has minimal changes.

timabbott · 2020-03-05T21:41:15Z

zulip/zulip/__init__.py

@@ -611,6 +610,7 @@ def end_error_retry(succeeded):
            end_error_retry(False)
            return {'msg': "Unexpected error from the server", "result": "http-error",
                    "status_code": res.status_code}
+        return {'msg': "Unexpected error from the server", "result": "unexpected-error"}


Why do we need to add this?

Removed after previous changes.

timabbott · 2020-03-05T21:42:28Z

I merged the first commit and posted a couple comments. I'll also note that this doesn't replace the time.sleep(1) in call_on_each_event, nor does it add a change to detect "invalid authentication" failures and not retry those, which I think we should do.

orientor · 2020-03-06T19:01:28Z

@timabbott Changed the code to be more readable and have minimal changes. I was not aware of call_on_each_event. Will make the necessary commits soon. Also can you shed some light on invalid authentication? According to me the only way of having invalid authentication is wrong API key. Are there any other methods? I will make the changes accordingly.

Fixes zulip#537.

orientor · 2020-03-15T11:18:42Z

Added exponential backoff to call_on_each_event. But the function was originally intended to keep retrying to connect the server until successful. So an infinite loop. @timabbott @showell Is that intended or should I add a max number of retries?

zulipbot · 2020-04-19T03:36:36Z

Heads up @orientor, we just merged some commits that conflict with the changes your made in this pull request! You can review this repository's recent commits to see where the conflicts occur. Please rebase your feature branch against the upstream/master branch and resolve your pull request's merge conflicts accordingly.

zulipbot added the size: S label Feb 23, 2020

showell reviewed Feb 23, 2020

View reviewed changes

orientor changed the title ~~API: Added random exponential backoff to do_api_query~~ API: Add random exponential backoff to do_api_query. Feb 24, 2020

orientor force-pushed the Exponential branch from a0312fe to 541df63 Compare February 24, 2020 01:01

zulipbot added size: M and removed size: S labels Feb 25, 2020

showell reviewed Feb 25, 2020

View reviewed changes

orientor force-pushed the Exponential branch from 3064c35 to 4724c3c Compare February 25, 2020 14:59

orientor force-pushed the Exponential branch 3 times, most recently from 91cd715 to 967487a Compare March 1, 2020 09:24

orientor changed the title ~~API: Add random exponential backoff to do_api_query.~~ api: Add random exponential backoff to do_api_query. Mar 1, 2020

orientor force-pushed the Exponential branch from 967487a to 59ac76b Compare March 4, 2020 21:42

orientor force-pushed the Exponential branch from 59ac76b to ee74fc9 Compare March 4, 2020 22:55

orientor force-pushed the Exponential branch from ee74fc9 to b01d523 Compare March 5, 2020 04:23

zulipbot added the has conflicts label Mar 5, 2020

timabbott reviewed Mar 5, 2020

View reviewed changes

zulipbot removed the has conflicts label Mar 6, 2020

orientor force-pushed the Exponential branch 2 times, most recently from 4fc8183 to 62f6f7c Compare March 6, 2020 18:51

zulipbot added size: S and removed size: M labels Mar 6, 2020

orientor added 2 commits March 15, 2020 16:11

api: Improve and optimize Random Exponential Backoff algorithm.

5a28a6b

Fixes zulip#537.

api: Use Random Exponential Backoff in do_api_query method.

fd07c4e

Fixes zulip#537.

orientor force-pushed the Exponential branch from 62f6f7c to fd07c4e Compare March 15, 2020 10:43

api: Use Random Exponential Backoff in call_on_each_event method.

d333376

zulipbot added the has conflicts label Apr 19, 2020

Uh oh!

api: Add random exponential backoff to do_api_query. #538

Are you sure you want to change the base?

api: Add random exponential backoff to do_api_query. #538

Uh oh!

Conversation

orientor commented Feb 23, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

orientor commented Feb 23, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

orientor Feb 24, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

timabbott commented Feb 25, 2020

Uh oh!

orientor commented Feb 27, 2020

Uh oh!

orientor commented Mar 2, 2020

Uh oh!

orientor commented Mar 2, 2020

Uh oh!

orientor commented Mar 2, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

timabbott commented Mar 4, 2020

Uh oh!

orientor commented Mar 4, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

orientor commented Mar 4, 2020

Uh oh!

timabbott commented Mar 5, 2020

Uh oh!

orientor commented Mar 5, 2020

Uh oh!

timabbott Mar 5, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

timabbott commented Mar 5, 2020

Uh oh!

orientor commented Mar 6, 2020

Uh oh!

orientor commented Mar 15, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zulipbot commented Apr 19, 2020

Uh oh!

Uh oh!

orientor commented Feb 23, 2020 •

edited

Loading

orientor Feb 24, 2020 •

edited

Loading

orientor commented Mar 2, 2020 •

edited

Loading

orientor commented Mar 4, 2020 •

edited

Loading

timabbott Mar 5, 2020 •

edited

Loading

orientor commented Mar 15, 2020 •

edited

Loading