-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consider consecutive estimator internal errors #3178
Conversation
/// Defines how many consecutive errors are allowed before the cache starts | ||
/// returning the error to the user without trying to fetch the price from the | ||
/// estimator. | ||
const ACCUMULATIVE_ERRORS_THRESHOLD: u32 = 5; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wouldn't be simpler and more robust to define a custom response for NoLiquidity
from the solvers side? So if we receive NoLiquidity
, there is no liquidity, any other answer is some kind of error like server down, rate limit, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already have this error: https://github.com/cowprotocol/services/pull/3178/files#diff-50d12b254f11a969c7dc675e6f3181c39a3785e541dac0963f869d2d3ec9bacaR231
Not sure I am following the idea. The purpose of this PR is to detect whether the solver is offline.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I didn't explain myself properly. The idea would be to have known errors, like NoLiquidity
for which we could keep requesting the prices at a "normal" pace, and then a set of known (e.g., rate limit) and unknown (server down) errors for which we would:
- request at a greater time interval, so it is not so spammy
- have a exponential backoff method with a maximum cap, so the spam is reduced significantly
but at the same time we kind of keep requesting the token in case the price becomes available, just the load is significantly reduced.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I got it correctly, that means a custom cache entry expiration time depending on the cached value type. Will think about that. At first glance, it requires much more changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that means a custom cache entry expiration time depending on the cached value type
we now time.now() and we can derive how much the previous expiration was with requested_at
(I may be wrong here, and we may need a new timestamp field). And then apply exponential backoff of the previous timeout (with a configurable, preferably) maximum cap value. Pretty simple, maybe I am missing something 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks correct to me.
Nice test. 👍
A bit unfortunate that this component accumulates complexity. Hopefully this can be addressed when we build the separate pricing service.
} | ||
None | ||
} | ||
} | ||
} | ||
|
||
fn get_ready_to_use_cached_price( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function doesn't feel like it's carrying its own weight. I think inlining it in the 2 places where it's used is fine and similarly expressive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just wanted to hide the filtering details there and avoid using a non-filtered cache anywhere else.
Description
From the original issue:
This PR addressed it by introducing a consecutive estimator's internal errors counter. The consecutive error threshold is hardcoded with a value of
5
(still a subject of consideration). While a threshold is not reached, the requests remain to be sent to the solver. Otherwise, the cache uses the cached value until it expires using the existing timeout or until another result gets cached. Received recoverable errors don't reset the accumulative errors counter.How to test
Added a unit test.
Related Issues
Fixes #3159