Consider consecutive estimator internal errors #3178

squadgazzz · 2024-12-20T20:03:15Z

Description

From the original issue:

Native prices are an essential part of the auction. That's why the autopilot needs to have a valid native price for all tokens (sold and bought) in an auction. The NativePriceCache has a background task that continuously updates missing native prices.
It obviously caches valid native prices but also responses indicating that no solver has enough liquidity for the requested token. This is done to avoid fetching native prices over and over for tokens that no solver supports.
However, when a solver reports some different error (best example are rate limiting errors) we assume they are just intermittent and re-requesting the native price might work in the future.

The problem is that this logic does not consider the case where a solver always returns an error for a given token while also not other solver is able to produce a good result.
This happened recently where one solver always returned 404 errors because it got shut down on staging. Since no solver was able to produce a price and the logic assume these errors are intermittent we queried the native price for a few tokens over and over again which resulted in a lot more request than usual.

This PR addressed it by introducing a consecutive estimator's internal errors counter. The consecutive error threshold is hardcoded with a value of 5(still a subject of consideration). While a threshold is not reached, the requests remain to be sent to the solver. Otherwise, the cache uses the cached value until it expires using the existing timeout or until another result gets cached. Received recoverable errors don't reset the accumulative errors counter.

How to test

Added a unit test.

Related Issues

Fixes #3159

m-lord-renkse · 2024-12-30T10:41:07Z

crates/shared/src/price_estimation/native_price_cache.rs

+/// Defines how many consecutive errors are allowed before the cache starts
+/// returning the error to the user without trying to fetch the price from the
+/// estimator.
+const ACCUMULATIVE_ERRORS_THRESHOLD: u32 = 5;


wouldn't be simpler and more robust to define a custom response for NoLiquidity from the solvers side? So if we receive NoLiquidity, there is no liquidity, any other answer is some kind of error like server down, rate limit, etc.

We already have this error: https://github.com/cowprotocol/services/pull/3178/files#diff-50d12b254f11a969c7dc675e6f3181c39a3785e541dac0963f869d2d3ec9bacaR231
Not sure I am following the idea. The purpose of this PR is to detect whether the solver is offline.

Sorry, I didn't explain myself properly. The idea would be to have known errors, like NoLiquidity for which we could keep requesting the prices at a "normal" pace, and then a set of known (e.g., rate limit) and unknown (server down) errors for which we would:

request at a greater time interval, so it is not so spammy

have a exponential backoff method with a maximum cap, so the spam is reduced significantly
but at the same time we kind of keep requesting the token in case the price becomes available, just the load is significantly reduced.

If I got it correctly, that means a custom cache entry expiration time depending on the cached value type. Will think about that. At first glance, it requires much more changes.

that means a custom cache entry expiration time depending on the cached value type

we now time.now() and we can derive how much the previous expiration was with requested_at (I may be wrong here, and we may need a new timestamp field). And then apply exponential backoff of the previous timeout (with a configurable, preferably) maximum cap value. Pretty simple, maybe I am missing something 🤔

MartinquaXD

Looks correct to me.
Nice test. 👍
A bit unfortunate that this component accumulates complexity. Hopefully this can be addressed when we build the separate pricing service.

crates/shared/src/price_estimation/native_price_cache.rs

MartinquaXD · 2024-12-31T12:38:29Z

crates/shared/src/price_estimation/native_price_cache.rs

                }
                None
            }
        }
    }

+    fn get_ready_to_use_cached_price(


This function doesn't feel like it's carrying its own weight. I think inlining it in the 2 places where it's used is fine and similarly expressive.

I just wanted to hide the filtering details there and avoid using a non-filtered cache anywhere else.

crates/shared/src/price_estimation/native_price_cache.rs

squadgazzz added 5 commits December 20, 2024 20:02

Consider consecutive estimator internal errors

4138a19

Naming

f5cec23

Test

722e919

Naming

a89405f

Naming

0c264d6

squadgazzz marked this pull request as ready for review December 24, 2024 20:10

squadgazzz requested a review from a team as a code owner December 24, 2024 20:10

squadgazzz added 5 commits December 24, 2024 20:12

Doc

5856ce1

Naming

6411999

Improved test

c290c0e

Improve the test

53e402b

Minor

f618ef5

m-lord-renkse reviewed Dec 30, 2024

View reviewed changes

MartinquaXD approved these changes Dec 31, 2024

View reviewed changes

squadgazzz added 2 commits December 31, 2024 14:17

Nits

f8ac38e

Merge branch 'main' into fix/3159

ca517b0

squadgazzz enabled auto-merge (squash) January 2, 2025 11:34

squadgazzz merged commit 1294738 into main Jan 2, 2025
11 checks passed

squadgazzz deleted the fix/3159 branch January 2, 2025 11:39

github-actions bot locked and limited conversation to collaborators Jan 2, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consider consecutive estimator internal errors #3178

Consider consecutive estimator internal errors #3178

squadgazzz commented Dec 20, 2024 •

edited

Loading

m-lord-renkse Dec 30, 2024 •

edited

Loading

squadgazzz Dec 30, 2024

m-lord-renkse Dec 30, 2024

squadgazzz Dec 30, 2024

m-lord-renkse Dec 30, 2024 •

edited

Loading

MartinquaXD left a comment

MartinquaXD Dec 31, 2024

squadgazzz Dec 31, 2024

Consider consecutive estimator internal errors #3178

Consider consecutive estimator internal errors #3178

Conversation

squadgazzz commented Dec 20, 2024 • edited Loading

Description

How to test

Related Issues

m-lord-renkse Dec 30, 2024 • edited Loading

Choose a reason for hiding this comment

squadgazzz Dec 30, 2024

Choose a reason for hiding this comment

m-lord-renkse Dec 30, 2024

Choose a reason for hiding this comment

squadgazzz Dec 30, 2024

Choose a reason for hiding this comment

m-lord-renkse Dec 30, 2024 • edited Loading

Choose a reason for hiding this comment

MartinquaXD left a comment

Choose a reason for hiding this comment

MartinquaXD Dec 31, 2024

Choose a reason for hiding this comment

squadgazzz Dec 31, 2024

Choose a reason for hiding this comment

squadgazzz commented Dec 20, 2024 •

edited

Loading

m-lord-renkse Dec 30, 2024 •

edited

Loading

m-lord-renkse Dec 30, 2024 •

edited

Loading