Metrics-based bad token detector #3172

squadgazzz · 2024-12-18T13:46:24Z

Description

Follow-up to #3156. This PR introduces an in-memory, ratio-based bad token detection strategy that complements the existing heuristics. Instead of relying on consecutive failures, it keeps track of both successful and failed settlement attempts for each token. The logic is as follows:

When a settlement encoding fails, every token involved in that attempt has its statistics updated: both total attempts and failed attempts are incremented.
Otherwise, every token involved has its total attempts incremented but not its failures.
Before marking a token as unsupported, the detector requires at least 20 recorded attempts. Once this threshold is met, if the token's failure ratio (fails / attempts) is at least 90%, it is considered unsupported.

This approach is more resilient than just counting consecutive failures. A highly utilized and generally reliable token that occasionally appears in failing trades with problematic tokens won't be prematurely flagged as unsupported because its overall success ratio remains high.

Due to the nature of the implementation, all the statistics get discarded on every restart. Implementing a persistence layer might make sense in the future, but problems with bad tokens are usually temporal.

How to test

A forked e2e test for the simulation-based detector is required first, and it is expected to be implemented in a separate PR.

crates/driver/src/domain/competition/bad_tokens/mod.rs

MartinquaXD · 2024-12-19T14:04:30Z

crates/driver/src/domain/competition/bad_tokens/metrics.rs

+    /// Defines the threshold for the number of consecutive unsupported
+    /// quality detections before a token is considered unsupported.
+    const UNSUPPORTED_THRESHOLD: u8 = 5;


I think we need to be less blunt with this heuristic. Let's say there is only 1 market order USDC <> SHITCOIN then the solver will try to solve that over and over and this detector will end up marking USDC as unsupported too.
A better approach might be to count how often a token could be encoded vs. how often it couldn't. That way the heuristic shouldn't flag USDC as unsupported because it will be involved in a ton of trades with "normal" tokens which increase its "goodness" ratio.
I believe the simplest implementation of this strategy needs just 2 variables:

number of measurements you require at least per token

ratio of failing encodings a token must have

E.g. at least 20 measurements and a at least 90% of good measurements.

True, thanks.

at least 90% of good measurements

Or 90% of failures?

crates/driver/src/domain/competition/bad_tokens/metrics.rs

crates/driver/src/domain/competition/mod.rs

crates/driver/src/infra/config/file/mod.rs

mstrug · 2024-12-19T23:35:21Z

the detector requires at least 20 recorded attempts

Is it just an ordinary chosen number or is there a reason for the 20 attempts value?

crates/driver/src/domain/competition/bad_tokens/metrics.rs

mstrug · 2024-12-20T00:21:20Z

crates/driver/src/domain/competition/bad_tokens/metrics.rs

+    /// `failure` indicates whether the settlement was successful or not.
+    pub fn update_tokens(
+        &self,
+        token_pairs: Vec<(eth::TokenAddress, eth::TokenAddress)>,


Rust standard is to pass slice references instead of vectors, so I'm proposing to change that to: &[(eth::TokenAddress, eth::TokenAddress)].

I wanted to avoid redundant cloning when calling Map::entry function, which requires the owned type, and if that’s not sufficient, let’s adhere to the idiomatic approach that unnecessarily forces copying values.

In current solution with Vec<> as function argument particular token is still copied in call to Map::entry. It is copied because ownership of vector item cannot be taken from Vec and moved to function or variable. This is because vector stores its data in continuous heap memory buffer and to pass any vector item by value to function requires copy of this item from heap to stack.
So changing to slice as argument will not introduce any additional copy.

particular token is still copied in call to Map::entry. It is copied because ownership of vector item cannot be taken from Vec and moved to function or variable.

Wait, what? Ownership of vector items can be moved from a Vec to a function or variable when the Vec is consumed (e.g., via .into_iter()), and moving items does not involve copying unless explicitly cloned. Map::entry works with owned values, so no copying occurs when items are moved directly.

The suggested idiomatic approach works with references, so when passing a value to the Map::entry, it needs to be cloned/copied first. So I assume the redundant allocation still exists.

Yes, you are right, into_iter() can move all items by direct iterating over the vector data buffer. I had on my mind case of moving one item from the vector (using index operator for instance) which is not possible.

I was playing a bit with sample code to check how passing by value and reference looks after compilation to see in which case memcpy is used, and it looks that for code without optimizations function parameters are passed by registers (if they fit) or copied by memcpy (if size is too large to use with registers) and it is valid for both pass-by value and pass-by reference. Basically the code for both functions looks similar. After enabling release level optimizations there is no memcpy for smaller and larger data size and direct memory access is used for iteration. Code of pass-by value and reference looks also very similar.
I think above is valid if we are using types which implements Copy trait, in other case this probably looks different (for instance for String type).
If you want to look at the asm here is the code which I was using: https://godbolt.org/z/ejGbYjcP4

Summarizing: In my opinion we should stick with idiomatic approach and compiler will do the optimal final result without any memory copy in this case.

crates/driver/src/domain/competition/bad_tokens/metrics.rs

crates/driver/src/domain/competition/bad_tokens/mod.rs

crates/driver/src/infra/api/mod.rs

crates/driver/src/domain/competition/bad_tokens/metrics.rs

squadgazzz · 2024-12-20T10:30:24Z

Is it just an ordinary chosen number or is there a reason for the 20 attempts value?

Followed this suggestion #3172 (comment)

# Conflicts: # crates/driver/src/domain/competition/bad_tokens/cache.rs # crates/driver/src/domain/competition/bad_tokens/metrics.rs # crates/driver/src/domain/competition/bad_tokens/mod.rs # crates/driver/src/domain/competition/bad_tokens/simulation.rs # crates/driver/src/domain/competition/mod.rs # crates/driver/src/infra/api/mod.rs # crates/driver/src/infra/config/file/load.rs # crates/driver/src/infra/config/file/mod.rs # crates/driver/src/infra/solver/mod.rs

sunce86

Code LG.

Marking all tokens in a solution as bad because maybe only one trade was bad can be problematic, but I assume this is how it worked before?

squadgazzz · 2024-12-23T12:25:35Z

Marking all tokens in a solution as bad because maybe only one trade was bad can be problematic, but I assume this is how it worked before?

This is the first implementation of this logic. Previously, a Grafana metric was used to analyze the data manually.

MartinquaXD and others added 27 commits December 5, 2024 08:29

Reduce dependencies of trace call detector for use in driver

c745cb2

Allow executing pre-interactions before hand

4353d3a

TBC

d506d2e

TBC

5e5241a

wip

d445982

fix

8ff6825

Merge remote-tracking branch 'origin/main' into kill-bad-tokens-1

1cca6b0

Refactor filtering logic to avoid allocations

9dce0be

Remove filter helper function

b1b174d

reference-count Cache internally for simpler API

144521e

Make some functions private

41c41ed

Simplify config logic

bda4d1c

fixup comment

1600736

Merge remote-tracking branch 'origin/main' into kill-bad-tokens-1

90877b9

Rename and simplify

5c560dc

Split logic into separate files

7637891

some cleanup

434181e

Add request sharing to bad token detection

3fc408b

fixup

1e2ac01

enable driver bad token detection in e2e tests

f3650a1

Reduce diff

a825099

fixup

b4d907b

fixup

4cec1a2

Merge branch 'main' into kill-bad-tokens-1

f74a8bb

Merge branch 'main' into kill-bad-tokens-1

9787753

Fix cache eviction logic

f1e32d7

Replace .with_config() with new()

3f29b2e

squadgazzz marked this pull request as ready for review December 18, 2024 14:26

squadgazzz requested a review from a team as a code owner December 18, 2024 14:26

squadgazzz force-pushed the bad-token/metrics branch from 79eeee5 to 2ea71bd Compare December 18, 2024 14:31

MartinquaXD reviewed Dec 18, 2024

View reviewed changes

crates/driver/src/domain/competition/bad_tokens/mod.rs Outdated Show resolved Hide resolved

squadgazzz marked this pull request as draft December 18, 2024 15:39

squadgazzz force-pushed the bad-token/metrics branch from 2ea71bd to fb5794c Compare December 18, 2024 17:12

squadgazzz added 3 commits December 18, 2024 17:12

Implement metrics-based bad token detection strategy

fb5794c

Docs

e8f6962

More docs

cd7fa0f

squadgazzz marked this pull request as ready for review December 18, 2024 17:29

Stop incrementing counter once threshold is reached

838ae16

MartinquaXD reviewed Dec 19, 2024

View reviewed changes

squadgazzz added 3 commits December 19, 2024 18:55

Reworked logic

8e6bc46

Remove hashset

95536d4

Typo

ac3ad22

mstrug requested changes Dec 20, 2024

View reviewed changes

MartinquaXD reviewed Dec 20, 2024

View reviewed changes

Review comments

ec35222

squadgazzz mentioned this pull request Dec 20, 2024

Make metrics bad token detector configurable #3176

Merged

Shared detector

4b2de81

squadgazzz added 2 commits December 20, 2024 10:36

Naming

8134837

Typo

444e92e

mstrug approved these changes Dec 20, 2024

View reviewed changes

Base automatically changed from kill-bad-tokens-1 to main December 20, 2024 15:03

sunce86 reviewed Dec 23, 2024

View reviewed changes

sunce86 approved these changes Dec 23, 2024

View reviewed changes

squadgazzz merged commit 7e52015 into main Dec 23, 2024
11 checks passed

squadgazzz deleted the bad-token/metrics branch December 23, 2024 12:26

github-actions bot locked and limited conversation to collaborators Dec 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Metrics-based bad token detector #3172

Metrics-based bad token detector #3172

squadgazzz commented Dec 18, 2024 •

edited

Loading

MartinquaXD Dec 19, 2024

squadgazzz Dec 19, 2024

squadgazzz Dec 19, 2024

mstrug commented Dec 19, 2024

mstrug Dec 20, 2024

squadgazzz Dec 20, 2024

mstrug Dec 20, 2024

squadgazzz Dec 20, 2024 •

edited

Loading

mstrug Dec 21, 2024

squadgazzz commented Dec 20, 2024

sunce86 left a comment

squadgazzz commented Dec 23, 2024

Metrics-based bad token detector #3172

Metrics-based bad token detector #3172

Conversation

squadgazzz commented Dec 18, 2024 • edited Loading

Description

How to test

MartinquaXD Dec 19, 2024

Choose a reason for hiding this comment

squadgazzz Dec 19, 2024

Choose a reason for hiding this comment

squadgazzz Dec 19, 2024

Choose a reason for hiding this comment

mstrug commented Dec 19, 2024

mstrug Dec 20, 2024

Choose a reason for hiding this comment

squadgazzz Dec 20, 2024

Choose a reason for hiding this comment

mstrug Dec 20, 2024

Choose a reason for hiding this comment

squadgazzz Dec 20, 2024 • edited Loading

Choose a reason for hiding this comment

mstrug Dec 21, 2024

Choose a reason for hiding this comment

squadgazzz commented Dec 20, 2024

sunce86 left a comment

Choose a reason for hiding this comment

squadgazzz commented Dec 23, 2024

squadgazzz commented Dec 18, 2024 •

edited

Loading

squadgazzz Dec 20, 2024 •

edited

Loading