Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metrics-based bad token detector #3172

Merged
merged 39 commits into from
Dec 23, 2024
Merged

Metrics-based bad token detector #3172

merged 39 commits into from
Dec 23, 2024

Conversation

squadgazzz
Copy link
Contributor

@squadgazzz squadgazzz commented Dec 18, 2024

Description

Follow-up to #3156. This PR introduces an in-memory, ratio-based bad token detection strategy that complements the existing heuristics. Instead of relying on consecutive failures, it keeps track of both successful and failed settlement attempts for each token. The logic is as follows:

  1. When a settlement encoding fails, every token involved in that attempt has its statistics updated: both total attempts and failed attempts are incremented.
  2. Otherwise, every token involved has its total attempts incremented but not its failures.
  3. Before marking a token as unsupported, the detector requires at least 20 recorded attempts. Once this threshold is met, if the token's failure ratio (fails / attempts) is at least 90%, it is considered unsupported.

This approach is more resilient than just counting consecutive failures. A highly utilized and generally reliable token that occasionally appears in failing trades with problematic tokens won't be prematurely flagged as unsupported because its overall success ratio remains high.

Due to the nature of the implementation, all the statistics get discarded on every restart. Implementing a persistence layer might make sense in the future, but problems with bad tokens are usually temporal.

How to test

A forked e2e test for the simulation-based detector is required first, and it is expected to be implemented in a separate PR.

@squadgazzz squadgazzz marked this pull request as ready for review December 18, 2024 14:26
@squadgazzz squadgazzz requested a review from a team as a code owner December 18, 2024 14:26
@squadgazzz squadgazzz marked this pull request as draft December 18, 2024 15:39
@squadgazzz squadgazzz marked this pull request as ready for review December 18, 2024 17:29
Comment on lines 22 to 24
/// Defines the threshold for the number of consecutive unsupported
/// quality detections before a token is considered unsupported.
const UNSUPPORTED_THRESHOLD: u8 = 5;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to be less blunt with this heuristic. Let's say there is only 1 market order USDC <> SHITCOIN then the solver will try to solve that over and over and this detector will end up marking USDC as unsupported too.
A better approach might be to count how often a token could be encoded vs. how often it couldn't. That way the heuristic shouldn't flag USDC as unsupported because it will be involved in a ton of trades with "normal" tokens which increase its "goodness" ratio.
I believe the simplest implementation of this strategy needs just 2 variables:

  • number of measurements you require at least per token
  • ratio of failing encodings a token must have

E.g. at least 20 measurements and a at least 90% of good measurements.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, thanks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

at least 90% of good measurements

Or 90% of failures?

crates/driver/src/domain/competition/bad_tokens/metrics.rs Outdated Show resolved Hide resolved
crates/driver/src/domain/competition/mod.rs Outdated Show resolved Hide resolved
crates/driver/src/infra/config/file/mod.rs Outdated Show resolved Hide resolved
@mstrug
Copy link
Contributor

mstrug commented Dec 19, 2024

the detector requires at least 20 recorded attempts

Is it just an ordinary chosen number or is there a reason for the 20 attempts value?

crates/driver/src/domain/competition/bad_tokens/metrics.rs Outdated Show resolved Hide resolved
crates/driver/src/domain/competition/bad_tokens/metrics.rs Outdated Show resolved Hide resolved
/// `failure` indicates whether the settlement was successful or not.
pub fn update_tokens(
&self,
token_pairs: Vec<(eth::TokenAddress, eth::TokenAddress)>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rust standard is to pass slice references instead of vectors, so I'm proposing to change that to: &[(eth::TokenAddress, eth::TokenAddress)].

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to avoid redundant cloning when calling Map::entry function, which requires the owned type, and if that’s not sufficient, let’s adhere to the idiomatic approach that unnecessarily forces copying values.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In current solution with Vec<> as function argument particular token is still copied in call to Map::entry. It is copied because ownership of vector item cannot be taken from Vec and moved to function or variable. This is because vector stores its data in continuous heap memory buffer and to pass any vector item by value to function requires copy of this item from heap to stack.
So changing to slice as argument will not introduce any additional copy.

Copy link
Contributor Author

@squadgazzz squadgazzz Dec 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

particular token is still copied in call to Map::entry. It is copied because ownership of vector item cannot be taken from Vec and moved to function or variable.

Wait, what? Ownership of vector items can be moved from a Vec to a function or variable when the Vec is consumed (e.g., via .into_iter()), and moving items does not involve copying unless explicitly cloned. Map::entry works with owned values, so no copying occurs when items are moved directly.

The suggested idiomatic approach works with references, so when passing a value to the Map::entry, it needs to be cloned/copied first. So I assume the redundant allocation still exists.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you are right, into_iter() can move all items by direct iterating over the vector data buffer. I had on my mind case of moving one item from the vector (using index operator for instance) which is not possible.

I was playing a bit with sample code to check how passing by value and reference looks after compilation to see in which case memcpy is used, and it looks that for code without optimizations function parameters are passed by registers (if they fit) or copied by memcpy (if size is too large to use with registers) and it is valid for both pass-by value and pass-by reference. Basically the code for both functions looks similar. After enabling release level optimizations there is no memcpy for smaller and larger data size and direct memory access is used for iteration. Code of pass-by value and reference looks also very similar.
I think above is valid if we are using types which implements Copy trait, in other case this probably looks different (for instance for String type).
If you want to look at the asm here is the code which I was using: https://godbolt.org/z/ejGbYjcP4

Summarizing: In my opinion we should stick with idiomatic approach and compiler will do the optimal final result without any memory copy in this case.

crates/driver/src/domain/competition/bad_tokens/mod.rs Outdated Show resolved Hide resolved
crates/driver/src/infra/api/mod.rs Outdated Show resolved Hide resolved
@squadgazzz
Copy link
Contributor Author

Is it just an ordinary chosen number or is there a reason for the 20 attempts value?

Followed this suggestion #3172 (comment)

Base automatically changed from kill-bad-tokens-1 to main December 20, 2024 15:03
# Conflicts:
#	crates/driver/src/domain/competition/bad_tokens/cache.rs
#	crates/driver/src/domain/competition/bad_tokens/metrics.rs
#	crates/driver/src/domain/competition/bad_tokens/mod.rs
#	crates/driver/src/domain/competition/bad_tokens/simulation.rs
#	crates/driver/src/domain/competition/mod.rs
#	crates/driver/src/infra/api/mod.rs
#	crates/driver/src/infra/config/file/load.rs
#	crates/driver/src/infra/config/file/mod.rs
#	crates/driver/src/infra/solver/mod.rs
Copy link
Contributor

@sunce86 sunce86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code LG.

Marking all tokens in a solution as bad because maybe only one trade was bad can be problematic, but I assume this is how it worked before?

@squadgazzz
Copy link
Contributor Author

Marking all tokens in a solution as bad because maybe only one trade was bad can be problematic, but I assume this is how it worked before?

This is the first implementation of this logic. Previously, a Grafana metric was used to analyze the data manually.

@squadgazzz squadgazzz merged commit 7e52015 into main Dec 23, 2024
11 checks passed
@squadgazzz squadgazzz deleted the bad-token/metrics branch December 23, 2024 12:26
@github-actions github-actions bot locked and limited conversation to collaborators Dec 23, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants