[ML] Investigate optimizing CTokenListCategory::updateOrderedCommonTokenIds

The method `CTokenListCategory::updateOrderedCommonTokenIds` is known to be potentially inefficient.  It contains this comment:

```
    // The algorithm used here is technically O(N^3 * log(N)), but has these
    // redeeming features:
    // 1. It does not allocate any memory.
    // 2. It is order O(N * log(N)) in the (likely) case of no change required.
    // 3. In the case where the nested loops run many times it will be reducing
    //    the value of N for subsequent calls, thus making those subsequent
    //    calls much faster.  For example, suppose we start off with 100 ordered
    //    tokens, and one call to this method runs the outer loop 50 times to
    //    reduce the ordered set to 50 tokens.  Then the next call will start
    //    with only 50 ordered tokens.
    // There's a trade-off here, because reducing the complexity would mean
    // changing the data structures, which would add complexity to state
    // persistence, or allocating temporary storage, which would increase the
    // runtime a lot in the common case where the previously ordered common
    // tokens are present in the same order in the new tokens.
```

Usually we get away with this but recently we have seen some data sets where the inefficient algorithm causes problems, namely when there are many tokens in each message:

<img width="2310" alt="Screenshot 2022-09-07 at 13 01 25" src="https://user-images.githubusercontent.com/7405510/188887429-2c274913-374d-4fc6-a5c2-dd230b008919.png">

In the case of `m_CommonUniqueTokenIds` having more than a certain number of values in it, we should take the time at the beginning of `CTokenListCategory::updateOrderedCommonTokenIds` to build a more efficient data structure to work with. For small numbers of tokens the time taken to build a new data structure (in particular memory allocations) is likely to outweigh the cost of the looping we currently do, so some experimentation will be needed to find the point at which it's worthwhile.  But for large numbers of tokens we should see a performance improvement from it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ML] Investigate optimizing CTokenListCategory::updateOrderedCommonTokenIds #2403

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[ML] Investigate optimizing CTokenListCategory::updateOrderedCommonTokenIds #2403

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions