Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use of RPCManager with auto switching over node configs #98

Merged
merged 16 commits into from
Jan 27, 2025

Conversation

xmariachi
Copy link
Collaborator

@xmariachi xmariachi commented Dec 7, 2024

Purpose of Changes and their Description

  • Creation of a RPCManager that controls the rpc endpoints.
  • An RPC (client) is defined by a NodeConfig. The RPCManager holds a number of NodeConfigs, preconfigured.
  • Initialization, parameterisation of calls for
  • Compliant with current separation of responsibilities, usecase keeps node management (before initialised and never changed; now integrated as an RPCManager), while client-internal operations fall under lib (ie NodeConfig-based operations).
  • Definition of situations where a switch of node is required. For now, error 429 and +Mempool is full, maybe to be expanded.
  • RPCManager as an interface with its testing mock

Also:

  • custom errors and error refactorization
  • fix whitelist extra check
  • improve checking gas routine

Link(s) to Ticket(s) or Issue(s) resolved by this PR

Are these changes tested and documented?

  • If tested, please describe how. If not, why tests are not needed. -- being tested against chain
  • If documented, please describe where. If not, describe why docs are not needed. -- Added section in README
  • Added to Unreleased section of CHANGELOG.md?

Still Left Todo

  • testing with several workers under load, and adding unit testing.
  • review current usage processes esp on registration where several calls are made, etc.
  • potentially more refactors

@xmariachi xmariachi requested review from kpeluso and amimart December 7, 2024 00:18
@xmariachi xmariachi marked this pull request as ready for review December 11, 2024 00:42
Copy link

@amimart amimart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's good stuff well done! I've let some remarks I'll let you consider

usecase/rpc_manager.go Outdated Show resolved Hide resolved
usecase/rpc_manager.go Show resolved Hide resolved
Copy link
Collaborator

@kpeluso kpeluso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would just want to talk ab the algorithm being employed. Because the RPCManager is initialized at startup time and reused across all processes, then if there is load, we're guaranteed to hit the rate limit, although much higher chance of recovering still (because we cycle through RPCs upon 429). However, with a randomized algo, we'd avoid hitting a rate limit for longer (because we're not necessarily using the last-used RPC) while maintaining a high chance of recovery upon 429 (because we still cycle through RPCs with some probability). Obviously this is dependent on both the number of topics, their epochs, rate limits of each RPC, and number of RPCs involved, but afaict the randomized algo will work better for a wider set of those arguments than this serial approach.

}
totalNodes := len(nodes)

for attempts := 0; attempts < totalNodes; attempts++ {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not going to block on this but you could add a parameter that meters how often one loops through the full list, or simply how many rpcs are attempted, with replacement (so you can retry preexisting ones).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to add number of "bad failures" (failures attributable to the rpc instability, not chain values) per node, and a "jail" where nodes can spend some time out of the valid set.

Also we can set Prometheus metrics for some of these failures.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe skill issue, but my experience with that sort of thing is: at first, it looks like the smart thing to do, but complexity and tuning quickly become burdensome. i think just having good metrics/logging will get us to a perfectly acceptable place

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 on avoiding early optimization. Improvements (if understood as such!) are expressed as future work, sorry that wasn't explicit.

But metrics here are needed first, yes. There is a ticket for this since it needs some rework on the way metrics are currently defined and the new places where we want to use it. Probably the client to receive a Metrics object optionally to configure it internally - current metrics impl I think it was a quick addition for just getting it running, but not very carefully thought in terms of app structure.

usecase/rpc_manager.go Show resolved Hide resolved
@xmariachi
Copy link
Collaborator Author

xmariachi commented Dec 12, 2024

Pending - to jail bad nodes --> In a separate pr

@xmariachi xmariachi force-pushed the diego/proto-2682-rpc-mannager-n-rpcs branch from f5b9780 to 8c000de Compare December 24, 2024 15:28
xmariachi added a commit that referenced this pull request Jan 8, 2025
<!-- < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < ☺
v           ✰  Thanks for creating a PR! You're awesome! ✰
v Please note that maintainers will only review those PRs with a
completed PR template.
☺ > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >  -->

## Purpose of Changes and their Description
* Adds context-based timeouts to all RPC operations, providing generic
utility for running a function with timeout.
* Timeout values are configurable in the wallet config

## Link(s) to Ticket(s) or Issue(s) resolved by this PR

## Are these changes tested and documented?

- [x] If tested, please describe how. If not, why tests are not needed.
-- Tested against current chain
- [x] If documented, please describe where. If not, describe why docs
are not needed. -- In readme, new section
- [x] Added to `Unreleased` section of `CHANGELOG.md`?

Note part of this will conflict with #98
@xmariachi xmariachi force-pushed the diego/proto-2682-rpc-mannager-n-rpcs branch from 8c000de to a4b1362 Compare January 14, 2025 00:23
@xmariachi
Copy link
Collaborator Author

xmariachi commented Jan 14, 2025

Rebased to dev (timeouts, logging and minor things)

@xmariachi xmariachi force-pushed the diego/proto-2682-rpc-mannager-n-rpcs branch from a4b1362 to c3f22b4 Compare January 17, 2025 11:16
@xmariachi
Copy link
Collaborator Author

Rebased again with v0.8.1. Re-tested against testnet.

Copy link

@amimart amimart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM :)

Copy link

@guilherme-brandao guilherme-brandao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Preemptively approving because I believe if you consider adding the suggestions I made, they will be addressed in your subsequent PR. Additionally, changing the client (which I know you plan to do in your next PR) might enforce some of these changes.

lib/errors.go Outdated Show resolved Hide resolved
lib/errors.go Outdated Show resolved Hide resolved
lib/errors.go Show resolved Hide resolved
lib/repo_tx_utils.go Outdated Show resolved Hide resolved
lib/errors.go Show resolved Hide resolved
lib/errors.go Fixed Show fixed Hide fixed
@xmariachi xmariachi merged commit 18b2378 into dev Jan 27, 2025
5 checks passed
@xmariachi xmariachi deleted the diego/proto-2682-rpc-mannager-n-rpcs branch January 27, 2025 18:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants