-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use of RPCManager with auto switching over node configs #98
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's good stuff well done! I've let some remarks I'll let you consider
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would just want to talk ab the algorithm being employed. Because the RPCManager is initialized at startup time and reused across all processes, then if there is load, we're guaranteed to hit the rate limit, although much higher chance of recovering still (because we cycle through RPCs upon 429). However, with a randomized algo, we'd avoid hitting a rate limit for longer (because we're not necessarily using the last-used RPC) while maintaining a high chance of recovery upon 429 (because we still cycle through RPCs with some probability). Obviously this is dependent on both the number of topics, their epochs, rate limits of each RPC, and number of RPCs involved, but afaict the randomized algo will work better for a wider set of those arguments than this serial approach.
} | ||
totalNodes := len(nodes) | ||
|
||
for attempts := 0; attempts < totalNodes; attempts++ { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not going to block on this but you could add a parameter that meters how often one loops through the full list, or simply how many rpcs are attempted, with replacement (so you can retry preexisting ones).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want to add number of "bad failures" (failures attributable to the rpc instability, not chain values) per node, and a "jail" where nodes can spend some time out of the valid set.
Also we can set Prometheus metrics for some of these failures.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe skill issue, but my experience with that sort of thing is: at first, it looks like the smart thing to do, but complexity and tuning quickly become burdensome. i think just having good metrics/logging will get us to a perfectly acceptable place
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 on avoiding early optimization. Improvements (if understood as such!) are expressed as future work, sorry that wasn't explicit.
But metrics here are needed first, yes. There is a ticket for this since it needs some rework on the way metrics are currently defined and the new places where we want to use it. Probably the client to receive a Metrics object optionally to configure it internally - current metrics impl I think it was a quick addition for just getting it running, but not very carefully thought in terms of app structure.
|
f5b9780
to
8c000de
Compare
<!-- < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < ☺ v ✰ Thanks for creating a PR! You're awesome! ✰ v Please note that maintainers will only review those PRs with a completed PR template. ☺ > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > --> ## Purpose of Changes and their Description * Adds context-based timeouts to all RPC operations, providing generic utility for running a function with timeout. * Timeout values are configurable in the wallet config ## Link(s) to Ticket(s) or Issue(s) resolved by this PR ## Are these changes tested and documented? - [x] If tested, please describe how. If not, why tests are not needed. -- Tested against current chain - [x] If documented, please describe where. If not, describe why docs are not needed. -- In readme, new section - [x] Added to `Unreleased` section of `CHANGELOG.md`? Note part of this will conflict with #98
8c000de
to
a4b1362
Compare
Rebased to dev (timeouts, logging and minor things) |
…witching over node configs
a4b1362
to
c3f22b4
Compare
Rebased again with v0.8.1. Re-tested against testnet. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Preemptively approving because I believe if you consider adding the suggestions I made, they will be addressed in your subsequent PR. Additionally, changing the client (which I know you plan to do in your next PR) might enforce some of these changes.
Purpose of Changes and their Description
NodeConfig
. The RPCManager holds a number of NodeConfigs, preconfigured.usecase
keeps node management (before initialised and never changed; now integrated as an RPCManager), while client-internal operations fall underlib
(ieNodeConfig
-based operations).error 429
and +Mempool is full
, maybe to be expanded.Also:
Link(s) to Ticket(s) or Issue(s) resolved by this PR
Are these changes tested and documented?
Unreleased
section ofCHANGELOG.md
?Still Left Todo