-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Very High 95% latency #40
Comments
@anshal-savla do you know if something with our deployment infra could be causing this? When I run the relayer locally it never takes >10 seconds which involves making calls to remote RPCs, whereas the relayer deployed on GCP should have RPCs within the same availability zone. |
@anthony-near Nothing in our infra setup should be causing such high latency. |
Any idea what might be causing it then @anthony-near? |
Could we maybe get some tracing and metrics on the Relayer to track the latency? |
I ran a profiler locally and 70%+ of cpu time (different than wall time including network requests) is making rpc requests and formatting the responses. The only network calls made are RPC requests. Locally they were the source of latency and bottlenecks. I also suspect it is RPC request latency |
I will try to do more profiling to include network calls and go through the logs on prod to verify for sure that it is RPC. |
How long does this test take to run on the wall clock? |
Test took 4.207 seconds total wall clock time. There's latency in the call to the rpc locally, but that's accounted for in the wall clock time and is nothing like what is in the 99th percentile in prod. I'm not sure the how grafana request latency metrics are collected. I'm not entirely sure the methodology used there is accurate. |
I also just ran a load test where I made 100 serial (non-parallelized) requests hitting the |
Hmm, then it doesn't seem to be capturing the latency we've observed though grafana and the MPC services metrics. Would it be possible to add logging so we can see what requests are taking so long? |
Using |
Did we test this out with the new release cut on devnet? Is this still an ongoing issue? |
This appears to be an ongoing issue. We're still seeing 95% of 20+ seconds. Observing the logs it seems that most requests from FastAuth take 15-40 seconds and the quicker requests are the other endpoints which seem to come in at a few ms (not quite sure what those other endpoints are doing). That's reflected in the grafana dashboard which shows the 50% of requests at 5ms, which is too fast to do anything on chain. |
I was under the impression that some of the requests hit Have you guys also try testing this on the testnet relayer? That one runs the latest with the changes. cc: @anthony-near |
Not 100% sure if this is related to relayer, but in case it is, wanted to share a sample video of trying to login via FastAuth and sitting on a "recovering account" screen for 15 seconds. https://drive.google.com/file/d/10d6YgLzkU0I-mAcLTviUPyI8eCmcFGMz/view |
We've been observing quite high relayer latency when creating and running add-key on FastAuth.
This is reflected in this grafana dashboard which shows fairly consistent 95% latency of 20 seconds, which is higher than I'd expect.
Do we know what's causing this and what could be done to improve it?
The text was updated successfully, but these errors were encountered: