-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update Message to support larger number of peers #114
Conversation
The long running streams seem to reset eventually.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd say a 5x in the maximum number of clients we can spin up in our test is pretty awesome progress! Still lots of digging to do, of course but I think this change makes a lot of sense.
The link to the run with 50 nodes seems to show a certain level of success, but I don't think the stats made their way into InfluxDB. I can't see any data in grafana and there are some errors about timeouts in the logs? |
50 nodes is right around the point where we see things break down, so I may have linked a run that didn't properly succeed. Here's a run with 40 participants. |
I think the chat example I linked is incorrect. Here's an example where the stream is closed after sending a message. |
Fixes #111
Fixes #29
It looks like our previous implementation of a lib p2p message service had some limitations. Whenever we had a large amount of load on the message service we would get
"stream reset"
errors. However we didn't notice this because we silently close the stream when getting astream reset
.Changes
"stream reset"
errors by not having long running streams.Effects
After these changes I've been able to successfully run a test with 50 participants. I've also been able to run a 3 participant test with 10,000 concurrency.
There still appears to be two failure modes when you add enough participants (
i>50
):We might be reaching limitations of the cloud VM? A client may just not have the computing power to handle so many tcp requests in a timely manner?
I'm still going to close #111 and #29.
I will create a new issue for investigating the performance after this change.See #115Performance Comparison
Simple Scenario
A simple scenario with 3 participants, concurrency 1, run for 2 minutes.
Before: 13.7 ms
http://34.168.92.245:3000/d/5OBBeW37k/time-to-first-payment?orgId=1&from=1664473900255&to=1664474074277&var-runId=ccqtme8nr2gk3i239hmg&var-jobCount=1&var-testDuration=2m0s&var-hubs=1&var-payees=1&var-jitter=0&var-latency=0&var-payers=1&var-payeepayers=0&var-nitroVersion=v0.0.0-20220922174011-3e33cafaa1f3
After: 14.0 ms
http://34.168.92.245:3000/d/5OBBeW37k/time-to-first-payment?orgId=1&from=1664474132546&to=1664474342866&var-runId=ccqtofgnr2gk3i239hn0&var-jobCount=1&var-testDuration=2m0s&var-hubs=1&var-payees=1&var-jitter=0&var-latency=0&var-payers=1&var-payeepayers=0&var-nitroVersion=v0.0.0-20220922174011-3e33cafaa1f3
Benchmark Scenario
Our established "benchmark" scenario.
Before: 4.20 s
http://34.168.92.245:3000/d/5OBBeW37k/time-to-first-payment?orgId=1&from=1664475776074&to=1664475886730&var-runId=ccqu5a0nr2gk3i239hq0&var-jobCount=10&var-testDuration=30s&var-hubs=1&var-payees=1&var-jitter=2&var-latency=15&var-payers=10&var-payeepayers=0&var-nitroVersion=v0.0.0-20220922174011-3e33cafaa1f3
After: 4.66 s
http://34.168.92.245:3000/d/5OBBeW37k/time-to-first-payment?orgId=1&from=1664475882755&to=1664475952310&var-runId=ccqu5h0nr2gk3i239hqg&var-jobCount=10&var-testDuration=30s&var-hubs=1&var-payees=1&var-jitter=2&var-latency=15&var-payers=10&var-payeepayers=0&var-nitroVersion=v0.0.0-20220922174011-3e33cafaa1f3
Long Benchmark Scenario
Our established "benchmark" scenario but run for 2 minutes.
Before: 7.56s
http://34.168.92.245:3000/d/5OBBeW37k/time-to-first-payment?orgId=1&from=1664476483036&to=1664476752826&var-runId=ccquap0nr2gk3i239ht0&var-jobCount=10&var-testDuration=2m0s&var-hubs=1&var-payees=1&var-jitter=2&var-latency=15&var-payers=10&var-payeepayers=0&var-nitroVersion=v0.0.0-20220922174011-3e33cafaa1f3
After: 7.12s
http://34.168.92.245:3000/d/5OBBeW37k/time-to-first-payment?orgId=1&from=1664476878409&to=1664477152413&var-runId=ccqueb8nr2gk3i239hug&var-jobCount=10&var-testDuration=2m0s&var-hubs=1&var-payees=1&var-jitter=2&var-latency=15&var-payers=10&var-payeepayers=0&var-nitroVersion=v0.0.0-20220922174011-3e33cafaa1f3