Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Perf measurements #2

Open
tekjar opened this issue Jun 29, 2020 · 9 comments
Open

Perf measurements #2

tekjar opened this issue Jun 29, 2020 · 9 comments
Labels
help wanted Extra attention is needed

Comments

@tekjar
Copy link

tekjar commented Jun 29, 2020

Hi. Thanks a lot. This is exactly what I'm looking for and I have a perfect use case to test the performance. I'll report here soon.

@tekjar
Copy link
Author

tekjar commented Jun 29, 2020

Surprisingly throughput went down drastically. Maybe I'm doing something wrong?

tekjar/numbers@2c9fced#diff-f0bee5e53cdb498867700a698edbb6e5
https://github.com/tekjar/numbers/tree/master/minimqtt

I'll read up more. Please let me know if you find any red flags.

@mzabaluev
Copy link
Owner

Surprisingly throughput went down drastically. Maybe I'm doing something wrong?

tekjar/numbers@2c9fced#diff-f0bee5e53cdb498867700a698edbb6e5

This uses ChunkedBytes as an additional buffer on top of BytesMut that Framed uses internally.
To get a performance gain, you would need to replace the Framed sink with something similar to the implementation in the example.

I'm thinking of providing a replacement for tokio_util::codec that would use ChunkedBytes as the buffer for the writing half, but I haven't gotten up to it yet.

@mzabaluev mzabaluev added invalid This doesn't seem right and removed invalid This doesn't seem right labels Jun 29, 2020
@mzabaluev
Copy link
Owner

Wait, no, you have hacked into the Framed to get at the output object directly.
I'd like to isolate this as a benchmarkable test case.

In general, if the messages are predominantly smaller than the pre-allocated buffer and the output usually consumes the entire accumulated buffer, there won't be any benefits in using ChunkedBytes against BytesMut or Vec.

@mzabaluev mzabaluev added the help wanted Extra attention is needed label Jun 29, 2020
@mzabaluev
Copy link
Owner

mzabaluev commented Jun 29, 2020

I have commented on the commit (1 2) about the potential pitfalls.

@tekjar
Copy link
Author

tekjar commented Jun 29, 2020

I've performed more experiments now and I think the problem isn't with my code. Went through the source code here but I couldn't find anything obvious. Will recheck this when I get time again. But I can support you with more experiments if you like :)

update: I think this advance is wrong

https://github.com/tekjar/numbers/blob/master/minimqtt/vectored/src/bin/tokio.rs#L74

@mzabaluev
Copy link
Owner

@tekjar Thank you! Can you submit a minimized benchmark added to benches/ in a PR? I'd like to profile it to see where the performance penalty is coming from.

@mzabaluev
Copy link
Owner

update: I think this advance is wrong

https://github.com/tekjar/numbers/blob/master/minimqtt/vectored/src/bin/tokio.rs#L74

write_buf already calls it internally, indeed. Please check that the code still works as intended, that's more important than performance :)

@mzabaluev
Copy link
Owner

Replacing put_bytes with put_slice might do the trick, if the Bytes slices tend to be small.

@mzabaluev
Copy link
Owner

I have added a benchmark in bf1c712 that may illustrate the penalty observed here, and documented the tradeoff of put_bytes in 27c0846.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants