Perf measurements #2

tekjar · 2020-06-29T12:46:38Z

Hi. Thanks a lot. This is exactly what I'm looking for and I have a perfect use case to test the performance. I'll report here soon.

tekjar · 2020-06-29T14:20:34Z

Surprisingly throughput went down drastically. Maybe I'm doing something wrong?

tekjar/numbers@2c9fced#diff-f0bee5e53cdb498867700a698edbb6e5
https://github.com/tekjar/numbers/tree/master/minimqtt

I'll read up more. Please let me know if you find any red flags.

mzabaluev · 2020-06-29T14:46:03Z

Surprisingly throughput went down drastically. Maybe I'm doing something wrong?

tekjar/numbers@2c9fced#diff-f0bee5e53cdb498867700a698edbb6e5

This uses ChunkedBytes as an additional buffer on top of BytesMut that Framed uses internally.
To get a performance gain, you would need to replace the Framed sink with something similar to the implementation in the example.

I'm thinking of providing a replacement for tokio_util::codec that would use ChunkedBytes as the buffer for the writing half, but I haven't gotten up to it yet.

mzabaluev · 2020-06-29T14:55:31Z

Wait, no, you have hacked into the Framed to get at the output object directly.
I'd like to isolate this as a benchmarkable test case.

In general, if the messages are predominantly smaller than the pre-allocated buffer and the output usually consumes the entire accumulated buffer, there won't be any benefits in using ChunkedBytes against BytesMut or Vec.

mzabaluev · 2020-06-29T15:16:07Z

I have commented on the commit (1 2) about the potential pitfalls.

tekjar · 2020-06-29T19:20:42Z

I've performed more experiments now and I think the problem isn't with my code. Went through the source code here but I couldn't find anything obvious. Will recheck this when I get time again. But I can support you with more experiments if you like :)

update: I think this advance is wrong

https://github.com/tekjar/numbers/blob/master/minimqtt/vectored/src/bin/tokio.rs#L74

mzabaluev · 2020-06-29T21:17:29Z

@tekjar Thank you! Can you submit a minimized benchmark added to benches/ in a PR? I'd like to profile it to see where the performance penalty is coming from.

mzabaluev · 2020-06-29T21:31:44Z

update: I think this advance is wrong

https://github.com/tekjar/numbers/blob/master/minimqtt/vectored/src/bin/tokio.rs#L74

write_buf already calls it internally, indeed. Please check that the code still works as intended, that's more important than performance :)

mzabaluev · 2020-06-29T22:02:40Z

Replacing put_bytes with put_slice might do the trick, if the Bytes slices tend to be small.

mzabaluev · 2020-06-30T02:08:00Z

I have added a benchmark in bf1c712 that may illustrate the penalty observed here, and documented the tradeoff of put_bytes in 27c0846.

mzabaluev added invalid This doesn't seem right and removed invalid This doesn't seem right labels Jun 29, 2020

mzabaluev added the help wanted Extra attention is needed label Jun 29, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Perf measurements #2

Perf measurements #2

tekjar commented Jun 29, 2020

tekjar commented Jun 29, 2020

mzabaluev commented Jun 29, 2020

mzabaluev commented Jun 29, 2020

mzabaluev commented Jun 29, 2020 •

edited

Loading

tekjar commented Jun 29, 2020 •

edited

Loading

mzabaluev commented Jun 29, 2020

mzabaluev commented Jun 29, 2020

mzabaluev commented Jun 29, 2020

mzabaluev commented Jun 30, 2020

Perf measurements #2

Perf measurements #2

Comments

tekjar commented Jun 29, 2020

tekjar commented Jun 29, 2020

mzabaluev commented Jun 29, 2020

mzabaluev commented Jun 29, 2020

mzabaluev commented Jun 29, 2020 • edited Loading

tekjar commented Jun 29, 2020 • edited Loading

mzabaluev commented Jun 29, 2020

mzabaluev commented Jun 29, 2020

mzabaluev commented Jun 29, 2020

mzabaluev commented Jun 30, 2020

mzabaluev commented Jun 29, 2020 •

edited

Loading

tekjar commented Jun 29, 2020 •

edited

Loading