Skip to content

Optimize memory usage and reduce GC by eliminating memory copies 3x #74

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: k8s-spark-3.1
Choose a base branch
from

Conversation

ambud
Copy link

@ambud ambud commented Aug 4, 2022

The current shuffle service implementation creates multiple copies of the shuffle write which leads to increased GC and under heavy load reduced performance due to GC throughput limitations. These copies can be eliminating by simply passing the reference of the original Netty ByteBuf which can then be passed to the storage layer and eliminate the need for any Buf to array conversion / rewrapping.

This PR is an initial draft of these changes validated with the unit tests as well as basic long running tests for validity.

Original Flow-> network buf copied to custom buf -> shuffleDataWrapper to byte array (copy) -> wrapped to buf -> write to file as output stream (copy to byte array)
New Flow -> network buf to composite buf (no copy) -> handler passthru -> materialize to ByteBuffer -> write to file (via channel using direct buf)

This therefore eliminates the 3 copies of the data in memory.

@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.


Ambud Sharma seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

@ambud ambud changed the title Optimize memory usage and reduce GC by eliminating memory copies Optimize memory usage and reduce GC by eliminating memory copies 3x Aug 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants