Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clean up and compress some pre-join packets #15881

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

sfan5
Copy link
Collaborator

@sfan5 sfan5 commented Mar 8, 2025

TOCLIENT_ANNOUNCE_MEDIA:

  • string list is compressed by zstd
  • hashes no longer base64 encoded (lol)

TOCLIENT_MEDIA:

  • file contents are compressed by zstd

TOCLIENT_ITEMDEF and NODEDEF:

  • compressed by zstd instead of zlib

Why do these matter?
All of them need to happen before the client can join a game session (minus media transfer if cached), and our network layer isn't that good at bulk data transfer.
So if we reduce the time spent here it will lead to faster join times.

What about media compression?
Taking my ~400MB cache/media folder I have amassed, zstd takes two(!) seconds to compress it all and manages a reduction to 65% of original size.
Zstd will efficiently skip compression if it deems the file to be incompressible.
In case we are ever too concerned about the compression time the server could very well just (pre-)cache all media files.

To do

This PR is Ready for Review.

How to test

  1. test combination of old and new clients and servers

std::istringstream iss(data, std::ios::binary);
std::ostringstream oss(std::ios::binary);
decompressZstd(iss, oss);
data = oss.str();
Copy link
Member

@SmallJoker SmallJoker Mar 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps it's time for a nice wrapper function? We're effectively copying the data 4 times:

  1. readLongString (compressed)
  2. istringstream constructor (compressed)
  3. is.read(input_buffer, bufsize); (compressed)
  4. os.write(output_buffer, output.pos); (decompressed)
  5. data assignment (decompressed)

Could be simplified down to:

  1. (no copy) Read raw data from NetworkPacket (compressed)
  2. Write data to an output stream (decompressed)
  3. data assignment (decompressed)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

zero-copy string extraction/input for streams was only invented in C++20: https://en.cppreference.com/w/cpp/io/basic_stringstream/str (or C++26 for string views)
we could implement this ourselves like described here: http://videocortex.io/2017/custom-stream-buffers/

readLongString could be avoided if the function would return a string_view

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See also: #14086

@sfan5
Copy link
Collaborator Author

sfan5 commented Mar 8, 2025

TOCLIENT_ITEMDEF and NODEDEF: compressed by zstd instead of zlib

During my research for #15885 I also kept a copy of definitions of every server so I have some nice data for this, too.
data basis: uncompressed itemdef and nodedef of 294 servers
total size: 2.4 GB
smallest size: 7 B for nodedef (huh?), 868 B for itemdef
biggest size: 26M for nodedef, 14M for itemdef
average size: 5M for nodedef, 2M for itemdef

anyway the important part:

  • zlib reduces this pile to only 129M in 21 seconds
  • zstd gets down to 116M but takes 3 seconds

conclusion: (without further tuning) zstd isn't the compression wonder, but it is a speed wonder. 😄

Copy link
Member

@Desour Desour left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks fine. Haven't tested.

Do we maybe want to have a flag per file for whether a media file is compressed? The server could then decide if it wants to compress.

In TOCLIENT_MEDIA, the names could also be stored separately, and compressed together.

Is the doc in networkprotocol.h supposed to only document the newest protocol version?

@sfan5
Copy link
Collaborator Author

sfan5 commented Mar 10, 2025

Do we maybe want to have a flag per file for whether a media file is compressed? The server could then decide if it wants to compress.

Data that zstd doesn't want to compress seems to be passed through with 10 bytes added. It's also still fast at doing this.
So I don't see a benefit either in data size or speed.

If speed is really really a problem the Zstd frame format is not complicated and you can just paste the right bytes in front of uncompressed data to turn it into valid zstd.

In TOCLIENT_MEDIA, the names could also be stored separately, and compressed together.

Yeah, but there isn't that many per bunch actually.

Is the doc in networkprotocol.h supposed to only document the newest protocol version?

I think so. We have git for the history.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants