Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add binary option to WebSockets #1598

Closed
wants to merge 4 commits into from

Conversation

MCJack123
Copy link
Contributor

Currently, handling of UTF-8 codes in WebSocket text messages is handled entirely on the Java side. This is understandable for most uses, but some users may want to interact with the bytes directly (e.g. with utf8) without Java meddling with codepoints outside CC's range.

This PR adds a binary flag to http.websocket that disables UTF-8 decoding of text messages on the Java side. When enabled, the binary flag of each message remains intact, but the data received will always be interpreted as binary data - no more ?s replacing out-of-range codes. Sending text messages on a binary handle will cause Java to reinterpret the bytes as UTF-8 before sending, preserving manually-encoded codepoints. (This is an implementation detail, since TextWebSocketFrame takes a UTF-16 String, so we need to convert it from UTF-8 ourselves.) If that fails for some reason, the sender will fall back to normal bytewise strings.

The syntax for this flag matches http.get's argument list, so there should be no confusion between the two lists, and no compatibility issues should arise - WebSockets still default to normal/text handles. I've added a test suite as well (which duplicates the original WS test) that ensures that UTF-8 bytes are preserved through the socket.

image

@SquidDev SquidDev added the area-Core This affects CC's core (the Lua runtime, APIs, computer internals). label Oct 3, 2023
@MCJack123
Copy link
Contributor Author

This should be passing tests, but it seems the in-game tests are randomly failing in areas I didn't touch - please disregard the failures.

@SquidDev
Copy link
Member

SquidDev commented Oct 7, 2023

Thanks for the PR! Issues like this are always a pain, because the current behaviour is incorrect, and really sending/receiving raw UTF-8 is what we should have done from the start.

I have mentioned this in a couple of places (#1246, #1233) and I am tempted to just remove text mode across the whole mod, always dealing with the raw bytes. Websockets are a little different, as they mandate UTF-8, but we can verify that when sending. Obviously this would break things, but I don't know how much. Maybe an experiment to run on SwitchCraft :).

This should be passing tests, but it seems the in-game tests are randomly failing in areas I didn't touch.

Yeah, game tests are notoriously flaky in CI. I'm not quite sure why the windows one timed out - normally that's pretty reliable. I've kick-started both.

var buf = LuaValues.encode(text);
var bytes = new byte[buf.capacity()];
buf.get(bytes);
data = new String(bytes, StandardCharsets.UTF_8);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be better to use a CharsetDecoder here instead, and then fail on CharacterCodingException errors, instead of replacing invalid characters with U+FFFD. Not sure, leave as is, will have a think about it.

@SquidDev SquidDev mentioned this pull request Oct 25, 2023
6 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-Core This affects CC's core (the Lua runtime, APIs, computer internals).
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants