Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for custom parsing of APC, SOS and PM sequences. #115

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

boazy
Copy link

@boazy boazy commented Aug 4, 2024

Fixes #109.

This attempts the same goal at John-Toohey's PR #110, but since this PR was was not accepted, I wanted to try another approach and see whether it is acceptable.

Rationale for support

APC sequences are rarely supported by general-purpose terminals (if we put aside Kermit clients, tmux and screen), but there is one exception: The Kitty Terminal Graphics Protocol. While the Kitty Image Protocol is not as ubiquitous as Sixel, it is more robust, accurate and powerful and it's implemented by several prominent terminal emulators:

  • Kitty (obviously)
  • Konsole
  • Wezterm
  • Wayst
  • Ghostty

It's also supported by a large number of programs and libraries, some of them are mentioned here.

Design

Design Goals

  • Support SOS, PM and APC sequences (just like the original PR)
  • Support streaming parsing of SOS, PM and APC sequence payloads.
  • Avoid adding new conditional branches in existing code paths. This should minimize performance impact on existing code.
  • Avoid re-purposing OSC state variables (in order to reduce complexity)
  • Avoid adding any new stored state variable
  • No embedded parsers - use existing state change tables for parsing

Design Choices

Reorder actions into packable and non-packable actions

Currently, resulting state and actions are packed into an 8-bit value in the state change table (with a 4-bit nibble allocated for each). All values from 0 to 15 for both state and actions are used, so I could not add new states and action that would be packable. Unfortunately, I needed to support a state change with the OpaquePut action. The best way I've found is to separate the integer values of actions that do not need to be packed into the state change table (such as Hook, UnhookandClear`) into a value higher than 15 and reserve lower values for actions that need to be packed

I'm not sure if my current solution is acceptable or not but I have no other idea on how to resolve this situation without repurposing the OscString state and actions and re-introducing an embedded parser.

Action matching order

This should have minimal impact, but in order to avoid any performance impact for programs which do not use APC sequences, I made sure that when matching APC/SOS/PM-specific actions, they are always matched last.

Streaming

Unlike #110, this PR does not collect the sequence payload inside an array. Instead, the sequence payload is streamed directly to the Perform trait, one byte at a time. I believe this approach is more flexible and would lead to better performance overall. The key gains here are:

  • Avoiding extra vector allocation for the payload (I don't want to re-use osc_raw and in any case we would have to increase its default size to 4096 to be compatible with the max payload size).
  • Avoiding unnecessary copying when. The payload often just needs to be parsed directly. This is especially useful when sending images directly, since the image data would often just be forwarded to an image format parser anyway.

Names

I've changed SosPmApcString to OpaqueString, since I wanted to have a general name for all these types of sequences where the VTE parser has no understanding of the internal structure of the sequence payload. This is difference from OSC and CSI sequences where the VTE parser is aware of the high-level structure of the sequence (if not the semantics).

@boazy boazy force-pushed the parserless-apc-support branch from 46e444a to 04fbc6e Compare August 4, 2024 14:39
@boazy boazy force-pushed the parserless-apc-support branch from 04fbc6e to 255c2fe Compare August 4, 2024 14:41
Copy link
Member

@chrisduerr chrisduerr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll give a detailed review later today/tomorrow, but assuming that this was benchmarked it seems like a reasonable approach.

Generally I think the VTE crate needs a little rewrite, but I don't want to hold up other changes for that since it will likely take a long time before I get around to that.

@boazy
Copy link
Author

boazy commented Aug 5, 2024

I'll give a detailed review later today/tomorrow, but assuming that this was benchmarked it seems like a reasonable approach.

I ran cargo bench but I'm a bit unsure on how to run the full vtebench suite. Is there a guide for that?

I don't expect any performance impact since I've added no new state and I've only added tailed branches to an already-long match expression, but you never know until you test...

@kchibisov
Copy link
Member

You can open a PR against alacritty with your changes, so we can use our bot for it. But generally you just build alacritty with your changes and run the vtebench.

@boazy
Copy link
Author

boazy commented Aug 5, 2024

You can open a PR against alacritty with your changes, so we can use our bot for it. But generally you just build alacritty with your changes and run the vtebench.

Thanks! Here are the results:

Master Branch:

  dense_cells (625 samples @ 1 MiB):
    15.47ms avg (90% < 16ms) +-1.93ms

  scrolling (199 samples @ 1 MiB):
    37.19ms avg (90% < 42ms) +-4.62ms

  scrolling_bottom_region (361 samples @ 1 MiB):
    27.25ms avg (90% < 28ms) +-2.28ms

  scrolling_bottom_small_region (369 samples @ 1 MiB):
    26.69ms avg (90% < 27ms) +-2.3ms

  scrolling_fullscreen (185 samples @ 1 MiB):
    41.15ms avg (90% < 45ms) +-2.28ms

  scrolling_top_region (247 samples @ 1 MiB):
    40.09ms avg (90% < 42ms) +-2.92ms

  scrolling_top_small_region (364 samples @ 1 MiB):
    27.06ms avg (90% < 28ms) +-2.27ms

  unicode (611 samples @ 1.06 MiB):
    15.86ms avg (90% < 17ms) +-1.55ms

This PR:

  dense_cells (606 samples @ 1 MiB):
    16ms avg (90% < 17ms) +-1.26ms

  scrolling (198 samples @ 1 MiB):
    37.39ms avg (90% < 42ms) +-5.06ms

  scrolling_bottom_region (361 samples @ 1 MiB):
    27.25ms avg (90% < 28ms) +-2.12ms

  scrolling_bottom_small_region (362 samples @ 1 MiB):
    27.19ms avg (90% < 28ms) +-2.25ms

  scrolling_fullscreen (182 samples @ 1 MiB):
    41.63ms avg (90% < 45ms) +-3.56ms

  scrolling_top_region (250 samples @ 1 MiB):
    39.6ms avg (90% < 41ms) +-2.85ms

  scrolling_top_small_region (362 samples @ 1 MiB):
    27.16ms avg (90% < 28ms) +-1.28ms

  unicode (654 samples @ 1.06 MiB):
    14.79ms avg (90% < 16ms) +-2.14ms

The scrolling_* benchmarks are unstable and give me different lead every time. Unicode seems to be consistently slightly faster on this PR while dense_cells is slightly faster on master branch, but the difference is quite small and the variance is higher.

@kchibisov
Copy link
Member

@boazy just open a PR against alacritty's repo, it should better for us to review that way, since we can run the bot.

@boazy
Copy link
Author

boazy commented Aug 7, 2024

@kchibisov I've added a pull request. I'm not sure how to trigger the benchmarks bot.

@chrisduerr
Copy link
Member

I'm not sure how to trigger the benchmarks bot.

You can't. I did.

src/definitions.rs Outdated Show resolved Hide resolved
src/lib.rs Outdated Show resolved Hide resolved
src/lib.rs Outdated Show resolved Hide resolved
boazy and others added 2 commits August 14, 2024 12:44
Fix typo (case)

Co-authored-by: Christian Duerr <[email protected]>
Fix typo (period)

Co-authored-by: Christian Duerr <[email protected]>
@chrisduerr
Copy link
Member

@boazy This has stalled out for a while now, are you still interested in working on this? I'm slightly concerned because the longer we wait, the higher the chance that a vte rewrite will make all your work worthless.

@boazy
Copy link
Author

boazy commented Oct 1, 2024

I would expect three sets of functions. How that is done is an implementation detail irrelevant to the consumer (as long as it's fast).

@boazy This has stalled out for a while now, are you still interested in working on this? I'm slightly concerned because the longer we wait, the higher the chance that a vte rewrite will make all your work worthless.

I didn't have time for a while, but I've had some time again today and I've checked your suggestion. Unfortunately, I don't think it would be possible without completely re-architecting the way state works. As I've mentioned before, we only have 4 bits for actions that change the state, but we also only have 4 bits for the state itself, and here all possible values (0-15) are used.

If we want to support three sets of functions, we would need to track three separate states for each type of string (SOS, PM, APC). This is not possible without either:

  1. Adding a new state variable for tracking the type of opaque string. (with possible performance impact)
  2. Modifying the state variable to a u16 or better yet something like struct ResultingState(State, Action).
    Would this impact performance? I don't know, but it's also a big change I'm a little bit uneasy with...

The only thing I can do without any major change is to have 3 different functions instead of opaque_start, but opaue_end and opaque_put would have to remain a single function in this case.

@chrisduerr
Copy link
Member

Just to recap because this is quickly fading into obscurity and I don't want to forget about it entirely:

Splitting opaque_start up would be no problem, since we have access to the three states anyway, but splitting opaque_put and opaque_end is a problem because it means we'd have to keep track of the state we're currently in?

But couldn't you just store that on Parser (the size of which is irrelevant), then branch inside Action::OpaquePut => and Action::OpaqueEnd => since doing so should leave performance of the other branches unaffected?

I believe that's kinda your first suggestion? Why do you think this would impact performance?

@boazy
Copy link
Author

boazy commented Oct 16, 2024

But couldn't you just store that on Parser (the size of which is irrelevant), then branch inside Action::OpaquePut => and Action::OpaqueEnd => since doing so should leave performance of the other branches unaffected?

I believe that's kinda your first suggestion? Why do you think this would impact performance?

I honestly don't know. I assumed the size is important for performance since we want to be able to have all state transitions in one table (which only has 256 entries?). If we go back to the start, I created this PR trying to answer the comments to the original PR that tried to add APC support (#110), particularly this comment:

This basically feels like an embedded parser. Most of our parser is structured in a way to stream through bytes and process them as they're coming in, however rather than creating state transitions you're storing a state transition to later then match on it instead. VTE is already too slow and I'm afraid this will just amplify that problem.

#110 (comment)

I realize I can add additional state that doesn't affect OSC sequences, but seeing this comment I was a bit worried. Do you believe adding an additional field to Parser (let's say opaque_sequence_kind) should be ok?

@chrisduerr
Copy link
Member

I honestly don't know. I assumed the size is important for performance since we want to be able to have all state transitions in one table (which only has 256 entries?). If we go back to the start, I created this PR trying to answer the comments to the original PR that tried to add APC support (#110), particularly this comment:

Yeah that's fair. This would make this a slow sloppy implementation and it might not be worth adding it for that reason. But that applies to your current solution too and the only alternative is a bigger rework that integrates this as a core part of our parser.

I realize I can add additional state that doesn't affect OSC sequences, but seeing this comment I was a bit worried. Do you believe adding an additional field to Parser (let's say opaque_sequence_kind) should be ok?

I think we should either do this, or go for something that properly integrates everything into our standard processing pipeline, which of course is difficult without affecting performance. I'm very hesitant to accept this sort of "second rate escape sequence", but I would prefer that over the current implementation.

@boazy
Copy link
Author

boazy commented Oct 17, 2024

I think we should either do this, or go for something that properly integrates everything into our standard processing pipeline, which of course is difficult without affecting performance. I'm very hesitant to accept this sort of "second rate escape sequence", but I would prefer that over the current implementation.

Ok, I think I've understood you now. I'll modify the pull request along these lines.

@boazy boazy force-pushed the parserless-apc-support branch from 9038add to cbbdde3 Compare October 17, 2024 06:18
Split the `opaque_[start|put|end]()` set of functions into 3 different
sets of functions for SOS, PM and APC sequences.
@boazy boazy force-pushed the parserless-apc-support branch from cbbdde3 to 45061fd Compare October 17, 2024 06:20
Nightly cargo fmt seems to have added new rules since the last commit,
so the code (including code that I didn't touch in this PR) doesn't
past `cargo fmt +nightly -- --check` anymore.
@boazy boazy requested a review from chrisduerr October 17, 2024 13:11
src/lib.rs Outdated Show resolved Hide resolved
@boazy boazy requested a review from chrisduerr October 21, 2024 08:33
@boazy boazy force-pushed the parserless-apc-support branch from d511cdd to 4839d4d Compare October 21, 2024 08:33
@chrisduerr
Copy link
Member

Going it tenatively approve the current revision, I don't think there's any other changes necessary from your end.

While I'm not entirely convinced I'll give this a closer look and if there are no performance regressions it should be fine.

@chrisduerr
Copy link
Member

@boazy Just a heads-up: Once #118 is merged, it might make some sense to reinvestigate this. There's certainly been some changes and we do have a lot of extra free states/actions that could be made use of without affecting performance.

@kchibisov
Copy link
Member

We've reworked vte a bit, so probably would be easier to add things without hitting perf, since all the code that caused problem here is gone now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

Support for APC?
3 participants