Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactoring resizing into state machine. #2175

Merged

Conversation

Nexarian
Copy link
Contributor

@Nexarian Nexarian commented Mar 15, 2022

Switch to a state machine that is processed at its fastest once per XRDP main loop cycle. This allows for for more flexibility in how resizing is managed, and also allows for preparation for resizing to work with the EGFX changes that require this.

This also is a stability enhancement in that by queueing up the resize changes, they are processed one at a time, and if multiple resizes were to come in at once, or duplicates were to come in, this handles them elegantly. With the older version, XRDP was more likely to crash or cause a client disconnect, especially if there was a duplicate resize that came in as it sometimes does from MSTSC.

Fixes: #1928

@Nexarian Nexarian force-pushed the refactor_resizing_into_state_machine branch 4 times, most recently from 7548f3b to 03dff23 Compare March 30, 2022 06:00
@Nexarian Nexarian changed the title [Work-In-Progress] Refactoring resizing into state machine. Refactoring resizing into state machine. Mar 30, 2022
@Nexarian Nexarian force-pushed the refactor_resizing_into_state_machine branch from 03dff23 to e9da565 Compare March 30, 2022 21:43
@Nexarian Nexarian force-pushed the refactor_resizing_into_state_machine branch 4 times, most recently from e09a814 to 233904c Compare March 31, 2022 22:07
@Nexarian
Copy link
Contributor Author

Nexarian commented Apr 1, 2022

@matt335672 @metalefty @aquesnel Please take a look!

@matt335672
Copy link
Member

Good stuff - thanks @Nexarian.

I'll take a detailed look after the weekend.

Copy link
Contributor

@aquesnel aquesnel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Nexarian thanks for this stability fix. There are a few things I didn't understand about the design choices in this change, and I've left comments for the things that I don't understand. Can you please help me to understand the choices made in this change.

xrdp/xrdp_mm.c Outdated
dynamic_description = (struct dynamic_monitor_description *)
g_malloc(sizeof(struct dynamic_monitor_description), 1);

error = libxrdp_process_monitor_stream(s, &(dynamic_description->description), 1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this error code being ignored?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It isn't anymore! Good catch :)

Comment on lines +1178 to +1209
case WMRZ_ENCODER_DELETE:
// Disable the encoder until the resize is complete.
if (mm->encoder != NULL)
{
xrdp_encoder_delete(mm->encoder);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't look at the module manager that uses the encoder code, so just looking at this code deleting the encoder feels like we are putting the module manager into an invalid or inconsistent state. Can you please explain why a single step of the state machine doesn't move the system completely from the current screen size to the new screen size?

Copy link
Contributor Author

@Nexarian Nexarian Apr 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are several reasons:

  1. We may or may not have an encoder in the module manager. Some types of XRDP rendering don't use it. Therefore, having no encoder is a valid state. You could argue that deleting an encoder when there was one previously IS invalid, however. But I don't have a ton of choice here because
  2. This is engineering for the future when the EGFX code for hardware acceleration gets merged in. For this specific change, yes, I could delete and re-create the encoder in one step, but the way XRDP is architected right now, we can't do that for EGFX, which I have a prototype branch of here: https://github.com/Nexarian/xrdp/tree/mainline_merge/xrdp
  3. After extensive testing, the EGFX resizing setup requires that you delete the EGFX connection and reconstitute it. The way XRDP is architected right now, if a frame comes into the encoder, it processes it. Also, the encoder has the width and height of the screen(s?) hard coded at creation, and when it comes to H264, I'm not sure that's dynamically changeable. Deleting the EGFX connection requires a state machine, as you can't re-create one until the old one has been shut down (and the Microsoft Clients get REALLY angry if you try to break this order).

What I could do is architect a way to not delete the encoder, put some sort of pause or disable flag in it so it stops processing new frames, and then delete and re-create it in one go before invalidate is sent to xorgxrdp. However, I'm not sure if a "disabled" encoder is better than a "deleted/NULL" encoder...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that having a disabled encoder is much safer since you have said that different encoders have different workflows for being cleaned up and deleted. My concern is that deleting the encoder without synchronizing with the code that uses the encoder can leaver the other code in an invalid state and potentially in a race condition or use-after-free situation.

To me this feels like there is an interface that is missing for delegating to the encoder to clean itself up. I'm thinking of an interface like encoder.try_cleanup() that can be called repeatedly until it succeeds. For example:

resize_display() {
// ...
switch(resize_sate):
    case WMRZ_DELETE_ENCODER:
        if (encoder.try_cleanup() == SUCCESS) {
            advance_resize_state(resize_sate, WMRZ_SERVER_MONITOR_RESIZE);
        } else {
            LOG_DEVEL(LOG_LEVEL_INFO, "encoder cleanup still in progress, waiting before trying the encoder clean up again");
        }
        break;
//...
}

This is just my naive thoughts. Since you've been looking into the code quite a bit, what it is about the xrdp architecture that is preventing a solution that would allow just deleting an recreating the encoder safely in one step?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aquesnel I'm nearly certain that the encoder delete code that already exists here does exactly what you want. It's already a try_cleanup, it blocks until the encoder is finished, deleted, and all state is cleaned up. It's very nice.

Now, the question you asked that took me over a month to answer: What about the xrdp architecture is preventing a solution that would just allow deleting and recreating the encoder safely in one step. I've done my best to answer that in this wiki section.

I recognize this is a lot to read, but I think it's important that more XRDP contributors start to understand the direction that EGFX is headed and how it affects resizing, as these will become bread-and-butter features in the future.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll paraphrase what I put in a comment for the issue:
I've read the wiki you wrote fully but I don't fully understand it. I think that improving that wiki and my understanding of the architecture should be moved to a new issue/discussion so that this pull request can be unblocked.

xrdp/xrdp_mm.c Outdated
}
else if (description->state == WMRZ_ERROR)
{
LOG(LOG_LEVEL_INFO, "dynamic_monitor_process_queue: Clearing completed resize (w: %d x h: %d)", description->description.session_width, description->description.session_height);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: s/clearing completed/clearing errored/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

xrdp/xrdp_mm.c Outdated
Comment on lines 1311 to 1295
else if (description->description.session_width <= 0 && description->description.session_height <= 0)
{
LOG(LOG_LEVEL_INFO, "dynamic_monitor_process_queue: Not allowing resize due to invalid dimensions (w: %d x h: %d)", description->description.session_width, description->description.session_height);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this condition of invalid bounds handled in both this function and the process_dynamic_monitor_description function? Can we simplify this by handling this condition only in process_dynamic_monitor_description and seeing the state to WMRZ_ERROR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch! This code has been in the works for a while so I missed that :)

return 0;
}

switch (description->state)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I read the issue and commit description about a state machine to coordinate server screen resizes, I assumed that there would be a single resize state for the screen. Having the resize state stored in each resize request description is surprising since it means that the actual state of the screen (and supporting components) can be out of sync with the resize state enum in the resize description request.
Can you please explain why you went with sorting the resize state enum in the description instead of with the screen that is being resized?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic here is that when a resize request comes in, we don't process it immediately, we store it in a queue and deal with it later. The data that comes in from a resize request is of the same format as that of any type of display description. So, store the display description that is targeted in the queue, perform the resize, and then when we're done, overwrite whatever the previous config was with the new one.

Maybe the word "description" is misleading? I could change it to something else. Otherwise, I don't fully grok your concern?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My concern is that for the resize to be processed correctly we need to strictly process one resize request at a time to completion before starting the next resize request. With the current code it is possible to represent a resize state like the following: (and this example it would be a bug in the resize code)

xrdp_mm = {
    resize_queue = [
        dynamic_monitor_description = { display_resize_state = WMRZ_SERVER_VERSION_MESSAGE, ...},
        dynamic_monitor_description = { display_resize_state = WMRZ_QUEUED, ...},
        dynamic_monitor_description = { display_resize_state = WMRZ_ENCODER_DELETE, ...},
    ],
    ...
}

To avoid this situation the code has to be carefully written to ensure the invariants that only one dynamic_monitor_description has a state other than WMRZ_QUEUED at a time, and that

The two ways that I deal with code that must be carefully written to avoid bugs is to encapsulate the critical path code that updates the state, and the other is to make invalid states unrepresentable. Here is how I would change the code to make the invalid state unrepresentable:

  • move the enum display_resize_state state from the dynamic_monitor_description struct to the xrdp_mm struct
  • add a display_size_description display_description_resize_in_progress to the xrdp_mm struct
  • remove the explicit WMRZ_QUEUED state and replace its concept with the display_size_description just being in the resize_queue list

This would make the xrdp_mm struct look like:

xrdp_mm = {
    display_resize_state = WMRZ_SERVER_VERSION_MESSAGE,
    display_description_resize_in_progress = { /* display_size_description struct */ },

    resize_queue = [
        display_size_description = { /* display_size_description struct */ },
        display_size_description = { /* display_size_description struct */ },
    ],
    ...
}

Using this representation of the resize in progress it is much easier to enforce the invariant that there is only 1 resize in progress at a time, and the resize is processed to completion before the next resize is processed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that the resize state is better stored in xrdp_mm, as apart from the resize being processed, every other item should be WMRZ_QUEUED, and so doesn't need an explicit state, which is another way of saying what @aquesnel has said above.

Using an explicit display_description_resize_in_progress is very clear, but you could also just use the head of the queue as the struct being worked on. There are advantages either way. Using the head of the queue may save you a state, as you can use the queue length being > 0 as an indication that a resize is in progress.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@matt335672 @aquesnel I spent a lot of time thinking about the right way to do this. I settled on this:

  • Queue up only the display_size_description struct, as @aquesnel proposed.
  • In xrdp_mm, now have display_control_monitor_layout_data which wraps display_size_description and contains the state and timestamps for logging. There is only one of these, and it is only ever populated if it is null. If it is not null, the state machine is processed as in the original version. This is a separate commit from the original commit so you can see how I've changed it.

@Nexarian
Copy link
Contributor Author

Nexarian commented Apr 1, 2022

@aquesnel Extremely busy this weekend, but I should start to be able to respond to your comments around Tuesday!

xrdp/xrdp.h Outdated

#define XRDP_DISPLAY_RESIZE_STATE_TO_STR(status) \
((status) == WMRZ_QUEUED ? "QUEUED" : \
(status) == WMRZ_ENCODER_DELETE ? "WMRZ_ENCODER_DELETE" : \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small point - this string is incompatible with the others in that it has a WMRZ_ prefix

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

xrdp/xrdp_mm.c Outdated
description->minfo_wm,
sizeof(struct monitor_info)
* CLIENT_MONITOR_DATA_MAXIMUM_MONITORS);
LOG(LOG_LEVEL_INFO,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LOG_DEVEL() ? This doesn't look like user-specific logging.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

xrdp/xrdp_mm.c Outdated
}

const char *
wm_login_state_to_str(enum wm_login_state login_state);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A function prototype not in a header is rarely a good idea. You could rename this function as xrdp_wm_login_state_to_str() and add the prototype to xrdp/xrdp.h along with all the other wm functions.

A better idea might be to replace this code which occurs twice with minor variations:-

    if (wm->login_state != WMLS_CLEANUP && wm->login_state != WMLS_INACTIVE)
    {
        LOG(LOG_LEVEL_INFO, "process_dynamic_monitor_description: Not trying to resize while logging in! State was %s", wm_login_state_to_str(wm->login_state));
        return 0;
    }

with a call to a function like xrdp_wm_can_resize():-

int
xrdp_wm_can_resize(struct xrdp_wm *self)
{
    int rv = (self->login_state != WMLS_CLEANUP && self->login_state != WMLS_INACTIVE);
    if (!rv)
    {
        LOG(LOG_LEVEL_INFO, "Not trying to resize while logging in!");
        LOG_DEVEL(LOG_LEVEL_INFO, "State is %s", wm_login_state_to_str(self->login_state));
    }
    return rv;
}

That better keeps the encapsulation of the login_state member. The above is off the top of my head, but I'm sure you get the idea. Prototype for xrdp_wm_can_resize() lives in xrdp/xrdp.h.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

xrdp/xrdp_mm.c Show resolved Hide resolved
xrdp/xrdp_mm.c Outdated
{
return 0;
}
if (self->wm->login_state != WMLS_CLEANUP && self->wm->login_state != WMLS_INACTIVE)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

xrdp/xrdp_mm.c Outdated
}
ignore_marker = (struct display_size_description *)
g_malloc(sizeof(struct display_size_description), 1);
g_memset(ignore_marker, 0, sizeof(struct display_size_description));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unnecessary call to g_memset(). Passing 1 as the second parameter of g_malloc() does this for you

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

xrdp/xrdp_wm.c Show resolved Hide resolved
@Nexarian
Copy link
Contributor Author

Nexarian commented Apr 5, 2022

@matt335672 Thanks for the review, while I work on addressing the comments from both of you, would you be able to test and verify that you don't see the MSTSC crash with this as well? I want to make sure someone else other than me can verify that it does fix what's intended here.

Copy link
Member

@matt335672 matt335672 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On a general point, there are some lines quite a bit over 80 columns here. Can you make an effort to break them where appropriate to resize?

Thanks again for all the work you're putting in to this.

xrdp/xrdp_mm.c Outdated
sizeof(struct monitor_info)
* CLIENT_MONITOR_DATA_MAXIMUM_MONITORS);
LOG(LOG_LEVEL_INFO,
"process_dynamic_monitor_description:"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably mean advance_resize_state_machine here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Fixed.

@Nexarian
Copy link
Contributor Author

Nexarian commented Apr 7, 2022

Also I will fix the "previous state took [very large number] MS" log messages.

Now that you've verified that it actually fixes the issue I think it does, I'll get to work on cleanup!

Copy link
Member

@matt335672 matt335672 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple more for you which I found while testing #2219 against #1928

xrdp/xrdp_mm.c Outdated
@@ -1244,6 +1355,7 @@ dynamic_monitor_initialize(struct xrdp_mm *self)
int
xrdp_mm_drdynvc_up(struct xrdp_mm *self)
{
struct display_size_description *ignore_marker;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be struct dynamic_monitor_description *

struct display_size_description is smaller than struct dynamic_monitor_description. Using the smaller structure results in invalid memory accesses in dynamic_monitor_process_queue() when the pointer is cast to struct dynamic_monitor_description *

Valgrind found this one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

xrdp/xrdp_mm.c Outdated
error = dynamic_monitor_initialize(self);
}
ignore_marker = (struct display_size_description *)
g_malloc(sizeof(struct display_size_description), 1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The above two should be struct dynamic_monitor_description as mentioned above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@Nexarian Nexarian force-pushed the refactor_resizing_into_state_machine branch 7 times, most recently from 9b86704 to 7159e49 Compare April 20, 2022 04:44
@Nexarian
Copy link
Contributor Author

@aquesnel @matt335672 Can you take a look at this, especially this, and let me know what you think about deleting and re-creating the encoder after understanding my documentation?

@matt335672
Copy link
Member

@Nexarian - gimme a few days and I'll get to it.

@matt335672
Copy link
Member

@Nexarian - that looks good to me.I can't say I fully understand it on a first reading, but you've done a good job of making the case for the 'acceleration assistant', and the UML diagram is extremely helpful.

A couple of minor comments:-

  • This area changes pretty quickly, so adding some dates in at various places might be useful. At some stage, when the wiki gets revised, it will be useful to know when particular technologies were being used.
  • You may want to give some thought into making the source for the UML diagram available somewhere.

Thanks.

@Nexarian Nexarian force-pushed the refactor_resizing_into_state_machine branch 3 times, most recently from 1929bf6 to 9f8390f Compare June 26, 2022 01:51
@Nexarian
Copy link
Contributor Author

@matt335672 I have:

  1. Switched the UML diagram to SVG. Now anyone with an SVG editor can change it, and it scales so much better! I used yEd (https://yed.yworks.com/) to create it. The original is in a .graphml file, but I'm not sure how to preserve that longer term. Suggestions appreciated!
  2. I went through the wiki and updated major sections to include a date header. I can move the old section into an appendix when it changes appreciably.

@Nexarian
Copy link
Contributor Author

@matt335672 @aquesnel Please re-review the PR. I've updated all the comments, the main one being addressing the enumeration state as part of a queued item. I moved that out. Rebased from devel and tested, made sure the CI checks pass.

If you feel I've properly addressed the question as to why deleting the encoder when I do appropriately, are there any other blockers?

@Nexarian Nexarian requested a review from matt335672 June 26, 2022 02:05
Nexarian added a commit to Nexarian/xrdp that referenced this pull request Jun 26, 2022
The RFX compression mode requires an encoder. When you resize RFX, you
have to recreate the encoder.

This will fix it as well: neutrinolabs#2175,
however, that PR is bigger and more complex and may take longer.
@aquesnel
Copy link
Contributor

aquesnel commented Jun 27, 2022

Hi @Nexarian ,

I read the wiki page you created and like Matt, I don't fully understand the section about "Special Topic: Deleting and re-creating the encoder in the EGFX workflow for resizing" nor the what is the solution that you used to minimize/avoid memory copies of encoded data.

That being said, the wiki page has convinced me that what you have made is very valuable and seems reasonably tested. I would like to fully understand this, but at this point I think that would be making "perfect the enemy of good". My recommendation is to move forward with the merge and in parallel/after continue to document and discuss the architecture of encoders in XRDP in a new issue/discussion topic.

FYI, I like https://plantuml.com/ for sequence diagrams since it is text based (which is easy source that can be shared/saved) and can produce a very similar diagram to what you have.

xrdp/xrdp_mm.c Outdated
Comment on lines 1079 to 1102
LOG(LOG_LEVEL_DEBUG, "dynamic_monitor_data: msg_type %d msg_length %d",
msg_type, msg_length);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a log message that is only useful for devs and not admins. I suggest changing this to LOG_DEVL

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

xrdp/xrdp_mm.c Outdated
Comment on lines 1082 to 1124
if (msg_type != DISPLAYCONTROL_PDU_TYPE_MONITOR_LAYOUT)
{
LOG_DEVEL(LOG_LEVEL_ERROR,
"process_dynamic_monitor_description:"
" description is null.");
return 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there other message types that this function is expected to process but is not supported?
Can you add a LOG message for the message types that are ignored here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added this code block

 // Unsupported message types
        switch (msg_type)
        {
            case DISPLAYCONTROL_PDU_TYPE_CAPS:
                LOG(LOG_LEVEL_ERROR, "dynamic_monitor_data: Received"
                    " DISPLAYCONTROL_PDU_TYPE_CAPS. Per MS-RDPEDISP 2.2.2.1,"
                    " this is a server-to-client message, client should never"
                    " send this. Ignoring message");
                break;
            default:
                LOG(LOG_LEVEL_ERROR, "dynamic_monitor_data: Received neither"
                    " nor DISPLAYCONTROL_PDU_TYPE_CAPS. Ignoring message.");
                break;
        }

Hope that settles this concern.

Comment on lines +1178 to +1209
case WMRZ_ENCODER_DELETE:
// Disable the encoder until the resize is complete.
if (mm->encoder != NULL)
{
xrdp_encoder_delete(mm->encoder);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll paraphrase what I put in a comment for the issue:
I've read the wiki you wrote fully but I don't fully understand it. I think that improving that wiki and my understanding of the architecture should be moved to a new issue/discussion so that this pull request can be unblocked.

@matt335672
Copy link
Member

The .graphml file can be embedded in another file (see for example https://github.com/matt335672/xrdp/wiki/Authentication-architecture-redesign), or even just moved into the wiki as is, I suspect. It's worth a play with the wiki tools to find out.

I'll get round to a review later in the week.

Thanks again for all the effort you're putting into this @Nexarian

@Nexarian
Copy link
Contributor Author

Nexarian commented Jun 27, 2022

@aquesnel In short:

  1. We have to re-create the encoder on resize because for at least some encoders, the size of the screen is set read-only on encoder creation.
  2. If we didn't have to use a state machine for the resizing workflow, re-creating the encoder in one step is preferred. I just checked that into devel here
  3. However, due to the impending integration of EGFX work, we need a state machine. That makes life harder, because now resizing happens in multiple asynchronous steps.
  4. During that async workflow, there is no easy way to prevent invalid data from being sent to the encoder. We could add some flag/state management to XRDP, but that is an additional opportunity for bugs.
  5. Thus, the solution is to shut down and delete the encoder during resize, and bring it back only after everything has settled. A NULL encoder is a supported state, and always has been as not all session types use it.
  6. Microsoft clients are super sensitive to the "wrong" data being sent to them, so we have to be extremely careful as to how the state machine is constructed.
  7. This involves juggling, with EGFX and HW accel integrated, the states of 3 different processes. It gets a little crazy.

I'll address everyone's feedback later this week.

Copy link
Member

@matt335672 matt335672 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's my somewhat belated review - looking good.

xrdp/xrdp_mm.c Outdated
return error;
}

/******************************************************************************/
static int
process_dynamic_monitor_description(struct xrdp_wm *wm,
struct dynamic_monitor_description *description)
process_display_control_monitor_layout_data(struct xrdp_wm *wm,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for sorting out the state variable.

As there's only one description struct in place now, it may be clearer to remove the description parameter from this function and also advance_resize_state_machine() and advance_error(). That will also let you remove the double-indirections in dynamic_monitor_process_queue()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

xrdp/xrdp_mm.c Outdated
*description,
WMRZ_ENCODER_DELETE);
list_remove_item(self->resize_queue, 0);
g_free((struct display_size_description *)queue_head);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you set auto_free for the resize_queue, you can remove this g_free() call as list_remove_item() will do it for you. That will also fix the possible memory leak in xrdp_mm_delete() where you are calling list_delete() on the resize_queue without checking it's empty first.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

@Nexarian
Copy link
Contributor Author

Nexarian commented Jul 2, 2022

Once #2301, #2300, and #1928 are resolved, I'll update, rebase, and merge this.

@Nexarian Nexarian force-pushed the refactor_resizing_into_state_machine branch 4 times, most recently from cf07999 to a9a72bd Compare July 10, 2022 23:07
@Nexarian
Copy link
Contributor Author

Still working on this. Found some bugs I have to work out before I'm ready to commit this.

@Nexarian Nexarian force-pushed the refactor_resizing_into_state_machine branch from c5e2e23 to 2f9cd39 Compare July 11, 2022 03:29
- Fixes MSTSC resizing (with RFX as well).
- Queue system so that resizes are processed when XRDP and the X server
are ready, not immediately.
- Deletes and recreates the encoder (RFX fix)
- Introduces a dynamic_monitor_description struct that is used for the
queue system.
- Fix some lines previously introduced by the original resizing code to
be 80 chars long, as is the standard for XRDP.
@Nexarian
Copy link
Contributor Author

@aquesnel Created #2313 to further continue discussions of the encoder.

@Nexarian
Copy link
Contributor Author

The only thing I haven't yet addressed is how to upload the graphml/svg for the state diagram in the wiki. That's out of scope for this PR, so I'll merge this and then we can discuss that separately.

@Nexarian Nexarian merged commit 682ca95 into neutrinolabs:devel Jul 11, 2022
derekschrock pushed a commit to derekschrock/xrdp that referenced this pull request Dec 15, 2022
The RFX compression mode requires an encoder. When you resize RFX, you
have to recreate the encoder.

This will fix it as well: neutrinolabs#2175,
however, that PR is bigger and more complex and may take longer.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

MSTSC crashes when resolution is changed by maximizing on a different monitor
3 participants