Skip to content

Fix race between parent and monitor in use_pty #1130

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open

Conversation

squell
Copy link
Member

@squell squell commented May 10, 2025

The problem with #1129 was that when the monitor shuts down, it takes the pseudoterminal it controls with it. So we end up in a condition where the parent is racing with the monitor to read remaining data before the monitor crosses the finish line.

This could be demonstrated by adding a sleep in the monitor, but a better fix is simply to use the existing backchannel to synchronise parent and monitor. I re-used the old ExecCommand message for this, as that was already used to handle a similar race at the start.

So basically the backchannel protocol is: parent sends an Edge to signal the monitor it can go (this was already the case), do the usual thing while the monitor and child are doing their dance, and then again when the monitor has communicated that it is ended, let it wait for another Edge from the parent.

NOTE: This fixes #1129 on FreeBSD, but not on the Trixie machine exhibiting the behaviour. Haven't been able to reproduce on a second machine reliably.

@squell squell added the C-exec Execution component (interfacing with OS) label May 10, 2025
@squell squell changed the title Fix race between parent and monitor. Fix race between parent and monitor in use_pty May 10, 2025
@squell
Copy link
Member Author

squell commented May 10, 2025

By the way, I have left b156eb9 in for illustration purposes only. That fix alone takes care of the problem "in practice". We can drop that one before it is merged.

@squell squell requested a review from pvdrz May 10, 2025 14:10
@squell squell linked an issue May 10, 2025 that may be closed by this pull request
@squell squell enabled auto-merge (rebase) May 10, 2025 14:14
@trifectatechfoundation trifectatechfoundation locked as spam and limited conversation to collaborators May 10, 2025
@trifectatechfoundation trifectatechfoundation unlocked this conversation May 11, 2025
@squell
Copy link
Member Author

squell commented May 11, 2025

Note, another solution would maybe be to not _exit(1) in the monitor but use `loop { std::thread::yield_now(); }' and then kill the monitor off. But that feels very heavyhanded.

@squell squell disabled auto-merge May 11, 2025 20:51
@squell
Copy link
Member Author

squell commented May 12, 2025

This fixes the problem on the FreeBSD machine I tested, but not on the Debian trixie machine.

@squell squell marked this pull request as draft May 12, 2025 11:59
@squell squell marked this pull request as ready for review May 12, 2025 18:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-exec Execution component (interfacing with OS)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

In use_pty mode sometimes output is missed.
1 participant