Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: better CI support #98

Merged
merged 13 commits into from
Oct 19, 2023
Merged

feat: better CI support #98

merged 13 commits into from
Oct 19, 2023

Conversation

Yohe-Am
Copy link
Collaborator

@Yohe-Am Yohe-Am commented Oct 2, 2023

This PR fixes #83.

I've decided to go with a straight forward solution that, when the program is provided with the --exit-after flag, uses an Actor that asks the CommandActors to tell it when they're finished. Once it has the go from all its targets, it exits the program.

In order to make the tool more CI friendly, I've also introduced a --no-watch flag that globally disables reloading triggered by fs watch events. I also recommend adding, in this PR or another, code that disables the Console actor when not needed and joins all the child std pipes. The console makes little sense in a CI context.

@Yohe-Am Yohe-Am changed the title Support run and task Support run and exit task Oct 2, 2023
Copy link
Owner

@zifeo zifeo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very well done! A few suggestions. For the the todo, would you mind create an issue or tackling them? Let's fix the CI & remove remove dead/commented code.

src/actors/command.rs Outdated Show resolved Hide resolved
src/actors/command.rs Outdated Show resolved Hide resolved
src/args.rs Show resolved Hide resolved
src/args.rs Outdated Show resolved Hide resolved
@zifeo
Copy link
Owner

zifeo commented Oct 3, 2023

Also do not forget to include the flags in the readme.

@Yohe-Am Yohe-Am changed the title Support run and exit task feat: better CI support Oct 3, 2023
@Yohe-Am
Copy link
Collaborator Author

Yohe-Am commented Oct 3, 2023

Alright, I've resolved a few of the suggestions. They might bloat this PR but I have a few minor ideas for improvements in this direction:

  • A keybinding in the console that'll trigger the exit-after behavior. This allows a scenario where a user can start in whiz without the flag, and they can investigate the panels and such. If satisfied, they press the keybinding allowing whiz to complete when all the tasks are finished. This is better than just using the flag alone since if the commands exit too quickly they might not give the user time to do proper investigations. The current implementation only needs slight modification to support this usecase.

  • In the spirit of KISS, the implementation ignores any reloads that happens after the task has reported itself finished. The exit-after flag implies disabling file watching as well and I believe this approach won't surprise most users but maybe such behavior is desirable.

src/args.rs Show resolved Hide resolved
@zifeo
Copy link
Owner

zifeo commented Oct 5, 2023

@Yohe-Am That makes sense. I would keep this behaviour separated from q. Alternatively, flushing the log buffer ordered would maybe be even better if a debug is necessary or checking the run (as most of the CI will not show any log with the current implementation).

@Yohe-Am
Copy link
Collaborator Author

Yohe-Am commented Oct 5, 2023

Ah, the test I wrote appear to be flaky. The test asserts that the return code of all the test tasks, simple ls and echo commands, are 0. The test reads this code from the Actix::System when it's done and GrimReaperActor is responsible for setting this. It only sets 0 if all tasks it was targeted to kill exit with a 0. The problem is, sometimes, some tasks return ExitStatus::Undetermined which I've decided to interpret as a non-0 error code.

Looking through manuals and the subprocess code, this ExitStatus is only returned when something has already used waitpid on the child to completion. And waitpid is only used by the poll and wait_timeout invocations. All the new code I've added goes through the Child abstraction which avoids using poll if the child has already been waited upon to completion. Actix, at least as used in whiz, doesn't actors move across threads either. Not sure what's wrong, maybe I'm looking in the wrong place...

Any ideas?

@Yohe-Am
Copy link
Collaborator Author

Yohe-Am commented Oct 5, 2023

After some investigation, the error was indeed elsewhere. I was led astray by the Child::Killed to ExitStatus::Undetermined mapping in Child::exit_status. The main issue appears to be that the child is not always dead by the time the actor processes the StdoutTerminated event and the ensure_stopped therein call ends up pulling the plug. I suppose I'll have it wait for a while for the child to die, in the manner of WaitStatus before pulling the plug. I'm thinking, poll if the child is dead every 20 millis and kill it if a second has elapsed. Lemme see...

@Yohe-Am
Copy link
Collaborator Author

Yohe-Am commented Oct 5, 2023

Ayup, that seems to have fixed it. Seems to be working well now and ready to merge. I don't think the solution is a semantic violation, right? StdoutTerminated always correlates to death?

@zifeo
Copy link
Owner

zifeo commented Oct 6, 2023

@Yohe-Am unsure of the last (quite complex :)) commit, I believe ensure_stopped should do the same behaviour. Do you see a diff?

@Yohe-Am
Copy link
Collaborator Author

Yohe-Am commented Oct 6, 2023

@zifeo ensure_stopped kills the child if it's still running. As it turns out, the child might still be running by the time StdoutTerminated is processed by the actor. That complex async bit's just giving slight breathing room (a second) for the command to complete naturally. Otherwise, this edge case robs us a chance of reading its actual error code.

@zifeo
Copy link
Owner

zifeo commented Oct 6, 2023

@Yohe-Am would use ensure_stop somehow when the stdout terminated event is sent simplify that edge case? The CI seems still failing (maybe os specific).

@Yohe-Am
Copy link
Collaborator Author

Yohe-Am commented Oct 6, 2023

I suppose, to simplify the code, I can use the wait_timeout syscall to achieve the same behavior. It won't be async but I don't think that's a boon here. On the behavior this achieves though, I can't think of a simpler way of handling the edge case.

@Yohe-Am
Copy link
Collaborator Author

Yohe-Am commented Oct 6, 2023

CI

yeah, I'm not sure why macOS bash is acting differently. hmm...

@Yohe-Am
Copy link
Collaborator Author

Yohe-Am commented Oct 6, 2023

Alright, I've pushed a change that simplify that async bit.

CI, macOs

According to this, it might be the time unit. on sleep. I've pushed a change that'll hopefully get it along.

@Yohe-Am
Copy link
Collaborator Author

Yohe-Am commented Oct 6, 2023

windows

oof, right. Let me add a quick cfg case for the test

@Yohe-Am

This comment was marked as resolved.

@Yohe-Am

This comment was marked as resolved.

@Yohe-Am
Copy link
Collaborator Author

Yohe-Am commented Oct 7, 2023

Turns out the TIMEOUT command on Windows only works in interactive shell sessions. According to this and other discussions I've found, abusing the PING program as an alternative to sleep is common under cmd and that should hopefully fix that.

Aand, after some consideration, seeing that a command as simple as sleep has quirked up all three platforms, I've just replaced it all with python invocations in the tests.

@Yohe-Am Yohe-Am enabled auto-merge (squash) October 17, 2023 18:20
@Yohe-Am Yohe-Am merged commit 1330a27 into zifeo:main Oct 19, 2023
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Run task then exit support
2 participants