-
-
Notifications
You must be signed in to change notification settings - Fork 325
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix infinite hang in wait_for_process_to_die() on Windows #4685
base: master
Are you sure you want to change the base?
Conversation
The ninja generator uses wait_for_process_to_die() to wait for a previous scons daemon process to terminate. It previously had 3 separate implemementations: one using psutil if that module was not available, plus a fallback implementation for Windows, and one for non-Windows platforms. I was encountering problems with the fallback implementation on Windows hanging forever, and not being interruptible even with Ctrl-C. Apparently the win32 OpenProcess() function can return a valid process handle even for already terminated processes. This change adds an extra call to GetExitCodeProcess() to avoid infinitely looping if the process has in fact already exited. I also added a timeout so that this function will eventually fail rather than hanging forever if a process does not exit. I also removed the psutil implementation: it seemed simpler to me to avoid having multiple separate implementations with different behaviors. This appeared to be the only place in scons that depends on psutil outside of tests. Also note that this wait_for_process_to_die() function is only used by the ninja tool. I have added unit tests checking the behavior of wait_for_process_to_die(). Without the `GetExitCodeProcess()` check added by this PR, the simple code in `test_wait_for_process_to_die_success()` would hang forever.
Does the infinite hang only happen when psutil is not installed? |
I didn't actually test installing psutil, but I assume that version did work. The implementation using psutil enumerates all processes on the system, and checks to see if the pid in question is in that list. When this bug occurs, the process in question is not shown in Task Manager or in the output of tasklist.exe, but Windows does successfully return a handle for it if you call OpenProcess() on the pid. (I didn't much information about this behavior in the Windows API docs, but there are various stack overflow questions like https://stackoverflow.com/questions/35296996 that discuss it a bit.) My main rationale for just removing the psutil implementation was:
|
Can you restore the psutil usage in the function, but leave your improved code for when that's not available? ninja tool use already requires ninja package installation, so using psutil if available seems reasonable. And then add a blurb in CHANGES.txt and RELEASE.txt and should be no issue to merge. |
We can add a note in the docs that psutil can optionally be installed for ninja tool usage. That said when this goal was set, installing other packages was much much more complicated than it is today. |
What is the benefit of keeping the psutil implementation? It seems to just add maintenance complexity to me to have multiple separate implementations that all must be maintained. If you don't trust the non-psutil version to work properly, it seems better to just remove it and make psutil a hard dependency that is checked on start-up (or perhaps whenever the ninja tool is first encountered in an SConscript file). |
IMHO this PR is about fixing the issue you had, which was only in the code used when there was no psutil available. If you'd like to float a separate PR about removing psutil, feel free, but I don't see any reason to remove it at this point. Regardless, I don't see how not removing psutil affects your usage in any way |
I've updated the code to keep the existing psutil implementation as-is. Let me know if you would prefer I squash this into a single commit and force-push instead. On a somewhat unrelated note, looking through CHANGES.txt was actually a blast from the past, and rather helpful:
I think this is the key functionality I really care about, rather than the Ninja generator itself. I mostly wanted to use ninja to avoid having to wait for SCons to start up on incremental rebuilds. I had forgotten that this |
Yeah, there's a particular range of dates where commits are hard to trace - seems to have been from the period when the project was using subversion and hosted on tigris. - lots if "merged hundreds of revisions via svnmerger from ..." . Before my time :-) but frustrating for prospecting. Interactive could be pretty useful, I didn't see any real use for it for a long time, but I'm coming around. I'm thinking if someone ever built an IDE plugin, you could configure the build to remain open and be able to quickly rebuild on changes as long as you stay in the editor... Was actually poking at an Interactive bug quite recently (#3029) if you felt amused to look... |
Ninja calls back to interactive scons session to build what it can't. |
Yeah, interactive mode always did feel like a bit of a hack to me, but seemed necessary at the time to make incremental rebuilds fast. If scons takes several seconds to start on a build, this can be a significant fraction of the build time if only a single source file was changged. From 2003-2009 I was at a company that used scons as their build tool, and the start-up overhead was pretty high (they used Yeah, to do fast incremental rebuilds well it would be nice to spawn a background daemon that can monitor the filesystem for changes so it can be immediately ready to go whenever it is requested to do a build. More recently I was at Facebook for many years, and buck2 does this. The file system monitoring piece is available as a separate project if scons ever wanted to integrate it. (https://github.com/facebook/watchman/). It's actually quite nice and handles a lot of tricky cross-platform issues.
Yeah, I'll see if I can take a look, but no promises. I actually did just run into this myself using |
Tagged as "ninja" since that's the sole customer of the routine at the moment. |
The ninja generator uses
wait_for_process_to_die()
to wait for a previous scons daemon process to terminate. It previously had 3 separate implementations: one using psutil if that module was available, plus a fallback implementation for Windows, and one for non-Windows platforms.I was encountering problems with the fallback implementation on Windows hanging forever, and not being interruptible even with Ctrl-C. Apparently the win32
OpenProcess()
function can return a valid process handle even for already terminated processes.This change adds an extra call to
GetExitCodeProcess()
to avoid infinitely looping if the process has in fact already exited. I also added a timeout so that this function will eventually fail rather than hanging forever if a process does not exit.I also removed the psutil implementation: it seemed simpler to me to avoid having multiple separate implementations with different behaviors. This appeared to be the only place in scons that depends on psutil outside of tests. Also note that this
wait_for_process_to_die()
function is only used by the ninja tool.I have added unit tests checking the behavior of
wait_for_process_to_die()
. Without theGetExitCodeProcess()
check added by this PR, the simple code intest_wait_for_process_to_die_success()
would hang forever on Windows.Contributor Checklist:
CHANGES.txt
andRELEASE.txt
(and read theREADME.rst
).