Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WMAgent components terminated but PID remains alive #12091

Open
amaltaro opened this issue Sep 4, 2024 · 0 comments
Open

WMAgent components terminated but PID remains alive #12091

amaltaro opened this issue Sep 4, 2024 · 0 comments

Comments

@amaltaro
Copy link
Contributor

amaltaro commented Sep 4, 2024

Impact of the bug
WMAgent (docker model)

Describe the bug
It's not the first time I notice this behavior, but now I manage to collect the required artifacts.
When WMAgent components crash and their worker thread is terminated, example:

2024-08-29 21:30:51,756:140034193299200:INFO:BaseWorkerThread:Worker thread <WMComponent.ErrorHandler.ErrorHandlerPoller.ErrorHandlerPoller object at 0x7f5c48a74310> terminated

they are still reported as running with manage status script:

(WMAgent-2.3.4.3) [xxx@cmsgwms-xxx:current]$ manage status
Component:ErrorHandler Running:1166
...

which actually makes sense, as the process still exists:

(WMAgent-2.3.4.3) [xxx@cmsgwms-xxx:current]$ ps aux | grep 1166
xxx+    1166  0.0  0.0 340604 71492 ?        S    Aug26   5:46 python /usr/local/bin/wmcoreD --start --config=/data/srv/wmagent/2.3.4/config/config.py

This behavior is different than the one we used to have in the RPM model, which would bring the component down (and exit the process) if the component only had that one worker thread (which is the case for ErrorHandler).

Note that the component (worker thread) is properly monitored in WMStats, which correctly says that the component is down.

How to reproduce it
Perhaps as simple as making a component to crash.

Expected behavior
If the component has a single worker thread (which is the most majority of the WMAgent components), whenever a worker thread gets terminated, it should terminate the process as well and properly report that component as down in the manage script.

Additional context and error message
None

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant