Use dbus to tell when systemd has completed an action for us #3805

clumens · 2025-01-21T14:39:13Z

At the moment, this just handles start and stop actions. I still need to deal with monitors before I can remove a bunch of code and consider the upstream issue closed. But I expect the code to handle monitors to look very similar to what's already here, so people might as well take a look.

clumens · 2025-01-21T14:40:28Z

daemons/execd/execd_commands.c

+         */
+        pcmk__set_result(&(cmd->result), PCMK_OCF_UNKNOWN_ERROR,
+                         PCMK_EXEC_ERROR, NULL);
+    }


Note the FIXME here. There's several other possibilities systemd could give us - canceled, failed, dependency, and skipped are all documented. These could be broken out into separate errors.

clumens · 2025-01-21T14:41:56Z

lib/services/systemd.c

+    if (!pcmk__str_eq(op->action, PCMK_ACTION_START, pcmk__str_casei)) {
+        services__finalize_async_op(op);
+    }
+


I hate this because it moves a critical part of handling a resource out of the library and into the callers (see handle_systemd_job_complete). But, it's necessary to not call services__finalize_async_op right here because that function will remove the operation from the list of inflight ops and then free it, and it's called before handle_systemd_job_complete so we'll have no op to handle anymore.

I'll give this some more thought tomorrow night. I'm thinking perhaps we can design the callback differently, but I haven't thought through the specifics. Something involving one of the following:

One message filter per unit that we care about, instead of a global message filter. We can use the user_data field where we're currently passing NULL, to give a unit name. D-Bus runs all message filters until it gets a DBUS_HANDLER_RESULT_HANDLED code. We can remove a unit's handler when we're done with it.

Similar, but by using a hash table or list more cleverly within the services library.

I haven't really gotten back to this yet.

Even if we keep this basic setup, we should finalize the action if there is no callback registered.

What do you think of this? Disclaimer: I'm pretty exhausted while writing it. Forgive me if there is an egregious oversight.

Have filter_systemd_signals() take a svc_action_t as user_data.

DON'T add the filter in subscribe_to_signals().

DO add the filter within invoke_unit_by_path(), right before or right after services_set_op_pending(), passing op as user_data: dbus_connection_add_filter(systemd_proxy, filter_systemd_signals, op, NULL).

DON'T call services__finalize_async_op() within handle_systemd_job_complete().

In filter_systemd_signals(), if we make it to the end, call dbus_connection_remove_filter(systemd_proxy, filter_systemd_signals, user_data). This removes the filter for the given action. This should happen AFTER calling systemd_callback().

Either call services__finalize_async_op(user_data) immediately after calling dbus_connection_remove_filter(). Or alternatively, pass services__finalize_async_op as the free_data_function argument (4th argument position) when adding the filter.

Other implications:

No need for services__systemd_get_inflight_op(). Instead, pass the op directly to systemd_callback() as user_data, and have handle_systemd_job_complete() use it as-is. Or alternatively, pass op->cb_data as user_data, since that's the only part that handle_systemd_job_complete() cares about.

No need to expose services__finalize_async_op().

No need to pass unit_name to systemd_callback().

There may be still others, but at some point I think we do have to have a callback function within the executor, because there are lrmd* objects involved (like cmd).

This approach assumes that we end up receiving a JobRemoved signal for each op, so that we remove the filter that we added for that op. Otherwise, the list of filters would accumulate in the D-Bus connection object. I think we can (and in fact, we have to) assume that we'll get a JobRemoved signal for each unit start/stop that we call.

clumens · 2025-01-21T15:00:36Z

lib/services/systemd.c

+    crm_trace("Checking DBus message %s.%s() on %s",
+              dbus_message_get_interface(message),
+              dbus_message_get_member(message),
+              dbus_message_get_path(message));


I might remove this message - it's kind of a lot, even at trace level, since it will be triggered for every single thing systemd ever does after pacemaker has started.

If it catches even things unrelated to the systemd units that we're managing as resources, then that would be a lot. One could argue that all information is good information, since tracing has to be enabled at a fairly granular level... but I'd vote against this.

It looks like signal messages get broadcast, so this is probably at least getting all signals.

That might not be a huge deal, depending on what all events generate a signal... if there are no recurring events happening within systemd, then it might be fine.

We had a mechanism for ridiculously granular XML logging for a while, separate from tracing (I think it had to be enabled at build time). We dropped that probably last year.

What makes it a lot is that it's before the filter (dbus_message_is_signal), so it's everything that systemd ever does, regardless of whether we care about it or not. It's a ton of UnitNew() and UnitRemoved() messages, at least. I'm not sure we care about the unfiltered firehose. Here's what happens if you just start up the cluster:

[root@rhel9-scratch-1 ~]# grep "Checking DBus" /var/log/pacemaker/pacemaker.log | wc -l 1459

daemons/execd/execd_commands.c

clumens · 2025-01-22T18:07:09Z

@HideoYamauchi As part of this PR, I was hoping to get rid of all the polling code in action_complete, and in order to do that I need to remove the if block you just recently added that executes a follow-up monitor for pending probes (6fee07d). I'm wondering how you were ending up in a situation where a systemd status probe took any time at all to where a follow-up monitor was useful.

clumens · 2025-01-30T20:55:46Z

The monitor-related portion is starting to take on a life of its own, so I think what we should do is get this PR in first (it's big enough already), and then after that I can work on getting the monitor portion done separately. That way, if that part doesn't work out, I think we've still addressed the actual bug here.

HideoYamauchi · 2025-01-30T23:34:57Z

@HideoYamauchi As part of this PR, I was hoping to get rid of all the polling code in action_complete, and in order to do that I need to remove the if block you just recently added that executes a follow-up monitor for pending probes (6fee07d). I'm wondering how you were ending up in a situation where a systemd status probe took any time at all to where a follow-up monitor was useful.

@clumens

We will check and get back to you.
Please allow us some time to review.

Best Regards,
Hideo Yamauchi.

@clumens

I haven't checked the details of running it on Probe.
Based on Ken's comment below, I've made some modifications so that it can be run on Probe as well.

[WIP] Fix when monitor of systemd resource continues to be pending. #3720 (comment)

Does this answer answer your question?

Best Regards,
Hideo Yamauchi.

nrwahl2 · 2025-01-31T06:42:55Z

The builds failed on BSD due to:

/srv/workspace/maker_pacemaker-pipeline_PR-3805/pacemaker/daemons/execd/pacemaker-execd.c:552:(.text.startup+0x2e2): undefined reference to `services__set_systemd_callback'

It's guarded by #ifdef SUPPORT_SYSTEMD. I think we should be using #if SUPPORT_SYSTEMD instead, as we're doing in services.c.

daemons/execd/execd_commands.c

Most importantly, replace a tab with spaces so the else blocks line up.

If you scroll up far enough, you'll see that the #endif closes out the same #ifdef that is immediately used again.

First, invert the test to see if this is a systemd resource. If it's not, there's no way goagain was set so we can just bail out to the end of the function. Then having done that, everything that was in the block can be unindented. No other code changes.

If goagain is not set, just bail to the end of the function. Having done that, everything that was in the previous block can be unindented. I think this is a little more legible. No other code changes.

Internally, pacemaker uses a shorter resource-based name when referring to systemd actions. However, systemd itself wants a real unit name so we append ".service" or whatever to the name at exec time. Because we do this at the last minute and don't save it anywhere, there's no way to figure out what action a systemd unit is associated with later. So, add that to the opaque data. This also requires an accessor function since the opaque data is supposed to be private.

At the moment, all this does is log a trace message for all recieved signals. We'll add some filtering on this later.

These are the ones that allow us to know whether a service has finished or not. In the future, we could perhaps open this filter up to other systemd signals if there's a need to do so.

This will get called when we receive a systemd message from dbus and it makes it through the filter in filter_systemd_signals. At the moment, this is only the JobRemoved signal which can be used to determine if a start action we initiated has started. This could be expanded to other signals in the future, if that would be useful.

We'll be handling this through talking to systemd over dbus. systemd has its own concept of timeouts, so we don't need to add an additional timer on top of that.

The action_complete function is called when a systemd action finishes. Unfortunately, StartUnit actions finish and return a value immediately, not when the unit we asked it to start is actually up and running. For this reason, action_complete handles start actions by then scheduling a follow-up monitor action to poll the service and see if it's actually up and running yet. It then takes the appropriate action when this happens. Since we're moving towards waiting for systemd to tells us a service is running, we don't need to do any of this anymore. However, stop actions and monitors still need this code. For the moment, it's easiest to just bail out for start actions.

An inflight operation is one that is currently being executed - either by us (in the case of something that we forked and execed), or systemd (in the case of something we told it to start up). Here, we add a function that just get a systemd inflight op that matches a given unit name.

Instead of being private to libcrmservices, this should now just be an internal-only use function. We'll need to do this to call it from elsewhere.

The executor just has to register a callback that gets called when JobRemoved signals are received from systemd. This callback then takes the data it's given and finds a matching svc_action_t. If there's a match, it can then finalize the action and remove it from the table of actions we're watching for. At the moment this only handles start actions, which is not sufficient to close T25, but it's a good start and should be easy enough to build on to handle stop actions and perhaps also monitors. Related T25

This is just like starting resources, but a lot easier because there's almost no extra code. Related T25

The comment should explain this pretty well. What was happening was that an executor regression test was failing because the operation never gets finalized. Essentially, services__finalize_async_op never gets called for a unit that doesn't exist. We typically want this to happen as part of the systemd callback that's registered in the executor, but this callback will never get called in this case because there's no systemd message that will make it through our filter and trigger the callback. JobRemoved won't get sent because there's no job to remove. It seems like we could look for a failure from LoadUnit for units that don't exist, but pacemaker always writes an override unit file which means the unit will always exist.

clumens · 2025-01-31T15:54:04Z

The builds failed on BSD due to:
/srv/workspace/maker_pacemaker-pipeline_PR-3805/pacemaker/daemons/execd/pacemaker-execd.c:552:(.text.startup+0x2e2): undefined reference to `services__set_systemd_callback'
It's guarded by #ifdef SUPPORT_SYSTEMD. I think we should be using #if SUPPORT_SYSTEMD instead, as we're doing in services.c.

Thanks, I was just sitting down to sort these build errors out. Fixed.

clumens · 2025-01-31T15:56:16Z

I haven't checked the details of running it on Probe. Based on Ken's comment below, I've made some modifications so that it can be run on Probe as well.
* [[WIP] Fix when monitor of systemd resource continues to be pending. #3720 (comment)](https://github.com/ClusterLabs/pacemaker/pull/3720#discussion_r1838366250)

@HideoYamauchi Thanks for getting back to me. I'm going to address the monitor/probe stuff in a separate PR. I'll read through that thread in preparation for working on the monitor PR and let you know.

This PR shouldn't have any effects for monitors or probes (aside from the follow-up monitors we were spawning for start/stop actions - those will be removed, but I don't think that will affect anyone).

HideoYamauchi · 2025-02-01T00:25:01Z

@HideoYamauchi Thanks for getting back to me. I'm going to address the monitor/probe stuff in a separate PR. I'll read through that thread in preparation for working on the monitor PR and let you know.

This PR shouldn't have any effects for monitors or probes (aside from the follow-up monitors we were spawning for start/stop actions - those will be removed, but I don't think that will affect anyone).

@clumens

Okay!
I haven't seen the details of this PR yet.
If you need me to review it too, please let me know.

Many thanks,
Hideo Yamauchi.

nrwahl2

For now, I've made it through "Feature: libcrmservices: Add a systemd job complete callback."

nrwahl2 · 2025-02-01T10:32:08Z

daemons/execd/execd_commands.c

@@ -917,8 +917,8 @@ action_complete(svc_action_t * action)
                    }
                }
            }
-        } else if (pcmk__str_any_of(cmd->action, PCMK_ACTION_MONITOR, PCMK_ACTION_STATUS, NULL) && 
-                  (cmd->interval_ms > 0)) { 
+        } else if (pcmk__str_any_of(cmd->action, PCMK_ACTION_MONITOR, PCMK_ACTION_STATUS, NULL) &&


While we're here, we prefer to have && at the beginning of the next line instead of the end of the line.

daemons/execd/execd_commands.c

nrwahl2 · 2025-02-02T05:01:15Z

lib/services/systemd.c

-    free(name);
+
+    if (op != NULL) {
+        op->opaque->unit_name = name;


I think this function can only be called once for a given op, but for future-proofing we could be paranoid and free op->opaque->unit_name first.

lib/services/systemd.c

include/crm/services_internal.h

lib/services/systemd.c

lib/services/services.c

lib/services/systemd.c

nrwahl2 · 2025-02-03T10:14:49Z

lib/services/systemd.c

@@ -1222,8 +1222,11 @@ services__execute_systemd(svc_action_t *op)
                         "Bug in service library");

    if (invoke_unit_by_name(op->agent, op, NULL) == pcmk_rc_ok) {
-        op->opaque->timerid = pcmk__create_timer(op->timeout + 5000,
-                                                 systemd_timeout_callback, op);
+        if (!pcmk__str_eq(op->action, PCMK_ACTION_START, pcmk__str_casei)) {


As much as I like small, atomic commits, I'd rather see this merged into the "Use dbus to detect systemd resource startup". It's not atomic (it belongs with the other changes), and it's small enough that it doesn't need its own commit to avoid confusion. For me it initially caused confusion haha.

systemd has its own concept of timeouts, so we don't need to add an additional timer on top of that.

Can you elaborate on this? I'm having difficulty tracing where else we use op->timeout in a way that it gets actually passed to systemd. I see it getting used for other Pacemaker timers and for dbus calls. Neither of those (in the places that I've checked) are actually sending the timeout to systemd. Rather, they're used internally within Pacemaker and the dbus library.

The code is hard to trace and I could be missing something. However, I want to feel confident that we're not going to lose a start or stop operation timeout that the user has configured.

Looking again at the systemd D-Bus doc, it does look like the job timeout would be determined by the unit configuration, with a global default. I don't see where they even expose a way to set the timeout per-job. Which would be required in order to use our user-configured resource action timeout.

I'll wait to hear what you have to say. This timer still may or may not be necessary. We do use op->timeout in a few other places for internal and D-Bus timers (but not as data passed to systemd).

Looking over this again, I think I assumed we were doing something that we were not. In systemd.c, you'll notice that we create override unit files for any service that pacemaker manages (just search for override). As I was skimming through the code, I saw that timeouts were mentioned where SYSTEMD_OVERRIDE_TEMPLATE is defined. However, now that I look again, I see that it was mentioned in the context of a TODO.

So I guess what I want to do is either (1) reinstate the timeouts and make them work, or (2) take care of the TODO and hopefully remove more timeout-related code. There's... a lot of it.

If you feel the need or inclination to do anything too complex, it might be worth doing another PR first. This one already has a lot going on (even though there isn't that much code), and I'm not even quite finished reviewing it.

That TODO should probably mention stop timeouts too.

I haven't looked at the monitor code, but the way you've described us using a poll to hack the monitor operation probably renders any timeout in the unit configuration irrelevant.

hopefully remove more timeout-related code

Yeah... obviously be careful. Some of it is timeouts for more generic things like D-Bus operations. Our current use of the op->timeout in some of those places may be incorrect or arbitrary in the first place. Really, the op->timeout should start counting down when the operation begins. But IIRC we're using the same timeout value for multiple steps along the way (without subtracting anything, which might be painful to do) -- and then the timeout + 5000 for the timer to check whether the "real" part of the start/stop action timed out.

We might be using it for things like reloading systemd, loading the unit, waiting for a response to enqueuing the start job, etc. (I'm going from memory here.) If we add the configured timeout to a unit override, then the full configured timeout will be available to systemd for the meat of the operation, in addition to using the timeout for all the setup/bookkeeping steps.

It's complicated (again, obviously) and we may have to settle for "good enough."

it might be worth doing another PR first

I see #3816

daemons/execd/execd_commands.c

lib/services/services.c

nrwahl2 · 2025-02-03T11:00:17Z

lib/services/systemd.c

+    if (!pcmk__str_eq(op->action, PCMK_ACTION_START, pcmk__str_casei)) {
+        services__finalize_async_op(op);
+    }
+


I'll give this some more thought tomorrow night. I'm thinking perhaps we can design the callback differently, but I haven't thought through the specifics. Something involving one of the following:

One message filter per unit that we care about, instead of a global message filter. We can use the user_data field where we're currently passing NULL, to give a unit name. D-Bus runs all message filters until it gets a DBUS_HANDLER_RESULT_HANDLED code. We can remove a unit's handler when we're done with it.

Similar, but by using a hash table or list more cleverly within the services library.

nrwahl2 · 2025-02-04T10:41:47Z

lib/services/systemd.c

-    // Determine result and finalize action
+    /* Determine result and finalize action.
+     *
+     * Start and stop actions must be finalized by the registerd systemd callback


s/registerd/registered/

clumens requested a review from nrwahl2 January 21, 2025 14:39

clumens commented Jan 21, 2025

View reviewed changes

daemons/execd/execd_commands.c Show resolved Hide resolved

clumens force-pushed the dbus-systemd branch from 47fcf1b to 9a1d97d Compare January 30, 2025 20:54

clumens marked this pull request as ready for review January 30, 2025 20:54

nrwahl2 reviewed Jan 31, 2025

View reviewed changes

daemons/execd/execd_commands.c Show resolved Hide resolved

clumens added 16 commits January 31, 2025 10:47

Refactor: libcrmservice: systemd_init should return a bool.

aa3e74c

Refactor: daemons: Fix whitespace problems in execd_commands.c.

a497f6a

Most importantly, replace a tab with spaces so the else blocks line up.

Refactor: daemons: Get rid of an unnecessary #endif/#ifdef.

9a11cb8

If you scroll up far enough, you'll see that the #endif closes out the same #ifdef that is immediately used again.

Refactor: daemons: Unindent the goagain block in action_complete.

8985e8b

If goagain is not set, just bail to the end of the function. Having done that, everything that was in the previous block can be unindented. I think this is a little more legible. No other code changes.

Feature: libcrmservice: Subscribe to systemd dbus signals.

31170b9

At the moment, all this does is log a trace message for all recieved signals. We'll add some filtering on this later.

Feature: libcrmservice: Only respond to JobRemoved signals.

396dfad

These are the ones that allow us to know whether a service has finished or not. In the future, we could perhaps open this filter up to other systemd signals if there's a need to do so.

Feature: systemd: Don't initiate a timer for start actions.

1590694

We'll be handling this through talking to systemd over dbus. systemd has its own concept of timeouts, so we don't need to add an additional timer on top of that.

Refactor: libcrmservices: Expose services__finalize_async_op.

8fc6d68

Instead of being private to libcrmservices, this should now just be an internal-only use function. We'll need to do this to call it from elsewhere.

Feature: daemons: Use dbus to detect systemd resources stopping.

1b6c951

This is just like starting resources, but a lot easier because there's almost no extra code. Related T25

clumens force-pushed the dbus-systemd branch from 9a1d97d to c0d99b9 Compare January 31, 2025 17:52

nrwahl2 reviewed Feb 3, 2025

View reviewed changes

clumens added the review: in progress PRs that are currently being reviewed label Feb 3, 2025

nrwahl2 mentioned this pull request Feb 3, 2025

Fix the FileAudit test in CTS for systemd systems #3809

Merged

clumens mentioned this pull request Feb 3, 2025

Prepare for the systemd+dbus PR #3816

Open

nrwahl2 reviewed Feb 4, 2025

View reviewed changes

nrwahl2 mentioned this pull request Feb 5, 2025

WIP: [ALT] Use dbus to tell when systemd has completed an action for us #3818

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use dbus to tell when systemd has completed an action for us #3805

Use dbus to tell when systemd has completed an action for us #3805

clumens commented Jan 21, 2025

clumens Jan 21, 2025

clumens Jan 21, 2025

nrwahl2 Feb 3, 2025

nrwahl2 Feb 4, 2025

nrwahl2 Feb 4, 2025 •

edited

Loading

nrwahl2 Feb 5, 2025

clumens Jan 21, 2025

nrwahl2 Feb 2, 2025

clumens Feb 3, 2025

nrwahl2 Feb 4, 2025

clumens commented Jan 22, 2025

clumens commented Jan 30, 2025

HideoYamauchi commented Jan 30, 2025 •

edited

Loading

nrwahl2 commented Jan 31, 2025

clumens commented Jan 31, 2025

clumens commented Jan 31, 2025

HideoYamauchi commented Feb 1, 2025

nrwahl2 left a comment

nrwahl2 Feb 1, 2025

nrwahl2 Feb 2, 2025

nrwahl2 Feb 3, 2025

nrwahl2 Feb 3, 2025

clumens Feb 3, 2025

nrwahl2 Feb 3, 2025 •

edited

Loading

nrwahl2 Feb 3, 2025

nrwahl2 Feb 3, 2025

nrwahl2 Feb 3, 2025

nrwahl2 Feb 4, 2025

Use dbus to tell when systemd has completed an action for us #3805

Are you sure you want to change the base?

Use dbus to tell when systemd has completed an action for us #3805

Conversation

clumens commented Jan 21, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nrwahl2 Feb 4, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

clumens commented Jan 22, 2025

clumens commented Jan 30, 2025

HideoYamauchi commented Jan 30, 2025 • edited Loading

nrwahl2 commented Jan 31, 2025

clumens commented Jan 31, 2025

clumens commented Jan 31, 2025

HideoYamauchi commented Feb 1, 2025

nrwahl2 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nrwahl2 Feb 3, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nrwahl2 Feb 4, 2025 •

edited

Loading

HideoYamauchi commented Jan 30, 2025 •

edited

Loading

nrwahl2 Feb 3, 2025 •

edited

Loading