Enable saving and restoring subsimulator state #765

kyllingstad · 2024-06-22T13:07:50Z

This is the first step towards closing #756. I've added functions corresponding to FMI 2.0's fmi2{Get,Set,Free}FMUstate() throughout the various layers of subsimulator interfaces and implementations:

cosim::slave and its implementation in cosim::fmi::v2::slave_instance
cosim::simulator and its implementation in cosim::slave_simulator

The API is very similar to the one defined by FMI, except that it represents saved states by numeric indices rather than opaque pointers.

This led me to also remove the slave_state and state_guard stuff that was in slave_simulator.{hpp,cpp}. The overloading of the "state" terminology became confusing, and it seemed like it was a lot of code for very little gain. (It was supposed to be a check of correct API usage, but I can't remember it ever actually catching a bug.)

Note: This is all about saving states in memory, not about converting them to a process-independent format (serialisation, step 2 of #756) or saving to persistent storage (#757).

~~This PR also fixes #762.~~ [Edit: Issue #762 has now been independently fixed (in the exact same way) by PR #766.]

This is the first step towards closing #756. I've added functions corresponding to FMI 2.0's `fmi2{Get,Set,Free}FMUstate()` throughout the various layers of subsimulator interfaces and implementations: * `cosim::slave` and its implementation in `cosim::fmi::v2::slave_instance` * `cosim::simulator` and its implementation in `cosim::slave_simulator` This led me to also remove the `slave_state` and `state_guard` stuff that was in `slave_simulator.{hpp,cpp}`. The overloading of the "state" terminology became confusing, and it seemed like it was a lot of code for very little gain. (It was supposed to be a check of correct API usage, but I can't remember it ever actually catching a bug.) This commit also fixes #762.

It's just a temporary buffer.

By simply not allowing state saving when modifiers are active (which is sketchy anyway), the implementation becomes much simpler.

kyllingstad · 2024-06-27T17:35:59Z

In case someone is in the middle of reviewing this, please note that I just pushed a commit which simplifies the changes to slave_simulator significantly.

In brief, rather than trying to save the entire internal state, which includes cached variable values and modifiers, we leave all state saving to the slave implementations and FMU code. Simulators which have active variable modifiers will simply refuse to save their state. I was never comfortable with my original attempt, because I wasn't sure that modifiers were properly saved. They're just std::function objects, which can point to any callable object, and there is no guarantee that copying one actually makes a deep copy of its entire state.

If it turns out that saving state which includes modifiers is a necessary feature, we can revisit it and make a proper implementation later.

kyllingstad · 2024-07-01T07:16:00Z

I changed the target branch for this from master to the new dev/state-persistence branch now. I am splitting the work on #756 and #757 over several PRs so it can be reviewed in manageable chunks, but I worry that I won't have the full picture of the changes needed before everything is done. Therefore, I'd like to keep it out of master until it's more mature. The dev/state-persistence branch can be merged into master when everything is done and we are happy with it.

restenb · 2024-07-02T14:38:22Z

In brief, rather than trying to save the entire internal state, which includes cached variable values and modifiers, we leave all state saving to the slave implementations and FMU code. Simulators which have active variable modifiers will simply refuse to save their state.

The effect of the variable modifier on the simulation will be seen in the saved states, so in principle this omits information necessary for e.g. fully transparent rewind / playback functionality for a whole simulation where these "user actions" must also be tracked. I guess there's also the question of what exactly can happen if the FMU is restored to a previous state, but our cached values aren't.

The data intended to be saved in a void* fmi2_FMU_state_t is by definition completely unknown by the caller - it's whatever the implementing FMU needs to restore it's state later? In other words there's no guarantee that the data there is suitable for certain uses, like serialization? Does that mean fmi2_capi_serialize_fmu_state should be used for serializable data instead?

I can't really find any information about how these functions are intended to be implemented by the FMU. Should we cooperate on some example FMUs? Is the Dahlquist FMU intended for testing the FMU state API?

Another question I have - say we want to save all state by default. Does this affect the current implementation - for example, do we need to start thinking about keeping a circular buffer of states for a configurable duration, for example? Is this to be done in execution::step, with saving & restoring state acting as a form of manipulator, or directly on each model instance within algorithm::do_step?

kyllingstad · 2024-07-03T07:52:04Z

Many good questions! I'll try to answer, but first, let me clarify something: This PR is not about serialisation at all. I split #765 into two tasks, where this PR addresses only the first one, namely saving the state in the FMU instance's internal memory. (I am almost done with the second task too, namely to enable serialisation of saved states for individual subsimulators. A PR on this is forthcoming. After that, I'll turn to #757, which is about saving, serialising, and persisting the entire simulation state to disk.)

That said, there is a use case for just being able to save states in memory too: It can be used by "re-stepping" algorithms, e.g. algorithms that roll back the last step(s) to a previous state if the error is too large, in order to repeat them with a smaller step size.

The effect of the variable modifier on the simulation will be seen in the saved states, so in principle this omits information necessary for e.g. fully transparent rewind / playback functionality for a whole simulation where these "user actions" must also be tracked.

I don't think the FMI state saving/serisalisation functions were designed for playback. Their goal is to save the precise simulation state at a certain point in time, so you can

go back to that point within the current simulation run, e.g. for error control, or
start another simulation from the exact same point later¹

We don't need information about what has happened in the past for either of these use cases, only the complete state of the system at present.

In other words, it doesn't matter if modifiers have been applied and then disabled before we save the state, nor whether we intend to apply some modifiers after we have restored the state again.

I guess there's also the question of what exactly can happen if the FMU is restored to a previous state, but our cached values aren't.

Yeah, that was a challenging point of this work. I have addressed it by calling set_variables() to transfer all cached values to the FMU instance before saving the state, and by calling get_variables() to repopulate the cache after restoring the state. That way, I basically hand over the responsibility for saving the variable values to the FMU (which a properly FMI-conforming FMU is supposed to handle correctly).

But that would not work as easily if modifiers were involved, so for now, I just want to forbid modifiers at the save point. We can revisit it later with a more sophisticated solution if the need arises, but for now, I think I'd like to gain some experience with the current, limited solution.

The data intended to be saved in a void* fmi2_FMU_state_t is by definition completely unknown by the caller - it's whatever the implementing FMU needs to restore it's state later? In other words there's no guarantee that the data there is suitable for certain uses, like serialization?

From the perspective of the co-simulation master, the fmi2_FMU_state_t pointer is completely opaque. It is just a handle that we use to refer to a state that has been saved internally in the FMU instance.

Does that mean fmi2_capi_serialize_fmu_state should be used for serializable data instead?

"In addition", not "instead". First you save the state to get an fmi2_FMU_state_t handle, then you pass that handle to fmi2_capi_serialize_fmu_state() to get a version of the state which is suitable for storage and later deserialisation. Working on it! :)

I can't really find any information about how these functions are intended to be implemented by the FMU. Should we cooperate on some example FMUs?

The FMI Library functions we use here are just wrappers over FMI functions. For example, fmi2_import_get_fmu_state() corresponds to the FMI 2.0 function fmi2GetFMUstate(), whose semantics are described in the FMI 2.0 spec.

Is the Dahlquist FMU intended for testing the FMU state API?

Exactly. And I'll be using it to test the serialisation API in my next PR.

Another question I have - say we want to save all state by default. Does this affect the current implementation - for example, do we need to start thinking about keeping a circular buffer of states for a configurable duration, for example? Is this to be done in execution::step, with saving & restoring state acting as a form of manipulator, or directly on each model instance within algorithm::do_step?

I'm not sure what the use case would be for saving all state by default, unless you mean for playback, and then I'll reiterate my statement that that's not what this feature is for. Saving the entire state in each time step would be enormously costly.

I haven't gotten to the point where I'm dealing with the full system and simulation yet, but here are my current ideas:

Sophisticated algorithm implementations can use the simulator save/restore state API for restepping, e.g. for error estimation and step size control.
All algorithm implementations will have to support serialisation, which will consist of saving and serialising their own internal state, plus forwarding the results of the individual subsimulator save/serialise operations.
execution will gain some functions which can be called to export and import serialised versions of the simulation state.
We'll add new functions for saving/loading exported, serialised states to/from disk files.

This is the use case we have in OptiStress. There, we want to run a large number of simulations from the same starting point, e.g. in an optimisation loop. But for the sake of performance, we'd like to avoid repeating the "warm-up period" before the system reaches the steady state that we'll then perturb. ↩

kyllingstad · 2024-07-03T07:54:14Z

I don't think the FMI state saving/serisalisation functions were designed for playback.

In fact, I don't think they can even conceivably be used for playback, because the internal state of each FMU instance is just exported as a binary blob, and in general you don't know anything about the format of its contents.

restenb · 2024-07-04T10:52:32Z

include/cosim/algorithm/simulator.hpp

@@ -133,6 +133,55 @@ class simulator : public manipulable
    virtual step_result do_step(
        time_point currentT,
        duration deltaT) = 0;
+
+    /// A type used for references to saved states (see `save_state()`).
+    using state_index = int;


If a step is ongoing when save_state is called, should the FMU complete the step before returning it's state? Should we also save a time_point, or possibly use time_point instead of int as index, to be able to immediately tie a saved_state to the step it was saved from? I see this is implemented in the state struct mock_slave, so on the other hand the caller of this API is free to handle simulator time as they see fit.

The FMI specification forbids calling the state saving/restoring functions when a step is ongoing (see state machine in sec. 4.2.4 of FMI v. 2.0.4), so I don't think this is an issue. We could consider checking it, but we don't support async stepping elsewhere in libcosim yet, so this can probably wait until we do.

The slave_simulator class does not have the current time point as part of its internal state, so it doesn't need to save it or associate it with the lower-level FMU state either. The FMU might do so itself (though we wouldn't know about it). I'm quite certain that this is something we'll have to do when we get to saving the full co-simulation state in fixed_step_algorithm, though.

This is a follow-up to #765 and the final step to close #757. Here, I've implemented functionality to export the internal state of individual subsimulators in a generic, structured form, and to import them again later. This exported form is intended as an intermediate step before serialisation and disk storage. The idea was to create a type that can be inspected and serialised to almost any file format we'd like. The type is defined by `cosim::serialization::node` in `cosim/serialization.hpp`. It is a hierarchical, dynamic data type with support for a variety of primitive scalar types and a few aggregate types: strings, arrays of nodes, dictionaries of nodes, and binary blobs. (Think JSON, only with more types.)

This is a follow-up to #765 and the second and final step to close #756. Here, I've implemented functionality to export the internal state of individual subsimulators in a generic, structured form, and to import them again later. This exported form is intended as an intermediate step before serialisation and disk storage. The idea was to create a type that can be inspected and serialised to almost any file format we'd like. The type is defined by `cosim::serialization::node` in `cosim/serialization.hpp`. It is a hierarchical, dynamic data type with support for a variety of primitive scalar types and a few aggregate types: strings, arrays of nodes, dictionaries of nodes, and binary blobs. (Think JSON, only with more types.)

This is a follow-up to #765 and the second and final step to close #756. Here, I've implemented functionality to export the internal state of individual subsimulators in a generic, structured form, and to import them again later. This exported form is intended as an intermediate step before serialisation and disk storage. The idea was to create a type that can be inspected and serialised to almost any file format we'd like. The type is defined by `cosim::serialization::node` in `cosim/serialization.hpp`. It is a hierarchical, dynamic data type with support for a variety of primitive scalar types and a few aggregate types: strings, arrays of nodes, dictionaries of nodes, and binary blobs. (Think JSON, only with more types.) It is based on Boost.PropertyTree

kyllingstad added the enhancement New feature or request label Jun 22, 2024

kyllingstad requested review from joakimono, restenb and davidhjp01 June 22, 2024 13:07

kyllingstad self-assigned this Jun 22, 2024

kyllingstad linked an issue Jun 22, 2024 that may be closed by this pull request

Initial subsimulator variable values are not available after setup() #762

Closed

This was referenced Jun 22, 2024

Initial subsimulator variable values are not available after setup() #762

Closed

Transfer variables after initialisation #766

Merged

Base automatically changed from bugfix/763-clean-up-fmus to master June 24, 2024 09:14

kyllingstad force-pushed the feature/756-state-saving branch from 3551162 to bc87812 Compare June 24, 2024 09:17

kyllingstad added 2 commits June 27, 2024 13:38

Don't save slave_simulator::impl::variableValues

7d363f4

It's just a temporary buffer.

Simplify state saving in slave_simulator

b1bb170

By simply not allowing state saving when modifiers are active (which is sketchy anyway), the implementation becomes much simpler.

Missed a change in proxyfmu in last commit

b4eae50

kyllingstad changed the base branch from master to dev/state-persistence July 1, 2024 07:11

restenb reviewed Jul 4, 2024

View reviewed changes

restenb approved these changes Jul 4, 2024

View reviewed changes

kyllingstad merged commit 981236a into dev/state-persistence Jul 4, 2024
20 checks passed

kyllingstad deleted the feature/756-state-saving branch July 4, 2024 13:01

kyllingstad mentioned this pull request Jul 4, 2024

Enable exporting and importing subsimulator state #769

Merged

kyllingstad mentioned this pull request Oct 17, 2024

Add interfaces to FMI functions for saving/serialising/deserialising/restoring FMU state #756

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable saving and restoring subsimulator state #765

Enable saving and restoring subsimulator state #765

kyllingstad commented Jun 22, 2024 •

edited

Loading

kyllingstad commented Jun 27, 2024

kyllingstad commented Jul 1, 2024

restenb commented Jul 2, 2024 •

edited

Loading

kyllingstad commented Jul 3, 2024 •

edited

Loading

kyllingstad commented Jul 3, 2024

restenb Jul 4, 2024 •

edited

Loading

kyllingstad Jul 4, 2024

Enable saving and restoring subsimulator state #765

Enable saving and restoring subsimulator state #765

Conversation

kyllingstad commented Jun 22, 2024 • edited Loading

kyllingstad commented Jun 27, 2024

kyllingstad commented Jul 1, 2024

restenb commented Jul 2, 2024 • edited Loading

kyllingstad commented Jul 3, 2024 • edited Loading

Footnotes

kyllingstad commented Jul 3, 2024

restenb Jul 4, 2024 • edited Loading

Choose a reason for hiding this comment

kyllingstad Jul 4, 2024

Choose a reason for hiding this comment

kyllingstad commented Jun 22, 2024 •

edited

Loading

restenb commented Jul 2, 2024 •

edited

Loading

kyllingstad commented Jul 3, 2024 •

edited

Loading

restenb Jul 4, 2024 •

edited

Loading