Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable saving and restoring subsimulator state #765

Merged
merged 4 commits into from
Jul 4, 2024

Conversation

kyllingstad
Copy link
Member

@kyllingstad kyllingstad commented Jun 22, 2024

This is the first step towards closing #756. I've added functions corresponding to FMI 2.0's fmi2{Get,Set,Free}FMUstate() throughout the various layers of subsimulator interfaces and implementations:

  • cosim::slave and its implementation in cosim::fmi::v2::slave_instance
  • cosim::simulator and its implementation in cosim::slave_simulator

The API is very similar to the one defined by FMI, except that it represents saved states by numeric indices rather than opaque pointers.

This led me to also remove the slave_state and state_guard stuff that was in slave_simulator.{hpp,cpp}. The overloading of the "state" terminology became confusing, and it seemed like it was a lot of code for very little gain. (It was supposed to be a check of correct API usage, but I can't remember it ever actually catching a bug.)

Note: This is all about saving states in memory, not about converting them to a process-independent format (serialisation, step 2 of #756) or saving to persistent storage (#757).

This PR also fixes #762. [Edit: Issue #762 has now been independently fixed (in the exact same way) by PR #766.]

@kyllingstad kyllingstad added the enhancement New feature or request label Jun 22, 2024
@kyllingstad kyllingstad self-assigned this Jun 22, 2024
Base automatically changed from bugfix/763-clean-up-fmus to master June 24, 2024 09:14
This is the first step towards closing #756.  I've added functions
corresponding to FMI 2.0's `fmi2{Get,Set,Free}FMUstate()` throughout the
various layers of subsimulator interfaces and implementations:

* `cosim::slave` and its implementation in
   `cosim::fmi::v2::slave_instance`
* `cosim::simulator` and its implementation in `cosim::slave_simulator`

This led me to also remove the `slave_state` and `state_guard` stuff
that was in `slave_simulator.{hpp,cpp}`. The overloading of the "state"
terminology became confusing, and it seemed like it was a lot of code
for very little gain. (It was supposed to be a check of correct API
usage, but I can't remember it ever actually catching a bug.)

This commit also fixes #762.
@kyllingstad kyllingstad force-pushed the feature/756-state-saving branch from 3551162 to bc87812 Compare June 24, 2024 09:17
By simply not allowing state saving when modifiers are active (which is
sketchy anyway), the implementation becomes much simpler.
@kyllingstad
Copy link
Member Author

In case someone is in the middle of reviewing this, please note that I just pushed a commit which simplifies the changes to slave_simulator significantly.

In brief, rather than trying to save the entire internal state, which includes cached variable values and modifiers, we leave all state saving to the slave implementations and FMU code. Simulators which have active variable modifiers will simply refuse to save their state. I was never comfortable with my original attempt, because I wasn't sure that modifiers were properly saved. They're just std::function objects, which can point to any callable object, and there is no guarantee that copying one actually makes a deep copy of its entire state.

If it turns out that saving state which includes modifiers is a necessary feature, we can revisit it and make a proper implementation later.

@kyllingstad kyllingstad changed the base branch from master to dev/state-persistence July 1, 2024 07:11
@kyllingstad
Copy link
Member Author

I changed the target branch for this from master to the new dev/state-persistence branch now. I am splitting the work on #756 and #757 over several PRs so it can be reviewed in manageable chunks, but I worry that I won't have the full picture of the changes needed before everything is done. Therefore, I'd like to keep it out of master until it's more mature. The dev/state-persistence branch can be merged into master when everything is done and we are happy with it.

@restenb
Copy link
Member

restenb commented Jul 2, 2024

In brief, rather than trying to save the entire internal state, which includes cached variable values and modifiers, we leave all state saving to the slave implementations and FMU code. Simulators which have active variable modifiers will simply refuse to save their state.

The effect of the variable modifier on the simulation will be seen in the saved states, so in principle this omits information necessary for e.g. fully transparent rewind / playback functionality for a whole simulation where these "user actions" must also be tracked. I guess there's also the question of what exactly can happen if the FMU is restored to a previous state, but our cached values aren't.

The data intended to be saved in a void* fmi2_FMU_state_t is by definition completely unknown by the caller - it's whatever the implementing FMU needs to restore it's state later? In other words there's no guarantee that the data there is suitable for certain uses, like serialization? Does that mean fmi2_capi_serialize_fmu_state should be used for serializable data instead?

I can't really find any information about how these functions are intended to be implemented by the FMU. Should we cooperate on some example FMUs? Is the Dahlquist FMU intended for testing the FMU state API?

Another question I have - say we want to save all state by default. Does this affect the current implementation - for example, do we need to start thinking about keeping a circular buffer of states for a configurable duration, for example? Is this to be done in execution::step, with saving & restoring state acting as a form of manipulator, or directly on each model instance within algorithm::do_step?

@kyllingstad
Copy link
Member Author

kyllingstad commented Jul 3, 2024

Many good questions! I'll try to answer, but first, let me clarify something: This PR is not about serialisation at all. I split #765 into two tasks, where this PR addresses only the first one, namely saving the state in the FMU instance's internal memory. (I am almost done with the second task too, namely to enable serialisation of saved states for individual subsimulators. A PR on this is forthcoming. After that, I'll turn to #757, which is about saving, serialising, and persisting the entire simulation state to disk.)

That said, there is a use case for just being able to save states in memory too: It can be used by "re-stepping" algorithms, e.g. algorithms that roll back the last step(s) to a previous state if the error is too large, in order to repeat them with a smaller step size.

The effect of the variable modifier on the simulation will be seen in the saved states, so in principle this omits information necessary for e.g. fully transparent rewind / playback functionality for a whole simulation where these "user actions" must also be tracked.

I don't think the FMI state saving/serisalisation functions were designed for playback. Their goal is to save the precise simulation state at a certain point in time, so you can

  1. go back to that point within the current simulation run, e.g. for error control, or
  2. start another simulation from the exact same point later1

We don't need information about what has happened in the past for either of these use cases, only the complete state of the system at present.

In other words, it doesn't matter if modifiers have been applied and then disabled before we save the state, nor whether we intend to apply some modifiers after we have restored the state again.

I guess there's also the question of what exactly can happen if the FMU is restored to a previous state, but our cached values aren't.

Yeah, that was a challenging point of this work. I have addressed it by calling set_variables() to transfer all cached values to the FMU instance before saving the state, and by calling get_variables() to repopulate the cache after restoring the state. That way, I basically hand over the responsibility for saving the variable values to the FMU (which a properly FMI-conforming FMU is supposed to handle correctly).

But that would not work as easily if modifiers were involved, so for now, I just want to forbid modifiers at the save point. We can revisit it later with a more sophisticated solution if the need arises, but for now, I think I'd like to gain some experience with the current, limited solution.

The data intended to be saved in a void* fmi2_FMU_state_t is by definition completely unknown by the caller - it's whatever the implementing FMU needs to restore it's state later? In other words there's no guarantee that the data there is suitable for certain uses, like serialization?

From the perspective of the co-simulation master, the fmi2_FMU_state_t pointer is completely opaque. It is just a handle that we use to refer to a state that has been saved internally in the FMU instance.

Does that mean fmi2_capi_serialize_fmu_state should be used for serializable data instead?

"In addition", not "instead". First you save the state to get an fmi2_FMU_state_t handle, then you pass that handle to fmi2_capi_serialize_fmu_state() to get a version of the state which is suitable for storage and later deserialisation. Working on it! :)

I can't really find any information about how these functions are intended to be implemented by the FMU. Should we cooperate on some example FMUs?

The FMI Library functions we use here are just wrappers over FMI functions. For example, fmi2_import_get_fmu_state() corresponds to the FMI 2.0 function fmi2GetFMUstate(), whose semantics are described in the FMI 2.0 spec.

Is the Dahlquist FMU intended for testing the FMU state API?

Exactly. And I'll be using it to test the serialisation API in my next PR.

Another question I have - say we want to save all state by default. Does this affect the current implementation - for example, do we need to start thinking about keeping a circular buffer of states for a configurable duration, for example? Is this to be done in execution::step, with saving & restoring state acting as a form of manipulator, or directly on each model instance within algorithm::do_step?

I'm not sure what the use case would be for saving all state by default, unless you mean for playback, and then I'll reiterate my statement that that's not what this feature is for. Saving the entire state in each time step would be enormously costly.

I haven't gotten to the point where I'm dealing with the full system and simulation yet, but here are my current ideas:

  • Sophisticated algorithm implementations can use the simulator save/restore state API for restepping, e.g. for error estimation and step size control.
  • All algorithm implementations will have to support serialisation, which will consist of saving and serialising their own internal state, plus forwarding the results of the individual subsimulator save/serialise operations.
  • execution will gain some functions which can be called to export and import serialised versions of the simulation state.
  • We'll add new functions for saving/loading exported, serialised states to/from disk files.

Footnotes

  1. This is the use case we have in OptiStress. There, we want to run a large number of simulations from the same starting point, e.g. in an optimisation loop. But for the sake of performance, we'd like to avoid repeating the "warm-up period" before the system reaches the steady state that we'll then perturb.

@kyllingstad
Copy link
Member Author

I don't think the FMI state saving/serisalisation functions were designed for playback.

In fact, I don't think they can even conceivably be used for playback, because the internal state of each FMU instance is just exported as a binary blob, and in general you don't know anything about the format of its contents.

@@ -133,6 +133,55 @@ class simulator : public manipulable
virtual step_result do_step(
time_point currentT,
duration deltaT) = 0;

/// A type used for references to saved states (see `save_state()`).
using state_index = int;
Copy link
Member

@restenb restenb Jul 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a step is ongoing when save_state is called, should the FMU complete the step before returning it's state? Should we also save a time_point, or possibly use time_point instead of int as index, to be able to immediately tie a saved_state to the step it was saved from? I see this is implemented in the state struct mock_slave, so on the other hand the caller of this API is free to handle simulator time as they see fit.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The FMI specification forbids calling the state saving/restoring functions when a step is ongoing (see state machine in sec. 4.2.4 of FMI v. 2.0.4), so I don't think this is an issue. We could consider checking it, but we don't support async stepping elsewhere in libcosim yet, so this can probably wait until we do.

The slave_simulator class does not have the current time point as part of its internal state, so it doesn't need to save it or associate it with the lower-level FMU state either. The FMU might do so itself (though we wouldn't know about it). I'm quite certain that this is something we'll have to do when we get to saving the full co-simulation state in fixed_step_algorithm, though.

@kyllingstad kyllingstad merged commit 981236a into dev/state-persistence Jul 4, 2024
20 checks passed
@kyllingstad kyllingstad deleted the feature/756-state-saving branch July 4, 2024 13:01
kyllingstad added a commit that referenced this pull request Jul 4, 2024
This is a follow-up to #765 and the final step to close #757.

Here, I've implemented functionality to export the internal state of
individual subsimulators in a generic, structured form, and to import
them again later.

This exported form is intended as an intermediate step before
serialisation and disk storage.  The idea was to create a type that can
be inspected and serialised to almost any file format we'd like.

The type is defined by `cosim::serialization::node` in
`cosim/serialization.hpp`.  It is a hierarchical, dynamic data type with
support for a variety of primitive scalar types and a few aggregate
types: strings, arrays of nodes, dictionaries of nodes, and binary
blobs. (Think JSON, only with more types.)
kyllingstad added a commit that referenced this pull request Jul 4, 2024
This is a follow-up to #765 and the second and final step to close #756.

Here, I've implemented functionality to export the internal state of
individual subsimulators in a generic, structured form, and to import
them again later.

This exported form is intended as an intermediate step before
serialisation and disk storage.  The idea was to create a type that can
be inspected and serialised to almost any file format we'd like.

The type is defined by `cosim::serialization::node` in
`cosim/serialization.hpp`.  It is a hierarchical, dynamic data type with
support for a variety of primitive scalar types and a few aggregate
types: strings, arrays of nodes, dictionaries of nodes, and binary
blobs. (Think JSON, only with more types.)
kyllingstad added a commit that referenced this pull request Oct 10, 2024
This is a follow-up to #765 and the second and final step to close #756.

Here, I've implemented functionality to export the internal state of
individual subsimulators in a generic, structured form, and to import
them again later.

This exported form is intended as an intermediate step before
serialisation and disk storage.  The idea was to create a type that can
be inspected and serialised to almost any file format we'd like.

The type is defined by `cosim::serialization::node` in
`cosim/serialization.hpp`.  It is a hierarchical, dynamic data type with
support for a variety of primitive scalar types and a few aggregate
types: strings, arrays of nodes, dictionaries of nodes, and binary
blobs. (Think JSON, only with more types.) It is based on
Boost.PropertyTree
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Initial subsimulator variable values are not available after setup()
2 participants