[prototype] automatic DB migation on nodeos upgrades #394
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Background
This is a prototype for automatic DB migration when the DB is not compatible with the current version of nodeos (such as when chainbase's environment check fails, or chainbase's internal structures have changed). Currently under such conditions users are required to go through a manual upgrade process that includes manually removing files from nodeos' data directories and loading a snapshot (which some users may prefer to only source themselves for security reasons).
The basic premise of this approach is that when a DB is created or loaded successfully (in other words, at the end of nodeos startup), a shared library is dropped alongside the DB1. This shared library is built with the same compiler and chainbase library as the currently running nodeos. The shared library exposes one public function: create a portable snapshot. It's nice to lean on using portable snapshots to upgrade since that's already a very well defined structure where new nodeos remains backward compatible with even OG snapshots.
When nodeos is started and it finds it needs to migrate the DB (because it's in an older format, or the environment check has failed), it loads the shared library alongside the existing DB and executes the function to create a snapshot. This function is quite simple (see new libsnapshotter), but an important aspect is that before creating the snapshot it undoes to LIB state. Rolling back to LIB is required because portable snapshots don't convey any undo state; portable snapshots are generally expected to represent an irreversible block -- something like the
create_snapshot
RPC endpoint will wait for it to become irreversible before making it available for use.Once the snapshot has been created, the library is unloaded and the new nodeos' controller is started as-if it was commanded to start from a snapshot. That includes automatic removal of the previous state files (similar to how it would be done manually).
Testing
The easiest way to play with it is to grab a build with gcc and a build with clang (such as the nonpinned build and pinned build) and switch between them. Today that would not work due to the environment check, but here it seamlessly works (with the delay for creating & restoring the snapshot)
Notable Changes
(besides the path changes discussed elsewhere) the other big change stem from me not wanting to use
controller
. Creating and destroying a controller just has a lot of side effects and potential failures modes I'd rather avoid. So some member functions incontroller
(andresource_limits_manager
&authorization_manager
) are refactored out asstatic
. This actually ended up not too bad. There would still need to be some cleanup done to locations, naming, and some public/private/const aspects of course, but the changes don't seem to grossly violate anything.Problems
Whoops it isn't building in CI due to a dependency failure in ninja; will look in to a good solution.
The current shared library approach has an outstanding problem that nodeos will SEGV at the end of its execution (after main) when performing an upgrade. Something in LLVM is unhappy about either being placed in a shared library or how the library is loaded in a linker namespace. Since libsnapshotter doesn't use
controller
, ideailly it wouldn't even be linking with LLVM etc. But that's too much refactoring to do withlibchain
I think. Ultimately, for now, this will probably need to be a changed to be a separate process so that there is no library funny business.Footnotes
When I first proposed this approach I had in my mind that only the chainbase DB was required to generate a snapshot. Thus, I proposed that the shared library be embedded in to the chainbase DB. But of course, we need a wee bit of forkdb to generate a snapshot. So right now this current implementation elects to just leave the shared library in the state directory. ↩