Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds migrations to restore currupted staking ledgers in Polkadot and Kusama #447

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

gpestana
Copy link
Contributor

@gpestana gpestana commented Aug 26, 2024

Note: for more details on the corrupted ledgers issue and recovery steps check https://hackmd.io/m_h9DRutSZaUqCwM9tqZ3g?view.

This PR adds a migration in Polkadot and Kusama runtimes to recover the current corrupted ledgers in Polkadot and Kusama. A migration consists of:

  1. Call into pallet_staking::Pallet::<T>::restore_ledger for each of the "whitelisted" stashes as Root origin.
  2. Performs a check that ensures the restored ledger's stake does not overflow the current stash's free balance. If that's the case, force unstake the ledger. This check is currently missing in polkadot-sdk/pallet-staking (PR with patch here).

The reason to restore the corrupted ledgers as migrations implemented in the fellowship runtimes is twofold:

  1. The call to pallet_staking::Pallet::<T>::restore_ledger and check + force_unstake must be done atomically (thus a ledger can't be safely restored with a set of two distinct extrinsic calls, so it's not possible to use referenda to this fx).
  2. To speed up the whole process and avoid having to wait for 1. merge and releases of Patches Call::Staking.restore_ledger to ensure a restored ledger has enough free balance to cover staking locks paritytech/polkadot-sdk#5066 and 2. referenda to call into Call::restore_ledger for both Polkadot and Kusama.

Alternatively, we could add a new temporary pallet directly in the fellowship runtime which would expose an extrinsic to restore the ledgers and perform the extra missing check. See this PR as an example.


  • on-runtime-upgrade tests against Polkadot and Kusama
  • staking try-state checks passing after all migrations.

@bkchr
Copy link
Contributor

bkchr commented Aug 27, 2024

This could just be an on chain referendum that calls the functions?

@gpestana
Copy link
Contributor Author

This could just be an on chain referendum that calls the functions?

We only can tell if a ledger needs force unstake at the time of recovery. We need to do the restore, check and potentially unstake atomically. In the time between the referenda are open and executed, new stashes can get in the situation that need to be force unstaked.

@kianenigma
Copy link
Contributor

@bkchr'ss comment still holds: the flaw you are pointing out, the ledgers changing between the time of proposal and enactment, also applies to the hardcoded list of ledgers here.

From my understanding in previous talks, the goal of bringing this code here was that the original pallet_staking::Pallet::<T>::restore_ledger had some flaws, and then I suggested you to copy-paste the code here in a "temporary" pallet.

@gpestana
Copy link
Contributor Author

gpestana commented Aug 27, 2024

@bkchr'ss comment still holds: the flaw you are pointing out, the ledgers changing between the time of proposal and enactment, also applies to the hardcoded list of ledgers here.

So there's two things at play here:

  1. The hardcoded list of accounts are fixed and should not change (since we fixed the runtime code so that there's no new corrupted ledgers).
  2. From the set of hardcoded (and corrupted ledgers), some may need to be unstaked. At this point, we know that one account needs to be unstaked in Polkadot, but it may change in the future (if the stash changes its free balance so that free_balance < restored stake of the ledger).

@kianenigma
Copy link
Contributor

I see, you are right!

I got a bit confused because I didn't look at the code and just checked the comments here.

Then I would rephrase the asnwer to @bkchr as: The logic here is a bit more than just what is available in a pallet-staking Call, therefore it has be a migration, or a temp pallet etc.

Indeed a migration seems cleaner than my original suggestion of a temp pallet, good choice!

@bkchr
Copy link
Contributor

bkchr commented Aug 27, 2024

Then I would rephrase the asnwer to @bkchr as: The logic here is a bit more than just what is available in a pallet-staking Call, therefore it has be a migration, or a temp pallet etc.

Bkchr says that he didn't know that the calls need to happen atomically. Then it makes sense to have them as a migration.

@mrshiposha
Copy link
Contributor

Hi @gpestana.

Yesterday, we tried to stake DOT via a multisig account using batchAll([staking.bond, staking.nominate]).
We got staking.BadState. I thought it was some issue with providers/consumers, but they seem OK on our account (providers = 1).

Do you think this issue is related to the one this PR fixes, or is it a separate one?

Note: our account isn't part of the list from this PR.

@gpestana
Copy link
Contributor Author

gpestana commented Sep 5, 2024

@mrshiposha a staking.BadState error is related to this PR yes. Do you want to reach out to me on Matrix so we can check the state of your account together? My handle is @gpestana:parity.io

Copy link
Contributor

@kianenigma kianenigma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM except adding the migration sturcts to the migration tuple + reporting on a dry-run thereof.

@gpestana
Copy link
Contributor Author

gpestana commented Sep 9, 2024

Just for the record, the issue that @mrshiposha has raised is not related to corrupted ledgers. The BadState error was raised due to trying to bond with a unfunded account as stash.

related to: paritytech/polkadot-sdk#5627

Comment on lines +1867 to +1868
T::AccountId::decode(&mut &Into::<[u8; 32]>::into(stash.clone())[..])
.expect("32 bytes fits, qed.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another way to build AccountId
AccountId::from_str("ESGsxFePah1qb96ooTU4QJMxMKUG7NZvgTig3eJxP9f3wLa").unwrap();

// force unstake the ledger.
if ledger.total > T::Currency::free_balance(&stash_account) {
let slashing_spans = 10; // default slashing spans for migration.
let _ = pallet_staking::Pallet::<T>::force_unstake(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be better to force unstake only ledger.total - free_balance? Or the rationale is these ledgers are the overwritten ones and should be removed?

total_weight += weight;
}

log::info!(target: LOG_TARGET, "migrations::corrupted_ledgers: done");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: useful generally to print a summary such as success: x, errored: y

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants