Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

trivial extrinsic stalled the parachain #84

Closed
brenzi opened this issue Mar 21, 2022 · 18 comments · May be fixed by #126
Closed

trivial extrinsic stalled the parachain #84

brenzi opened this issue Mar 21, 2022 · 18 comments · May be fixed by #126
Assignees

Comments

@brenzi
Copy link
Collaborator

brenzi commented Mar 21, 2022

sending a simple sudo.sudo(balances.setBalance()) extrinsic on a new account bricked our parachain (again)

The tx has not yet been included in a finalized block, so it seemns this extrinsic somehow breaks our PoV

2: author.pendingExtrinsics
[
  {
    isSigned: true
    method: {
      args: {
        call: {
          args: {
            who: {
              Id: 2LNz3ia9jQsgpCdRVu4MUhrar6LrVue6aCYnCebEhoegJeW2
            }
            new_free: 1,000,000,000,000
            new_reserved: 0
          }
          method: setBalance
          section: balances
        }
      }
      method: sudo
      section: sudo
    }
    era: {
      MortalEra: {
        period: 32
        phase: 9
      }
    }
    nonce: 0
    signature: 0x0ae814e0fd738dceae36e912a6a3341da1f448e34f4a0c9280ecc5be73c633311c2e50aab790f14a49d29ea75c8c625fb272e6ec98978edf173b9fbbb0b69488
    signer: {
      Id: 2M18kekfuKffp858CFeguSN6VnvfWrmmC4hxj6P4AWdpR8CW
    }
    tip: 0
  }
]

1: author.submitAndWatchExtrinsic
{
  events: []
  status: {
    Broadcast: [
      12D3KooWT2dtwAeFxkPa7csfvYZJgUvVxYTZHWE5fFb4fxcz39Gp
      12D3KooWS7mrujYYeqJXz2u3p2ZPK3Q1zvGFYCaQoV5euZbr6iug
      12D3KooWBR4UmiGadvQ77m6iwWuoEUwgRkFMMhzhecXn2Hc9QUhC
      12D3KooW9yxCm9MoTTtmH1cEQHwnTKhv3bCoiFwbRhRJU7sae4Yd
      12D3KooWSoYm9LrECusAcSJp7W566TPkHk5guwVCkvTLZbC8dKpu
    ]
  }
}
@mosonyi
Copy link
Collaborator

mosonyi commented Mar 21, 2022

I don't know if it is related but I observed this in the logs:
Error: Service(Client(Backend("Io error: Permission denied (os error 13)")))

@brenzi
Copy link
Collaborator Author

brenzi commented Mar 21, 2022

we could purge the pending extrinsic with rpc author_removeExtrinsic, but I don't know yet how to get the extrinsic hash.

ah. js/apps is so kind to tell us the hash if you go to Network -> Node Info -> Pending extrinsics.

execute this on all collators, banning the evil extrinsic:

curl -H "Content-Type: application/json" -d '{"id":26,"jsonrpc":"2.0","method":"author_removeExtrinsic","params":[[{"hash":"0xfe7ed1221547feac28b4c8afef1916d39a73dce195ab1d315dbc9ce53300c404"}]]}' http://localhost:9933/

@brenzi brenzi mentioned this issue Mar 21, 2022
8 tasks
@brenzi
Copy link
Collaborator Author

brenzi commented Mar 21, 2022

Tested a simple balance transfer which should be filtered anyway because of our extrinsic filter in the shell runtime. stalls the parachain as well

5: author.pendingExtrinsics
[
  {
    isSigned: true
    method: {
      args: {
        dest: {
          Id: 2LNz3ia9jQsgpCdRVu4MUhrar6LrVue6aCYnCebEhoegJeW2
        }
        value: 1,000,000,000,000
      }
      method: transferKeepAlive
      section: balances
    }
    era: {
      MortalEra: {
        period: 32
        phase: 26
      }
    }
    nonce: 0
    signature: 0x00a1b29a38a3c86a56c5aa2bbfb3340e7f7d8391548d19cd6a005130518d2c034391774da3b263aa3b7dda04e28d290adee6b90e0305eadd13e2203372343885
    signer: {
      Id: 2M18kekfuKffp858CFeguSN6VnvfWrmmC4hxj6P4AWdpR8CW
    }
    tip: 0
  }
]

hash: 0xdfbb8bff0d46a56f2ab4c766141833914111053e72576e6b1e9fed84f90f4ade

@mosonyi
Copy link
Collaborator

mosonyi commented Mar 21, 2022

I don't know if it is related but I observed this in the logs:
Error: Service(Client(Backend("Io error: Permission denied (os error 13)")))

Forget this, coming from somewhere else, already fixed.

@mosonyi
Copy link
Collaborator

mosonyi commented Mar 21, 2022

@brenzi I did again the #77 unbricking and tried a transfer -> filtered -> ok
tried the sudo call-> Result OK

@brenzi
Copy link
Collaborator Author

brenzi commented Mar 21, 2022

after discussing the setup: we may need to repeat because the SetCurrentHead set the head back to the same genesis, not to a new one, which isn't exactly the same setup.

edit: the above is not true. still we played it through with the new binary 1.5.13 but really can't reproduce #77 (comment)

@brenzi
Copy link
Collaborator Author

brenzi commented Mar 24, 2022

I believe I found a problem:

The Kusama relay chain tells me for paraid 2015:

paras.currentCodeHash: Option<H256>
0x30317213c02289ca9445f1781daeb29ff3d015e059fbe3b9a582c997ad1afbeb

which is the hash of our runtime V2

Our collator thinks we're on runtime V3

queried on our api node wss://kusama.api.integritee.network

const system.version: SpVersionRuntimeVersion
...
specVersion: 3

which would hash to

0x9529f0cfed07681fac22e397eea13952660a351a15fdfaf4a21397ba3094d5c4

I was actually wondering why our runtime jumped to V3 just because we changed forceSetCurrentHead

It could be related to this error message in the collator log:

Mar 24 10:10:45 iku-node-04 integritee-collator[3366956]: 2022-03-24 10:10:45 [Relaychain] cannot query the runtime API version: UnknownBlock: State already discarded for BlockId::Hash(0x08d0cd920c2964da2ea6bc1150b043e75902a3a4c919f9e13f68bebb4e3e4c79)

the above refers to a relaychain block hash:
https://github.com/paritytech/polkadot/blob/00e7262eb4c4b664d849c0688dc9016d8369da7b/node/core/runtime-api/src/lib.rs#L373-L379

@brenzi
Copy link
Collaborator Author

brenzi commented Mar 24, 2022

I was able to reproduce our Kusama parachian issue on rococo-local. Follow this precisely:

using release binaries:

save binaries in ~/bin/ using version suffix, see below
Also, build polkadot-0.9.17 binary

runtimes (just for reference. no need to download for this test):

  • V2 with hash: 0x30317213c02289ca9445f1781daeb29ff3d015e059fbe3b9a582c997ad1afbeb
  • V3 with hash 0x9529f0cfed07681fac22e397eea13952660a351a15fdfaf4a21397ba3094d5c4

polkadot-launch script

procedure

  1. node ../polkadot-launch/dist/cli.js polkadot-launch/launch-rococo-local-with-shell-rococo-1.4.12.json

    • collator should produce blocks after 2-3 minutes and runtime version should be V2
  2. brick chain: sudo.sudo(balances.setBalance(Alice, 0))

    • sudo account should be reaped. check low level events for system.KilledAccount in the block including the sudo call
  3. prepare fix genesis

~/bin/integritee-collator-1.5.13 build-spec --chain=shell-rococo-local-dev > new-head-spec.json

now edit the spec and set

  "para_id": 2015,
...
      "parachainInfo": {
        "parachainId": 2015
      },
~/bin/integritee-collator-1.5.13 build-spec --chain new-head-spec.json --raw > new-head-spec-raw.json
~/bin/integritee-collator-1.5.13 export-genesis-state --chain new-head-spec-raw.json > new-head.state

new-head.state:

0x000000000000000000000000000000000000000000000000000000000000000000104ef43d943c58bb485a586668a381240763ed9118c38720a2b927ba0032a44703170a2e7597b7b7e3d84c05391d139a62b157e78786d8c082f29dcf4c11131400
  1. relay: sudo.sudo(paras.forceSetCurrentHead(2015, new-head.state)

    • the parachain should stall now. no new blocks
  2. now, start a collator on fresh genesis:

~/bin/integritee-collator-1.4.12 --tmp --dave --ws-port 9946 --port 31201 --chain new-head-spec-raw.json -- --chain rococo-local-raw.json
and a second one
~/bin/integritee-collator-1.4.12 --tmp --eve --ws-port 9947 --port 31202 --chain new-head-spec-raw.json -- --chain rococo-local-raw.json

  • point js/apps to port 9946 and observe new blocks being produced, starting again with 0
  • observe that the runtime version is V3 (as the genesis we generated was produced by the collator 1.5.13
  1. stall parachain by sending balances.transfer(Alice, Alice_stash, 1TEER)
    • observe that no new blocks are accepted by the relaychain. best will stay constant
  2. remove extrinsic from pool with author.removeExtrinsic. look up extrinsic hash under NodeInfo
    • observe that blocks will be produced again

Ok, so now we have the same problem like on Kusama.

diagnose

The relaychain thinks we're still running V2, but the collator derive from the newly generated genesis that it has to be V3

  • check paras.currentCodeHash on relay. it will still be on the old runtime ( in the rococo-local case 0xa271919f7d988511b15e55ef7689c8f4215f972955ce44db59f889a652f0f0e8)

why didn't we see this when we tested our fix with forceSetCurrentHead?

  • suspicion: The problem is that our test of the fix was done before we released our new collator involving the fix. therefore, the runtime spec_version was not bumped for our test and the problem didn't show

fix options

We can fix it by relaychain governance:
sudo.sudoUncheckedWeight(paras.forceSetCurrentCode(2015, <V3>.wasm)

then extrinsics will be accepted again. But we'd like to avoid annoying Kusama governance again.

  • can we somehow force the collator to use V2 runtime although genesis says it should be V3?

@mosonyi
Copy link
Collaborator

mosonyi commented Mar 24, 2022

@brenzi congrats, good catch!

@brenzi
Copy link
Collaborator Author

brenzi commented Mar 24, 2022

solution idea 1

Basically, we need to perform a runtime upgrade with a sudo call on our parachain. If that succeeds, everything is back to normal.

a simple sudo extrinsic will not be accepted by relay because the signed extra is different for V3 (which the para uses now) than V2 (which the relay validators use).

https://github.com/scs/substrate-api-client/blob/b26ad4db237c11c1aff3241c04db9b1beb48a081/compose-macros/src/lib.rs#L80-L83

signed extra includes spec_version and therefore, the PoV will include an extrinsic whose signature will be wrong if verifed against runtime V2

We could patch substrate where it verifies extrinsic signatures and hard-code runtime_version to 2 there. That patch would be used to build a temporary collator binary which only serves to do the runtime upgrade

we would need to patch substrate-api-client too, to compose a fake spec-version extrinsic for sudo setCode manually

@brenzi
Copy link
Collaborator Author

brenzi commented Mar 25, 2022

Here are the relaychain logs for the rococo-local case: It seems the suspicion above was right

2022-03-24 22:14:23 panicked at 'Transaction has a bad signature', /home/runner/.cargo/git/checkouts/substrate-7e08433d4c370a21/4aeb95f/frame/executive/src/lib.rs:388:17    

later we also see this, but that may be because I let my laptop sleep in between (check times in log)

2022-03-24 22:37:48 Error with block built on 0x84c5db1c4aca3eeddef40ccc7c8572ee4b37243e77aefca28c57f4b92263154b: Import failed: Unexpected epoch change

@haerdib
Copy link
Contributor

haerdib commented Apr 4, 2022

We successfully updated our parachain runtime from version v3 to v3 with a set_code_without_checks call. The relay chain accepted the update, so the parachain runtime and the runtime the relay chain knows of, are finally matching again.

To make this happen, we had to tweak the following:

diff --git a/compose-macros/src/lib.rs b/compose-macros/src/lib.rs
index 5ef9ade..ddea999 100644
--- a/compose-macros/src/lib.rs
+++ b/compose-macros/src/lib.rs
@@ -133,7 +133,7 @@ macro_rules! compose_extrinsic {
                     Era::Immortal,
                     $api.genesis_hash,
                     $api.genesis_hash,
-                    $api.runtime_version.spec_version,
+                    2,
                     $api.runtime_version.transaction_version
                 )
             } else {
diff --git a/examples/example_setcode_extrinsic.rs b/examples/example_setcode_extrinsic.rs
new file mode 100644
index 0000000..5346832
--- /dev/null
+++ b/examples/example_setcode_extrinsic.rs
@@ -0,0 +1,76 @@
+/*
+    Copyright 2019 Supercomputing Systems AG
+    Licensed under the Apache License, Version 2.0 (the "License");
+    you may not use this file except in compliance with the License.
+    You may obtain a copy of the License at
+        http://www.apache.org/licenses/LICENSE-2.0
+    Unless required by applicable law or agreed to in writing, software
+    distributed under the License is distributed on an "AS IS" BASIS,
+    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+    See the License for the specific language governing permissions and
+    limitations under the License.
+*/
+
+///! Example that shows how to compose a setCode extrinsic.
+/// Upgrade from node 1.05 to 1.0.6: use wasm integritee_node_runtime-v6.compact.wasm
+use clap::{load_yaml, App};
+use sp_core::crypto::Ss58Codec;
+use sp_core::{crypto::Pair, sr25519};
+use support::weights::Weight;
+
+use substrate_api_client::rpc::WsRpcClient;
+use substrate_api_client::{compose_call, compose_extrinsic, Api, UncheckedExtrinsicV4, XtStatus};
+
+fn main() {
+    env_logger::init();
+    let url = get_node_url_from_cli();
+
+    // Alice's seed: subkey inspect //Alice
+    let from: sr25519::Pair = Pair::from_string(
+        "0xe5be9a5092b81bca64be81d212e7f2f9eba183bb7a90954f7b76361f6edb5c0a",
+        None,
+    )
+    .unwrap();
+    println!("signer account: {}", from.public().to_ss58check());
+
+    let client = WsRpcClient::new(&url);
+    let api = Api::new(client)
+        .map(|api| api.set_signer(from.clone()))
+        .unwrap();
+
+    //let new_wasm: &[u8] = include_bytes!("integritee_node_runtime-v6.compact.wasm");
+    let new_wasm: &[u8] = include_bytes!("shell_runtime-v3.compact.compressed.wasm");
+
+    // this call can only be called by sudo
+    #[allow(clippy::redundant_clone)]
+    let call = compose_call!(
+        api.metadata.clone(),
+        "System",
+        "set_code_without_checks",
+        new_wasm.to_vec()
+    );
+
+    let weight: Weight = 0;
+
+    #[allow(clippy::redundant_clone)]
+    let xt: UncheckedExtrinsicV4<_> =
+        compose_extrinsic!(api.clone(), "Sudo", "sudo_unchecked_weight", call, weight);
+
+    // send and watch extrinsic until finalized
+    let tx_hash = api
+        .send_extrinsic(xt.hex_encode(), XtStatus::InBlock)
+        .unwrap();
+
+    println!("[+] Transaction got included. Hash: {:?}", tx_hash);
+}
+
+pub fn get_node_url_from_cli() -> String {
+    let yml = load_yaml!("cli.yml");
+    let matches = App::from_yaml(yml).get_matches();
+
+    let node_ip = matches.value_of("node-server").unwrap_or("ws://127.0.0.1");
+    let node_port = matches.value_of("node-port").unwrap_or("9944");
+    let url = format!("{}:{}", node_ip, node_port);
+    println!("Interacting with node on {}\n", url);
+    url
+}
diff --git a/examples/shell_runtime-v3.compact.compressed.wasm b/examples/shell_runtime-v3.compact.compressed.wasm
new file mode 100644
index 0000000..66283c5
Binary files /dev/null and b/examples/shell_runtime-v3.compact.compressed.wasm differ
diff --git a/primitives/io/src/lib.rs b/primitives/io/src/lib.rs
index a677f418be..6c191057c8 100644
--- a/primitives/io/src/lib.rs
+++ b/primitives/io/src/lib.rs

@@ -654,7 +654,7 @@ pub trait Crypto {
        ///
        /// Returns `true` when the verification was successful.
        fn ed25519_verify(sig: &ed25519::Signature, msg: &[u8], pub_key: &ed2551                                                                                                                                                             9::Public) -> bool {
-               ed25519::Pair::verify(sig, msg, pub_key)
+               true
        }

        /// Register a `ed25519` signature for batch verification.
@@ -671,9 +671,7 @@ pub trait Crypto {
                msg: &[u8],
                pub_key: &ed25519::Public,
        ) -> bool {
-               self.extension::<VerificationExt>()
-                       .map(|extension| extension.push_ed25519(sig.clone(), pub                                                                                                                                                             _key.clone(), msg.to_vec()))
-                       .unwrap_or_else(|| ed25519_verify(sig, msg, pub_key))
+               true
        }

        /// Verify `sr25519` signature.
@@ -681,7 +679,7 @@ pub trait Crypto {
        /// Returns `true` when the verification was successful.
        #[version(2)]
        fn sr25519_verify(sig: &sr25519::Signature, msg: &[u8], pub_key: &sr2551                                                                                                                                                             9::Public) -> bool {
-               sr25519::Pair::verify(sig, msg, pub_key)
+               true
        }

        /// Register a `sr25519` signature for batch verification.
@@ -698,9 +696,7 @@ pub trait Crypto {
                msg: &[u8],
                pub_key: &sr25519::Public,
        ) -> bool {
-               self.extension::<VerificationExt>()
-                       .map(|extension| extension.push_sr25519(sig.clone(), pub                                                                                                                                                             _key.clone(), msg.to_vec()))
-                       .unwrap_or_else(|| sr25519_verify(sig, msg, pub_key))
+               true
        }

        /// Start verification extension.
@@ -721,15 +717,7 @@ pub trait Crypto {
        ///
        /// Will panic if no `VerificationExt` is registered (`start_batch_verif                                                                                                                                                             y` was not called).
        fn finish_batch_verify(&mut self) -> bool {
-               let result = self
-                       .extension::<VerificationExt>()
-                       .expect("`finish_batch_verify` should only be called aft                                                                                                                                                             er `start_batch_verify`")
-                       .verify_and_clear();
-
-               self.deregister_extension::<VerificationExt>()
-                       .expect("No verification extension in current context!")                                                                                                                                                             ;
-
-               result
+               true
        }

        /// Returns all `sr25519` public keys for the given key id from the keys                                                                                                                                                             tore.
@@ -779,7 +767,7 @@ pub trait Crypto {
        /// Returns `true` when the verification in successful regardless of
        /// signature version.
        fn sr25519_verify(sig: &sr25519::Signature, msg: &[u8], pubkey: &sr25519                                                                                                                                                             :
        /// Will panic if no `VerificationExt` is registered (`start_batch_verify` was not called).
        fn finish_batch_verify(&mut self) -> bool {
-               let result = self
-                       .extension::<VerificationExt>()
-                       .expect("`finish_batch_verify` should only be called after `start_batch_verify`")
-                       .verify_and_clear();
-
-               self.deregister_extension::<VerificationExt>()
-                       .expect("No verification extension in current context!");
-
-               result
+               true
        }

        /// Returns all `sr25519` public keys for the given key id from the keystore.
@@ -779,7 +767,7 @@ pub trait Crypto {
        /// Returns `true` when the verification in successful regardless of
        /// signature version.
        fn sr25519_verify(sig: &sr25519::Signature, msg: &[u8], pubkey: &sr25519::Public) -> bool {
-               sr25519::Pair::verify_deprecated(sig, msg, pubkey)
+               true
        }

        /// Returns all `ecdsa` public keys for the given key id from the keystore.
@@ -843,7 +831,7 @@ pub trait Crypto {
        ///
        /// Returns `true` when the verification was successful.
        fn ecdsa_verify(sig: &ecdsa::Signature, msg: &[u8], pub_key: &ecdsa::Public) -> bool {
-               ecdsa::Pair::verify_deprecated(sig, msg, pub_key)
+               true
        }

        /// Verify `ecdsa` signature.
@@ -851,7 +839,7 @@ pub trait Crypto {
        /// Returns `true` when the verification was successful.
        #[version(2)]
        fn ecdsa_verify(sig: &ecdsa::Signature, msg: &[u8], pub_key: &ecdsa::Public) -> bool {
-               ecdsa::Pair::verify(sig, msg, pub_key)
+               true
        }

        /// Verify `ecdsa` signature with pre-hashed `msg`.
@@ -862,7 +850,7 @@ pub trait Crypto {
                msg: &[u8; 32],
                pub_key: &ecdsa::Public,
        ) -> bool {
-               ecdsa::Pair::verify_prehashed(sig, msg, pub_key)
+               true
        }

        /// Register a `ecdsa` signature for batch verification.
@@ -879,9 +867,7 @@ pub trait Crypto {
                msg: &[u8],
                pub_key: &ecdsa::Public,
        ) -> bool {
-               self.extension::<VerificationExt>()
-                       .map(|extension| extension.push_ecdsa(sig.clone(), pub_key.clone(), msg.to_vec()))
-                       .unwrap_or_else(|| ecdsa_verify(sig, msg, pub_key))
+               true
        }

@brenzi
Copy link
Collaborator Author

brenzi commented Apr 4, 2022

the consequence of this fix - however - is that we now have an invalid signature in our collator parachain history and syncing a collator from scratch will stall at the block where we sent the code update. This can be fixed by applying the patched collator binary to go beyond that block and once synched, every "proper" collator binary will work again

@brenzi
Copy link
Collaborator Author

brenzi commented Apr 8, 2022

@mosonyi please add artifacts to release

  • parachain snapshot (post-fix)
  • collator-hotfix-binary

https://github.com/integritee-network/parachain/releases/tag/1.5.15

then we can close this issue

@mosonyi
Copy link
Collaborator

mosonyi commented Apr 21, 2022

@brenzi done

@brenzi
Copy link
Collaborator Author

brenzi commented Apr 22, 2022

@mosonyi I added instructions to https://github.com/integritee-network/parachain/releases/tag/1.5.15
But please:

  • add more info as needed
  • is the snapshot for validators, archive nodes or what? Please upload all snapshots anyone might ever need and document ehn to use what

I'd like these release notes to be comprehensive documentation of how to sync our parachain. Otherwise we'll have to explain 100x

@murerfel murerfel assigned mosonyi and unassigned haerdib Apr 26, 2022
@haerdib haerdib linked a pull request May 18, 2022 that will close this issue
@haerdib
Copy link
Contributor

haerdib commented May 18, 2022

For traceability, I have pushed two new branches:

Tweaked substrate version: https://github.com/integritee-network/substrate/tree/polkadot-v0.9.20-skip-signature-verification
Patched collator: https://github.com/integritee-network/parachain/tree/skip-signature-verification (PR #126 )

The collator binaries can be found here: https://github.com/integritee-network/parachain/actions/runs/2343727441

@mosonyi
Copy link
Collaborator

mosonyi commented Jun 27, 2022

The documentation is in wiki.

@mosonyi mosonyi closed this as completed Jun 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants