Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zombienet-bridges-* test failures #6161

Closed
serban300 opened this issue Oct 21, 2024 · 8 comments · Fixed by #6175
Closed

zombienet-bridges-* test failures #6161

serban300 opened this issue Oct 21, 2024 · 8 comments · Fixed by #6175
Assignees
Labels
I2-bug The node fails to follow expected behavior. T10-tests This PR/Issue is related to tests. T15-bridges This PR/Issue is related to bridges.

Comments

@serban300
Copy link
Contributor

serban300 commented Oct 21, 2024

The bridge zombienet tests are failing.
For example:
https://gitlab.parity.io/parity/mirrors/polkadot-sdk/-/jobs/7607843
https://gitlab.parity.io/parity/mirrors/polkadot-sdk/-/jobs/7609603

We need to investigate this.

@serban300 serban300 added T10-tests This PR/Issue is related to tests. T15-bridges This PR/Issue is related to bridges. labels Oct 21, 2024
@serban300 serban300 self-assigned this Oct 21, 2024
@serban300 serban300 added the I2-bug The node fails to follow expected behavior. label Oct 21, 2024
@bkontur
Copy link
Contributor

bkontur commented Oct 21, 2024

I think it should be related to the: #6133

@serban300
Copy link
Contributor Author

I think it should be related to the: #6133

The error in the linked failure is not related to this:

chainHead_v1_unpin, chainSpec_v1_chainName, chainSpec_v1_genesisHash, chainSpec_v1_properties, transactionWatch_v1_submitAndWatch, transactionWatch_v1_unwatch, transaction_v1_broadcast, transaction_v1_stop
<--- Last few GCs --->
[5198:0x639f8b0]    44928 ms: Mark-sweep 4076.2 (4143.0) -> 4075.9 (4143.0) MB, 3589.0 / 0.0 ms  (average mu = 0.089, current mu = 0.005) allocation failure; scavenge might not succeed
[5198:0x639f8b0]    49587 ms: Mark-sweep 4078.4 (4144.0) -> 4078.4 (4151.5) MB, 4293.0 / 0.0 ms  (average mu = 0.083, current mu = 0.079) allocation failure; GC in old space requested
<--- JS stacktrace --->
FATAL ERROR: Reached heap limit Allocation failed - JavaScript heap out of memory

That might also be a problem, but I think we will stumble upon it later, after fixing the first one

@pepoviola
Copy link
Contributor

I updated the NODE_OPTIONS here to set max-old-space-size=8192 and add the tag to run this job in the zombienet's cluster.

@serban300
Copy link
Contributor Author

I updated the NODE_OPTIONS here to set max-old-space-size=8192 and add the tag to run this job in the zombienet's cluster.

Thanks ! This will probably fix the error above. Anyway, I see that other runs fail with:

Setting up the rococo side of the bridge. Logs available at: /tmp/bridges-tests-run-jK9AK/logs/rococo-init.log
Setting up the westend side of the bridge. Logs available at: /tmp/bridges-tests-run-jK9AK/logs/westend-init.log
Timeout waiting for file /tmp/bridges-tests-run-jK9AK/rococo.env: 600 seconds

So setting up the bridge times out. This is probably related to something else. Looking into it.

@serban300
Copy link
Contributor Author

serban300 commented Oct 21, 2024

This is where it started failing: #6039

The problem is that the process of setting up the bridge is unsuccessful. For example for the Rococo side it fails here:

Input params: 
Generating hex-encoded call data for:
	type: check
	rpcEndpoint: --
	output: 
	inputArgs: 
Checking nodejs installation, if you see this everything is ready!
  calling force_create_foreign_asset:
      relay_url: ws://127.0.0.1:9942
      relay_chain_seed: //Alice
      runtime_para_id: 1000
      runtime_para_endpoint: ws://127.0.0.1:9910
      asset_multilocation: {
  "parents": 2,
  "interior": {
    "X1": [
      {
        "GlobalConsensus": "Westend"
      }
    ]
  }
}
      asset_owner_account_id: 5He2Qdztyxxa4GoagY6q1jaiLMmKy1gXS7PdZkhfj8ZG9hk5
      min_balance: 10000000000
      is_sufficient: true
      params:
Input params: {
  "parents": 2,
  "interior": {
    "X1": [
      {
        "GlobalConsensus": "Westend"
      }
    ]
  }
} 5He2Qdztyxxa4GoagY6q1jaiLMmKy1gXS7PdZkhfj8ZG9hk5 true 10000000000
Generating hex-encoded call data for:
	type: force-create-asset
	rpcEndpoint: ws://127.0.0.1:9910
	output: /tmp/tmp.I7imOIN4m4
	inputArgs: {
  "parents": 2,
  "interior": {
    "X1": [
      {
        "GlobalConsensus": "Westend"
      }
    ]
  }
},5He2Qdztyxxa4GoagY6q1jaiLMmKy1gXS7PdZkhfj8ZG9hk5,true,10000000000
Generating forceCreateAsset from RPC endpoint: ws://127.0.0.1:9910 to outputFile: /tmp/tmp.I7imOIN4m4 based on assetId: {
  "parents": 2,
  "interior": {
    "X1": [
      {
        "GlobalConsensus": "Westend"
      }
    ]
  }
}, assetOwnerAccountId: 5He2Qdztyxxa4GoagY6q1jaiLMmKy1gXS7PdZkhfj8ZG9hk5, isSufficient: true, minBalance: 10000000000
2024-10-21 18:02:31        REGISTRY: Unknown signed extensions StorageWeightReclaim found, treating them as no-effect
2024-10-21 18:02:31        API/INIT: RPC methods not decorated: chainHead_v1_body, chainHead_v1_call, chainHead_v1_continue, chainHead_v1_follow, chainHead_v1_header, chainHead_v1_stopOperation, chainHead_v1_storage, chainHead_v1_unfollow, chainHead_v1_unpin, chainSpec_v1_chainName, chainSpec_v1_genesisHash, chainSpec_v1_properties, transactionWatch_v1_submitAndWatch, transactionWatch_v1_unwatch, transaction_v1_broadcast, transaction_v1_stop

<--- Last few GCs --->

[528725:0x6a843d0]   289404 ms: Mark-Compact 4234.4 (4327.7) -> 3979.3 (4073.4) MB, 61658.62 / 0.00 ms  (average mu = 0.007, current mu = 0.004) allocation failure; GC in old space requested
[528725:0x6a843d0]   289455 ms: Scavenge 4011.0 (4089.4) -> 4012.7 (4091.4) MB, 15.70 / 0.00 ms  (average mu = 0.007, current mu = 0.004) allocation failure; 
[528725:0x6a843d0]   289470 ms: Scavenge 4012.7 (4091.4) -> 4011.0 (4105.6) MB, 15.84 / 0.00 ms  (average mu = 0.007, current mu = 0.004) allocation failure; 


<--- JS stacktrace --->

FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory
----- Native stack trace -----

 1: 0xcc08f6 node::OOMErrorHandler(char const*, v8::OOMDetails const&) [node]
 2: 0x1054130 v8::Utils::ReportOOMFailure(v8::internal::Isolate*, char const*, v8::OOMDetails const&) [node]
 3: 0x1054417 v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, v8::OOMDetails const&) [node]
 4: 0x1273655  [node]
 5: 0x1273b2e  [node]
 6: 0x1288d56 v8::internal::Heap::PerformGarbageCollection(v8::internal::GarbageCollector, v8::internal::GarbageCollectionReason, char const*) [node]
 7: 0x1289879  [node]
 8: 0x1289e88  [node]
 9: 0x19d98b1  [node]
/media/serban/data/workplace/sources/polkadot-sdk/bridges/testing/framework/utils/bridges.sh: line 59: 528725 Aborted                 (core dumped) node ${BASH_SOURCE%/*}/../utils/generate_hex_encoded_call "$type" "$endpoint" "$output" "$@"
Generated hex-encoded bytes to file '/tmp/tmp.rX2BmUtmp2': 
  calling send_governance_transact:
      relay_url: ws://127.0.0.1:9942
      relay_chain_seed: //Alice
      para_id: 1000
      hex_encoded_data: 
      require_weight_at_most_ref_time: 200000000
      require_weight_at_most_proof_size: 12000
      params:
jq: invalid JSON text passed to --argjson
Use jq --help for help with command-line options,
or see the jq manpage, or online docs  at https://jqlang.github.io/jq

          dest:
{
  "V3": {
    "parents": 0,
    "interior": {
      "X1": {
        "Parachain": "1000"
      }
    }
  }
}

          message:


--------------------------------------------------
2024-10-21 18:32:50        API/INIT: RPC methods not decorated: chainHead_v1_body, chainHead_v1_call, chainHead_v1_continue, chainHead_v1_follow, chainHead_v1_header, chainHead_v1_stopOperation, chainHead_v1_storage, chainHead_v1_unfollow, chainHead_v1_unpin, chainSpec_v1_chainName, chainSpec_v1_genesisHash, chainSpec_v1_properties, transactionWatch_v1_submitAndWatch, transactionWatch_v1_unwatch, transaction_v1_broadcast, transaction_v1_stop
2024-10-21 18:32:50        API/INIT: rococo/1016001: Not decorating runtime apis without matching versions: BeefyApi/5 (1/2/3 known)
2024-10-21 18:32:50        API/INIT: rococo/1016001: Not decorating unknown runtime apis: 0x6ff52ee858e6c5bd/1, 0x91b1c8b16328eb92/1, 0x9ffb505aa738d69c/1
Error: createType(Call):: Call: failed decoding xcmPallet.send:: Struct: failed on args: {"dest":"{\"_enum\":{\"__Unused0\":\"Null\",\"V2\":\"XcmV2MultiLocation\",\"__Unused2\":\"Null\",\"V3\":\"StagingXcmV3MultiLocation\",\"V4\":\"StagingXcmV4Location\"}}","message":"{\"_enum\":{\"__Unused0\":\"Null\",\"__Unused1\":\"Null\",\"V2\":\"XcmV2Xcm\",\"V3\":\"XcmV3Xcm\",\"V4\":\"StagingXcmV4Xcm\"}}"}:: Struct: failed on message: {"_enum":{"__Unused0":"Null","__Unused1":"Null","V2":"XcmV2Xcm","V3":"XcmV3Xcm","V4":"StagingXcmV4Xcm"}}:: Cannot map Enum JSON, unable to find '' in __unused0, __unused1, v2, v3, v4
    at createTypeUnsafe (file:///home/serban/.config/yarn/global/node_modules/@polkadot/types-create/create/type.js:51:22)
    at TypeRegistry.createTypeUnsafe (file:///home/serban/.config/yarn/global/node_modules/@polkadot/types/create/registry.js:226:16)
    at extrinsicFn (file:///home/serban/.config/yarn/global/node_modules/@polkadot/types/metadata/decorate/extrinsics/createUnchecked.js:13:25)
    at decorated (file:///home/serban/.config/yarn/global/node_modules/@polkadot/api/base/Decorate.js:488:50)
    at makeTx (file:///home/serban/.config/yarn/global/node_modules/@polkadot/api-cli/runcli.js:158:32)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
  calling open_hrmp_channels:
      relay_url: ws://127.0.0.1:9942
      relay_chain_seed: //Alice
      sender_para_id: 1000
      recipient_para_id: 1013
      max_capacity: 4
      max_message_size: 524288
      params:
--------------------------------------------------


I don't think the nodejs heap size is the issue here. Checking.

@pepoviola
Copy link
Contributor

Yes, I try increasing the size of the heap (2x) and I get the same error, something is leaking and consuming memory. I also check with Tarik about recent changes in pjs but they don't receive any performance issue.
Also, worth noting that this other test zombienet-polkadot-smoke-0004-coretime-smoke-test (https://gitlab.parity.io/parity/mirrors/polkadot-sdk/-/jobs/7596992) also is failing with a similar js oom error.

Thanks!

github-merge-queue bot pushed a commit that referenced this issue Oct 22, 2024
Related to #6161

This seems to fix the `JavaScript heap out of memory` error encountered
in the bridge zombienet tests lately.

This is just a partial fix, since we also need to address
#6133 in order to fully
fix the bridge zombienet tests
@serban300
Copy link
Contributor Author

Now the Rococo -> Westend transfer works. The Westend -> Rococo transfer fails:
https://gitlab.parity.io/parity/mirrors/polkadot-sdk/-/jobs/7620163

Doesn't seem related to the signed extensions. Investigating

@serban300
Copy link
Contributor Author

Looks like the issue was introduced by #5461

Investigating why

github-merge-queue bot pushed a commit that referenced this issue Oct 24, 2024
Closes #6161

Westend BridgeHub freezes for a while at block 3 and if we try to init
the bridge and fund the accounts during that time, it fails. So we wait
untill all the parachains produced at least 10 blocks, in order to make
sure that they work reliably.
mordamax pushed a commit to paritytech-stg/polkadot-sdk that referenced this issue Oct 25, 2024
Closes paritytech#6161

Westend BridgeHub freezes for a while at block 3 and if we try to init
the bridge and fund the accounts during that time, it fails. So we wait
untill all the parachains produced at least 10 blocks, in order to make
sure that they work reliably.
mordamax pushed a commit to paritytech-stg/polkadot-sdk that referenced this issue Oct 25, 2024
Closes paritytech#6161

Westend BridgeHub freezes for a while at block 3 and if we try to init
the bridge and fund the accounts during that time, it fails. So we wait
untill all the parachains produced at least 10 blocks, in order to make
sure that they work reliably.
mordamax pushed a commit to paritytech-stg/polkadot-sdk that referenced this issue Oct 25, 2024
Closes paritytech#6161

Westend BridgeHub freezes for a while at block 3 and if we try to init
the bridge and fund the accounts during that time, it fails. So we wait
untill all the parachains produced at least 10 blocks, in order to make
sure that they work reliably.
mordamax pushed a commit to paritytech-stg/polkadot-sdk that referenced this issue Oct 25, 2024
Closes paritytech#6161

Westend BridgeHub freezes for a while at block 3 and if we try to init
the bridge and fund the accounts during that time, it fails. So we wait
untill all the parachains produced at least 10 blocks, in order to make
sure that they work reliably.
mordamax pushed a commit to paritytech-stg/polkadot-sdk that referenced this issue Oct 25, 2024
Closes paritytech#6161

Westend BridgeHub freezes for a while at block 3 and if we try to init
the bridge and fund the accounts during that time, it fails. So we wait
untill all the parachains produced at least 10 blocks, in order to make
sure that they work reliably.
@alindima alindima mentioned this issue Oct 28, 2024
2 tasks
mordamax pushed a commit to paritytech-stg/polkadot-sdk that referenced this issue Oct 29, 2024
Closes paritytech#6161

Westend BridgeHub freezes for a while at block 3 and if we try to init
the bridge and fund the accounts during that time, it fails. So we wait
untill all the parachains produced at least 10 blocks, in order to make
sure that they work reliably.
mordamax pushed a commit to paritytech-stg/polkadot-sdk that referenced this issue Oct 29, 2024
Closes paritytech#6161

Westend BridgeHub freezes for a while at block 3 and if we try to init
the bridge and fund the accounts during that time, it fails. So we wait
untill all the parachains produced at least 10 blocks, in order to make
sure that they work reliably.
github-merge-queue bot pushed a commit that referenced this issue Nov 4, 2024
Reported in
#6161 (comment)

Fixes a bug introduced in
#5461, where the claim
queue would contain entries even if the validator groups storage is
empty (which happens during the first session).

This PR sets the claim queue core count to be the minimum between the
num_cores param and the number of validator groups

TODO:
- [x] prdoc
- [x] unit test
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
I2-bug The node fails to follow expected behavior. T10-tests This PR/Issue is related to tests. T15-bridges This PR/Issue is related to bridges.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants