Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to load a 1GB genesis file in 40 seconds in version 1.28.0. #7361

Open
lyfsn opened this issue Aug 26, 2024 · 4 comments
Open

Unable to load a 1GB genesis file in 40 seconds in version 1.28.0. #7361

lyfsn opened this issue Aug 26, 2024 · 4 comments
Assignees

Comments

@lyfsn
Copy link

lyfsn commented Aug 26, 2024

Description
Our custom network uses a large 1GB genesis.json file, and it worked fine with versions before 1.28.0, such as 1.27.x.

However, after upgrading to version 1.28.0, my Nethermind node can't start and encountered this error:

26 Aug 02:44:18 | Snap serving enabled, but PruningBoundary is less than 128. Setting to 128. 
26 Aug 02:45:39 | Step LoadGenesisBlock         failed after 80976ms System.TimeoutException: Genesis block was not processed after 40 seconds
   at Nethermind.Init.Steps.LoadGenesisBlock.Load(IWorldState worldState) in /src/Nethermind/Nethermind.Init/Steps/LoadGenesisBlock.cs:line 88
   at Nethermind.Init.Steps.LoadGenesisBlock.Execute(CancellationToken _) in /src/Nethermind/Nethermind.Init/Steps/LoadGenesisBlock.cs:line 46
   at Nethermind.Init.Steps.EthereumStepsManager.ExecuteStep(IStep step, StepInfo stepInfo, CancellationToken cancellationToken) in /src/Nethermind/Nethermind.Init/Steps/EthereumStepsManager.cs:line 153
   at Nethermind.Init.Steps.EthereumStepsManager.InitializeAll(CancellationToken cancellationToken) in /src/Nethermind/Nethermind.Init/Steps/EthereumStepsManager.cs:line 95
   at Nethermind.Runner.Ethereum.EthereumRunner.Start(CancellationToken cancellationToken) in /src/Nethermind/Nethermind.Runner/Ethereum/EthereumRunner.cs:line 36
   at Nethermind.Runner.Program.<>c__DisplayClass8_0.<<Run>b__1>d.MoveNext() in /src/Nethermind/Nethermind.Runner/Program.cs:line 213
26 Aug 02:45:39 | Error during ethereum runner start System.TimeoutException: Genesis block was not processed after 40 seconds
   at Nethermind.Init.Steps.LoadGenesisBlock.Load(IWorldState worldState) in /src/Nethermind/Nethermind.Init/Steps/LoadGenesisBlock.cs:line 88
   at Nethermind.Init.Steps.LoadGenesisBlock.Execute(CancellationToken _) in /src/Nethermind/Nethermind.Init/Steps/LoadGenesisBlock.cs:line 46
   at Nethermind.Init.Steps.EthereumStepsManager.ExecuteStep(IStep step, StepInfo stepInfo, CancellationToken cancellationToken) in /src/Nethermind/Nethermind.Init/Steps/EthereumStepsManager.cs:line 153
   at Nethermind.Init.Steps.EthereumStepsManager.InitializeAll(CancellationToken cancellationToken) in /src/Nethermind/Nethermind.Init/Steps/EthereumStepsManager.cs:line 95
   at Nethermind.Runner.Ethereum.EthereumRunner.Start(CancellationToken cancellationToken) in /src/Nethermind/Nethermind.Runner/Ethereum/EthereumRunner.cs:line 36
   at Nethermind.Runner.Program.<>c__DisplayClass8_0.<<Run>b__1>d.MoveNext() in /src/Nethermind/Nethermind.Runner/Program.cs:line 213

Steps to Reproduce

  1. Generate a large genesis file of 1GB.
  2. Use this large genesis file to initialize and start the node.

Actual behavior
The node can't start and logs a timeout of 40 seconds.

By the way, why is the 40s timeout hardcoded?

readonly TimeSpan _genesisProcessedTimeout = TimeSpan.FromSeconds(40);

Expected behavior
The node can start normally, just like in version 1.27.x.

Screenshots
Screenshot 2024-08-26 at 10 57 09

Desktop (please complete the following information):
Please provide the following information regarding your setup:

  • Operating System: Linux
  • Version: 1.28.0
  • Installation Method: Docker
  • Consensus Client: none

Additional context
In my more precise testing, if the genesis file size exceeds 256MB, the node fails to start and times out while loading the genesis file.

My startup paramaters:

version: "3.9"
services:
  execution:
    tty: true
    environment:
    - TERM=xterm-256color
    - COLORTERM=truecolor
    stop_grace_period: 30s
    container_name: gas-execution-client
    image: ${EC_IMAGE_VERSION}
    networks:
    - gas
    volumes:
    - ${EC_DATA_DIR}:/nethermind/data
    - ${EC_JWT_SECRET_PATH}:/tmp/jwt/jwtsecret
    - ${CHAINSPEC_PATH}:/tmp/chainspec/chainspec.json
    ports:
    - "30304:30304/tcp"
    - "30304:30304/udp"
    - "8009:8009"
    - "8545:8545"
    - "8551:8551"
    expose:
    - 8545
    - 8551
    command:
    - --config=none.cfg
    - --Init.ChainSpecPath=/tmp/chainspec/chainspec.json
    - --datadir=/nethermind/data
    - --log=INFO
    - --JsonRpc.Enabled=true
    - --JsonRpc.Host=0.0.0.0
    - --JsonRpc.Port=8545
    - --JsonRpc.JwtSecretFile=/tmp/jwt/jwtsecret
    - --JsonRpc.EngineHost=0.0.0.0
    - --JsonRpc.EnginePort=8551
    - --Network.DiscoveryPort=30304
    - --HealthChecks.Enabled=true
    - --Metrics.Enabled=true
    - --Metrics.ExposePort=8009
    - --Sync.MaxAttemptsToUpdatePivot=0
    logging:
      driver: json-file
      options:
        max-size: 10m
        max-file: "10"
networks:
  gas:
    name: gas-network

Logs

@LukaszRozmej
Copy link
Member

Can you share genesis file you are using?

@lyfsn
Copy link
Author

lyfsn commented Aug 29, 2024

Can you share genesis file you are using?

In my test environment, I generate a random genesis file every time using this script, which creates many accounts in one genesis file.

For a quick test, this is a larger than 800MB genesis file of Endurance's mainnet. You could also try using this file: (But I haven't tried this file to see if it will produce the error. My error comes from the script method mentioned above.)
https://github.com/OpenFusionist/network_config

@ohko4711
Copy link

ohko4711 commented Sep 18, 2024

hi @LukaszRozmej For the above mentioned performance regression, I've done a further investigation and have some conclusions and points I'd like to further discuss

Regarding the performance issue:
PR: #7215 was a performance optimization that replaced LruCache with ClockCache to reduce lock granularity. However, due to implementation details, it caused a regression that led to timeout issues when initializing large genesis files (>800M).
The latest commit (60159fb) appears to have fixed this issue based on our tests.

Issue identification method:

  • Compared commits between two releases (1.27.1...1.28.0)
  • Locally compiled Nethermind and attempted to start it with a large genesis file
  • Used git bisect to gradually locate the problematic commit

Regarding the 40s hard-coded timeout:
this has been previously discussed.Related PR: #6160. We can further discuss this issue:

It's up for discussion if we want to increase the timeout from 40 seconds (current default, hard-coded value) to something different.

Let me know if you need any additional information or clarification on this matter.

@LukaszRozmej
Copy link
Member

LukaszRozmej commented Sep 18, 2024

@ohko4711 thank you for the analysis. #7215 might have some unplanned effect though 60159fb shouldn't affect genesis based on the code, so not sure if it was this that could fix it. @benaadams can you check, both are your changes.

I will move the timeout to config though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants