Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(libs/pidstore): Detect corruption on startup and reset pidstore #3132

Merged
merged 2 commits into from
Jan 23, 2024

Conversation

renaynay
Copy link
Member

The recovery of an existing pidstore is not critical to node functionality and should not block the node's start-up if it was corrupted during an ungraceful shutdown. If corruption is detected, a fresh pidstore can be instantiated and used by the node.

Found by conduit.

@renaynay renaynay added area:p2p kind:fix Attached to bug-fixing PRs labels Jan 22, 2024
@renaynay renaynay self-assigned this Jan 22, 2024
@codecov-commenter
Copy link

codecov-commenter commented Jan 23, 2024

Codecov Report

Attention: 3 lines in your changes are missing coverage. Please review.

Comparison is base (65c6b65) 51.08% compared to head (0bb4fc0) 51.47%.
Report is 3 commits behind head on main.

Files Patch % Lines
libs/pidstore/pidstore.go 76.92% 2 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3132      +/-   ##
==========================================
+ Coverage   51.08%   51.47%   +0.38%     
==========================================
  Files         177      177              
  Lines       11157    11170      +13     
==========================================
+ Hits         5700     5750      +50     
+ Misses       4958     4926      -32     
+ Partials      499      494       -5     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@Wondertan Wondertan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't think of any other reason why It could get corrupted besides an ungrateful shutdown. In this case just doing that only for pidstore might not solve the issue. Lets ask conduit if this fixes the issue for them without resetting the node and only after merge it

@renaynay
Copy link
Member Author

@Wondertan I agree that an ungraceful shutdown is most likely the cause for this. IMO fix is still ok to implement as pidstore is non-essential.

Copy link
Collaborator

@distractedm1nd distractedm1nd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are other datastore corruptions that happen, I wonder if we can wrap badger startup similarly or something (not resetting it, but just by giving a more readable error output when detected)

@renaynay renaynay enabled auto-merge (squash) January 23, 2024 16:50
@renaynay renaynay merged commit af417b0 into celestiaorg:main Jan 23, 2024
23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:p2p kind:fix Attached to bug-fixing PRs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants