-
Notifications
You must be signed in to change notification settings - Fork 153
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: bug cannot use the disk which has failed replica #3499
Conversation
Warning Rate limit exceeded@PhanLe1010 has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 0 minutes and 9 seconds before requesting another review. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📒 Files selected for processing (2)
WalkthroughThe changes modify the replica scheduling logic in the Changes
Sequence DiagramsequenceDiagram
participant Scheduler
participant filterDisksWithMatchingReplicas
Scheduler->>filterDisksWithMatchingReplicas: Call with disks, replicas, diskSoftAntiAffinity, ignoreFailedReplicas
alt ignoreFailedReplicas is true
filterDisksWithMatchingReplicas-->>Scheduler: Filter out failed replicas
else ignoreFailedReplicas is false
filterDisksWithMatchingReplicas-->>Scheduler: Include all replicas
end
The sequence diagram illustrates how the new Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
scheduler/replica_scheduler.go
(2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (1)
- GitHub Check: Summary
🔇 Additional comments (2)
scheduler/replica_scheduler.go (2)
470-470
: LGTM! Clear and purposeful parameter addition.The new
ignoreFailedReplicas
parameter effectively supports the PR's objective of allowing disk reuse when the caller intends to reuse failed replicas.
473-480
: LGTM! Implementation aligns with requirements.The logic correctly implements both key requirements:
- Skips failed replicas when the caller intends to reuse them (
ignoreFailedReplicas=true
)- Skips non-reusable failed replicas by checking
IsPotentiallyReusableReplica
This ensures that disks with failed replicas are considered as valid candidates under the right conditions.
Cool @PhanLe1010 ! I just found the same root cause as you did. ;) |
@mergify backport v1.8.x |
✅ Backports have been created
|
@PhanLe1010 CI failed
|
We should not exclude the disk which has failed replica in 2 cases: 1. The caller is trying to reuse the failed replica, this disk is a valid candidate 2. The failed replica is no longer reusable, this disk is a valid candidate longhorn-10210 Signed-off-by: Phan Le <[email protected]>
@derekbit Fixed. Thank you |
@mergify backport v1.6.x v1.7.x |
✅ Backports have been created
|
Root cause:
The root cause of the issue longhorn/longhorn#10210 with the reproducing steps is that when
Replica Disk Level Soft Anti-Affinity
isfalse
, Longhorn never considers a disk which contains a failed replica as a valid candidate.Propose solution:
We should not exclude the disk which has failed replica in 2 cases:
Test plan:
longhorn/longhorn#10210 (comment)
longhorn/longhorn#10210