Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: check CoreDNS #120

Merged

Conversation

c3y1huang
Copy link
Contributor

Which issue(s) this PR fixes:

Issue longhorn/longhorn#9752

What this PR does / why we need it:

Check if CoreDNS has more than 2 ready replicas.

Special notes for your reviewer:

None

Additional documentation or context

None

@c3y1huang c3y1huang self-assigned this Nov 14, 2024
Copy link

coderabbitai bot commented Nov 14, 2024

Walkthrough

The pull request introduces several changes, including an update to the go.mod file to change the version of the github.com/longhorn/go-common-libs dependency. Additionally, two new constants are added in pkg/consts/preflight.go, and a new method checkKubeDNS is introduced in pkg/local/preflight/checker.go to verify CoreDNS deployment configurations. The newClusterRole method in pkg/remote/preflight/checker.go is modified to enhance permissions for Kubernetes resources. Lastly, new functions for listing Kubernetes deployments and creating label selectors are added.

Changes

File Change Summary
go.mod Updated dependency version for github.com/longhorn/go-common-libs from v0.0.0-20241124035508-d6221574e626 to v0.0.0-20241128023039-4d6c3a880dbc.
pkg/consts/preflight.go Added constants KubeAppLabel with value "k8s-app" and KubeAppValueDNS with value "kube-dns".
pkg/local/preflight/checker.go Added method checkKubeDNS to check CoreDNS deployment configuration; modified Init method for streamlined client configuration; updated Run method.
pkg/remote/preflight/checker.go Updated newClusterRole method to include "deployments" in the Resources field of PolicyRule and expanded Verbs from ["get"] to ["get", "list"].
vendor/github.com/longhorn/go-common-libs/kubernetes/deployment.go Added function ListDeployments to retrieve a list of Kubernetes Deployment resources based on namespace and label selectors.
vendor/github.com/longhorn/go-common-libs/kubernetes/kubernetes.go Added function NewLabelSelectorFromMap to create a label selector from a map of string labels.
vendor/modules.txt Updated module reference for github.com/longhorn/go-common-libs to new version.

Assessment against linked issues

Objective Addressed Explanation
The DR volume should reattach to an available node and continue the restore process without entering a faulted state (#9752). The changes do not directly address the DR volume issue. It's unclear if the new checks impact this functionality.

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@c3y1huang c3y1huang force-pushed the 9752-dr-volume-faulted-after-node-reboot branch 2 times, most recently from 038cb0c to 9678468 Compare November 14, 2024 02:59
@c3y1huang
Copy link
Contributor Author

@coderabbitai review

Copy link

coderabbitai bot commented Nov 14, 2024

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Outside diff range and nitpick comments (2)
go.mod (1)

Line range hint 3-3: Verify Go Version Compatibility

The module specifies Go 1.23, which seems incorrect as the latest stable version is Go 1.22. This might be a typo or an unstable version.

Please update to use a stable Go version (e.g., go 1.22).

pkg/local/preflight/checker.go (1)

471-502: Consider refactoring for better maintainability

  1. Define a constant for minimum required replicas.
  2. Consolidate the replica checks and logging for better flow.
+const minRequiredCoreDNSReplicas = 2

 func (local *Checker) checkCoreDNS() {
 	logrus.Info("Checking if CoreDNS has multiple replicas")

-	daemonSet, err := commonkube.GetDeployment(local.kubeClient, metav1.NamespaceSystem, consts.AppNameCoreDNS)
+	deployment, err := commonkube.GetDeployment(local.kubeClient, metav1.NamespaceSystem, consts.AppNameCoreDNS)
 	if err != nil {
-		local.collection.Log.Error = append(local.collection.Log.Warn, errors.Wrap(err, "failed to check CoreDNS").Error())
+		local.collection.Log.Error = append(local.collection.Log.Error, errors.Wrap(err, "failed to check CoreDNS").Error())
 		return
 	}

-	if daemonSet.Spec.Replicas == nil || *daemonSet.Spec.Replicas < 2 {
-		local.collection.Log.Warn = append(local.collection.Log.Warn, "CoreDNS is set with fewer than 2 replicas; consider increasing replica count for high availability")
-		return
-	}
+	desiredReplicas := int32(0)
+	if deployment.Spec.Replicas != nil {
+		desiredReplicas = *deployment.Spec.Replicas
+	}

-	if daemonSet.Status.ReadyReplicas < 2 {
-		local.collection.Log.Warn = append(local.collection.Log.Warn, "CoreDNS has fewer than 2 ready replicas; some replicas may not be running or ready")
+	if desiredReplicas < minRequiredCoreDNSReplicas {
+		local.collection.Log.Warn = append(local.collection.Log.Warn, 
+			fmt.Sprintf("CoreDNS is set with %d replicas; consider increasing replica count to at least %d for high availability", 
+				desiredReplicas, minRequiredCoreDNSReplicas))
 	}

-	local.collection.Log.Info = append(local.collection.Log.Info, fmt.Sprintf("CoreDNS is set with %d replicas and %d ready replicas", *daemonSet.Spec.Replicas, daemonSet.Status.ReadyReplicas))
+	if deployment.Status.ReadyReplicas < minRequiredCoreDNSReplicas {
+		local.collection.Log.Warn = append(local.collection.Log.Warn, 
+			fmt.Sprintf("CoreDNS has only %d ready replicas; some replicas may not be running or ready", 
+				deployment.Status.ReadyReplicas))
+	}
+
+	local.collection.Log.Info = append(local.collection.Log.Info, 
+		fmt.Sprintf("CoreDNS is set with %d replicas and %d ready replicas", 
+			desiredReplicas, deployment.Status.ReadyReplicas))
 }
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 8379a8a and 9678468.

⛔ Files ignored due to path filters (1)
  • go.sum is excluded by !**/*.sum
📒 Files selected for processing (6)
  • go.mod (1 hunks)
  • pkg/consts/preflight.go (1 hunks)
  • pkg/local/preflight/checker.go (3 hunks)
  • pkg/remote/preflight/checker.go (1 hunks)
  • vendor/github.com/longhorn/go-common-libs/kubernetes/deployment.go (1 hunks)
  • vendor/modules.txt (2 hunks)
🔇 Additional comments (9)
pkg/consts/preflight.go (1)

7-8: LGTM! Well-integrated constant addition.

The new AppNameCoreDNS constant follows the established naming convention and uses the correct lowercase value that matches Kubernetes naming standards. The constant is properly grouped with other app names and will integrate well with the CoreDNS preflight checks.

vendor/github.com/longhorn/go-common-libs/kubernetes/deployment.go (1)

1-11: LGTM: Clean package structure with appropriate imports

The package structure is well-organized with all necessary imports for Kubernetes client operations and logging.

go.mod (1)

7-7: ⚠️ Potential issue

Security & Maintainability Concern: Using Personal Fork as Dependency

Replacing github.com/longhorn/go-common-libs with a personal fork raises several concerns:

  1. Security: Personal forks may not undergo the same security review and testing as the main repository
  2. Maintainability: Forks can become stale or unmaintained
  3. Visibility: No documentation explaining why the fork is necessary

Please clarify:

  • Why is the fork necessary?
  • Are the changes in the fork planned to be contributed back to the main repository?
  • What is the long-term maintenance plan for the forked dependency?
pkg/remote/preflight/checker.go (3)

168-168: LGTM with testing reminder

The RBAC permission change looks good and is necessary for the CoreDNS check functionality. Please ensure that:

  1. The change is tested with various RBAC configurations
  2. The CoreDNS check works as expected in both success and failure scenarios

168-168: Verify minimal RBAC permissions for CoreDNS check

The addition of "deployments" permission is aligned with the PR objective to check CoreDNS replicas. However, let's verify if we can further restrict the scope:

  1. Consider namespace-specific permission instead of cluster-wide
  2. Verify if we need both "daemonsets" and "deployments" permissions
#!/bin/bash
# Description: Check CoreDNS deployment location and usage of daemonsets permission

# Check where CoreDNS is deployed
echo "Checking CoreDNS deployment namespace:"
rg -l "kind:\s+Deployment.*name:\s+coredns" 

# Check if daemonsets permission is still needed
echo "Checking daemonset usage in code:"
rg "daemonsets?" --type go -B 2 -A 2

168-168: Verify complete implementation of CoreDNS check

While the RBAC permission is added here, let's verify the implementation of the actual CoreDNS check:

✅ Verification successful

Let's gather more information about the CoreDNS implementation by examining the identified files.


CoreDNS replica check implementation is complete and well-implemented

The implementation is properly in place with:

  • Complete checkCoreDNS() function in pkg/local/preflight/checker.go
  • Proper constant definition in pkg/consts/preflight.go
  • Well-documented function that checks both desired and ready replicas
  • Appropriate warning messages for both cases (<2 replicas specified or <2 ready replicas)
  • Proper error handling and logging

The RBAC permissions added in the review match the implementation's requirements for accessing deployments.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify implementation of CoreDNS replica check

# Look for CoreDNS-related code
echo "Searching for CoreDNS check implementation:"
rg -l "CoreDNS" --type go

# Look for deployment replica check logic
echo "Searching for deployment replica check logic:"
ast-grep --pattern 'func $CHECK_FUNC($ARGS) {
  $$$
  Replicas $$$
  $$$
}'

Length of output: 356


Script:

#!/bin/bash
# Check CoreDNS-related code in the identified files
echo "Content of pkg/consts/preflight.go with CoreDNS context:"
rg -A 5 -B 5 "CoreDNS" pkg/consts/preflight.go

echo -e "\nContent of pkg/local/preflight/checker.go with CoreDNS context:"
rg -A 10 -B 10 "CoreDNS" pkg/local/preflight/checker.go

# Look for deployment replica checks in these specific files
echo -e "\nLooking for deployment/replica checks:"
rg -A 5 -B 5 "deployment.*replica|replica.*deployment" pkg/local/preflight/checker.go

Length of output: 3909

pkg/local/preflight/checker.go (2)

56-64: LGTM: Proper Kubernetes client initialization

The initialization of the Kubernetes client is well-structured with appropriate error handling and wrapping.


180-180: LGTM: Appropriate placement of CoreDNS check

The CoreDNS check is correctly placed before OS-specific checks since it's a cluster-level verification.

vendor/modules.txt (1)

136-136: Justify the use of a personal fork instead of the official library.

I notice that the PR replaces github.com/longhorn/go-common-libs with a personal fork github.com/c3y1huang/go-common-libs. While this might be necessary for development, there are several concerns:

  1. Security: Using a personal fork instead of the official library could introduce security risks.
  2. Maintenance: Personal forks may not be maintained long-term.
  3. Best Practices: It's generally recommended to contribute changes back to the original repository rather than maintaining a separate fork.

Could you please:

  1. Explain why the fork is necessary
  2. Consider contributing the changes back to the original repository
  3. Document the specific changes made in the fork

Let's verify the changes in the forked repository:

Also applies to: 931-931

@c3y1huang
Copy link
Contributor Author

Test results:

  1. CoreDNS with 1 replica.
    > ./bin/longhornctl-linux-amd64 check preflight --image="c3y1huang/research:longhorn-cli-443"
    INFO[2024-11-14T10:42:39+08:00] Initializing preflight checker               
    INFO[2024-11-14T10:42:39+08:00] Cleaning up preflight checker                
    INFO[2024-11-14T10:42:39+08:00] Running preflight checker                    
    INFO[2024-11-14T10:42:43+08:00] Retrieved preflight checker result:
    ip-10-0-2-6:
      info:
      - Service iscsid is running
      - NFS4 is supported
      - Package nfs-client is installed
      - Package open-iscsi is installed
      - Package cryptsetup is installed
      - Module dm_crypt is loaded
      warn:
      - CoreDNS is set with fewer than 2 replicas; consider increasing replica count for high availability
    ip-10-0-2-21:
      info:
      - Service iscsid is running
      - NFS4 is supported
      - Package nfs-client is installed
      - Package open-iscsi is installed
      - Package cryptsetup is installed
      - Module dm_crypt is loaded
      warn:
      - CoreDNS is set with fewer than 2 replicas; consider increasing replica count for high availability
    ip-10-0-2-187:
      info:
      - Service iscsid is running
      - NFS4 is supported
      - Package nfs-client is installed
      - Package open-iscsi is installed
      - Package cryptsetup is installed
      - Module dm_crypt is loaded
      warn:
      - CoreDNS is set with fewer than 2 replicas; consider increasing replica count for high availability 
    INFO[2024-11-14T10:42:43+08:00] Cleaning up preflight checker                
    INFO[2024-11-14T10:42:43+08:00] Completed preflight checker   
    
  2. CoreDNS with 2 replicas.
    > ./bin/longhornctl-linux-amd64 check preflight --image="c3y1huang/research:longhorn-cli-443"
    INFO[2024-11-14T10:43:45+08:00] Initializing preflight checker               
    INFO[2024-11-14T10:43:45+08:00] Cleaning up preflight checker                
    INFO[2024-11-14T10:43:46+08:00] Running preflight checker                    
    INFO[2024-11-14T10:43:49+08:00] Retrieved preflight checker result:
    ip-10-0-2-6:
      info:
      - CoreDNS is set with 2 replicas and 2 ready replicas
      - Service iscsid is running
      - NFS4 is supported
      - Package nfs-client is installed
      - Package open-iscsi is installed
      - Package cryptsetup is installed
      - Module dm_crypt is loaded
    ip-10-0-2-21:
      info:
      - CoreDNS is set with 2 replicas and 2 ready replicas
      - Service iscsid is running
      - NFS4 is supported
      - Package nfs-client is installed
      - Package open-iscsi is installed
      - Package cryptsetup is installed
      - Module dm_crypt is loaded
    ip-10-0-2-187:
      info:
      - CoreDNS is set with 2 replicas and 2 ready replicas
      - Service iscsid is running
      - NFS4 is supported
      - Package nfs-client is installed
      - Package open-iscsi is installed
      - Package cryptsetup is installed
      - Module dm_crypt is loaded 
    INFO[2024-11-14T10:43:49+08:00] Cleaning up preflight checker                
    INFO[2024-11-14T10:43:50+08:00] Completed preflight checker 
    

@c3y1huang c3y1huang force-pushed the 9752-dr-volume-faulted-after-node-reboot branch from 9678468 to 86d0859 Compare November 14, 2024 03:21
@c3y1huang
Copy link
Contributor Author

@coderabbitai review

Copy link

coderabbitai bot commented Nov 14, 2024

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@c3y1huang c3y1huang force-pushed the 9752-dr-volume-faulted-after-node-reboot branch from 86d0859 to dc7a477 Compare November 14, 2024 03:24
@c3y1huang c3y1huang marked this pull request as ready for review November 14, 2024 03:48
@c3y1huang c3y1huang requested review from derekbit and a team November 14, 2024 03:49
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Outside diff range and nitpick comments (2)
pkg/local/preflight/checker.go (2)

472-502: Consider using constants for replica threshold.

The magic number "2" is used multiple times. Consider defining it as a constant for better maintainability.

Add this at the package level:

+const minCoreDNSReplicas = 2

Then use it in the checks:

-if deployment.Spec.Replicas == nil || *deployment.Spec.Replicas < 2 {
+if deployment.Spec.Replicas == nil || *deployment.Spec.Replicas < minCoreDNSReplicas {

-if deployment.Status.ReadyReplicas < 2 {
+if deployment.Status.ReadyReplicas < minCoreDNSReplicas {

501-501: Consider security implications of logging deployment details.

The log message exposes information about the cluster configuration. Consider making this a debug-level log or removing specific numbers.

-local.collection.Log.Info = append(local.collection.Log.Info, fmt.Sprintf("CoreDNS is set with %d replicas and %d ready replicas", *deployment.Spec.Replicas, deployment.Status.ReadyReplicas))
+local.collection.Log.Info = append(local.collection.Log.Info, "CoreDNS has sufficient replicas and is ready")
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 9678468 and dc7a477.

⛔ Files ignored due to path filters (1)
  • go.sum is excluded by !**/*.sum
📒 Files selected for processing (6)
  • go.mod (1 hunks)
  • pkg/consts/preflight.go (1 hunks)
  • pkg/local/preflight/checker.go (3 hunks)
  • pkg/remote/preflight/checker.go (1 hunks)
  • vendor/github.com/longhorn/go-common-libs/kubernetes/deployment.go (1 hunks)
  • vendor/modules.txt (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (5)
  • go.mod
  • pkg/consts/preflight.go
  • pkg/remote/preflight/checker.go
  • vendor/github.com/longhorn/go-common-libs/kubernetes/deployment.go
  • vendor/modules.txt
🔇 Additional comments (2)
pkg/local/preflight/checker.go (2)

56-64: LGTM! Clean and straightforward client initialization.

The simplified Kubernetes client initialization with proper error handling looks good.


180-180: LGTM! Good placement of CoreDNS check.

The CoreDNS check is appropriately placed before OS-specific checks.

@roger-ryao
Copy link

Hi @c3y1huang
I have a question: should we change the number of replicas for the CoreDNS deployment in the cluster after running the install preflight? One concern I have is that the default number of replicas for the CoreDNS deployment is 1, but the user's original cluster might have been changed to 3. However, after running Longhorn cli, it might become 2, which could lead to issues.

@c3y1huang
Copy link
Contributor Author

c3y1huang commented Nov 14, 2024

Hi @c3y1huang I have a question: should we change the number of replicas for the CoreDNS deployment in the cluster after running the install preflight? One concern I have is that the default number of replicas for the CoreDNS deployment is 1, but the user's original cluster might have been changed to 3. However, after running Longhorn cli, it might become 2, which could lead to issues.

This PR is proposing to add the CoreDNS check to the longhornctl check preflight. It doesn't change the CoreDNS setup, as that responsibility should remain with the user, since CoreDNS is not managed by Longhorn.

shuo-wu
shuo-wu previously approved these changes Nov 14, 2024
Copy link
Contributor

@shuo-wu shuo-wu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link

mergify bot commented Nov 17, 2024

This pull request is now in conflict. Could you fix it @c3y1huang? 🙏

@c3y1huang c3y1huang force-pushed the 9752-dr-volume-faulted-after-node-reboot branch from af34ac2 to 430891e Compare November 28, 2024 01:08
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Outside diff range and nitpick comments (2)
pkg/local/preflight/checker.go (2)

476-476: Fix typo in comment

-// "kube-app: kube-dns" lable from the kubeClient and checks the
+// "kube-app: kube-dns" label from the kubeClient and checks the
🧰 Tools
🪛 GitHub Check: codespell

[failure] 476-476:
lable ==> label, ladle, labile, able


472-511: Consider alternative DNS providers

The function assumes CoreDNS/KubeDNS is the only DNS provider. Consider:

  1. Adding support for other DNS providers
  2. Making the check optional or configurable
  3. Adding documentation about supported DNS configurations

This is particularly important as the PR objectives mention this check is part of fixing DR volume reattachment issues, which might occur with different DNS setups.

🧰 Tools
🪛 GitHub Check: codespell

[failure] 476-476:
lable ==> label, ladle, labile, able

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between af34ac2 and 430891e.

⛔ Files ignored due to path filters (1)
  • go.sum is excluded by !**/*.sum
📒 Files selected for processing (7)
  • go.mod (1 hunks)
  • pkg/consts/preflight.go (1 hunks)
  • pkg/local/preflight/checker.go (3 hunks)
  • pkg/remote/preflight/checker.go (1 hunks)
  • vendor/github.com/longhorn/go-common-libs/kubernetes/deployment.go (2 hunks)
  • vendor/github.com/longhorn/go-common-libs/kubernetes/kubernetes.go (2 hunks)
  • vendor/modules.txt (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (4)
  • go.mod
  • pkg/consts/preflight.go
  • pkg/remote/preflight/checker.go
  • vendor/modules.txt
🧰 Additional context used
🪛 GitHub Check: codespell
pkg/local/preflight/checker.go

[failure] 476-476:
lable ==> label, ladle, labile, able

🔇 Additional comments (9)
vendor/github.com/longhorn/go-common-libs/kubernetes/kubernetes.go (3)

6-7: LGTM: Import statements are correctly added

The new imports for metav1 and labels are properly organized and necessary for the label selector functionality.


29-33: LGTM: Clean implementation of label selector creation

The function is well-implemented, using standard Kubernetes API methods and providing a clean interface for creating label selectors.


29-33: Consider upstream contribution instead of maintaining a fork

This change is currently implemented in a forked version of go-common-libs (c3y1huang/go-common-libs instead of longhorn/go-common-libs). Consider contributing this utility function back to the upstream repository to avoid maintaining a separate fork.

Let's verify if this function exists in the upstream repository:

✅ Verification successful

Let me verify the PR details to make a more informed decision.


Let me check the actual content of the PR to verify if it contains the same implementation.


PR #87 already contains this implementation

The exact same NewLabelSelectorFromMap function implementation is already being contributed upstream through PR #87 (longhorn/go-common-libs#87) by the same author. The PR is currently open and under review. No need to create a separate contribution.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check if similar functionality exists in the upstream repository
# and if there are any existing PRs for this feature

# Check if the function exists in the upstream repository
gh repo clone longhorn/go-common-libs /tmp/upstream-lib
cd /tmp/upstream-lib
git grep -l "NewLabelSelectorFromMap"

# Check for any related open PRs
gh pr list -R longhorn/go-common-libs --search "NewLabelSelectorFromMap"

Length of output: 394


Script:

#!/bin/bash
# Check the PR content to see if it's the same implementation
gh pr view 87 -R longhorn/go-common-libs

Length of output: 889


Script:

#!/bin/bash
# Get the PR diff to check the implementation
gh pr diff 87 -R longhorn/go-common-libs | grep -A 10 "NewLabelSelectorFromMap"

Length of output: 648

vendor/github.com/longhorn/go-common-libs/kubernetes/deployment.go (3)

10-10: LGTM: Import addition is appropriate

The added import for labels package is necessary for the new label selector functionality.


Line range hint 15-23: Consider improving context handling and input validation

The existing implementation could be improved as previously suggested:

  1. Accept a context parameter instead of using context.Background()
  2. Add validation for input parameters
  3. Add proper error handling for nil client

This is especially important as this function may be used in the CoreDNS check implementation.


26-44: Verify usage in CoreDNS check implementation

Since these functions are being added as part of implementing the CoreDNS check feature, we should verify their usage pattern.

✅ Verification successful

Based on the search results, I can now generate the final response:

Functions are used correctly in CoreDNS check implementation

The ListDeployments function is used appropriately in pkg/local/preflight/checker.go to list CoreDNS deployments in the kube-system namespace. The search results show no other direct callers of these functions in the codebase, and the implementation aligns with the CoreDNS check requirements.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check how these functions are used in the CoreDNS check implementation
echo "Searching for usage of GetDeployment and ListDeployments in CoreDNS check..."
rg -A 5 "GetDeployment|ListDeployments" --type go

# Check if there are any other callers that would need to be updated
echo "Checking for other potential callers..."
ast-grep --pattern 'func $name($$$) {
  $$$
  GetDeployment($$$)
  $$$
}'

ast-grep --pattern 'func $name($$$) {
  $$$
  ListDeployments($$$)
  $$$
}'

Length of output: 4143

pkg/local/preflight/checker.go (3)

56-64: LGTM: Clean Kubernetes client initialization

The implementation follows good practices with proper error handling and clear error messages.


180-180: LGTM: Consistent with existing check pattern

The integration of checkKubeDNS() follows the same pattern as other checks in the Run method.


500-503: 🛠️ Refactor suggestion

Remove early return to ensure complete check

The early return after checking replica count prevents checking the ready replicas when the desired count is insufficient. This could miss reporting cases where both the desired and ready replica counts are problematic.

 if deployment.Spec.Replicas == nil || *deployment.Spec.Replicas < 2 {
   local.collection.Log.Warn = append(local.collection.Log.Warn, fmt.Sprintf("Kube DNS %q is set with fewer than 2 replicas; consider increasing replica count for high availability", deployment.Name))
-  return
 }

Likely invalid or redundant comment.

@c3y1huang c3y1huang force-pushed the 9752-dr-volume-faulted-after-node-reboot branch from 430891e to 38b989b Compare November 28, 2024 01:30
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (1)
vendor/github.com/longhorn/go-common-libs/kubernetes/kubernetes.go (1)

29-33: Add documentation and nil check

While the implementation is correct, consider these improvements:

  1. Add godoc comments explaining the function's purpose and parameters
  2. Add nil check for the input map

Consider this implementation:

+// NewLabelSelectorFromMap creates a labels.Selector from a map of label key-value pairs.
+// It's used to filter Kubernetes resources based on their labels.
+// Parameters:
+//   - labels: A map of label key-value pairs
+// Returns:
+//   - labels.Selector: A selector that can be used to filter resources
+//   - error: An error if the selector creation fails
 func NewLabelSelectorFromMap(labels map[string]string) (labels.Selector, error) {
+    if labels == nil {
+        return nil, errors.New("labels map cannot be nil")
+    }
     return metav1.LabelSelectorAsSelector(&metav1.LabelSelector{
         MatchLabels: labels,
     })
 }
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 430891e and 38b989b.

⛔ Files ignored due to path filters (1)
  • go.sum is excluded by !**/*.sum
📒 Files selected for processing (7)
  • go.mod (1 hunks)
  • pkg/consts/preflight.go (1 hunks)
  • pkg/local/preflight/checker.go (3 hunks)
  • pkg/remote/preflight/checker.go (1 hunks)
  • vendor/github.com/longhorn/go-common-libs/kubernetes/deployment.go (2 hunks)
  • vendor/github.com/longhorn/go-common-libs/kubernetes/kubernetes.go (2 hunks)
  • vendor/modules.txt (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (6)
  • go.mod
  • pkg/consts/preflight.go
  • pkg/local/preflight/checker.go
  • pkg/remote/preflight/checker.go
  • vendor/github.com/longhorn/go-common-libs/kubernetes/deployment.go
  • vendor/modules.txt
🔇 Additional comments (2)
vendor/github.com/longhorn/go-common-libs/kubernetes/kubernetes.go (2)

6-7: LGTM: Import statements are correctly added

The new imports are necessary for the label selector functionality and follow Go conventions.


Line range hint 1-33: Verify implications of using forked dependency

The AI summary indicates this file is from a forked repository (c3y1huang/go-common-libs instead of longhorn/go-common-libs). This change requires careful consideration:

  1. Ensure the fork is maintained and synchronized with upstream
  2. Verify the security implications of using a forked dependency
  3. Document the reason for using the fork

@c3y1huang
Copy link
Contributor Author

Test Results:

  • RKE2:
    INFO[2024-11-28T09:34:33+08:00] Initializing preflight checker               
    INFO[2024-11-28T09:34:33+08:00] Cleaning up preflight checker                
    INFO[2024-11-28T09:34:33+08:00] Running preflight checker                    
    INFO[2024-11-28T09:34:45+08:00] Retrieved preflight checker result:
    ip-10-0-2-111:
      info:
      - Kube DNS "rke2-coredns-rke2-coredns" is set with 2 replicas and 2 ready replicas
      - Service iscsid is running
      - NFS4 is supported
      - Package nfs-client is installed
      - Package open-iscsi is installed
      - Package cryptsetup is installed
      - Module dm_crypt is loaded
    ip-10-0-2-112:
      info:
      - Kube DNS "rke2-coredns-rke2-coredns" is set with 2 replicas and 2 ready replicas
      - Service iscsid is running
      - NFS4 is supported
      - Package nfs-client is installed
      - Package open-iscsi is installed
      - Package cryptsetup is installed
      - Module dm_crypt is loaded
    ip-10-0-2-119:
      info:
      - Kube DNS "rke2-coredns-rke2-coredns" is set with 2 replicas and 2 ready replicas
      - Service iscsid is running
      - NFS4 is supported
      - Package nfs-client is installed
      - Package open-iscsi is installed
      - Package cryptsetup is installed
      - Module dm_crypt is loaded 
    INFO[2024-11-28T09:34:45+08:00] Cleaning up preflight checker                
    INFO[2024-11-28T09:34:45+08:00] Completed preflight checker
    
  • K3s:
    INFO[2024-11-28T09:49:13+08:00] Initializing preflight checker               
    INFO[2024-11-28T09:49:13+08:00] Cleaning up preflight checker                
    INFO[2024-11-28T09:49:13+08:00] Running preflight checker                    
    INFO[2024-11-28T09:49:17+08:00] Retrieved preflight checker result:
    ip-10-0-2-42:
      error:
      - 'Module dm_crypt is not loaded: failed to execute: nsenter [--mount=/host/proc/10437/ns/mnt --net=/host/proc/10437/ns/net grep dm_crypt /proc/modules], output , stderr : exit status 1'
      info:
      - Kube DNS "coredns" is set with 2 replicas and 2 ready replicas
      - Service iscsid is running
      - NFS4 is supported
      - Package nfs-client is installed
      - Package open-iscsi is installed
      - Package cryptsetup is installed
    ip-10-0-2-182:
      error:
      - 'Module dm_crypt is not loaded: failed to execute: nsenter [--mount=/host/proc/10201/ns/mnt --net=/host/proc/10201/ns/net grep dm_crypt /proc/modules], output , stderr : exit status 1'
      info:
      - Kube DNS "coredns" is set with 2 replicas and 2 ready replicas
      - Service iscsid is running
      - NFS4 is supported
      - Package nfs-client is installed
      - Package open-iscsi is installed
      - Package cryptsetup is installed
    ip-10-0-2-228:
      error:
      - 'Module dm_crypt is not loaded: failed to execute: nsenter [--mount=/host/proc/11666/ns/mnt --net=/host/proc/11666/ns/net grep dm_crypt /proc/modules], output , stderr : exit status 1'
      info:
      - Kube DNS "coredns" is set with 2 replicas and 2 ready replicas
      - Service iscsid is running
      - NFS4 is supported
      - Package nfs-client is installed
      - Package open-iscsi is installed
      - Package cryptsetup is installed 
    INFO[2024-11-28T09:49:17+08:00] Cleaning up preflight checker                
    INFO[2024-11-28T09:49:17+08:00] Completed preflight checker
    

longhorn/longhorn-9752

Signed-off-by: Chin-Ya Huang <[email protected]>
longhorn/longhorn-9752

Signed-off-by: Chin-Ya Huang <[email protected]>
longhorn/longhorn-9752

Signed-off-by: Chin-Ya Huang <[email protected]>
@c3y1huang c3y1huang force-pushed the 9752-dr-volume-faulted-after-node-reboot branch from 38b989b to 1d91c76 Compare November 28, 2024 02:33
@c3y1huang c3y1huang requested a review from derekbit November 28, 2024 02:34
Copy link
Member

@derekbit derekbit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thank you @c3y1huang

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (2)
pkg/local/preflight/checker.go (2)

483-483: Rename function to match the component being checked

The function name checkKubeDNS doesn't match the actual component being checked (CoreDNS). Consider renaming for clarity.

-func (local *Checker) checkKubeDNS() {
+func (local *Checker) checkCoreDNS() {

484-509: Standardize terminology in log messages

The log messages inconsistently use "CoreDNS" and "Kube DNS". Standardize on "CoreDNS" throughout the function for consistency.

-	logrus.Info("Checking if CoreDNS has multiple replicas")
+	logrus.Info("Checking if CoreDNS deployment has multiple replicas")
 
 	deployments, err := commonkube.ListDeployments(local.kubeClient, metav1.NamespaceSystem, map[string]string{consts.KubeAppLabel: consts.KubeAppValueDNS})
 	if err != nil {
-		local.collection.Log.Error = append(local.collection.Log.Error, fmt.Sprintf("Failed to list Kube DNS with label %s=%s: %v", consts.KubeAppLabel, consts.KubeAppValueDNS, err))
+		local.collection.Log.Error = append(local.collection.Log.Error, fmt.Sprintf("Failed to list CoreDNS deployment with label %s=%s: %v", consts.KubeAppLabel, consts.KubeAppValueDNS, err))
 		return
 	}
 
 	if len(deployments.Items) != 1 {
-		local.collection.Log.Warn = append(local.collection.Log.Warn, fmt.Sprintf("Found %d deployments with label %s=%s; expected 1", len(deployments.Items), consts.KubeAppLabel, consts.KubeAppValueDNS))
+		local.collection.Log.Warn = append(local.collection.Log.Warn, fmt.Sprintf("Found %d CoreDNS deployments with label %s=%s; expected 1", len(deployments.Items), consts.KubeAppLabel, consts.KubeAppValueDNS))
 		return
 	}
 
 	deployment := deployments.Items[0]
 
 	if deployment.Spec.Replicas == nil || *deployment.Spec.Replicas < 2 {
-		local.collection.Log.Warn = append(local.collection.Log.Warn, fmt.Sprintf("Kube DNS %q is set with fewer than 2 replicas; consider increasing replica count for high availability", deployment.Name))
+		local.collection.Log.Warn = append(local.collection.Log.Warn, fmt.Sprintf("CoreDNS deployment %q is set with fewer than 2 replicas; consider increasing replica count for high availability", deployment.Name))
 	}
 
 	if deployment.Status.ReadyReplicas < 2 {
-		local.collection.Log.Warn = append(local.collection.Log.Warn, fmt.Sprintf("Kube DNS %q has fewer than 2 ready replicas; some replicas may not be running or ready", deployment.Name))
+		local.collection.Log.Warn = append(local.collection.Log.Warn, fmt.Sprintf("CoreDNS deployment %q has fewer than 2 ready replicas; some replicas may not be running or ready", deployment.Name))
 	}
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 38b989b and 1d91c76.

⛔ Files ignored due to path filters (1)
  • go.sum is excluded by !**/*.sum
📒 Files selected for processing (7)
  • go.mod (1 hunks)
  • pkg/consts/preflight.go (1 hunks)
  • pkg/local/preflight/checker.go (3 hunks)
  • pkg/remote/preflight/checker.go (1 hunks)
  • vendor/github.com/longhorn/go-common-libs/kubernetes/deployment.go (2 hunks)
  • vendor/github.com/longhorn/go-common-libs/kubernetes/kubernetes.go (2 hunks)
  • vendor/modules.txt (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (5)
  • go.mod
  • pkg/consts/preflight.go
  • pkg/remote/preflight/checker.go
  • vendor/github.com/longhorn/go-common-libs/kubernetes/kubernetes.go
  • vendor/modules.txt
🔇 Additional comments (6)
vendor/github.com/longhorn/go-common-libs/kubernetes/deployment.go (2)

8-9: LGTM!

The added import for labels package is necessary for the new functionality and follows Go import conventions.


37-55: 🛠️ Refactor suggestion

Improve robustness of ListDeployments implementation

The implementation needs the following improvements:

  1. Accept a context parameter instead of using context.Background()
  2. Add input validation for kubeClient and namespace

Apply this diff to improve the implementation:

-func ListDeployments(kubeClient kubeclient.Interface, namespace string, labelSelectors map[string]string) (*appsv1.DeploymentList, error) {
+func ListDeployments(ctx context.Context, kubeClient kubeclient.Interface, namespace string, labelSelectors map[string]string) (*appsv1.DeploymentList, error) {
+	if kubeClient == nil {
+		return nil, fmt.Errorf("kubeClient cannot be nil")
+	}
+	if namespace == "" {
+		return nil, fmt.Errorf("namespace cannot be empty")
+	}
+
 	log := logrus.WithFields(logrus.Fields{
 		"kind":           "Deployment",
 		"namespace":      namespace,
 		"labelSelectors": labelSelectors,
 	})
 	log.Trace("Listing resources")

 	var err error
 	selector := labels.Everything()
 	if len(labelSelectors) != 0 {
 		selector, err = NewLabelSelectorFromMap(labelSelectors)
 		if err != nil {
 			return nil, err
 		}
 	}

-	return kubeClient.AppsV1().Deployments(namespace).List(context.Background(), metav1.ListOptions{LabelSelector: selector.String()})
+	return kubeClient.AppsV1().Deployments(namespace).List(ctx, metav1.ListOptions{LabelSelector: selector.String()})
}
pkg/local/preflight/checker.go (4)

56-64: LGTM: Kubernetes client initialization looks good!

The initialization follows best practices with proper error handling and wrapping.


180-180: LGTM: Appropriate placement of DNS check!

The DNS check is correctly placed before OS-specific checks as it's a critical, OS-independent verification.


486-490: ⚠️ Potential issue

Enhance DNS deployment detection for different Kubernetes distributions

As noted in previous discussions, the current label selector might not work with all Kubernetes distributions (e.g., RKE2). Consider implementing a more flexible approach.

-	deployments, err := commonkube.ListDeployments(local.kubeClient, metav1.NamespaceSystem, map[string]string{consts.KubeAppLabel: consts.KubeAppValueDNS})
+	// Try different known label combinations
+	labelSelectors := []map[string]string{
+		{consts.KubeAppLabel: consts.KubeAppValueDNS},
+		{"app": "coredns"},
+		{"k8s-app": "coredns"},
+	}
+	
+	var deployments *appsv1.DeploymentList
+	var err error
+	
+	for _, selector := range labelSelectors {
+		deployments, err = commonkube.ListDeployments(local.kubeClient, metav1.NamespaceSystem, selector)
+		if err == nil && len(deployments.Items) > 0 {
+			break
+		}
+	}

499-502: 🛠️ Refactor suggestion

Remove early return to ensure complete status check

The early return after checking desired replicas prevents checking the ready replicas status, which could provide additional useful information even when the desired replica count is insufficient.

 	if deployment.Spec.Replicas == nil || *deployment.Spec.Replicas < 2 {
 		local.collection.Log.Warn = append(local.collection.Log.Warn, fmt.Sprintf("Kube DNS %q is set with fewer than 2 replicas; consider increasing replica count for high availability", deployment.Name))
-		return
 	}

@derekbit derekbit merged commit 5b54d29 into longhorn:master Nov 28, 2024
7 checks passed
@c3y1huang c3y1huang deleted the 9752-dr-volume-faulted-after-node-reboot branch November 28, 2024 03:33
@c3y1huang
Copy link
Contributor Author

@mergify backport v1.7.x

Copy link

mergify bot commented Nov 28, 2024

backport v1.7.x

✅ Backports have been created

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants