-
Notifications
You must be signed in to change notification settings - Fork 153
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: v2 data engine live upgrade #3282
base: master
Are you sure you want to change the base?
Conversation
Note Reviews pausedUse the following commands to manage reviews:
WalkthroughThe pull request introduces significant enhancements to the Longhorn project, focusing on the management of data engine upgrades and related resources. Key modifications include the addition of new controllers, methods, and data structures to facilitate the upgrade process for both data engines and nodes. The changes also involve updates to existing methods for better handling of backup responsibilities and error management. Overall, the enhancements aim to improve the robustness and clarity of the Longhorn storage system, particularly in scenarios involving upgrades and instance management. Changes
Assessment against linked issues
Possibly related PRs
Suggested reviewers
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
This pull request is now in conflict. Could you fix it @derekbit? 🙏 |
3a5c514
to
393807e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 39
🧹 Outside diff range and nitpick comments (67)
webhook/resources/dataengineupgrademanager/mutator.go (2)
49-51
: Enhance type assertion error message.Include the actual type in the error message for better debugging.
- return nil, werror.NewInvalidError(fmt.Sprintf("%v is not a *longhorn.DataEngineUpgradeManager", newObj), "") + return nil, werror.NewInvalidError(fmt.Sprintf("%v is not a *longhorn.DataEngineUpgradeManager (got %T)", newObj, newObj), "")
54-62
: Improve error context for label operations.The error handling could be more specific about what failed during the label operation.
- err := errors.Wrapf(err, "failed to get label patch for upgradeManager %v", upgradeManager.Name) + err := errors.Wrapf(err, "failed to get label patch for upgradeManager %v: labels=%v", upgradeManager.Name, longhornLabels)webhook/resources/nodedataengineupgrade/mutator.go (3)
21-28
: Add nil check for datastore parameterConsider adding validation for the datastore parameter to prevent potential nil pointer dereferences.
func NewMutator(ds *datastore.DataStore) admission.Mutator { + if ds == nil { + panic("nil datastore") + } return &nodeDataEngineUpgradeMutator{ds: ds} }
43-45
: Consider removing unused datastore fieldThe
ds
field in the struct is currently unused. If it's not needed for future operations, consider removing it.
47-74
: Enhance error handling and maintainabilityConsider the following improvements:
- Use more specific error messages in type assertion
- Consider extracting operation names into constants
- Be consistent with error wrapping (some use
errors.Wrapf
, others use string formatting)- return nil, werror.NewInvalidError(fmt.Sprintf("%v is not a *longhorn.NodeDataEngineUpgrade", newObj), "") + return nil, werror.NewInvalidError(fmt.Sprintf("expected *longhorn.NodeDataEngineUpgrade but got %T", newObj), "") - err := errors.Wrapf(err, "failed to get label patch for nodeUpgrade %v", nodeUpgrade.Name) + err = errors.Wrapf(err, "failed to get label patch for nodeUpgrade %v", nodeUpgrade.Name)k8s/pkg/apis/longhorn/v1beta2/dataengineupgrademanager.go (1)
21-35
: Consider adding validation for InstanceManagerImageThe status structure is well-designed for tracking upgrades across nodes. However, consider adding validation for the
InstanceManagerImage
field to ensure it follows the expected format (e.g., valid container image reference).engineapi/instance_manager_test.go (2)
5-71
: Consider adding more test cases for better coverage.While the existing test cases cover basic scenarios, consider adding these additional cases to improve coverage:
- Empty replica addresses map
- Nil replica addresses map
- Case where initiator IP matches target IP
- Edge cases for port numbers (0, 65535, invalid ports)
Example additional test cases:
tests := []struct { // ... existing fields }{ // ... existing test cases + { + name: "Empty replica addresses", + replicaAddresses: map[string]string{}, + initiatorAddress: "192.168.1.3:9502", + targetAddress: "192.168.1.3:9502", + expected: map[string]string{}, + expectError: false, + }, + { + name: "Nil replica addresses", + replicaAddresses: nil, + initiatorAddress: "192.168.1.3:9502", + targetAddress: "192.168.1.3:9502", + expected: nil, + expectError: true, + }, + { + name: "Invalid port number", + replicaAddresses: map[string]string{ + "replica1": "192.168.1.1:65536", + }, + initiatorAddress: "192.168.1.3:9502", + targetAddress: "192.168.1.3:9502", + expected: nil, + expectError: true, + }, }
73-84
: Enhance error messages and add documentation.While the test execution is correct, consider these improvements:
- Make error messages more descriptive by including the test case name
- Document that no cleanup is required
Apply this diff to improve the error messages:
t.Run(tt.name, func(t *testing.T) { result, err := getReplicaAddresses(tt.replicaAddresses, tt.initiatorAddress, tt.targetAddress) if (err != nil) != tt.expectError { - t.Errorf("expected error: %v, got: %v", tt.expectError, err) + t.Errorf("%s: expected error: %v, got: %v", tt.name, tt.expectError, err) } if !tt.expectError && !equalMaps(result, tt.expected) { - t.Errorf("expected: %v, got: %v", tt.expected, result) + t.Errorf("%s: expected addresses: %v, got: %v", tt.name, tt.expected, result) } })k8s/pkg/apis/longhorn/v1beta2/nodedataengineupgrade.go (4)
5-21
: Add documentation for upgrade states and workflow.The
UpgradeState
type defines a comprehensive set of states, but lacks documentation explaining:
- The purpose and conditions for each state
- The expected state transitions/workflow
- Any timeout or error handling considerations
This documentation is crucial for maintainers and users implementing the upgrade logic.
Add documentation like this:
type UpgradeState string const ( + // UpgradeStateUndefined indicates the upgrade state hasn't been set UpgradeStateUndefined = UpgradeState("") + // UpgradeStatePending indicates the upgrade is queued but not started UpgradeStatePending = UpgradeState("pending") // ... document remaining states ... )
35-40
: Consider adding fields for better observability.The current status structure could be enhanced with additional fields useful for troubleshooting:
- LastTransitionTime: When the current state was entered
- Conditions: Array of conditions following Kubernetes patterns
- Progress: Numerical progress indicator
Example enhancement:
type VolumeUpgradeStatus struct { // +optional State UpgradeState `json:"state"` // +optional Message string `json:"message"` + // +optional + LastTransitionTime *metav1.Time `json:"lastTransitionTime,omitempty"` + // +optional + Progress int32 `json:"progress,omitempty"` }
42-52
: Add standard Kubernetes status fields.Consider adding standard Kubernetes status fields for better integration:
- Conditions array following Kubernetes patterns
- ObservedGeneration for tracking spec changes
Example enhancement:
type NodeDataEngineUpgradeStatus struct { + // +optional + Conditions []metav1.Condition `json:"conditions,omitempty"` + // +optional + ObservedGeneration int64 `json:"observedGeneration,omitempty"` // ... existing fields ... }
54-69
: Add helpful printer columns for observability.Consider adding more printer columns for better operational visibility:
- Age: Standard column for resource age
- Message: Latest status message
Add these printer columns:
// +kubebuilder:printcolumn:name="State",type=string,JSONPath=`.status.state`,description="The current state of the node upgrade process" +// +kubebuilder:printcolumn:name="Age",type="date",JSONPath=".metadata.creationTimestamp" +// +kubebuilder:printcolumn:name="Message",type="string",JSONPath=".status.message"k8s/pkg/client/listers/longhorn/v1beta2/dataengineupgrademanager.go (2)
47-53
: Consider adding type assertion safety checksThe type assertion
m.(*v1beta2.DataEngineUpgradeManager)
could panic if the indexer contains an object of the wrong type. Consider adding a type check:func (s *dataEngineUpgradeManagerLister) List(selector labels.Selector) (ret []*v1beta2.DataEngineUpgradeManager, err error) { err = cache.ListAll(s.indexer, selector, func(m interface{}) { - ret = append(ret, m.(*v1beta2.DataEngineUpgradeManager)) + if obj, ok := m.(*v1beta2.DataEngineUpgradeManager); ok { + ret = append(ret, obj) + } }) return ret, err }
76-82
: Consider adding type assertion safety checks in namespace List methodSimilar to the main List method, the namespace-specific List method could benefit from safer type assertions:
func (s dataEngineUpgradeManagerNamespaceLister) List(selector labels.Selector) (ret []*v1beta2.DataEngineUpgradeManager, err error) { err = cache.ListAllByNamespace(s.indexer, s.namespace, selector, func(m interface{}) { - ret = append(ret, m.(*v1beta2.DataEngineUpgradeManager)) + if obj, ok := m.(*v1beta2.DataEngineUpgradeManager); ok { + ret = append(ret, obj) + } }) return ret, err }k8s/pkg/client/informers/externalversions/longhorn/v1beta2/nodedataengineupgrade.go (1)
35-40
: LGTM: Well-defined interface for upgrade monitoringThe
NodeDataEngineUpgradeInformer
interface provides a clean separation between the informer and lister functionalities, which is essential for implementing the live upgrade feature mentioned in the PR objectives.This interface will be crucial for:
- Watching upgrade status changes in real-time
- Maintaining consistency during the upgrade process
- Enabling rollback capabilities if needed
k8s/pkg/client/clientset/versioned/typed/longhorn/v1beta2/fake/fake_dataengineupgrademanager.go (2)
63-73
: Consider handling potential errors from ExtractFromListOptions.The label extraction ignores potential errors from
ExtractFromListOptions
. While this is common in fake implementations, consider handling these errors for more robust testing scenarios.- label, _, _ := testing.ExtractFromListOptions(opts) + label, _, err := testing.ExtractFromListOptions(opts) + if err != nil { + return nil, err + }
105-107
: Enhance UpdateStatus method documentation.The comment about
+genclient:noStatus
could be more descriptive. Consider clarifying that this is a generated method for handling status updates of DataEngineUpgradeManager resources, which is crucial for tracking upgrade progress.-// UpdateStatus was generated because the type contains a Status member. -// Add a +genclient:noStatus comment above the type to avoid generating UpdateStatus(). +// UpdateStatus updates the Status subresource of DataEngineUpgradeManager. +// This method is auto-generated due to the status field in the resource spec. +// To disable generation, add +genclient:noStatus to the type definition.k8s/pkg/apis/longhorn/v1beta2/node.go (1)
149-151
: Enhance field documentation while implementation looks good.The field implementation is correct and follows Kubernetes API conventions. Consider enhancing the documentation to provide more context:
- // Request to upgrade the instance manager for v2 volumes on the node. + // Request to upgrade the instance manager for v2 volumes on the node. + // When set to true, the node controller will initiate the data engine upgrade process. + // This field should be set to false once the upgrade is complete.k8s/pkg/apis/longhorn/v1beta2/instancemanager.go (4)
95-96
: Add documentation and validation forTargetNodeID
The new
TargetNodeID
field lacks:
- Documentation comments explaining its purpose and usage
- Validation rules to ensure valid node IDs are provided
Consider adding:
- A comment explaining when and how this field is used during upgrades
- Validation using kubebuilder tags (e.g., for length or format)
115-119
: Add documentation for target-related network fieldsPlease add documentation comments for the new network-related fields:
TargetIP
StorageTargetIP
TargetPort
These fields seem crucial for upgrade coordination and their purpose should be clearly documented.
129-132
: Consider using an enum for instance replacement stateThe boolean
TargetInstanceReplacementCreated
suggests a binary state. Consider using an enum instead to allow for more states in the future (e.g., "pending", "in_progress", "completed", "failed").
154-156
: Clarify the purpose and relationship of standby portsThe new standby port fields need:
- Documentation comments explaining their purpose
- Clarification on how they relate to the existing
TargetPortStart/End
- Architectural explanation of the standby mechanism in the upgrade process
controller/controller_manager.go (1)
226-227
: LGTM! Consider adding comments for better documentation.The controller execution follows the established pattern, using the same worker count and shutdown channel as other controllers.
Consider adding comments to document these new controllers, similar to the "Start goroutines for Longhorn controllers" comment above:
go volumeCloneController.Run(Workers, stopCh) go volumeExpansionController.Run(Workers, stopCh) + // Start goroutines for data engine upgrade controllers go dataEngineUpgradeManagerController.Run(Workers, stopCh) go nodeDataEngineUpgradeController.Run(Workers, stopCh)
k8s/pkg/apis/longhorn/v1beta2/volume.go (1)
305-306
: Consider documenting the node targeting lifecycleThe addition of these fields introduces another dimension to node targeting in Longhorn. To ensure maintainability and prevent confusion:
- Consider adding a comment block in the Volume type documentation explaining the relationship and lifecycle of all node-related fields:
- NodeID vs TargetNodeID
- CurrentNodeID vs CurrentTargetNodeID
- MigrationNodeID vs CurrentMigrationNodeID
- Document the state transitions during the upgrade process
- Consider adding validation rules to prevent conflicting node assignments
Would you like me to help draft the documentation for the node targeting lifecycle?
Also applies to: 358-359
datastore/datastore.go (1)
48-96
: Consider grouping related fields togetherWhile the implementation is correct, consider grouping the data engine upgrade related fields with other engine-related fields for better code organization. This would improve the readability and maintainability of the code.
Consider reordering the fields to group them with other engine-related fields:
engineLister lhlisters.EngineLister EngineInformer cache.SharedInformer + dataEngineUpgradeManagerLister lhlisters.DataEngineUpgradeManagerLister + DataEngineUpgradeManagerInformer cache.SharedInformer + nodeDataEngineUpgradeLister lhlisters.NodeDataEngineUpgradeLister + NodeDataEngineUpgradeInformer cache.SharedInformer replicaLister lhlisters.ReplicaLister ReplicaInformer cache.SharedInformer - dataEngineUpgradeManagerLister lhlisters.DataEngineUpgradeManagerLister - DataEngineUpgradeManagerInformer cache.SharedInformer - nodeDataEngineUpgradeLister lhlisters.NodeDataEngineUpgradeLister - NodeDataEngineUpgradeInformer cache.SharedInformertypes/types.go (1)
1271-1291
: Consider adding validation for empty parameters.The functions look good and follow the established patterns. However, consider adding validation for empty
prefix
andnodeID
parameters inGenerateNodeDataEngineUpgradeName
to prevent potential issues.Apply this diff to add parameter validation:
func GenerateNodeDataEngineUpgradeName(prefix, nodeID string) string { + if prefix == "" || nodeID == "" { + return "" + } return prefix + "-" + nodeID + "-" + util.RandomID() }webhook/resources/dataengineupgrademanager/validator.go (2)
70-72
: Handle order-independent comparison forNodes
fieldUsing
reflect.DeepEqual
to compare theNodes
field may lead to false negatives if the order of nodes differs, even when they contain the same elements. Since the order of nodes is likely insignificant, consider sorting the slices before comparison to ensure an order-independent check.Apply this diff to adjust the comparison:
import ( // Existing imports + "sort" ) func (u *dataEngineUpgradeManagerValidator) Update(request *admission.Request, oldObj runtime.Object, newObj runtime.Object) error { // Existing code + // Sort the Nodes slices before comparison + oldNodes := append([]string{}, oldUpgradeManager.Spec.Nodes...) + newNodes := append([]string{}, newUpgradeManager.Spec.Nodes...) + sort.Strings(oldNodes) + sort.Strings(newNodes) - if !reflect.DeepEqual(oldUpgradeManager.Spec.Nodes, newUpgradeManager.Spec.Nodes) { + if !reflect.DeepEqual(oldNodes, newNodes) { return werror.NewInvalidError("spec.nodes field is immutable", "spec.nodes") } // Existing code }
44-44
: Improve error messages by including the actual type receivedIn the type assertion error messages, including the actual type of the object received can aid in debugging. Modify the error messages to reflect the type.
Apply this diff to enhance the error messages:
- return werror.NewInvalidError(fmt.Sprintf("%v is not a *longhorn.DataEngineUpgradeManager", newObj), "") + return werror.NewInvalidError(fmt.Sprintf("%T is not a *longhorn.DataEngineUpgradeManager", newObj), "")Similarly, update lines 58 and 62:
- return werror.NewInvalidError(fmt.Sprintf("%v is not a *longhorn.DataEngineUpgradeManager", oldObj), "") + return werror.NewInvalidError(fmt.Sprintf("%T is not a *longhorn.DataEngineUpgradeManager", oldObj), "")- return werror.NewInvalidError(fmt.Sprintf("%v is not a *longhorn.DataEngineUpgradeManager", newObj), "") + return werror.NewInvalidError(fmt.Sprintf("%T is not a *longhorn.DataEngineUpgradeManager", newObj), "")Also applies to: 58-58, 62-62
controller/node_upgrade_controller.go (3)
259-268
: Ensure proper deep copy of status volumesWhen copying the
Volumes
map from the monitor status tonodeUpgrade.Status.Volumes
, a shallow copy can lead to unintended side effects.Apply this diff to perform a deep copy of each
VolumeUpgradeStatus
:nodeUpgrade.Status.State = status.State nodeUpgrade.Status.Message = status.Message nodeUpgrade.Status.Volumes = make(map[string]*longhorn.VolumeUpgradeStatus) for k, v := range status.Volumes { - nodeUpgrade.Status.Volumes[k] = &longhorn.VolumeUpgradeStatus{ - State: v.State, - Message: v.Message, - } + nodeUpgrade.Status.Volumes[k] = v.DeepCopy() }This ensures that each
VolumeUpgradeStatus
is independently copied.
128-145
: Correct the use ofmaxRetries
inhandleErr
Assuming
maxRetries
is defined within the controller struct, it should be referenced usinguc.maxRetries
.Apply this diff to correctly reference
maxRetries
:- if uc.queue.NumRequeues(key) < maxRetries { + if uc.queue.NumRequeues(key) < uc.maxRetries { handleReconcileErrorLogging(log, err, "Failed to sync Longhorn nodeDataEngineUpgrade resource") uc.queue.AddRateLimited(key) return }Ensure that
maxRetries
is properly defined as a field in the controller.
86-94
: Add logging toenqueueNodeDataEngineUpgrade
for better traceabilityIncluding logging when enqueueing items can help in debugging and monitoring the controller's workflow.
Apply this diff to add debug logging:
uc.queue.Add(key) + uc.logger.WithField("key", key).Debug("Enqueued NodeDataEngineUpgrade for processing")
This provides visibility into when items are added to the queue.
controller/upgrade_manager_controller.go (5)
55-58
: Address the TODO comment regarding the event recorder wrapperThere is a TODO comment indicating that the wrapper should be removed once all clients have moved to use the clientset. Consider addressing this to clean up the codebase if appropriate.
Do you want me to help remove the wrapper and update the code accordingly?
204-206
: Add nil check before closing the monitorIn
reconcile
, there is a potential nil pointer dereference ifuc.dataEngineUpgradeManagerMonitor
is alreadynil
.Add a nil check to ensure safety:
if uc.dataEngineUpgradeManagerMonitor != nil { uc.dataEngineUpgradeManagerMonitor.Close() uc.dataEngineUpgradeManagerMonitor = nil }
217-227
: Optimize status update comparisonUsing
reflect.DeepEqual
to compare the entire status can be inefficient. This may impact performance, especially with large structs.Consider comparing specific fields that are expected to change or use a hash to detect changes more efficiently.
175-181
: Simplify error handling inreconcile
The error handling logic can be streamlined for better readability.
Refactor the error handling as follows:
upgradeManager, err := uc.ds.GetDataEngineUpgradeManager(upgradeManagerName) if err != nil { if apierrors.IsNotFound(err) { return nil } return err }
48-53
: Consistent parameter ordering in constructorIn
NewDataEngineUpgradeManagerController
, the parameterscontrollerID
andnamespace
are passed at the end. For consistency with other controllers, consider placing these parameters earlier in the argument list.Reorder the parameters for consistency:
func NewDataEngineUpgradeManagerController( logger logrus.FieldLogger, ds *datastore.DataStore, scheme *runtime.Scheme, controllerID string, namespace string, kubeClient clientset.Interface) (*DataEngineUpgradeManagerController, error) {controller/monitor/upgrade_manager_monitor.go (3)
26-27
: Typographical error in constant namingThe constant
DataEngineUpgradeMonitorMonitorSyncPeriod
has an extra "Monitor" in its name. Consider renaming it toDataEngineUpgradeManagerMonitorSyncPeriod
for clarity and consistency.Apply this diff to fix the naming:
- DataEngineUpgradeMonitorMonitorSyncPeriod = 5 * time.Second + DataEngineUpgradeManagerMonitorSyncPeriod = 5 * time.Second
41-41
: Use conventional naming for cancel functionThe variable
quit
is used for the cancel function returned bycontext.WithCancel
, but it's more conventional to name itcancel
for clarity.Apply this diff to rename the variable:
- ctx, quit := context.WithCancel(context.Background()) + ctx, cancel := context.WithCancel(context.Background())Also, update all references of
quit
tocancel
in the monitor:- m.quit() + m.cancel()
331-331
: Unresolved TODO: Check for untracked node data engine upgradesThere is a
TODO
comment indicating that the code should check if there are anyNodeDataEngineUpgrade
resources in progress but not tracked bym.upgradeManagerStatus.UpgradingNode
. Addressing this is important to ensure that all in-progress upgrades are correctly monitored.Would you like assistance in implementing this logic?
webhook/resources/volume/mutator.go (2)
49-61
: Add unit tests forareAllDefaultInstanceManagersStopped
To ensure the correctness and reliability of the
areAllDefaultInstanceManagersStopped
function, consider adding unit tests that cover various scenarios, such as:
- All default instance managers are stopped.
- Some default instance managers are not stopped.
- Error handling when listing instance managers fails.
This will help prevent regressions and improve maintainability.
63-86
: Add unit tests forgetActiveInstanceManagerImage
Adding unit tests for
getActiveInstanceManagerImage
will help validate its behavior in different scenarios:
- When all default instance managers are stopped, and there is at least one non-default instance manager.
- When all default instance managers are stopped, but there are no non-default instance managers.
- When not all default instance managers are stopped.
This will improve code reliability and ease future maintenance.
webhook/resources/volume/validator.go (6)
104-104
: Clarify the error message for empty engine imageThe error message "BUG: Invalid empty Setting.EngineImage" may confuse users. Consider removing "BUG:" to make it clearer, e.g., "Invalid empty Setting.EngineImage".
131-133
: Ensure consistent error handling by wrapping errorsThe error returned at line 133 is not wrapped with
werror.NewInvalidError
. For consistency with other error returns in the code, consider wrapping the error:return werror.NewInvalidError(err.Error(), "")
144-145
: Verify compatibility check function nameEnsure that the function
CheckDataEngineImageCompatiblityByImage
is correctly named. The word "Compatiblity" appears to be misspelled; it should be "Compatibility".
165-177
: Redundant condition check forvolume.Spec.NodeID
Within the
if volume.Spec.NodeID != ""
block starting at line 165, there is another check forvolume.Spec.NodeID != ""
at line 173. This check is redundant and can be removed.
298-347
: Refactor repeated validation checks into a helper functionMultiple blocks between lines 298-347 perform similar validation checks for different volume specifications when using data engine v2. To improve maintainability and reduce code duplication, consider refactoring these checks into a helper function.
For example, create a function:
func (v *volumeValidator) validateImmutableFieldsForDataEngineV2(oldVolume, newVolume *longhorn.Volume, fieldName string, oldValue, newValue interface{}) error { if !reflect.DeepEqual(oldValue, newValue) { err := fmt.Errorf("changing %s for volume %v is not supported for data engine %v", fieldName, newVolume.Name, newVolume.Spec.DataEngine) return werror.NewInvalidError(err.Error(), "") } return nil }And then use it for each field:
if err := v.validateImmutableFieldsForDataEngineV2(oldVolume, newVolume, "backing image", oldVolume.Spec.BackingImage, newVolume.Spec.BackingImage); err != nil { return err }
408-409
: Typographical error in error messageIn the error message at line 409, "unable to set targetNodeID for volume when the volume is not using data engine v2", the field should be formatted consistently. Consider quoting
spec.targetNodeID
for clarity.- "unable to set targetNodeID for volume when the volume is not using data engine v2" + "unable to set spec.targetNodeID for volume when the volume is not using data engine v2"controller/replica_controller.go (1)
611-613
: Implement the placeholder methods or clarify their future useThe methods
SuspendInstance
,ResumeInstance
,SwitchOverTarget
,DeleteTarget
, andRequireRemoteTargetInstance
currently return default values without any implementation. If these methods are intended for future functionality, consider adding appropriate implementations orTODO
comments to indicate pending work.Would you like assistance in implementing these methods or creating a GitHub issue to track their development?
Also applies to: 615-617, 619-621, 623-625, 627-629
controller/monitor/node_upgrade_monitor.go (1)
220-222
: Consistent error variable naming for clarityThe variable
errList
used here refers to a single error instance. UsingerrList
may imply it contains multiple errors, which can be misleading.Consider renaming
errList
toerr
for consistency:- replicas, errList := m.ds.ListReplicasByNodeRO(nodeUpgrade.Status.OwnerID) - if errList != nil { - err = errors.Wrapf(errList, "failed to list replicas on node %v", nodeUpgrade.Status.OwnerID) + replicas, err := m.ds.ListReplicasByNodeRO(nodeUpgrade.Status.OwnerID) + if err != nil { + err = errors.Wrapf(err, "failed to list replicas on node %v", nodeUpgrade.Status.OwnerID)engineapi/instance_manager.go (1)
532-555
: Improve error messages by including invalid addressesThe error messages in the
getReplicaAddresses
function can be more informative by including the invalid address that caused the error. This will aid in debugging and provide better context.Apply the following diff to enhance the error messages:
- return nil, errors.New("invalid initiator address format") + return nil, fmt.Errorf("invalid initiator address format: %s", initiatorAddress) - return nil, errors.New("invalid target address format") + return nil, fmt.Errorf("invalid target address format: %s", targetAddress) - return nil, errors.New("invalid replica address format") + return nil, fmt.Errorf("invalid replica address format: %s", addr)controller/backup_controller.go (1)
599-602
: Handle Node NotFound error explicitlyIn the
isResponsibleFor
method, if the node resource is not found, the current code returnsfalse, err
. A missing node could signify that the node has been removed from the cluster, and the controller should treat it as not responsible without raising an error.Consider handling the
NotFound
error explicitly:node, err := bc.ds.GetNodeRO(bc.controllerID) if err != nil { + if apierrors.IsNotFound(err) { + return false, nil + } return false, err }controller/instance_handler.go (1)
214-245
: Simplify nested conditional logic for better readabilityThe nested conditional statements between lines 214-245 are complex and may reduce readability. Consider refactoring the code to simplify the logic, which will enhance maintainability and make future modifications easier.
scheduler/replica_scheduler.go (4)
Line range hint
480-480
: Address the TODO comment regarding V2 rebuildingThe TODO comment on line 480 indicates that the code handling for reusing failed replicas during V2 rebuilding is temporary. To ensure clarity and proper tracking, please consider creating an issue or task to remove or update this code once failed replica reuse is supported in V2.
Would you like assistance in creating a GitHub issue to track this TODO?
Line range hint
439-439
: Remove or resolve the 'Investigate' commentThe comment
// Investigate
suggests that further attention is needed for thegetDiskWithMostUsableStorage
function. Leaving such comments can cause confusion. Please either address the underlying issue or remove the comment.Would you like assistance in reviewing this function to address any concerns?
Line range hint
440-444
: Simplify the initialization ofdiskWithMostUsableStorage
The variable
diskWithMostUsableStorage
is initialized with an emptyDisk
struct and then immediately reassigned in the loop. This is unnecessary and could be simplified. Consider initializing it directly from the first element in thedisks
map.Apply this diff to simplify the initialization:
-func (rcs *ReplicaScheduler) getDiskWithMostUsableStorage(disks map[string]*Disk) *Disk { - diskWithMostUsableStorage := &Disk{} - for _, disk := range disks { - diskWithMostUsableStorage = disk - break - } +func (rcs *ReplicaScheduler) getDiskWithMostUsableStorage(disks map[string]*Disk) *Disk { + var diskWithMostUsableStorage *Disk + for _, disk := range disks { + diskWithMostUsableStorage = disk + break }
Line range hint
88-88
: Avoid reusing variable names likemultiError
to prevent confusionThe variable
multiError
is declared multiple times within theFindDiskCandidates
function, which can lead to readability issues and potential bugs due to variable shadowing. Consider renaming these variables or restructuring the code to improve clarity.Also applies to: 111-111
controller/engine_controller.go (6)
435-467
: Improve error handling infindInstanceManagerAndIPs
The function
findInstanceManagerAndIPs
can be enhanced to handle errors more gracefully. Specifically, when retrievingtargetIM
, if an error occurs, providing more context can help in debugging.Consider wrapping the error with additional context:
if e.Spec.TargetNodeID != "" { targetIM, err := ec.ds.GetInstanceManagerByInstanceRO(obj, true) if err != nil { - return nil, "", "", err + return nil, "", "", errors.Wrap(err, "failed to get target instance manager") }
Line range hint
2419-2476
: Enhance error messages inUpgrade
methodIn the
Upgrade
method for DataEngineV2, error messages can be improved to provide more context, especially when an instance is not found or not running.Enhance error messages for clarity:
if _, ok := im.Status.InstanceEngines[e.Name]; !ok { - return fmt.Errorf("target instance %v is not found in engine list", e.Name) + return fmt.Errorf("target instance %v is not found in instance manager %v engine list", e.Name, im.Name) }
Line range hint
2419-2476
: Avoid code duplication in instance existence checksThe code blocks checking for the existence of the initiator and target instances in the
Upgrade
method are nearly identical. Refactoring to a helper function can reduce duplication and improve maintainability.Consider creating a helper function:
func checkInstanceExists(im *longhorn.InstanceManager, e *longhorn.Engine, role string) error { if _, ok := im.Status.InstanceEngines[e.Name]; !ok { return fmt.Errorf("%s instance %v is not found in instance manager %v engine list", role, e.Name, im.Name) } return nil }Then use it:
if err := checkInstanceExists(im, e, "initiator"); err != nil { return err } // ... if err := checkInstanceExists(im, e, "target"); err != nil { return err }
Line range hint
2545-2583
: Clarify logic inisResponsibleFor
methodThe
isResponsibleFor
method contains complex logic that could benefit from additional comments explaining the decision-making process, especially around theisPreferredOwner
,continueToBeOwner
, andrequiresNewOwner
variables.Add comments to improve readability:
// Determine if the current node is the preferred owner and has the data engine available isPreferredOwner := currentNodeDataEngineAvailable && isResponsible // Continue to be the owner if the preferred owner doesn't have the data engine available, but the current owner does continueToBeOwner := currentNodeDataEngineAvailable && !preferredOwnerDataEngineAvailable && ec.controllerID == e.Status.OwnerID // Require new ownership if neither the preferred owner nor the current owner have the data engine, but the current node does requiresNewOwner := currentNodeDataEngineAvailable && !preferredOwnerDataEngineAvailable && !currentOwnerDataEngineAvailable
646-673
: Ensure consistency in error messagesIn the
SuspendInstance
method, error messages use different formats. For instance, some messages start with a lowercase letter, while others start with uppercase. Ensuring consistency improves readability and professionalism.Standardize error messages:
return fmt.Errorf("Invalid object for engine instance suspension: %v", obj) // ... return fmt.Errorf("Suspending engine instance is not supported for data engine %v", e.Spec.DataEngine)
750-760
: Handle potential errors when switching over targetIn
SwitchOverTarget
, after obtaining thetargetIM
andinitiatorIM
, the code proceeds to use their IPs. It would be prudent to check if these IPs are valid (non-empty) before proceeding, to prevent issues with network communication.Add checks for valid IPs:
if targetIM.Status.IP == "" { return fmt.Errorf("target instance manager IP is empty for engine %v", e.Name) } if initiatorIM.Status.IP == "" { return fmt.Errorf("initiator instance manager IP is empty for engine %v", e.Name) }controller/volume_controller.go (4)
1009-1010
: Fix typo in comment for clarityCorrect the typo in the comment. Change "something must wrong" to "something must be wrong".
Apply this diff to fix the typo:
- // r.Spec.Active shouldn't be set for the leftover replicas, something must wrong + // r.Spec.Active shouldn't be set for the leftover replicas; something must be wrong
1834-1834
: Improve grammar in comment for better readabilityModify the comment for clarity. Change "the image of replica is no need to be the same" to "the replica's image does not need to be the same".
Apply this diff to improve the comment:
- // For v2 volume, the image of replica is no need to be the same as the volume image + // For v2 volume, the replica's image does not need to be the same as the volume image
3208-3210
: Remove empty else block to simplify codeThe
else
block at line 3208 is empty and can be removed to improve code clarity.Apply this diff to remove the empty
else
block:} else { - // TODO: what if e.Status.CurrentState != longhorn.InstanceStateRunning } + // TODO: what if e.Status.CurrentState != longhorn.InstanceStateRunning🧰 Tools
🪛 golangci-lint (1.61.0)
3208-3208: SA9003: empty branch
(staticcheck)
🪛 GitHub Check: CodeFactor
[notice] 3208-3208: controller/volume_controller.go#L3208
If block ends with a return statement, so drop this else and outdent its block. (indent-error-flow)
3221-3226
: Simplify code by unwrapping unnecessary else blockSince the
if
block at line 3219 returns, theelse
block is unnecessary and can be unwrapped for better readability.Apply this diff to simplify the code:
if replicaAddressMap, err := c.constructReplicaAddressMap(v, e, rs); err != nil { return nil } - } else { if !reflect.DeepEqual(e.Spec.UpgradedReplicaAddressMap, replicaAddressMap) { e.Spec.UpgradedReplicaAddressMap = replicaAddressMap return nil } - }🧰 Tools
🪛 GitHub Check: CodeFactor
[notice] 3221-3221: controller/volume_controller.go#L3221
If block ends with a return statement, so drop this else and outdent its block (move short variable declaration to its own line if necessary) (indent-error-flow)datastore/longhorn.go (1)
5641-5875
: Recommend adding unit tests for new DataEngineUpgradeManager methodsConsider adding unit tests to cover the new methods added for
DataEngineUpgradeManager
andNodeDataEngineUpgrade
resources to ensure correct functionality and future maintainability.
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
⛔ Files ignored due to path filters (6)
go.mod
is excluded by!go.mod
go.sum
is excluded by!**/*.sum
,!go.sum
vendor/github.com/longhorn/longhorn-instance-manager/pkg/api/instance.go
is excluded by!vendor/**
vendor/github.com/longhorn/types/pkg/generated/imrpc/instance.pb.go
is excluded by!**/*.pb.go
,!**/generated/**
,!vendor/**
vendor/github.com/longhorn/types/pkg/generated/spdkrpc/spdk.pb.go
is excluded by!**/*.pb.go
,!**/generated/**
,!vendor/**
vendor/modules.txt
is excluded by!vendor/**
📒 Files selected for processing (51)
controller/backup_controller.go
(1 hunks)controller/controller_manager.go
(2 hunks)controller/engine_controller.go
(11 hunks)controller/instance_handler.go
(15 hunks)controller/instance_handler_test.go
(4 hunks)controller/monitor/node_upgrade_monitor.go
(1 hunks)controller/monitor/upgrade_manager_monitor.go
(1 hunks)controller/node_controller.go
(2 hunks)controller/node_upgrade_controller.go
(1 hunks)controller/replica_controller.go
(5 hunks)controller/uninstall_controller.go
(4 hunks)controller/upgrade_manager_controller.go
(1 hunks)controller/utils.go
(0 hunks)controller/volume_controller.go
(16 hunks)controller/volume_controller_test.go
(1 hunks)datastore/datastore.go
(3 hunks)datastore/longhorn.go
(6 hunks)engineapi/instance_manager.go
(5 hunks)engineapi/instance_manager_test.go
(1 hunks)k8s/crds.yaml
(84 hunks)k8s/pkg/apis/longhorn/v1beta2/dataengineupgrademanager.go
(1 hunks)k8s/pkg/apis/longhorn/v1beta2/instancemanager.go
(6 hunks)k8s/pkg/apis/longhorn/v1beta2/node.go
(2 hunks)k8s/pkg/apis/longhorn/v1beta2/nodedataengineupgrade.go
(1 hunks)k8s/pkg/apis/longhorn/v1beta2/register.go
(1 hunks)k8s/pkg/apis/longhorn/v1beta2/volume.go
(2 hunks)k8s/pkg/apis/longhorn/v1beta2/zz_generated.deepcopy.go
(4 hunks)k8s/pkg/client/clientset/versioned/typed/longhorn/v1beta2/dataengineupgrademanager.go
(1 hunks)k8s/pkg/client/clientset/versioned/typed/longhorn/v1beta2/fake/fake_dataengineupgrademanager.go
(1 hunks)k8s/pkg/client/clientset/versioned/typed/longhorn/v1beta2/fake/fake_longhorn_client.go
(2 hunks)k8s/pkg/client/clientset/versioned/typed/longhorn/v1beta2/fake/fake_nodedataengineupgrade.go
(1 hunks)k8s/pkg/client/clientset/versioned/typed/longhorn/v1beta2/generated_expansion.go
(2 hunks)k8s/pkg/client/clientset/versioned/typed/longhorn/v1beta2/longhorn_client.go
(3 hunks)k8s/pkg/client/clientset/versioned/typed/longhorn/v1beta2/nodedataengineupgrade.go
(1 hunks)k8s/pkg/client/informers/externalversions/generic.go
(2 hunks)k8s/pkg/client/informers/externalversions/longhorn/v1beta2/dataengineupgrademanager.go
(1 hunks)k8s/pkg/client/informers/externalversions/longhorn/v1beta2/interface.go
(4 hunks)k8s/pkg/client/informers/externalversions/longhorn/v1beta2/nodedataengineupgrade.go
(1 hunks)k8s/pkg/client/listers/longhorn/v1beta2/dataengineupgrademanager.go
(1 hunks)k8s/pkg/client/listers/longhorn/v1beta2/expansion_generated.go
(2 hunks)k8s/pkg/client/listers/longhorn/v1beta2/nodedataengineupgrade.go
(1 hunks)scheduler/replica_scheduler.go
(1 hunks)types/types.go
(4 hunks)webhook/resources/dataengineupgrademanager/mutator.go
(1 hunks)webhook/resources/dataengineupgrademanager/validator.go
(1 hunks)webhook/resources/nodedataengineupgrade/mutator.go
(1 hunks)webhook/resources/nodedataengineupgrade/validator.go
(1 hunks)webhook/resources/volume/mutator.go
(2 hunks)webhook/resources/volume/validator.go
(5 hunks)webhook/server/mutation.go
(2 hunks)webhook/server/validation.go
(2 hunks)
💤 Files with no reviewable changes (1)
- controller/utils.go
✅ Files skipped from review due to trivial changes (4)
- k8s/pkg/client/clientset/versioned/typed/longhorn/v1beta2/dataengineupgrademanager.go
- k8s/pkg/client/clientset/versioned/typed/longhorn/v1beta2/fake/fake_nodedataengineupgrade.go
- k8s/pkg/client/clientset/versioned/typed/longhorn/v1beta2/nodedataengineupgrade.go
- k8s/pkg/client/informers/externalversions/longhorn/v1beta2/dataengineupgrademanager.go
🧰 Additional context used
📓 Learnings (1)
webhook/server/mutation.go (1)
Learnt from: mantissahz
PR: longhorn/longhorn-manager#2182
File: webhook/server/mutation.go:46-46
Timestamp: 2024-11-10T16:45:04.898Z
Learning: In the Longhorn codebase, other custom resource mutators do not implement the `Mutate` method.
🪛 golangci-lint (1.61.0)
controller/uninstall_controller.go
209-209: Error return value of ds.DataEngineUpgradeManagerInformer.AddEventHandler
is not checked
(errcheck)
213-213: Error return value of ds.NodeDataEngineUpgradeInformer.AddEventHandler
is not checked
(errcheck)
controller/volume_controller.go
3208-3208: SA9003: empty branch
(staticcheck)
🪛 GitHub Check: CodeFactor
controller/volume_controller.go
[notice] 3221-3221: controller/volume_controller.go#L3221
If block ends with a return statement, so drop this else and outdent its block (move short variable declaration to its own line if necessary) (indent-error-flow)
[notice] 3208-3208: controller/volume_controller.go#L3208
If block ends with a return statement, so drop this else and outdent its block. (indent-error-flow)
🔇 Additional comments (103)
k8s/pkg/client/clientset/versioned/typed/longhorn/v1beta2/generated_expansion.go (1)
35-36
: LGTM! The new expansion interfaces align with Kubernetes patterns.
The addition of DataEngineUpgradeManagerExpansion
and NodeDataEngineUpgradeExpansion
interfaces follows the standard Kubernetes client-go pattern for extensibility. These interfaces support the v2 data engine live upgrade feature by providing extension points for the new resource types.
Let's verify that the corresponding resource types exist:
Also applies to: 45-46
✅ Verification successful
The expansion interfaces correctly correspond to their CRD types
The verification confirms that both expansion interfaces have their corresponding Custom Resource Definition (CRD) types properly defined in the API package:
DataEngineUpgradeManager
is defined ink8s/pkg/apis/longhorn/v1beta2/dataengineupgrademanager.go
NodeDataEngineUpgrade
is defined ink8s/pkg/apis/longhorn/v1beta2/nodedataengineupgrade.go
Both types follow the standard Kubernetes CRD structure with TypeMeta, ObjectMeta, Spec, and Status fields.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Verify the existence of corresponding CRD types for the new expansion interfaces
# Check for DataEngineUpgradeManager type
rg -A 5 "type DataEngineUpgradeManager struct" k8s/pkg/apis/longhorn/v1beta2/
# Check for NodeDataEngineUpgrade type
rg -A 5 "type NodeDataEngineUpgrade struct" k8s/pkg/apis/longhorn/v1beta2/
Length of output: 1327
k8s/pkg/apis/longhorn/v1beta2/register.go (2)
68-71
: LGTM! The new types follow Kubernetes API conventions.
The addition of DataEngineUpgradeManager
and NodeDataEngineUpgrade
types (and their List counterparts) is well-structured and maintains the alphabetical ordering in the scheme registration.
68-71
: Verify API versioning implications.
Adding new types to the v1beta2 API group is a significant change. Please ensure:
- These changes don't require an API version bump
- Any necessary migration paths are in place
- API backward compatibility is maintained
✅ Verification successful
Let me gather more information about API versioning and compatibility by checking for conversion code and CRD definitions specifically for these new types.
Let me gather more information about the API versioning and conversion setup by checking for conversion webhooks and CRD specifications.
Based on the gathered information, I can now provide a final response:
Adding new types to v1beta2 API is safe with existing conversion infrastructure
The codebase has proper API versioning support in place:
- A conversion webhook service is configured and running (
longhorn-conversion-webhook
) - The webhook is a prerequisite for the manager's readiness
- New types are being added to the existing v1beta2 API group which is the current storage version
No API version bump is needed as:
- These are new types being introduced, not modifications to existing ones
- The conversion webhook infrastructure is already in place to handle future versioning needs
- The changes maintain backward compatibility since they're additive
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check for API version related files and CRD definitions
# Check for API migration code
echo "Checking for API migration code..."
rg -g '!vendor' -l 'conversion.*v1beta'
# Check for CRD definitions
echo "Checking for CRD definitions..."
fd -e yaml -e yml . | xargs rg -l 'DataEngineUpgradeManager\|NodeDataEngineUpgrade'
# Check for version-related comments/docs
echo "Checking for version-related documentation..."
rg -g '!vendor' 'v1beta2.*upgrade'
Length of output: 1648
Script:
#!/bin/bash
# Check for conversion code and API compatibility
# Look for conversion functions or hub/conversion implementations
echo "Checking for conversion implementations..."
rg -g '!vendor' 'func.*Convert.*DataEngineUpgrade'
# Check for CRD definitions of the new types
echo "Checking CRD yaml files..."
fd crd | xargs cat | grep -A 10 "kind: CustomResourceDefinition" | grep -A 10 "longhorn.io"
# Check for any API compatibility comments or documentation
echo "Checking for API compatibility documentation..."
rg -g '!vendor' -B 2 -A 2 'v1beta2.*DataEngineUpgrade'
# Check for hub versions or storage versions
echo "Checking for storage version markers..."
rg -g '!vendor' 'storage.*=.*true' k8s/
Length of output: 61375
Script:
#!/bin/bash
# Check for conversion webhook configurations and CRD specifications
# Look for conversion webhook configurations
echo "Checking for conversion webhook configurations..."
rg -g '!vendor' -B 3 -A 3 'conversion.*webhook'
# Check CRD specifications for storage versions
echo "Checking CRD specifications for storage versions..."
fd -e yaml -e yml . | xargs grep -A 5 "versions:"
# Look for API compatibility comments or documentation
echo "Checking for API compatibility documentation..."
rg -g '!vendor' -B 2 -A 2 'v1beta.*compatibility'
Length of output: 14476
webhook/server/validation.go (2)
13-13
: LGTM: Import additions are well-organized
The new imports for data engine upgrade components follow the project's package structure and are properly organized with other resource imports.
Also applies to: 17-17
52-53
: LGTM: Validator additions follow established patterns
The new validators for data engine upgrades are properly integrated into the validation chain and follow the same initialization pattern as other resource validators.
Let's verify the implementation of these validators:
✅ Verification successful
Let me try a different approach to verify the validator implementations.
Validators are properly implemented with all required interface methods
Both validators follow the correct implementation pattern with:
- Proper struct definitions embedding DefaultValidator
- Complete implementation of admission.Validator interface:
- Resource() method defining the resource names
- Create() method for validation of new resources
- Update() method for validation of resource modifications
- Proper type assertions for their respective types (DataEngineUpgradeManager and NodeDataEngineUpgrade)
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Verify the implementation of the new validators
# Expected: Each validator should implement the required admission.Validator interface methods
# Check for the validator implementations
echo "Checking DataEngineUpgradeManager validator..."
ast-grep --pattern 'type $_ struct {
$$$
}
func (v $_) Resource() admission.Resource {
$$$
}
func (v $_) Create($_) error {
$$$
}
func (v $_) Update($_) error {
$$$
}
func (v $_) Delete($_) error {
$$$
}' webhook/resources/dataengineupgrademanager/validator.go
echo "Checking NodeDataEngineUpgrade validator..."
ast-grep --pattern 'type $_ struct {
$$$
}
func (v $_) Resource() admission.Resource {
$$$
}
func (v $_) Create($_) error {
$$$
}
func (v $_) Update($_) error {
$$$
}
func (v $_) Delete($_) error {
$$$
}' webhook/resources/nodedataengineupgrade/validator.go
Length of output: 750
Script:
#!/bin/bash
# Let's check if these files exist first and then look at their content
echo "Checking for validator files..."
fd validator.go -p webhook/resources/dataengineupgrademanager
fd validator.go -p webhook/resources/nodedataengineupgrade
echo -e "\nChecking DataEngineUpgradeManager validator content..."
rg -A 2 "type.*struct|func.*Resource|func.*Create|func.*Update|func.*Delete" webhook/resources/dataengineupgrademanager/validator.go || true
echo -e "\nChecking NodeDataEngineUpgrade validator content..."
rg -A 2 "type.*struct|func.*Resource|func.*Create|func.*Update|func.*Delete" webhook/resources/nodedataengineupgrade/validator.go || true
Length of output: 2104
webhook/resources/dataengineupgrademanager/mutator.go (4)
21-24
: LGTM! Clean struct definition following composition pattern.
The struct is well-designed, embedding the DefaultMutator and maintaining necessary dependencies.
1-74
: Overall implementation looks solid!
The mutator implementation follows good practices with proper error handling, clear separation of concerns, and follows the webhook pattern. The suggested improvements are minor enhancements to improve debugging and robustness.
47-74
: Consider validating upgradeManager.Name.
The mutate function should validate that upgradeManager.Name is non-empty before using it in error messages.
37-39
: Verify if Update operation should be supported.
Given that this is for live upgrades, should the mutator also handle Update operations for DataEngineUpgradeManager resources?
✅ Verification successful
Based on the gathered information from all the executed scripts, I can now provide a final response:
Update operation should be omitted for this mutating webhook
The current implementation correctly handles only Create operations for the DataEngineUpgradeManager webhook. While there are Update operations in the client code (as seen in the clientset), these are for normal API operations. The mutation webhook specifically focuses on initial creation because:
- The client code shows separate paths for Updates and UpdateStatus operations, which are handled by the controller layer rather than admission webhooks
- The DataEngineUpgradeManagerController exists to manage the lifecycle after creation
- No other webhooks were found handling updates, suggesting this is an intentional design where mutations are only needed during resource creation
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check if Update operations are handled elsewhere or if this is intentional.
# Check for Update operation handlers in related files
rg -A 5 "DataEngineUpgradeManager.*Update"
# Check validator for any Update operation handling
fd -e go -x grep -l "Update.*DataEngineUpgradeManager" {} \;
Length of output: 7308
Script:
#!/bin/bash
# Let's check the validator configuration to understand the complete admission control setup
rg -A 10 "DataEngineUpgradeManager.*validator"
# Also check the CRD definition to understand the intended operations
fd "dataengineupgrademanager" -t f -e yaml | xargs cat
# Check for any upgrade-related logic in controllers
ast-grep --pattern 'type DataEngineUpgradeManagerController struct {
$$$
}'
Length of output: 1311
Script:
#!/bin/bash
# Let's check the controller implementation to understand the upgrade workflow
rg -A 10 "func \(c \*DataEngineUpgradeManagerController\) (Create|Update|Sync|reconcile)"
# Check if there are any other webhooks handling updates
fd -t f -e go -x grep -l "webhook.*DataEngineUpgradeManager" {} \;
# Check for any validation rules in the webhook
rg -A 5 "ValidatingWebhookConfiguration.*DataEngineUpgradeManager"
Length of output: 232
webhook/resources/nodedataengineupgrade/mutator.go (3)
1-19
: LGTM: Well-organized imports and package structure
The imports are properly organized and grouped by external/internal dependencies.
54-62
: Verify label mutation security implications
The code applies labels from GetNodeDataEngineUpgradeLabels
without validation. Verify that these labels cannot be manipulated in ways that could affect system security.
✅ Verification successful
Let me analyze the implementation of GetNodeDataEngineUpgradeLabels
to understand how the labels are constructed.
Let me check the label key constants and how they are used in the system.
Based on the analysis of the code, I can now generate the final response:
Label mutation is safely implemented with controlled values
The GetNodeDataEngineUpgradeLabels
implementation shows that:
- Labels are constructed using a fixed prefix
longhorn.io/
- Label keys are predefined constants
- Label values are derived from controlled inputs (upgradeManagerID and nodeID)
- Base labels include a managed-by field set to a constant control plane name
The label mutation is safe because:
- All label keys are namespaced with the Longhorn prefix
- Values are derived from existing resource fields without transformation
- The implementation follows Kubernetes labeling best practices
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Check how NodeDataEngineUpgrade labels are used in the system
rg -A 5 "GetNodeDataEngineUpgradeLabels"
# Look for label-based security controls
rg -l "selector.*label"
Length of output: 17205
Script:
#!/bin/bash
# Check the implementation of GetBaseLabelsForSystemManagedComponent and related label functions
ast-grep --pattern 'func GetBaseLabelsForSystemManagedComponent() map[string]string {
$$$
}'
# Check label key constants
rg -A 3 "const.*LonghornLabel"
Length of output: 343
Script:
#!/bin/bash
# Check all Longhorn label related constants and their usage
rg "LonghornLabel" types/types.go -A 2
# Check how these labels are validated or used in security contexts
ast-grep --pattern 'func $_($$_) {
$$$
label$$$
$$$
}'
Length of output: 8496
30-41
: Verify if Update/Delete operations should be supported
The mutator currently only supports Create operations. Given this is for live upgrades, verify if Update/Delete operations should also be handled.
✅ Verification successful
Let me check the controller implementation to understand how NodeDataEngineUpgrade resources are handled.
Only Create operations are required for NodeDataEngineUpgrade webhook
The current implementation is correct. The evidence shows that:
- The controller handles status updates internally via
UpdateNodeDataEngineUpgradeStatus
- Status updates are implementation details and don't require webhook validation
- The resource is designed to be immutable after creation, with only its status being updated by the controller
- Delete operations are handled by standard Kubernetes garbage collection
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for any Update/Delete operations on NodeDataEngineUpgrade resources
ast-grep --pattern 'Update($$$)'
ast-grep --pattern 'Delete($$$)'
# Check if there are any controller implementations that might need these operations
rg -l "NodeDataEngineUpgrade.*Controller"
Length of output: 184
Script:
#!/bin/bash
# Check the controller implementation for update/delete operations
rg -A 10 "NodeDataEngineUpgrade.*Controller" controller/node_upgrade_controller.go
# Check for any status updates or deletions in the controller
rg "Update|Delete" controller/node_upgrade_controller.go
# Check the API type definition for any fields that might need updates
ast-grep --pattern 'type NodeDataEngineUpgrade struct {
$$$
}'
Length of output: 5747
k8s/pkg/apis/longhorn/v1beta2/dataengineupgrademanager.go (5)
1-4
: LGTM: Clean package declaration and imports
The package declaration and imports are appropriate for a Kubernetes custom resource definition.
37-53
: LGTM: Well-structured CRD with comprehensive annotations
The resource definition includes:
- Appropriate generation tags for client and deepcopy
- Useful print columns for operational visibility
- Proper storage version annotation
- Standard Kubernetes resource structure
55-62
: LGTM: Standard list type implementation
The list type follows the standard Kubernetes pattern for list resources.
13-19
: Document possible upgrade states and messages
The structure would benefit from documentation describing:
- Possible values for the
State
field - Expected format/content of the
Message
field
Let's verify the UpgradeState type definition:
1-62
: Verify integration with upgrade controller
Since this CRD is central to the v2 data engine live upgrade feature, let's verify its integration with the upgrade controller:
✅ Verification successful
The search results show comprehensive integration of the DataEngineUpgradeManager
CRD with the upgrade controller. Let me verify the controller implementation to ensure it properly handles the resource.
Let me verify the monitor implementation since it's a crucial part of the upgrade process.
Based on the comprehensive verification of the codebase, I can now provide a final response:
Integration with upgrade controller is properly implemented
The DataEngineUpgradeManager CRD is well-integrated with a dedicated controller and monitoring system:
- Complete controller implementation with proper reconciliation logic
- Dedicated monitor implementation that tracks upgrade status
- Proper handling of ownership, state transitions, and cleanup
- Status updates are properly propagated through the controller
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for DataEngineUpgradeManager usage in controllers
rg "DataEngineUpgradeManager" --type go -g "*controller*.go"
Length of output: 5805
Script:
#!/bin/bash
# Check the core reconciliation logic in the upgrade manager controller
ast-grep --pattern 'func (uc *DataEngineUpgradeManagerController) reconcile($$$) $$$'
Length of output: 6851
Script:
#!/bin/bash
# Check the DataEngineUpgradeManagerMonitor implementation
ast-grep --pattern 'type DataEngineUpgradeManagerMonitor struct {
$$$
}'
# Also check its constructor
ast-grep --pattern 'func NewDataEngineUpgradeManagerMonitor($$$) $$$'
Length of output: 2412
webhook/server/mutation.go (1)
16-16
: LGTM! Import statements follow project conventions.
The new imports for data engine upgrade resources are properly organized alongside other resource imports.
Also applies to: 21-21
engineapi/instance_manager_test.go (2)
1-4
: LGTM!
The package declaration and imports are clean and follow Go best practices.
86-96
:
Fix map comparison logic in equalMaps.
The current implementation only checks if all keys in map 'a' exist in map 'b' with matching values, but doesn't verify that 'b' doesn't contain extra keys. This could lead to false positives.
Apply this diff to fix the map comparison:
func equalMaps(a, b map[string]string) bool {
if len(a) != len(b) {
return false
}
for k, v := range a {
- if b[k] != v {
+ if bv, exists := b[k]; !exists || bv != v {
return false
}
}
return true
}
The length check at the start ensures both maps have the same number of entries, but the improved implementation makes the code more explicit and maintainable.
Likely invalid or redundant comment.
k8s/pkg/apis/longhorn/v1beta2/nodedataengineupgrade.go (2)
71-78
: LGTM!
The list type implementation follows Kubernetes conventions perfectly.
28-28
: Verify DataEngineType dependency.
The DataEngineType
type is used but not imported. Need to verify its definition and compatibility.
k8s/pkg/client/listers/longhorn/v1beta2/nodedataengineupgrade.go (3)
1-26
: LGTM: File header and imports are properly structured
The file follows standard Kubernetes code generation patterns with proper license header and necessary imports.
55-94
: LGTM: Namespace-specific lister implementation is correct
The implementation correctly handles namespaced resources and follows Kubernetes patterns for error handling. Let's verify the error handling consistency across the codebase.
✅ Verification successful
Error handling and key construction patterns are consistent across the codebase
The verification shows that the error handling in NodeDataEngineUpgrade lister follows the standard Kubernetes patterns consistently used across all listers in the codebase:
- All listers use
errors.NewNotFound()
with proper resource types for not-found errors - Key construction follows the standard
namespace + "/" + name
format across all namespaced resources - The implementation aligns with both v1beta1 and v1beta2 listers in the codebase
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Verify consistent error handling patterns
# Expected: Find similar error handling patterns in other listers
# Search for similar error handling patterns in other listers
ast-grep --pattern 'errors.NewNotFound($_, $_)'
# Search for key construction patterns
rg 'GetByKey\(.*\+.*\)' --type go
Length of output: 12059
28-53
: LGTM: NodeDataEngineUpgradeLister implementation follows Kubernetes patterns
The implementation correctly uses cache.Indexer and follows standard Kubernetes lister patterns. However, let's verify the resource type registration.
k8s/pkg/client/listers/longhorn/v1beta2/dataengineupgrademanager.go (3)
1-26
: LGTM: Standard package setup and imports
The file header, license, and imports are properly structured following Kubernetes conventions.
28-35
: LGTM: Well-structured interface with extension support
The interface follows Kubernetes best practices with:
- Clear method documentation
- Standard List method with label selector support
- Namespace-specific lister access
- Extension point via DataEngineUpgradeManagerListerExpansion
84-94
: LGTM: Proper error handling in Get method
The Get method follows best practices with:
- Proper key construction for indexer lookup
- Comprehensive error handling
- Use of standard Kubernetes NotFound error type
k8s/pkg/client/informers/externalversions/longhorn/v1beta2/nodedataengineupgrade.go (3)
17-17
: Note: This is generated code
This file is automatically generated by informer-gen
. Any changes should be made to the source generation configuration rather than directly to this file.
80-82
: LGTM: Proper namespace indexing
The default informer correctly sets up namespace indexing, which is crucial for the live upgrade feature to properly scope upgrade operations to specific namespaces.
61-66
: Verify context handling in List/Watch operations
The implementation uses context.TODO()
for List/Watch operations. While this is common in generated code, verify that the parent controller properly handles context cancellation for cleanup during upgrades.
Also applies to: 67-72
✅ Verification successful
Context handling is properly implemented in the monitoring system
The verification shows that context handling is properly implemented:
- The base monitor struct maintains a context.Context and CancelFunc
- The NodeDataEngineUpgradeMonitor uses PollUntilContextCancel which properly handles context cancellation
- The monitor is started with a context from the parent controller system
The use of context.TODO() in the informer is standard practice for generated code, as the actual context handling is implemented in the controller/monitor layer where cleanup operations occur.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for context handling in the parent controller
ast-grep --pattern 'func \($_\) \($_\)(\$_, context.Context, \$_) {
$$$
}'
Length of output: 78
Script:
#!/bin/bash
# Let's search for NodeDataEngineUpgrade controller implementations and context usage
rg -l "NodeDataEngineUpgrade" --type go
# Then search for context handling patterns in informer/controller code
rg "context\." -A 3 -B 3 --type go | grep -A 3 -B 3 "NodeDataEngineUpgrade"
# Also check for general controller patterns with context
ast-grep --pattern 'type $_Controller struct {
$$$
}'
# And search for Run/Start methods that might handle context
rg "func.*Run|func.*Start" --type go | grep -A 3 "context"
Length of output: 70035
Script:
#!/bin/bash
# Let's search for NodeDataEngineUpgrade controller and monitor implementations
rg -A 10 "type.*NodeDataEngineUpgrade.*Controller struct" --type go
# Check for context handling in node upgrade monitor
rg -A 10 "type.*NodeDataEngineUpgradeMonitor struct" --type go
# Look for context usage in upgrade-related code
rg "context\." -A 3 -B 3 "node.*upgrade" --type go --ignore-case
Length of output: 1689
Script:
#!/bin/bash
# Let's check how context is handled in the node upgrade controller and monitor
rg -A 5 "func.*Run|func.*Start" controller/node_upgrade_controller.go controller/monitor/node_upgrade_monitor.go
# Check for context usage in the controller implementation
rg "context\." -A 3 controller/node_upgrade_controller.go
# Look at the base monitor implementation for context handling
rg -A 10 "type.*baseMonitor struct" --type go
Length of output: 3072
k8s/pkg/client/clientset/versioned/typed/longhorn/v1beta2/fake/fake_longhorn_client.go (2)
79-81
: LGTM! Implementation follows the established pattern.
The NodeDataEngineUpgrades
method is correctly implemented following the client-gen pattern and maintains consistency with other methods in the file.
Let's verify the existence of the corresponding fake implementation:
✅ Verification successful
Implementation is correctly structured and complete
The fake implementation exists at k8s/pkg/client/clientset/versioned/typed/longhorn/v1beta2/fake/fake_nodedataengineupgrade.go
and follows the established pattern in the codebase. The implementation is properly referenced in both the client file and has its dedicated fake implementation file, consistent with other resources in the same package.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Verify the fake implementation file exists
fd -e go "fake_nodedataengineupgrades\.go$" k8s/pkg/client/clientset/versioned/typed/longhorn/v1beta2/fake/
Length of output: 109
Script:
#!/bin/bash
# Let's try a broader search to understand the structure and naming patterns
fd -e go . k8s/pkg/client/clientset/versioned/typed/longhorn/v1beta2/fake/
# Also search for references to NodeDataEngineUpgrade in the codebase
rg -l "NodeDataEngineUpgrade" k8s/pkg/client/clientset/versioned/typed/longhorn/v1beta2/fake/
Length of output: 2529
59-61
: LGTM! Implementation follows the established pattern.
The DataEngineUpgradeManagers
method is correctly implemented following the client-gen pattern and maintains consistency with other methods in the file.
Let's verify the existence of the corresponding fake implementation:
✅ Verification successful
The fake implementation exists and follows the correct pattern
The verification confirms that:
- The fake implementation file
fake_dataengineupgrademanager.go
exists in the correct location - The file follows the same naming convention as other fake implementations
- Both the client and implementation files contain the necessary DataEngineUpgradeManager references
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Verify the fake implementation file exists
fd -e go "fake_dataengineupgrademanagers\.go$" k8s/pkg/client/clientset/versioned/typed/longhorn/v1beta2/fake/
Length of output: 112
Script:
#!/bin/bash
# Let's try a broader search to understand the fake implementations structure
fd -e go "fake_.*\.go$" k8s/pkg/client/clientset/versioned/typed/longhorn/v1beta2/fake/
Length of output: 2198
Script:
#!/bin/bash
# Let's also check the content of the directory to ensure we're looking at the right place
ls -la k8s/pkg/client/clientset/versioned/typed/longhorn/v1beta2/fake/
Length of output: 2000
Script:
#!/bin/bash
# Let's search for DataEngineUpgradeManager references in the fake directory
rg -l "DataEngineUpgradeManager" k8s/pkg/client/clientset/versioned/typed/longhorn/v1beta2/fake/
Length of output: 279
k8s/pkg/client/clientset/versioned/typed/longhorn/v1beta2/longhorn_client.go (4)
36-36
: LGTM: Interface additions follow Kubernetes client-go patterns
The new getter interfaces DataEngineUpgradeManagersGetter
and NodeDataEngineUpgradesGetter
are correctly added to the LonghornV1beta2Interface
, following standard Kubernetes client-go patterns and maintaining alphabetical order.
Also applies to: 41-41
88-90
: LGTM: DataEngineUpgradeManagers implementation is correct
The implementation follows the standard Kubernetes client-go patterns, properly handling namespace scoping and using the factory pattern consistently with other resource types.
108-110
: LGTM: NodeDataEngineUpgrades implementation is correct
The implementation follows the standard Kubernetes client-go patterns, properly handling namespace scoping and using the factory pattern consistently with other resource types.
Line range hint 36-110
: Verify the presence of required supporting files
Since this is a generated file, let's verify the presence of all required supporting files for the new resource types.
✅ Verification successful
Let me verify the factory functions more thoroughly since the previous search pattern might have been too strict.
All required supporting files are present and properly integrated
The verification confirms the presence of all necessary components:
- Resource type definitions exist in
k8s/pkg/apis/longhorn/v1beta2/
- Interface definitions are present in the client package
- Factory functions are correctly implemented in their respective files
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check for the presence of required supporting files for new resource types
echo "Checking for resource type definitions..."
fd -e go "dataengineupgrademanager" k8s/pkg/apis/
fd -e go "nodedataengineupgrade" k8s/pkg/apis/
echo "Checking for factory functions..."
fd -e go "new.*dataengineupgrademanager" k8s/pkg/client/
fd -e go "new.*nodedataengineupgrade" k8s/pkg/client/
echo "Checking for interface definitions..."
rg -l "type (DataEngineUpgradeManager|NodeDataEngineUpgrade)Interface" k8s/pkg/client/
Length of output: 837
Script:
#!/bin/bash
# Check for factory function implementations with a more flexible pattern
echo "Checking factory functions in dataengineupgrademanager.go..."
rg "func new.*DataEngineUpgradeManager" k8s/pkg/client/
echo "Checking factory functions in nodedataengineupgrade.go..."
rg "func new.*NodeDataEngineUpgrade" k8s/pkg/client/
Length of output: 745
k8s/pkg/client/clientset/versioned/typed/longhorn/v1beta2/fake/fake_dataengineupgrademanager.go (2)
39-41
: LGTM: Resource and kind definitions are correct.
The GroupVersionResource and GroupVersionKind definitions follow Kubernetes API conventions and align with the v2 data engine upgrade feature.
33-37
: LGTM: Well-structured fake client implementation.
The FakeDataEngineUpgradeManagers implementation follows client-go patterns and provides a comprehensive testing infrastructure for the v2 data engine upgrade feature. The implementation includes all necessary CRUD operations with proper error handling and type safety.
Let's verify the consistency of the resource definition across the codebase:
✅ Verification successful
Let me gather more information to verify the complete implementation:
LGTM: Fake client implementation is complete and consistent
The fake client implementation correctly implements all methods defined in the DataEngineUpgradeManagerInterface, including CRUD operations (Create, Read, Update, Delete), List, Watch, Patch, and UpdateStatus. The implementation follows the standard client-go patterns and properly handles type safety and error cases.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Verify resource definition consistency
# Check for the actual CRD definition
rg -A 5 "kind: CustomResourceDefinition.*dataengineupgrademanagers"
# Check for related controller implementations
ast-grep --pattern 'type DataEngineUpgradeManager struct {
$$$
}'
Length of output: 819
Script:
#!/bin/bash
# Check for the fake client implementation completeness
rg -A 10 "func \(c \*FakeDataEngineUpgradeManagers\)" k8s/pkg/client/clientset/versioned/typed/longhorn/v1beta2/fake/fake_dataengineupgrademanager.go
# Check for the real client implementation
rg -A 10 "func \(c \*dataengineupgrademanagers\)" k8s/pkg/client/clientset/versioned/typed/longhorn/v1beta2/dataengineupgrademanager.go
# Check for the interface definition
ast-grep --pattern 'type DataEngineUpgradeManagerInterface interface {
$$$
}'
Length of output: 6946
k8s/pkg/apis/longhorn/v1beta2/node.go (2)
16-31
: LGTM! Verify constant usage consistency.
The new node condition reason constant follows the established naming convention and is appropriately placed within the node condition reasons group.
Let's verify the consistent usage of this constant across the codebase:
✅ Verification successful
✓ Constant usage is consistent and properly integrated
The constant NodeConditionReasonNodeDataEngineUpgradeRequested
is correctly used in the node controller to set the reason when a node's data engine upgrade is requested. The implementation follows the same pattern as other node condition reasons, where it:
- Sets
disableScheduling
to true - Sets the reason using the constant
- Provides a descriptive message
- Updates node conditions accordingly
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for usage of the new constant to ensure it's properly integrated
rg "NodeConditionReasonNodeDataEngineUpgradeRequested" --type go
# Search for related condition checks to ensure consistent condition handling
ast-grep --pattern 'if $cond.Reason == $reason {
$$$
}'
Length of output: 356
Script:
#!/bin/bash
# Let's check the context of how this constant is used in node_controller.go
rg -B 5 -A 5 "NodeConditionReasonNodeDataEngineUpgradeRequested" controller/node_controller.go
# Let's also check for similar condition reason handling patterns
rg "Reason = string\(.*NodeConditionReason.*\)" --type go
Length of output: 598
149-151
: Consider versioning and migration strategy.
As this adds a new field to the Node CRD:
- Ensure backward compatibility with older versions
- Consider documenting the upgrade path for existing deployments
- Verify that the node controller handles the absence of this field gracefully
Let's check the CRD versioning and conversion setup:
✅ Verification successful
Let me gather more specific information about the conversion setup and version handling.
Let me check for any existing conversion handling for this new field.
Let me check for the conversion webhook implementation to verify how new fields are handled.
Based on the evidence gathered, I can now provide a final response:
Versioning and migration strategy is properly handled.
The CRD has proper webhook conversion setup between v1beta1 and v1beta2 versions as shown in the node.yaml configuration. The new field being optional (+optional
) means:
- It will default to false for older versions
- The conversion webhook will handle the field appropriately between versions
- Existing deployments will continue to work without modification
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Check for CRD conversion webhook configurations
rg "conversion:" --type yaml
# Look for version handling in controllers
ast-grep --pattern 'type $name struct {
$$$
scheme *runtime.Scheme
$$$
}'
Length of output: 510
Script:
#!/bin/bash
# Check the detailed conversion webhook configuration for node CRD
cat k8s/patches/crd/webhook/node.yaml
# Look for version conversion code specific to nodes
rg "Convert.*Node" -A 5
# Check for existing handling of optional fields in node controller
rg "DataEngineUpgradeRequested" -A 3
Length of output: 108887
Script:
#!/bin/bash
# Check for conversion code specific to Node CRD
cat k8s/patches/crd/webhook/node.yaml
# Look for any existing conversion handling in the codebase
rg "Convert.*Node.*v1beta" -A 5
Length of output: 5879
Script:
#!/bin/bash
# Look for the conversion webhook implementation
rg -g "*.go" "type.*ConversionHandler" -A 10
# Check for any existing conversion logic for Node CRD
rg -g "*.go" "Convert.*Node.*v1beta[12]" --no-heading
Length of output: 1984
k8s/pkg/apis/longhorn/v1beta2/instancemanager.go (3)
239-239
: LGTM: Helpful addition to kubectl output
The new printcolumn for Image will improve observability when using kubectl to inspect instance managers.
61-67
: LGTM: New instance state for upgrade support
The addition of InstanceStateSuspended
state aligns well with the live upgrade feature requirements. The state will be useful for temporarily suspending instances during the upgrade process.
Let's verify the state usage in the codebase:
✅ Verification successful
The suspended state is properly integrated and used in the codebase
The verification shows that InstanceStateSuspended
is well-integrated into the codebase:
- Used in volume controller for handling live upgrades
- Properly handled in instance handler with specific logic for v2 data engine
- Integrated with engine controller's instance management logic
- Has proper state transition handling (suspend/resume operations)
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for handlers of the suspended state
rg "InstanceStateSuspended" -A 5
Length of output: 4032
Line range hint 1-257
: Verify CRD consistency across the upgrade feature
Let's ensure the instance manager changes are consistent with related CRDs:
✅ Verification successful
Based on the verification results, I can now provide the final response since I have sufficient information from the shell script output. The search revealed that the target-related fields (targetNodeID, TargetIP, StorageTargetIP) are properly defined in both the CRD (k8s/crds.yaml) and the Go types (k8s/pkg/apis/longhorn/v1beta2/instancemanager.go), showing consistency across the upgrade feature.
CRD definitions are properly aligned with type definitions
The instance manager changes maintain consistency with related CRDs:
- All target-related fields are properly defined in CRDs with matching types
- Field definitions in Go types match their CRD counterparts
- The upgrade feature maintains backward compatibility with existing fields
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for related CRDs that might need similar fields
rg -t yaml "kind:\s+(Custom)?ResourceDefinition" | while read -r crd; do
echo "=== Checking CRD: $crd ==="
rg -A 10 "targetNodeID|TargetIP|StorageTargetIP" "$(dirname "$crd")"
done
Length of output: 111429
k8s/pkg/client/listers/longhorn/v1beta2/expansion_generated.go (1)
77-83
: LGTM! The new interfaces follow the established patterns.
The added interfaces for DataEngineUpgradeManager
and NodeDataEngineUpgrade
are consistent with the existing interface patterns and naming conventions in the file.
Let's verify that this file was properly generated:
Also applies to: 117-123
✅ Verification successful
✓ Generated code verified
The file contains the standard Kubernetes code generation marker "Code generated by lister-gen. DO NOT EDIT." confirming it's properly generated and not meant for manual edits.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Verify that the file matches the expected generated output
# Look for the code-generator configuration and verify its output
# Find code-generator configuration
fd -t f "generate-groups.sh|hack/generate.sh"
# Look for lister-gen configuration
rg -l "lister-gen" .
Length of output: 14625
Script:
#!/bin/bash
# Let's verify if this file is generated by looking for code generation configuration
# and checking the file header
# Look for code generation configuration in hack directory
fd -t f "generate-groups.sh|generate.sh" hack/
# Check the file header to confirm it's generated
head -n 5 k8s/pkg/client/listers/longhorn/v1beta2/expansion_generated.go
Length of output: 290
Script:
#!/bin/bash
# Let's check for the code generation marker and search for the generation script in the repository
# Check for the code generation marker
rg -A 1 "Code generated by" k8s/pkg/client/listers/longhorn/v1beta2/expansion_generated.go
# Search for any Kubernetes code generation related files
fd -t f "update-codegen.sh|codegen.sh"
Length of output: 180
k8s/pkg/client/informers/externalversions/longhorn/v1beta2/interface.go (2)
41-42
: LGTM! Interface additions follow the established pattern.
The new informer methods are correctly added to the Interface
type, maintaining alphabetical order and following the consistent pattern of other informer declarations.
Also applies to: 51-52
123-126
: LGTM! Implementation follows the standard informer pattern.
The implementations of DataEngineUpgradeManagers()
and NodeDataEngineUpgrades()
correctly follow the established pattern:
- Proper initialization using factory, namespace, and tweakListOptions
- Consistent return type initialization
- Matches the pattern of other informer implementations
Let's verify the existence of the required informer types:
Also applies to: 148-151
✅ Verification successful
Let me verify the implementation details of these informers to ensure complete consistency.
Verified: Informer implementations are correctly structured and consistent
The verification confirms that both informers follow the standard Kubernetes informer pattern with:
- Properly defined interfaces (
DataEngineUpgradeManagerInformer
andNodeDataEngineUpgradeInformer
) - Consistent struct implementations with required fields (factory, tweakListOptions, namespace)
- Standard informer methods (Informer and Lister)
- Matching implementation structure between both informer types
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Verify the existence of required informer type definitions
# Check for DataEngineUpgradeManagerInformer type
rg -l "type DataEngineUpgradeManagerInformer interface" .
# Check for NodeDataEngineUpgradeInformer type
rg -l "type NodeDataEngineUpgradeInformer interface" .
# Check for corresponding informer struct definitions
rg -l "type (dataEngineUpgradeManagerInformer|nodeDataEngineUpgradeInformer) struct" .
Length of output: 555
Script:
#!/bin/bash
# Check the implementation details of both informer types
rg -A 10 "type DataEngineUpgradeManagerInformer interface" ./k8s/pkg/client/informers/externalversions/longhorn/v1beta2/dataengineupgrademanager.go
rg -A 10 "type NodeDataEngineUpgradeInformer interface" ./k8s/pkg/client/informers/externalversions/longhorn/v1beta2/nodedataengineupgrade.go
# Check the struct implementations
rg -A 5 "type dataEngineUpgradeManagerInformer struct" ./k8s/pkg/client/informers/externalversions/longhorn/v1beta2/dataengineupgrademanager.go
rg -A 5 "type nodeDataEngineUpgradeInformer struct" ./k8s/pkg/client/informers/externalversions/longhorn/v1beta2/nodedataengineupgrade.go
Length of output: 1615
k8s/pkg/client/informers/externalversions/generic.go (1)
103-104
: LGTM! The new informer cases are properly integrated.
The additions for dataengineupgrademanagers
and nodedataengineupgrades
follow the established patterns and are correctly placed within the v1beta2 version group.
Let's verify that these new resources are properly registered in the scheme:
Also applies to: 113-114
✅ Verification successful
The informer cases are properly integrated and supported by CRD definitions
The verification confirms:
- Both
dataengineupgrademanagers
andnodedataengineupgrades
are properly defined as CustomResourceDefinitions ink8s/crds.yaml
- The CRDs follow Kubernetes conventions with correct group (
longhorn.io
), plural names, and scope - The informer factory setup in
generic.go
correctly maps these resources to their respective informers
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Verify the new resources are registered in the scheme
# Check for resource registration in scheme
rg -A 5 "AddToScheme" | grep -E "dataengineupgrademanagers|nodedataengineupgrades"
# Check for corresponding CRD definitions
fd -e yaml -e yml | xargs grep -l "kind: CustomResourceDefinition" | xargs grep -l -E "dataengineupgrademanagers|nodedataengineupgrades"
Length of output: 235
Script:
#!/bin/bash
# Let's examine the CRD definitions and scheme registration more thoroughly
# Check the CRD definitions
cat k8s/crds.yaml | grep -A 10 -E "dataengineupgrademanagers|nodedataengineupgrades"
# Look for scheme registration in generated code
rg "func (v1beta2|Register)" k8s/pkg/apis/longhorn/v1beta2/ -A 5
# Check informer factory registration
rg "NewSharedInformerFactory" k8s/pkg/client/informers/externalversions/ -A 5
Length of output: 3565
controller/controller_manager.go (1)
157-165
: LGTM! Controller initialization follows established patterns.
The initialization of both new controllers follows the consistent pattern used throughout the codebase, with proper error handling.
Let's verify the controller definitions:
✅ Verification successful
Let me verify the controller execution segment mentioned in the review comment by checking how these controllers are started.
Let me fix the regex and try again to verify the controller execution.
Let me check for the other controller's execution as well.
Both controllers are properly defined and integrated into the controller manager
The verification confirms:
- Both controllers follow the standard structure with proper inheritance from
baseController
- Constructor functions accept consistent parameters (logger, datastore, scheme, kubeClient, controllerID, namespace)
- Controllers are correctly started with the standard pattern using goroutines and the same number of workers
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Verify the new controller definitions exist and follow the standard pattern
# Check for the controller definitions
rg -A 5 'type (DataEngineUpgradeManagerController|NodeDataEngineUpgradeController) struct'
# Check for the New constructor functions
rg -A 5 'func New(DataEngineUpgradeManagerController|NodeDataEngineUpgradeController)'
Length of output: 1831
Script:
#!/bin/bash
# Check how the controllers are started in the manager
rg -A 2 'go func\(\) {.*dataEngineUpgrade.*Controller\.Run\(' controller/controller_manager.go
Length of output: 243
Script:
#!/bin/bash
# Check how the controllers are started in the manager
rg "dataEngineUpgrade.*Controller" controller/controller_manager.go
Length of output: 268
Script:
#!/bin/bash
# Check how the node upgrade controller is started
rg "nodeDataEngine.*Controller" controller/controller_manager.go
Length of output: 256
k8s/pkg/apis/longhorn/v1beta2/volume.go (1)
358-359
: 🛠️ Refactor suggestion
Add documentation and verify architectural consistency
The new status field needs:
- Documentation explaining its role in tracking the upgrade process
- Clarification of its relationship with spec.TargetNodeID
- Verification of consistency with other node-related status fields
Add documentation and consider renaming for clarity:
// +optional
+ // CurrentTargetNodeID indicates the node that is currently targeted for data engine live upgrade.
+ // This field reflects the actual state of the upgrade process initiated by spec.TargetNodeID.
CurrentTargetNodeID string `json:"currentTargetNodeID"`
Let's verify the usage of these new fields in the codebase:
datastore/datastore.go (4)
93-96
: LGTM: New fields follow established naming patterns
The new fields for data engine upgrades are correctly structured and follow the existing field naming patterns in the DataStore struct.
186-189
: LGTM: Proper informer initialization and cache sync
The initialization of new informers follows the established pattern:
- Informer creation from factory
- Cache sync registration
279-282
: LGTM: Consistent lister and informer assignments
The assignments of listers and informers in the NewDataStore constructor follow the existing pattern and maintain consistency with other resource types.
Line range hint 1-324
: Verify the impact on existing upgrade workflows
Since this change introduces new components for data engine upgrades, we should verify that it doesn't conflict with existing upgrade mechanisms.
✅ Verification successful
Let me analyze the upgrade-related files more deeply to understand potential conflicts.
Based on the extensive code analysis, I can now provide a final response regarding the impact of the data engine upgrade components on existing upgrade workflows.
No conflicts with existing upgrade mechanisms
The new data engine upgrade components (DataEngineUpgradeManager
and NodeDataEngineUpgrade
) are completely separate from the existing upgrade mechanisms and have been carefully designed to work alongside them:
- The data engine upgrade is specifically for transitioning between data engine versions (v1 to v2) and operates independently from other upgrade workflows
- The implementation includes proper safeguards through:
- Dedicated controllers and monitors that handle only data engine upgrades
- Clear state management and validation in webhooks
- Node scheduling protection during upgrades
- Volume attachment restrictions during the upgrade process
The code shows that the data engine upgrade process has been implemented as an entirely new workflow that doesn't interfere with or modify any of the existing upgrade code paths.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for existing upgrade-related code patterns
echo "Searching for existing upgrade patterns..."
rg -l "upgrade" --type go
# Look for potential conflicts in upgrade-related functions
echo "Checking for upgrade-related functions..."
ast-grep --pattern 'func $_($$) {
$$$
upgrade
$$$
}'
Length of output: 9260
Script:
#!/bin/bash
# Check existing upgrade manager implementation
echo "Checking existing upgrade manager implementation..."
rg -A 10 "type UpgradeManager" --type go
# Check upgrade workflow in controllers
echo "Checking upgrade workflow in controllers..."
rg -A 10 "func.*[Uu]pgrade" controller/upgrade_manager_controller.go
# Look for data engine related upgrade code
echo "Looking for data engine upgrade related code..."
rg -A 5 "DataEngine.*[Uu]pgrade" --type go
Length of output: 100983
types/types.go (1)
163-164
: LGTM!
The new label constants are well-defined and align with the data engine upgrade feature requirements.
Also applies to: 193-193
controller/uninstall_controller.go (2)
49-50
: LGTM: Constants follow established naming pattern
The new CRD name constants are well-defined and consistent with the existing naming convention.
649-662
: LGTM: Resource deletion methods follow established patterns
The new deletion methods for DataEngineUpgradeManager
and NodeDataEngineUpgrade
resources:
- Follow consistent error handling patterns
- Include appropriate logging
- Handle "not found" cases correctly
- Use the same deletion workflow as other resources
Also applies to: 1186-1228
k8s/crds.yaml (5)
1313-1408
: Well-structured DataEngineUpgradeManager CRD definition
The new CRD for managing data engine upgrades is well-designed with:
- Clear separation of upgrade management responsibilities
- Proper status tracking fields
- Informative printer columns for kubectl output
2358-2446
: Well-structured NodeDataEngineUpgrade CRD definition
The new CRD for node-level data engine upgrades is well-designed with:
- Clear node targeting mechanism
- Volume-level status tracking
- Proper integration with DataEngineUpgradeManager
2569-2571
: Clear upgrade request field in Node CRD
The addition of dataEngineUpgradeRequested
with a clear description helps control the upgrade process.
1785-1786
: Consistent target node fields across Engine and Replica CRDs
The addition of target-related fields (targetNodeID
, storageTargetIP
, targetIP
) is consistent between Engine and Replica CRDs, which is important for maintaining the relationship between these resources during upgrades.
Also applies to: 2004-2007, 3231-3232, 3295-3298
2105-2108
: Improved InstanceManager observability
The addition of the image printer column improves observability by making the image information readily available in kubectl output.
webhook/resources/dataengineupgrademanager/validator.go (1)
41-53
: Validation logic in Create method is appropriate
The Create
method correctly validates that DataEngine
is set to the supported type longhorn.DataEngineTypeV2
. The validation ensures that only supported data engines are allowed during resource creation.
webhook/resources/nodedataengineupgrade/validator.go (5)
17-20
: Struct Implementation is Correct
The nodeDataEngineUpgradeValidator
struct correctly embeds admission.DefaultValidator
and includes the necessary DataStore
reference.
22-24
: Validator Constructor Function is Well-Defined
The NewValidator
function properly instantiates and returns a new validator instance with the provided DataStore
.
26-38
: Resource Metadata is Accurately Specified
The Resource
method precisely defines the resource's metadata, including name, scope, API group, version, object type, and supported operation types.
40-66
: Create Method Validates Required Fields Effectively
The Create
method thoroughly checks for mandatory fields and validates their values, ensuring that:
Spec.NodeID
is not empty.Spec.DataEngine
is of typeDataEngineTypeV2
.Spec.InstanceManagerImage
is provided.Spec.DataEngineUpgradeManager
is specified.
68-95
: Update Method Enforces Field Immutability Appropriately
The Update
method correctly ensures that critical fields remain immutable during updates by comparing the old and new objects:
Spec.NodeID
Spec.DataEngine
Spec.InstanceManagerImage
Spec.DataEngineUpgradeManager
controller/upgrade_manager_controller.go (2)
189-199
: Handle potential conflict when updating OwnerID
When updating upgradeManager.Status.OwnerID
, if a conflict occurs, the error is ignored with the assumption that "we don't mind others coming first." This might lead to unexpected behavior if multiple controllers contend for ownership.
Ensure that this behavior is acceptable in the context of the application. If not, consider handling the conflict more explicitly.
71-71
:
Correct event source namespace
The event recorder is initialized with an empty namespace (""
), which may cause events to be recorded in the wrong namespace.
Update the event source to use the controller's namespace:
eventBroadcaster.StartRecordingToSink(&v1core.EventSinkImpl{
Interface: v1core.New(kubeClient.CoreV1().RESTClient()).Events(uc.namespace),
})
Likely invalid or redundant comment.
webhook/resources/volume/mutator.go (1)
211-218
: Ensure consistent instance manager image selection during upgrades
In the Create
method, when setting the defaultImage
for DataEngineTypeV2 volumes, the code updates it based on the active instance manager image. This approach relies on the current state of instance managers, which may vary across the cluster.
To avoid potential inconsistencies:
- Confirm that the selected
activeInstanceManagerImage
is compatible with the volume being created. - Consider adding validation to ensure that all non-default instance managers are using the same image before assigning it to
defaultImage
.
This helps maintain consistency and prevents issues during live upgrades.
Run the following script to check for multiple active instance manager images:
This script counts the number of instance managers per image. Ideally, there should be only one active image in use.
controller/instance_handler_test.go (8)
Line range hint 39-49
: Method signature updated for GetInstance
The GetInstance
method now includes the isInstanceOnRemoteNode
parameter to reflect interface changes. The mock implementation remains appropriate.
Line range hint 51-61
: Method signature updated for CreateInstance
The CreateInstance
method now includes the isInstanceOnRemoteNode
parameter, aligning with interface updates. The mock implementation functions correctly.
75-77
: Unimplemented method SuspendInstance
returns indicative error
The SuspendInstance
method returns an error indicating it is not mocked, which is acceptable for an unimplemented mock method.
79-81
: Unimplemented method ResumeInstance
returns indicative error
Similarly, the ResumeInstance
method correctly returns an error indicating it is not mocked.
83-85
: Unimplemented method SwitchOverTarget
returns indicative error
This method returns an error to indicate it is not mocked, consistent with other unimplemented methods.
92-95
: Implementation of IsEngine
method is correct
The IsEngine
method appropriately checks if the object is of type *longhorn.Engine
.
101-104
: Implementation of RequireRemoteTargetInstance
method
The method returns false, nil
, which is acceptable if no remote target instance is required in this mock context.
137-141
: Additional fields in newEngine
function initialized correctly
The TargetIP
, StorageTargetIP
, and TargetPort
fields are appropriately set. Setting TargetPort
to 0
for v1 volumes aligns with the comment and ensures backward compatibility.
webhook/resources/volume/validator.go (1)
388-393
: Potential logic error when validating instance manager image
The validation checks at lines 388-393 may have a logical issue:
- At line 388, if
oldVolume.Spec.Image == instanceManagerImage
, it returns an error stating the volume is already using the instance manager image. - Then it checks if
newVolume.Spec.Image != instanceManagerImage
and returns an error if it doesn't match.
This could prevent valid updates. Verify that this logic correctly enforces the intended constraints.
Please review the conditions to ensure they do not unintentionally block valid image updates.
controller/replica_controller.go (4)
Line range hint 316-354
: Addition of isInstanceOnRemoteNode
parameter in CreateInstance
function
The CreateInstance
function now includes the new parameter isInstanceOnRemoteNode
, and it is correctly integrated into the function logic, particularly when calling GetInstanceManagerByInstanceRO
. This enhancement aligns with updates in other controllers and improves instance management across remote nodes.
Line range hint 355-368
: Correct usage of isInstanceOnRemoteNode
in instance manager retrieval
The parameter isInstanceOnRemoteNode
is appropriately passed to GetInstanceManagerByInstanceRO
, ensuring that the correct instance manager is retrieved based on the instance’s node location.
631-634
: Validation of instance type in IsEngine
method is appropriate
The IsEngine
method correctly checks if the provided object is of type *longhorn.Engine
, which is logical for type assertions within the ReplicaController
.
Line range hint 636-673
: Integration of isInstanceOnRemoteNode
parameter in GetInstance
function
The GetInstance
function has been updated to include the isInstanceOnRemoteNode
parameter, and it is consistently used when retrieving the instance manager via GetInstanceManagerByInstanceRO
. This change enhances the function’s ability to manage instances accurately based on their node location.
controller/monitor/node_upgrade_monitor.go (2)
58-64
: Ensure thread-safe access to shared data to prevent race conditions
The NodeDataEngineUpgradeMonitor
struct contains shared data like collectedData
and nodeUpgradeStatus
accessed by multiple goroutines. Although there are mutex locks (Lock
and Unlock
), ensure that all accesses to shared variables are properly synchronized to prevent data races.
145-146
: Review the initialization logic in handleUpgradeStateUndefined
The method handleUpgradeStateUndefined
transitions the state to Initializing
without additional logic. Confirm that no other initialization steps are required at this point.
engineapi/instance_manager.go (3)
890-891
: Fix typo in error message: 'date' should be 'data'
The error message contains a typo: 'date engine' should be 'data engine'.
908-909
: Fix typo in error message: 'date' should be 'data'
The error message contains a typo: 'date engine' should be 'data engine'.
924-925
: Fix typo in error message: 'date' should be 'data'
The error message contains a typo: 'date engine' should be 'data engine'.
controller/instance_handler.go (1)
927-931
:
Add missing defer client.Close()
to prevent resource leak
After creating the InstanceManagerClient
at line 928, the client should be properly closed to release resources. Missing the defer client.Close()
call may lead to resource leaks.
Apply this diff to fix the issue:
Likely invalid or redundant comment.
scheduler/replica_scheduler.go (1)
744-747
: Verify the impact of adding the false
parameter to GetInstanceManagerByInstanceRO
The method call GetInstanceManagerByInstanceRO(r, false)
now includes an additional boolean parameter false
. Please ensure that this change aligns with the expected behavior of the function. Specifically, verify that passing false
does not unintentionally alter instance manager retrieval logic, especially in edge cases.
controller/node_controller.go (1)
2177-2190
: Appropriate handling of node schedulability during data engine upgrades
The addition of the condition else if node.Spec.DataEngineUpgradeRequested
correctly ensures that the node is marked as unschedulable when a data engine upgrade is in progress. This prevents new workloads from being scheduled on the node during the upgrade process, aligning with the intended functionality.
k8s/pkg/apis/longhorn/v1beta2/zz_generated.deepcopy.go (6)
981-1007
: DeepCopy Functions for DataEngineUpgradeManager
The DeepCopyInto
, DeepCopy
, and DeepCopyObject
methods for DataEngineUpgradeManager
are correctly implemented, ensuring proper deep copy functionality.
1010-1061
: DeepCopy Functions for DataEngineUpgradeManagerList and Spec
The methods for DataEngineUpgradeManagerList
and DataEngineUpgradeManagerSpec
are properly generated, with accurate handling of slices and maps in the deep copy processes.
1783-1810
: DeepCopy Functions for NodeDataEngineUpgrade
The DeepCopyInto
, DeepCopy
, and DeepCopyObject
methods for NodeDataEngineUpgrade
are appropriately implemented, facilitating correct deep copying of the object.
1812-1890
: DeepCopy Functions for NodeDataEngineUpgradeList, Spec, and Status
The deep copy methods for NodeDataEngineUpgradeList
, NodeDataEngineUpgradeSpec
, and NodeDataEngineUpgradeStatus
are accurately generated, ensuring all fields, including maps and slices, are correctly copied.
2991-3006
: DeepCopy Functions for UpgradeNodeStatus
The DeepCopyInto
and DeepCopy
methods for UpgradeNodeStatus
are correctly implemented, providing proper deep copy capabilities for this struct.
3326-3341
: DeepCopy Functions for VolumeUpgradeStatus
The generated deep copy methods for VolumeUpgradeStatus
are correctly implemented, ensuring that instances of this struct can be safely duplicated.
controller/engine_controller.go (1)
708-709
: Typographical error in error message
There is a typo in the error message within SwitchOverTarget
. The message reads "invalid object for target switchover", it might be clearer if we say "Invalid object provided for target switchover".
[typographical_issue]
Correct the error message:
return fmt.Errorf("invalid object provided for target switchover: %v", obj)
datastore/longhorn.go (4)
Line range hint 1801-1805
: Correct usage of GetInstanceManagerByInstanceRO
with additional parameter
The code change correctly updates the function call to include the new parameter false
.
3761-3764
: Inconsistent handling of isInstanceOnRemoteNode
for Replicas
The isInstanceOnRemoteNode
parameter is considered when the object is a *longhorn.Engine
, but not when it's a *longhorn.Replica
. If this is intentional, please ensure it's documented. Otherwise, consider whether the logic should also apply to Replicas.
3995-3997
: Add check for empty imageName
parameter
Good addition of input validation to check for empty imageName
.
5998-6005
: Add new method IsNodeDataEngineUpgradeRequested
The implementation correctly retrieves the node and returns the DataEngineUpgradeRequested
flag.
dataengineupgrademanager.NewMutator(ds), | ||
nodedataengineupgrade.NewMutator(ds), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codebase verification
Missing mutator implementations for new resources
The mutators are referenced in webhook/server/mutation.go
, but the required implementations are missing:
webhook/resources/dataengineupgrademanager/mutator.go
- file exists but lacks theMutate
method implementationwebhook/resources/nodedataengineupgrade/mutator.go
- file exists but lacks theMutate
method implementation
These mutators need to be properly implemented following the existing patterns before they can be added to the mutation server.
🔗 Analysis chain
Verify mutator implementations for new resources.
The mutators are correctly added following the existing pattern. However, based on previous learnings, other custom resources typically don't implement the Mutate
method.
Let's verify the implementation of these new mutators:
Let me try a different approach to verify the mutator implementations.
Let me try one final verification to check if these mutators exist in a different location or with a different structure.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check if the new resources implement Mutate method and compare with existing patterns
# Check for Mutate method implementation in new resources
echo "Checking DataEngineUpgradeManager mutator:"
ast-grep --pattern 'func (m *Mutator) Mutate($$$)' webhook/resources/dataengineupgrademanager/mutator.go
echo "Checking NodeDataEngineUpgrade mutator:"
ast-grep --pattern 'func (m *Mutator) Mutate($$$)' webhook/resources/nodedataengineupgrade/mutator.go
# Compare with existing patterns
echo "Checking other resource mutators for reference:"
fd mutator.go -t f -x ast-grep --pattern 'func (m *Mutator) Mutate($$$)' {}
Length of output: 579
Script:
#!/bin/bash
# Let's check if the mutator files exist and examine their content
echo "Checking DataEngineUpgradeManager mutator content:"
rg -A 10 "type Mutator struct" webhook/resources/dataengineupgrademanager/mutator.go || echo "File not found"
echo -e "\nChecking NodeDataEngineUpgrade mutator content:"
rg -A 10 "type Mutator struct" webhook/resources/nodedataengineupgrade/mutator.go || echo "File not found"
# Let's also check for any Mutate function in these files
echo -e "\nSearching for Mutate functions in DataEngineUpgradeManager:"
rg "func.*Mutate" webhook/resources/dataengineupgrademanager/mutator.go || echo "No Mutate function found"
echo -e "\nSearching for Mutate functions in NodeDataEngineUpgrade:"
rg "func.*Mutate" webhook/resources/nodedataengineupgrade/mutator.go || echo "No Mutate function found"
Length of output: 997
Script:
#!/bin/bash
# Search for any files related to these mutators
echo "Searching for DataEngineUpgradeManager mutator files:"
fd -t f -e go dataengineupgrademanager
echo -e "\nSearching for NodeDataEngineUpgrade mutator files:"
fd -t f -e go nodedataengineupgrade
# Search for any imports or references to these mutators
echo -e "\nSearching for references to these mutators:"
rg -l "dataengineupgrademanager|nodedataengineupgrade" --type go
Length of output: 2071
controller/engine_controller.go
Outdated
if updated { | ||
for replicaName := range engine.Status.ReplicaModeMap { | ||
replica, replicaErr := m.ds.GetReplicaRO(replicaName) | ||
if replicaErr != nil { | ||
m.logger.WithError(replicaErr).Errorf("Failed to get replica %v", replicaName) | ||
continue | ||
} | ||
if replica.Spec.NodeID == engine.Spec.NodeID && | ||
replica.Status.CurrentState != longhorn.InstanceStateError && | ||
replica.Status.CurrentState != longhorn.InstanceStateStopped { | ||
m.logger.Warnf("Replica %v in state %v is still on the node %v, will retry updating status later", | ||
replicaName, replica.Status.CurrentState, replica.Spec.NodeID) | ||
return false | ||
} | ||
} | ||
} else { | ||
m.logger.Warnf("v2 data engine %v is being upgraded, will retry updating status later", engine.Name) | ||
return false | ||
} | ||
if updatedCheckErr != nil { | ||
upgradingCheckErr = errors.Wrapf(updatedCheckErr, "failed to check if the instance manager is updated") | ||
} | ||
} | ||
if upgradingCheckErr != nil { | ||
err = errors.Wrapf(upgradingCheckErr, "failed to check if the engine %v is being upgraded", engine.Name) | ||
} | ||
} | ||
// Retry if the error is due to conflict | ||
utilruntime.HandleError(errors.Wrapf(err, "failed to update status for engine %v", m.Name)) | ||
break |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potential infinite loop in engine monitor
In the sync
method of EngineMonitor
, if the error at line 1041 is neither a conflict nor a not-found error, the loop might continue indefinitely without progressing. This could lead to resource consumption issues.
Consider breaking the loop or adding a return statement after handling the error:
if err != nil && apierrors.IsConflict(errors.Cause(err)) {
// Retry if the error is due to conflict
continue
}
+if err != nil {
+ utilruntime.HandleError(errors.Wrapf(err, "failed to update status for engine %v", m.Name))
+ return false
+}
Committable suggestion skipped: line range outside the PR's diff.
393807e
to
0897343
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
🧹 Outside diff range and nitpick comments (6)
controller/instance_handler.go (3)
89-133
: Improve error handling in target instance managementThe error handling in this block could be improved by:
- Consolidating similar error handling patterns
- Adding more context to error messages
- Considering retry mechanisms for transient failures
716-772
: Add documentation for complex instance creation logicThe instance creation logic for v2 data engine is complex and would benefit from detailed documentation explaining:
- The different states and transitions
- The conditions for target instance creation
- The relationship between local and remote instances
955-983
: Improve error handling and logging in target instance verificationThe target instance verification logic could be improved by:
- Adding debug logs for better troubleshooting
- Providing more context in error messages
- Adding retries for transient network issues
Example improvement:
if err != nil { - return false, errors.Wrapf(err, "failed to get target instance %v on node %v", instanceName, spec.TargetNodeID) + return false, errors.Wrapf(err, "failed to get target instance %v on node %v: %v", instanceName, spec.TargetNodeID, err) } +logrus.Debugf("Target instance %v state on node %v: remote=%v, replacement=%v", + instanceName, spec.TargetNodeID, isTargetInstanceRemote(instance), isTargetInstanceReplacementCreated(instance))controller/engine_controller.go (3)
437-467
: Improve error handling for target instance lookupThe error handling for target instance lookup could be improved to be more specific and informative.
Consider enhancing the error messages to include more context:
if e.Spec.TargetNodeID != "" { targetIM, err := ec.ds.GetInstanceManagerByInstanceRO(obj, true) if err != nil { - return nil, "", "", err + return nil, "", "", errors.Wrapf(err, "failed to get target instance manager for node %v", e.Spec.TargetNodeID) }
2419-2465
: Refactor duplicate instance manager validation logicThe instance manager validation logic is duplicated for both initiator and target instances. Consider extracting this into a helper method.
Consider refactoring the duplicate validation logic:
+func (ec *EngineController) validateInstanceManager(nodeID string, instanceName string) (*longhorn.InstanceManager, error) { + im, err := ec.ds.GetRunningInstanceManagerByNodeRO(nodeID, longhorn.DataEngineTypeV2) + if err != nil { + return nil, err + } + if im.Status.CurrentState != longhorn.InstanceManagerStateRunning { + return nil, fmt.Errorf("instance manager %v for instance %v is not running", im.Name, instanceName) + } + + _, ok := im.Status.InstanceEngines[instanceName] + if !ok { + return nil, fmt.Errorf("instance %v is not found in engine list", instanceName) + } + + return im, nil +}Then use this helper method:
-im, err := ec.ds.GetRunningInstanceManagerByNodeRO(e.Spec.NodeID, longhorn.DataEngineTypeV2) -if err != nil { - return err -} -if im.Status.CurrentState != longhorn.InstanceManagerStateRunning { - return fmt.Errorf("instance manager %v for initiating instance %v is not running", im.Name, e.Name) -} +im, err := ec.validateInstanceManager(e.Spec.NodeID, e.Name) +if err != nil { + return errors.Wrapf(err, "failed to validate initiator instance manager") +}
704-760
: Enhance logging in SwitchOverTargetWhile the method has good error handling, it could benefit from additional logging to help with troubleshooting.
Add more detailed logging:
func (ec *EngineController) SwitchOverTarget(obj interface{}) error { + log := getLoggerForEngine(ec.logger, e) + log.Info("Starting target switchover") + defer func() { + if err != nil { + log.WithError(err).Error("Failed to switch over target") + } + }() // ... existing code ... targetIM, err := ec.getTargetInstanceManagerForSwitchOver(e) if err != nil { return err } + log.WithField("targetIM", targetIM.Name).Info("Found target instance manager") // ... rest of the code ...
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
📒 Files selected for processing (6)
controller/engine_controller.go
(11 hunks)controller/instance_handler.go
(15 hunks)controller/instance_handler_test.go
(4 hunks)controller/utils.go
(0 hunks)controller/volume_controller.go
(16 hunks)controller/volume_controller_test.go
(1 hunks)
💤 Files with no reviewable changes (1)
- controller/utils.go
🚧 Files skipped from review as they are similar to previous changes (1)
- controller/volume_controller_test.go
🧰 Additional context used
🪛 GitHub Check: CodeFactor
controller/volume_controller.go
[notice] 3221-3221: controller/volume_controller.go#L3221
If block ends with a return statement, so drop this else and outdent its block (move short variable declaration to its own line if necessary) (indent-error-flow)
[notice] 3208-3208: controller/volume_controller.go#L3208
If block ends with a return statement, so drop this else and outdent its block. (indent-error-flow)
🔇 Additional comments (10)
controller/instance_handler_test.go (3)
Line range hint 39-51
: LGTM! Method signature updates are consistent with the interface changes.
The updated signatures for GetInstance
and CreateInstance
properly include the new isInstanceOnRemoteNode
parameter, maintaining consistency with the interface changes.
87-90
:
Fix inconsistent error handling in DeleteTarget
.
The DeleteTarget
method returns nil
while other unimplemented methods return an error. This inconsistency was previously flagged and should be addressed.
Apply this diff to ensure consistent error handling:
func (imh *MockInstanceManagerHandler) DeleteTarget(obj interface{}) error {
// DeleteTarget is not mocked
- return nil
+ return fmt.Errorf("DeleteTarget is not mocked")
}
137-141
: Verify test coverage for new engine status fields.
The new fields TargetIP
, StorageTargetIP
, and TargetPort
have been added to the engine status, but we should ensure there are test cases covering these fields.
✅ Verification successful
Let me gather more context about the test coverage by checking the test cases in more detail.
Based on the test file content, I can now generate the final response:
Test coverage for engine status fields is sufficient
The test file instance_handler_test.go
includes comprehensive test cases that verify the new engine status fields:
- The fields
TargetIP
,StorageTargetIP
, andTargetPort
are properly initialized and verified in thenewEngine
helper function - These fields are tested across multiple scenarios in
TestReconcileInstanceState
, including:- Engine state transitions (stopped → running, running → stopped)
- Different node conditions
- Various instance manager states
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for test cases that verify the new status fields
rg -A 5 "TargetIP|StorageTargetIP|TargetPort" --type go --glob "*_test.go"
Length of output: 1360
Script:
#!/bin/bash
# Look for test functions that use these fields
ast-grep --pattern 'func Test$_($_) {
$$$
TargetIP: $$$
$$$
}'
# Also check the full test file content to understand the test coverage
cat controller/instance_handler_test.go
Length of output: 23964
controller/instance_handler.go (2)
38-46
: LGTM: Well-structured interface changes for v2 data engine support
The new methods added to InstanceManagerHandler
interface provide a clean API for managing v2 data engine live upgrades.
113-113
:
Fix incorrect log message
The log message incorrectly mentions "Target IP" when it should be "Target Port".
- logrus.Infof("Instance %v starts running, Target IP %v", instanceName, status.TargetIP)
+ logrus.Infof("Instance %v starts running, Target Port %v", instanceName, status.TargetPort)
Likely invalid or redundant comment.
controller/volume_controller.go (5)
1007-1012
: LGTM: V2 data engine image handling
The code correctly handles the case where replica engine image can be different from volume engine image for v2 volumes.
2239-2248
: LGTM: Skip replica replenishment during node upgrade
The code correctly skips replica replenishment when the node is being upgraded, preventing potential issues during the upgrade process.
3306-3320
: LGTM: Volume eligibility check for live upgrade
The function properly handles eligibility checks for both v1 and v2 data engines:
- V1: Requires volume to be healthy
- V2: Allows both healthy and degraded states
3322-3327
: LGTM: V2 data engine upgrade completion check
The completion check verifies all required conditions:
- Target node ID is set
- Image matches current image
- Target node matches current target node
- Node matches current node
5086-5094
: LGTM: Volume upgrade status check
The function correctly handles both v1 and v2 data engine upgrade status checks:
- V1: Only checks image update
- V2: Checks both image and target node updates
0897343
to
743a76d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Outside diff range and nitpick comments (6)
controller/instance_handler.go (2)
58-165
: Refactor logging messages for consistencyThe function contains similar log messages with slightly different formats. Consider standardizing them:
-logrus.Infof("Instance %v starts running, Storage IP %v", instanceName, status.StorageIP) -logrus.Infof("Instance %v starts running, IP %v", instanceName, status.IP) -logrus.Infof("Instance %v starts running, Port %d", instanceName, status.Port) +logrus.Infof("Instance %v: updated %s to %v", instanceName, "Storage IP", status.StorageIP) +logrus.Infof("Instance %v: updated %s to %v", instanceName, "IP", status.IP) +logrus.Infof("Instance %v: updated %s to %v", instanceName, "Port", status.Port)
883-995
: Add documentation for helper functionsThese helper functions would benefit from documentation explaining their purpose and behavior:
isVolumeBeingSwitchedBack
isTargetInstanceReplacementCreated
isTargetInstanceRemote
isDataEngineNotBeingLiveUpgraded
Example:
// isVolumeBeingSwitchedBack returns true if the volume is being switched back to its original node, // which occurs when the target node matches the current node but differs from the current target node. func isVolumeBeingSwitchedBack(spec *longhorn.InstanceSpec, status *longhorn.InstanceStatus) bool {controller/engine_controller.go (3)
437-467
: LGTM with a minor suggestion for error handlingThe implementation correctly handles both initiator and target instance managers. Consider adding error context when returning errors from GetInstanceManagerByInstanceRO to help with debugging.
- return nil, "", "", err + return nil, "", "", errors.Wrapf(err, "failed to get instance manager for initiator")
Line range hint
2419-2476
: Comprehensive upgrade implementation for v2 data engineThe implementation properly validates both initiator and target instance managers before proceeding with the upgrade. However, consider adding a check for concurrent upgrades to prevent potential race conditions.
Consider implementing a mutex or other synchronization mechanism to prevent concurrent upgrades of the same engine.
1041-1078
: Consider structured error types for better error handlingThe error handling is comprehensive but could benefit from defined error types for specific failure scenarios.
Consider creating custom error types for common failure scenarios:
type EngineUpgradeError struct { Reason string Details error } func (e *EngineUpgradeError) Error() string { return fmt.Sprintf("engine upgrade failed: %s: %v", e.Reason, e.Details) }controller/volume_controller.go (1)
1007-1012
: Improve comment clarity for v2 volume replica image checkThe comment could be more descriptive about why v2 volume replicas can have different images.
- // For a v2 volume, the instance manager image of a replica can be different from the one of its volume + // For a v2 volume, replicas use the instance manager image which can differ from the volume's engine image
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
📒 Files selected for processing (6)
controller/engine_controller.go
(11 hunks)controller/instance_handler.go
(15 hunks)controller/instance_handler_test.go
(4 hunks)controller/utils.go
(0 hunks)controller/volume_controller.go
(16 hunks)controller/volume_controller_test.go
(1 hunks)
💤 Files with no reviewable changes (1)
- controller/utils.go
🚧 Files skipped from review as they are similar to previous changes (1)
- controller/instance_handler_test.go
🧰 Additional context used
📓 Learnings (1)
controller/engine_controller.go (1)
Learnt from: derekbit
PR: longhorn/longhorn-manager#3282
File: controller/engine_controller.go:524-527
Timestamp: 2024-11-25T12:39:58.926Z
Learning: In `controller/engine_controller.go`, `e.Status.Port` is sourced from the SPDK engine and does not require additional validation.
🪛 golangci-lint (1.61.0)
controller/volume_controller.go
3208-3208: SA9003: empty branch
(staticcheck)
🪛 GitHub Check: CodeFactor
controller/volume_controller.go
[notice] 3221-3221: controller/volume_controller.go#L3221
If block ends with a return statement, so drop this else and outdent its block (move short variable declaration to its own line if necessary) (indent-error-flow)
[notice] 3208-3208: controller/volume_controller.go#L3208
If block ends with a return statement, so drop this else and outdent its block. (indent-error-flow)
🔇 Additional comments (9)
controller/instance_handler.go (3)
38-46
: LGTM: Interface changes align with v2 data engine requirements
The new methods added to InstanceManagerHandler
interface provide a comprehensive set of operations needed for v2 data engine live upgrades, including instance suspension, resumption, and target management.
224-227
: Resolve error handling uncertainty
The TODO comment about error handling when checking node upgrade status should be addressed.
38-39
: Verify usage of isInstanceOnRemoteNode parameter
The addition of isInstanceOnRemoteNode
parameter to GetInstance
and CreateInstance
methods requires verification of all callers.
✅ Verification successful
Based on the search results, I can now generate the final response:
All callers properly handle the isInstanceOnRemoteNode parameter
The codebase analysis shows that all calls to GetInstance
and CreateInstance
methods properly handle the new boolean parameter:
- Engine controller correctly uses
false
for initiator instances andtrue
for target instances during switchover - Instance handler consistently uses
false
for local instances andtrue
for remote instances - All implementations (EngineController, ReplicaController, MockInstanceManagerHandler) have been updated with the parameter
- No instances of incorrect parameter usage were found
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for all calls to GetInstance and CreateInstance to verify proper parameter usage
rg -A 2 'GetInstance\(' --type go
rg -A 2 'CreateInstance\(' --type go
Length of output: 6021
controller/volume_controller_test.go (1)
506-510
:
Verify field definitions and usage consistency
The test is setting new engine status fields (TargetIP
, StorageTargetIP
, TargetPort
) to empty values during volume detachment. However, based on past review comments, these fields may not be properly defined in the EngineStatus
struct.
If these fields are not properly defined in the EngineStatus
struct, please:
- Add the field definitions to the struct
- Update all relevant controllers and handlers to properly manage these fields
- Document the purpose and lifecycle of these fields
✅ Verification successful
Based on the search results, I can see that the fields TargetIP
, StorageIP
, StorageTargetIP
, Port
, and TargetPort
are properly defined in the Longhorn API types at k8s/pkg/apis/longhorn/v1beta2/instancemanager.go
:
TargetIP string `json:"targetIP"`
StorageTargetIP string `json:"storageTargetIP"`
TargetPort int `json:"targetPort"`
The fields are also actively used throughout the codebase, particularly in:
controller/instance_handler.go
- For managing instance states and IP/port assignmentscontroller/engine_controller.go
- For handling target switchover operationsengineapi/instance_manager.go
- For instance manager operations
The test code in controller/volume_controller_test.go
is correctly resetting these fields during volume detachment, which is consistent with the codebase's behavior.
The fields are properly defined and used consistently across the codebase
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Verify the existence and proper usage of new engine status fields
# Check if these fields are defined in the EngineStatus struct
ast-grep --pattern $'type EngineStatus struct {
$$$
TargetIP string
StorageTargetIP string
TargetPort int
$$$
}'
# Check for any other usages of these fields in the codebase
rg -A 2 'TargetIP|StorageTargetIP|TargetPort' --type go
Length of output: 31246
controller/engine_controller.go (2)
646-673
: LGTM! Well-structured suspend implementation
The suspend implementation has proper validation and error handling.
704-760
: Verify port handling during target switchover
The implementation looks good but the port selection logic needs verification:
- When
targetInstance.Status.TargetPortStart
is 0 - When both port options are 0
✅ Verification successful
Let me gather more information about the port handling logic.
Let me try a different approach to understand the port handling logic.
Let me check one more aspect of port handling, specifically focusing on the instance manager's port allocation and validation.
Based on the code analysis, I can now provide a final response about the port handling during target switchover.
Port handling during target switchover is properly implemented
The implementation has proper safeguards:
- The code checks for zero ports and handles them appropriately in
SwitchOverTarget
- When
targetInstance.Status.StandbyTargetPortStart
is non-zero, it's preferred overTargetPortStart
- The system validates port availability before operations
- There are explicit validations preventing operations with invalid port configurations (e.g., IP present but port = 0)
The port selection logic in the switchover implementation is safe and follows the established patterns in the codebase for port handling.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check for potential port conflicts or zero port assignments
# Test: Search for port assignments in engine instances
rg -A 3 "Status.TargetPortStart.*=.*0"
Length of output: 652
Script:
#!/bin/bash
# Check port initialization and assignment logic
ast-grep --pattern 'Status.TargetPortStart = $_'
# Check port validation or error handling
rg -B 2 -A 2 "TargetPortStart.*0.*error"
# Check instance manager port allocation
ast-grep --pattern 'func ($_) AllocatePort($$$)'
Length of output: 144
Script:
#!/bin/bash
# Check how ports are handled in instance manager
rg -A 5 "func.*AllocatePort"
# Look for port initialization in engine instance
rg -B 2 -A 5 "PortStart.*:=|TargetPortStart.*:="
# Check for port validation
rg -B 2 -A 3 "if.*Port.*== 0"
Length of output: 7486
Script:
#!/bin/bash
# Check instance manager port initialization and validation
rg -B 3 -A 5 "instance.*Status.*Port.*Start.*="
# Check error handling for port allocation
rg -B 2 -A 3 "failed.*allocate.*port"
# Look for port validation in engine instance creation
rg -B 3 -A 5 "CreateEngine.*port"
Length of output: 1698
controller/volume_controller.go (3)
3817-3822
: LGTM: V2 replica image handling
The code correctly handles the image setting for v2 replicas by allowing it to be empty and different from the volume image.
5086-5094
: LGTM: Volume upgrade check logic
The code properly handles upgrade checks for both v1 and v2 data engines by considering both image updates and target node changes.
1827-1836
: Verify engine version before starting replica
The code correctly handles different replica startup logic for v1 and v2 engines, but should verify engine existence first.
✅ Verification successful
Let me gather more context about the code surrounding the replica startup logic to verify if engine existence is properly checked.
Let me check the function containing the replica startup logic to verify if engine existence is checked before this code block.
Let me try a different approach to find the function containing this code.
Let me check the openVolumeDependentResources
function signature and its usage to verify if engine existence is properly checked.
Based on the gathered evidence, I can now provide the final response:
Engine existence check is properly handled
The code is safe as the openVolumeDependentResources
function is only called in volume state transitions where the engine existence is already verified. The function is called in three scenarios:
- During volume attachment (VolumeStateDetached → VolumeStateAttaching)
- During volume attached state to handle new replicas
- During volume migration
All these calls occur after engine validation in the volume controller reconciliation loop.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Verify engine existence check before replica startup
ast-grep --pattern 'if e == nil {
$$$
return $_
}'
Length of output: 9416
Script:
#!/bin/bash
# Get context around the replica startup code
rg -B 10 -A 10 "if r.Spec.FailedAt == \"\"" controller/volume_controller.go
Length of output: 6095
Script:
#!/bin/bash
# Get the entire function containing the replica startup logic
ast-grep --pattern 'func $_(c *VolumeController, $_, $_, $_ *longhorn.Engine) error {
$$$
if r.Spec.FailedAt == "" {
$$$
}
$$$
}'
Length of output: 143
Script:
#!/bin/bash
# Find the function containing this code by searching for unique strings around it
rg -B 30 "canInstanceManagerLaunchReplica" controller/volume_controller.go
Length of output: 2311
Script:
#!/bin/bash
# Find where openVolumeDependentResources is called
rg "openVolumeDependentResources" controller/volume_controller.go -B 5 -A 5
Length of output: 2446
82fdc88
to
00e7be5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 10
🧹 Outside diff range and nitpick comments (21)
k8s/pkg/apis/longhorn/v1beta2/dataengineupgrademanager.go (4)
5-15
: Consider enhancing field validation and documentationThe spec structure could benefit from additional validation and documentation:
- The
Nodes
field could use validation to ensure node names are valid:// Nodes specifies the list of nodes to perform the data engine upgrade on. // If empty, the upgrade will be performed on all available nodes. // +optional +// +kubebuilder:validation:Pattern=`^[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*$` Nodes []string `json:"nodes"`
- Consider adding more detailed documentation about:
- The upgrade process flow
- Prerequisites for upgrade
- Impact on running workloads
17-23
: Document possible states and message formatConsider adding documentation to clarify:
- The possible values for
UpgradeState
- The expected format and content of the
Message
fieldExample:
// UpgradeNodeStatus defines the state of the node upgrade process type UpgradeNodeStatus struct { + // State represents the current state of the upgrade process. + // Possible values: "Pending", "InProgress", "Completed", "Failed" // +optional State UpgradeState `json:"state"` + // Message provides detailed information about the current state, + // including any error details if the state is "Failed" // +optional Message string `json:"message"` }
25-39
: Add field validation and clarify status transitionsConsider enhancing the status structure with:
- Validation for
InstanceManagerImage
:// +optional +// +kubebuilder:validation:Pattern=`^[^:]+:[^:]+$` InstanceManagerImage string `json:"instanceManagerImage"`
- Documentation about:
- The relationship between
UpgradingNode
andUpgradeNodes
map- How the
OwnerID
is determined and its significance- Status transition flow between different states
41-57
: Consider adding more printer columns for better observabilityThe current printer columns are good, but consider adding:
- Age column to track resource lifetime
- Message column for quick status checks
Example:
// +kubebuilder:printcolumn:name="Upgrading Node",type=string,JSONPath=`.status.upgradingNode`,description="The node that is currently being upgraded" +// +kubebuilder:printcolumn:name="Message",type=string,JSONPath=`.status.message`,description="The current status message" +// +kubebuilder:printcolumn:name="Age",type=date,JSONPath=`.metadata.creationTimestamp`k8s/pkg/apis/longhorn/v1beta2/instancemanager.go (2)
95-96
: Consider adding node ID validationThe new
TargetNodeID
field should validate that the specified node exists and is ready to receive the instance during live upgrade.Consider adding a validation rule similar to:
// +optional +// +kubebuilder:validation:Pattern=^[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*$ TargetNodeID string `json:"targetNodeID"`
115-119
: LGTM: Comprehensive status tracking for live upgradesThe new status fields provide good observability for the live upgrade process. The separation of current and target information allows for proper tracking of the upgrade state.
Consider implementing a status condition type specifically for upgrade progress to provide a more standardized way to track the upgrade state. This would align with Kubernetes patterns and make it easier to integrate with tools like kubectl wait.
Also applies to: 129-132
controller/node_upgrade_controller.go (3)
57-57
: Track TODO comment with an issueThe TODO comment about removing the wrapper should be tracked with a GitHub issue for future follow-up.
Would you like me to create a GitHub issue to track this TODO?
235-244
: Consider consolidating monitor cleanup logicThe monitor cleanup logic is duplicated between the deletion case and the completion/error case. Consider extracting this into a helper method to maintain DRY principles.
+func (uc *NodeDataEngineUpgradeController) cleanupMonitor() { + if uc.nodeDataEngineUpgradeMonitor != nil { + uc.nodeDataEngineUpgradeMonitor.Close() + uc.nodeDataEngineUpgradeMonitor = nil + } +} func (uc *NodeDataEngineUpgradeController) reconcile(upgradeName string) (err error) { // ... if !nodeUpgrade.DeletionTimestamp.IsZero() { - if uc.nodeDataEngineUpgradeMonitor != nil { - uc.nodeDataEngineUpgradeMonitor.Close() - uc.nodeDataEngineUpgradeMonitor = nil - } + uc.cleanupMonitor() return uc.ds.RemoveFinalizerForNodeDataEngineUpgrade(nodeUpgrade) } // ... if nodeUpgrade.Status.State == longhorn.UpgradeStateCompleted || nodeUpgrade.Status.State == longhorn.UpgradeStateError { uc.updateNodeDataEngineUpgradeStatus(nodeUpgrade) - uc.nodeDataEngineUpgradeMonitor.Close() - uc.nodeDataEngineUpgradeMonitor = nil + uc.cleanupMonitor() }
259-267
: Add validation for status fieldsThe status update logic should validate the fields before assignment to prevent potential issues with invalid states or messages.
+func isValidUpgradeState(state longhorn.UpgradeState) bool { + switch state { + case longhorn.UpgradeStateInProgress, + longhorn.UpgradeStateCompleted, + longhorn.UpgradeStateError: + return true + } + return false +} func (uc *NodeDataEngineUpgradeController) updateNodeDataEngineUpgradeStatus(nodeUpgrade *longhorn.NodeDataEngineUpgrade) { // ... + if !isValidUpgradeState(status.State) { + log.Errorf("Invalid upgrade state: %v", status.State) + return + } nodeUpgrade.Status.State = status.State nodeUpgrade.Status.Message = status.Messagedatastore/datastore.go (1)
93-96
: Consider reordering fields alphabeticallyThe new fields
dataEngineUpgradeManagerLister
andnodeDataEngineUpgradeLister
should ideally be placed in alphabetical order within the struct to maintain consistency with other fields.Apply this reordering:
- dataEngineUpgradeManagerLister lhlisters.DataEngineUpgradeManagerLister - DataEngineUpgradeManagerInformer cache.SharedInformer - nodeDataEngineUpgradeLister lhlisters.NodeDataEngineUpgradeLister - NodeDataEngineUpgradeInformer cache.SharedInformer + dataEngineUpgradeManagerLister lhlisters.DataEngineUpgradeManagerLister + DataEngineUpgradeManagerInformer cache.SharedInformer + engineImageLister lhlisters.EngineImageLister + EngineImageInformer cache.SharedInformer + engineLister lhlisters.EngineLister + EngineInformer cache.SharedInformer + nodeDataEngineUpgradeLister lhlisters.NodeDataEngineUpgradeLister + NodeDataEngineUpgradeInformer cache.SharedInformercontroller/instance_handler.go (1)
58-165
: Consider refactoring to reduce complexityThe
syncStatusIPsAndPorts
function has deep nesting and repeated error handling patterns. Consider breaking it down into smaller functions:
syncBasicInstanceStatus
for basic IP/port syncsyncTargetInstanceStatus
for v2 data engine target instance synccontroller/volume_controller_test.go (1)
Line range hint
25-1000
: Consider improving test organization and documentationWhile the test cases are comprehensive, consider:
- Organizing test cases into logical groups using subtests
- Adding comments to explain complex test scenarios
- Using table-driven tests for similar scenarios
Example refactor:
func (s *TestSuite) TestVolumeLifeCycle(c *C) { + // Group test cases by lifecycle phase + t.Run("Creation", func(t *testing.T) { + // Volume creation test cases + }) + t.Run("Attachment", func(t *testing.T) { + // Volume attachment test cases + })controller/engine_controller.go (5)
437-467
: LGTM! Consider enhancing error handling for edge cases.The method effectively retrieves instance manager and IPs for both initiator and target instances. The implementation is clean and well-structured.
Consider adding validation for empty IP addresses and handling the case where instance manager exists but has no IP:
func (ec *EngineController) findInstanceManagerAndIPs(obj interface{}) (im *longhorn.InstanceManager, initiatorIP string, targetIP string, err error) { // ... existing code ... initiatorIP = initiatorIM.Status.IP + if initiatorIP == "" { + return nil, "", "", fmt.Errorf("initiator instance manager %v has no IP", initiatorIM.Name) + } targetIP = initiatorIM.Status.IP im = initiatorIM // ... existing code ... if e.Spec.TargetNodeID != "" { // ... existing code ... targetIP = targetIM.Status.IP + if targetIP == "" { + return nil, "", "", fmt.Errorf("target instance manager %v has no IP", targetIM.Name) + } }
2419-2465
: LGTM! Consider enhancing logging for better debugging.The implementation thoroughly validates both initiator and target instances before proceeding with the v2 data engine upgrade.
Consider adding structured logging to help with debugging upgrade issues:
// Check if the initiator instance is running im, err := ec.ds.GetRunningInstanceManagerByNodeRO(e.Spec.NodeID, longhorn.DataEngineTypeV2) if err != nil { + log.WithError(err).WithFields(logrus.Fields{ + "node": e.Spec.NodeID, + "engine": e.Name, + }).Error("Failed to get running instance manager for initiator") return err }
646-702
: Consider extracting common validation logic.Both SuspendInstance and ResumeInstance share similar validation patterns that could be extracted into a helper function.
Consider refactoring the common validation logic:
+func (ec *EngineController) validateEngineInstanceOp(e *longhorn.Engine, op string) error { + if !types.IsDataEngineV2(e.Spec.DataEngine) { + return fmt.Errorf("%v engine instance is not supported for data engine %v", op, e.Spec.DataEngine) + } + if e.Spec.VolumeName == "" || e.Spec.NodeID == "" { + return fmt.Errorf("missing parameters for engine instance %v: %+v", op, e) + } + return nil +} func (ec *EngineController) SuspendInstance(obj interface{}) error { e, ok := obj.(*longhorn.Engine) if !ok { return fmt.Errorf("invalid object for engine instance suspension: %v", obj) } - if !types.IsDataEngineV2(e.Spec.DataEngine) { - return fmt.Errorf("suspending engine instance is not supported for data engine %v", e.Spec.DataEngine) - } - if e.Spec.VolumeName == "" || e.Spec.NodeID == "" { - return fmt.Errorf("missing parameters for engine instance suspension: %+v", e) - } + if err := ec.validateEngineInstanceOp(e, "suspend"); err != nil { + return err + }
704-760
: Consider breaking down the complex switchover logic.While the implementation is correct, the method could be more maintainable if broken down into smaller, focused functions.
Consider refactoring into smaller functions:
+func (ec *EngineController) validateSwitchOverTarget(e *longhorn.Engine) error { + if !types.IsDataEngineV2(e.Spec.DataEngine) { + return fmt.Errorf("target switchover is not supported for data engine %v", e.Spec.DataEngine) + } + if e.Spec.VolumeName == "" || e.Spec.NodeID == "" { + return fmt.Errorf("missing parameters for target switchover: %+v", e) + } + return nil +} +func (ec *EngineController) getPortForSwitchOver(targetInstance *longhorn.InstanceProcess) int { + port := targetInstance.Status.TargetPortStart + if targetInstance.Status.StandbyTargetPortStart != 0 { + port = targetInstance.Status.StandbyTargetPortStart + } + return port +} func (ec *EngineController) SwitchOverTarget(obj interface{}) error { e, ok := obj.(*longhorn.Engine) if !ok { return fmt.Errorf("invalid object for target switchover: %v", obj) } - // ... existing validation code ... + if err := ec.validateSwitchOverTarget(e); err != nil { + return err + } // ... rest of the implementation ... - port := targetInstance.Status.TargetPortStart - if targetInstance.Status.StandbyTargetPortStart != 0 { - port = targetInstance.Status.StandbyTargetPortStart - } + port := ec.getPortForSwitchOver(targetInstance)
786-823
: LGTM! Consider reusing validation logic.The DeleteTarget implementation is solid with proper validation and error handling.
Consider reusing the previously suggested validation helper:
func (ec *EngineController) DeleteTarget(obj interface{}) error { e, ok := obj.(*longhorn.Engine) if !ok { return fmt.Errorf("invalid object for engine target deletion: %v", obj) } - if !types.IsDataEngineV2(e.Spec.DataEngine) { - return fmt.Errorf("deleting target for engine instance is not supported for data engine %v", e.Spec.DataEngine) - } + if err := ec.validateEngineInstanceOp(e, "delete target"); err != nil { + return err + }webhook/resources/nodedataengineupgrade/validator.go (2)
50-53
: Consider supporting future data engine types or provide clearer guidanceCurrently, the validator only supports
longhorn.DataEngineTypeV2
. If future data engine types are introduced, this hard-coded check may become a maintenance burden. Consider revising the validation to accommodate extensibility or provide clearer error messages.
78-92
: Consolidate immutable field checks to reduce code duplicationThe multiple if-statements checking for immutability of fields can be consolidated into a loop or helper function to improve readability and maintainability.
Apply this diff to refactor the immutability checks:
+immutableFields := map[string]string{ + "spec.nodeID": oldNodeUpgrade.Spec.NodeID, + "spec.dataEngine": string(oldNodeUpgrade.Spec.DataEngine), + "spec.instanceManagerImage": oldNodeUpgrade.Spec.InstanceManagerImage, + "spec.dataEngineUpgradeManager": oldNodeUpgrade.Spec.DataEngineUpgradeManager, +} + +for fieldPath, oldValue := range immutableFields { + newValue := getFieldValue(newNodeUpgrade, fieldPath) + if oldValue != newValue { + return werror.NewInvalidError(fmt.Sprintf("%s field is immutable", fieldPath), fieldPath) + } +}You'll need to implement the
getFieldValue
function to retrieve the field value based on thefieldPath
.controller/upgrade_manager_controller.go (1)
57-60
: Address the TODO: Remove the wrapper when clients have moved to use the clientsetThe TODO comment indicates an action item to remove the wrapper once all clients have migrated to use the clientset. Please ensure this task is tracked and addressed to keep the codebase clean.
Would you like assistance in updating the code or opening a new GitHub issue to track this task?
controller/monitor/node_upgrade_monitor.go (1)
98-98
: Name the parameter in theUpdateConfiguration
methodThe
UpdateConfiguration
method has an unnamed parameter of typemap[string]interface{}
. Providing a name enhances code readability and adheres to Go best practices.Update the function signature to include a parameter name:
- func (m *NodeDataEngineUpgradeMonitor) UpdateConfiguration(map[string]interface{}) error { + func (m *NodeDataEngineUpgradeMonitor) UpdateConfiguration(config map[string]interface{}) error {
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
📒 Files selected for processing (50)
controller/backup_controller.go
(1 hunks)controller/controller_manager.go
(2 hunks)controller/engine_controller.go
(11 hunks)controller/instance_handler.go
(15 hunks)controller/instance_handler_test.go
(4 hunks)controller/monitor/node_upgrade_monitor.go
(1 hunks)controller/monitor/upgrade_manager_monitor.go
(1 hunks)controller/node_controller.go
(2 hunks)controller/node_upgrade_controller.go
(1 hunks)controller/replica_controller.go
(5 hunks)controller/uninstall_controller.go
(4 hunks)controller/upgrade_manager_controller.go
(1 hunks)controller/utils.go
(0 hunks)controller/volume_controller.go
(16 hunks)controller/volume_controller_test.go
(1 hunks)datastore/datastore.go
(3 hunks)datastore/longhorn.go
(6 hunks)engineapi/instance_manager.go
(5 hunks)engineapi/instance_manager_test.go
(1 hunks)k8s/crds.yaml
(84 hunks)k8s/pkg/apis/longhorn/v1beta2/dataengineupgrademanager.go
(1 hunks)k8s/pkg/apis/longhorn/v1beta2/instancemanager.go
(6 hunks)k8s/pkg/apis/longhorn/v1beta2/node.go
(2 hunks)k8s/pkg/apis/longhorn/v1beta2/nodedataengineupgrade.go
(1 hunks)k8s/pkg/apis/longhorn/v1beta2/register.go
(1 hunks)k8s/pkg/apis/longhorn/v1beta2/volume.go
(2 hunks)k8s/pkg/apis/longhorn/v1beta2/zz_generated.deepcopy.go
(4 hunks)k8s/pkg/client/clientset/versioned/typed/longhorn/v1beta2/dataengineupgrademanager.go
(1 hunks)k8s/pkg/client/clientset/versioned/typed/longhorn/v1beta2/fake/fake_dataengineupgrademanager.go
(1 hunks)k8s/pkg/client/clientset/versioned/typed/longhorn/v1beta2/fake/fake_longhorn_client.go
(2 hunks)k8s/pkg/client/clientset/versioned/typed/longhorn/v1beta2/fake/fake_nodedataengineupgrade.go
(1 hunks)k8s/pkg/client/clientset/versioned/typed/longhorn/v1beta2/generated_expansion.go
(2 hunks)k8s/pkg/client/clientset/versioned/typed/longhorn/v1beta2/longhorn_client.go
(3 hunks)k8s/pkg/client/clientset/versioned/typed/longhorn/v1beta2/nodedataengineupgrade.go
(1 hunks)k8s/pkg/client/informers/externalversions/generic.go
(2 hunks)k8s/pkg/client/informers/externalversions/longhorn/v1beta2/dataengineupgrademanager.go
(1 hunks)k8s/pkg/client/informers/externalversions/longhorn/v1beta2/interface.go
(4 hunks)k8s/pkg/client/informers/externalversions/longhorn/v1beta2/nodedataengineupgrade.go
(1 hunks)k8s/pkg/client/listers/longhorn/v1beta2/dataengineupgrademanager.go
(1 hunks)k8s/pkg/client/listers/longhorn/v1beta2/expansion_generated.go
(2 hunks)k8s/pkg/client/listers/longhorn/v1beta2/nodedataengineupgrade.go
(1 hunks)scheduler/replica_scheduler.go
(1 hunks)types/types.go
(4 hunks)webhook/resources/dataengineupgrademanager/mutator.go
(1 hunks)webhook/resources/dataengineupgrademanager/validator.go
(1 hunks)webhook/resources/nodedataengineupgrade/mutator.go
(1 hunks)webhook/resources/nodedataengineupgrade/validator.go
(1 hunks)webhook/resources/volume/validator.go
(5 hunks)webhook/server/mutation.go
(2 hunks)webhook/server/validation.go
(2 hunks)
💤 Files with no reviewable changes (1)
- controller/utils.go
✅ Files skipped from review due to trivial changes (3)
- k8s/pkg/client/clientset/versioned/typed/longhorn/v1beta2/fake/fake_nodedataengineupgrade.go
- k8s/pkg/client/informers/externalversions/longhorn/v1beta2/dataengineupgrademanager.go
- k8s/pkg/client/listers/longhorn/v1beta2/nodedataengineupgrade.go
🚧 Files skipped from review as they are similar to previous changes (23)
- controller/instance_handler_test.go
- controller/monitor/upgrade_manager_monitor.go
- controller/uninstall_controller.go
- engineapi/instance_manager_test.go
- k8s/pkg/apis/longhorn/v1beta2/node.go
- k8s/pkg/apis/longhorn/v1beta2/nodedataengineupgrade.go
- k8s/pkg/apis/longhorn/v1beta2/register.go
- k8s/pkg/apis/longhorn/v1beta2/volume.go
- k8s/pkg/client/clientset/versioned/typed/longhorn/v1beta2/dataengineupgrademanager.go
- k8s/pkg/client/clientset/versioned/typed/longhorn/v1beta2/fake/fake_dataengineupgrademanager.go
- k8s/pkg/client/clientset/versioned/typed/longhorn/v1beta2/fake/fake_longhorn_client.go
- k8s/pkg/client/clientset/versioned/typed/longhorn/v1beta2/generated_expansion.go
- k8s/pkg/client/clientset/versioned/typed/longhorn/v1beta2/longhorn_client.go
- k8s/pkg/client/clientset/versioned/typed/longhorn/v1beta2/nodedataengineupgrade.go
- k8s/pkg/client/informers/externalversions/longhorn/v1beta2/interface.go
- k8s/pkg/client/informers/externalversions/longhorn/v1beta2/nodedataengineupgrade.go
- scheduler/replica_scheduler.go
- webhook/resources/dataengineupgrademanager/mutator.go
- webhook/resources/dataengineupgrademanager/validator.go
- webhook/resources/nodedataengineupgrade/mutator.go
- webhook/resources/volume/validator.go
- webhook/server/mutation.go
- webhook/server/validation.go
🧰 Additional context used
📓 Learnings (2)
controller/engine_controller.go (1)
Learnt from: derekbit
PR: longhorn/longhorn-manager#3282
File: controller/engine_controller.go:524-527
Timestamp: 2024-11-25T12:39:58.926Z
Learning: In `controller/engine_controller.go`, `e.Status.Port` is sourced from the SPDK engine and does not require additional validation.
controller/monitor/node_upgrade_monitor.go (1)
Learnt from: james-munson
PR: longhorn/longhorn-manager#3211
File: app/post_upgrade.go:102-113
Timestamp: 2024-11-10T16:45:04.898Z
Learning: In Go, when a deferred function references a variable like `err`, ensure that the variable is declared in the outer scope and not within an inner scope (such as within `if err := ...`), to prevent compilation errors and unintended variable shadowing.
🪛 golangci-lint (1.61.0)
controller/volume_controller.go
3208-3208: SA9003: empty branch
(staticcheck)
🪛 GitHub Check: CodeFactor
controller/volume_controller.go
[notice] 3221-3221: controller/volume_controller.go#L3221
If block ends with a return statement, so drop this else and outdent its block (move short variable declaration to its own line if necessary) (indent-error-flow)
[notice] 3208-3208: controller/volume_controller.go#L3208
If block ends with a return statement, so drop this else and outdent its block. (indent-error-flow)
🔇 Additional comments (66)
k8s/pkg/apis/longhorn/v1beta2/dataengineupgrademanager.go (2)
59-66
: LGTM!
The list type implementation follows Kubernetes API conventions correctly.
1-66
: Verify controller integration and usage patterns
Let's verify how this new API type is used in the codebase:
✅ Verification successful
Integration is properly implemented and verified
Based on the code analysis, the DataEngineUpgradeManager API type is well-integrated with necessary components:
- Controller implementation exists with proper structure and dependencies
- Monitor implementation is in place with sync mechanisms
- Webhook validator is implemented with create/update validations
- All key files are present and contain appropriate implementations:
controller/upgrade_manager_controller.go
controller/monitor/upgrade_manager_monitor.go
webhook/resources/dataengineupgrademanager/validator.go
The integration includes proper validation rules, monitoring mechanisms, and controller reconciliation structure, following the standard Kubernetes controller pattern.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check controller implementations and usage patterns
# Check for controller implementations
echo "Checking for controller implementations..."
rg "DataEngineUpgradeManager" -t go -l
# Check for usage in reconciliation loops
echo "Checking for reconciliation patterns..."
ast-grep --pattern 'func \($_\) Reconcile\($_\) {
$$$
DataEngineUpgradeManager
$$$
}'
# Check for webhook implementations
echo "Checking for webhook validations..."
rg "DataEngineUpgradeManager.*Validate" -t go
Length of output: 1897
Script:
#!/bin/bash
# Let's check the controller implementations and reconciliation patterns more thoroughly
# Check upgrade manager controller implementation
echo "=== Checking upgrade manager controller implementation ==="
rg -A 10 "type UpgradeManagerController struct" controller/upgrade_manager_controller.go
# Check reconciliation loop in the controller
echo -e "\n=== Checking reconciliation implementation ==="
rg -A 20 "func.*Reconcile.*UpgradeManager" controller/upgrade_manager_controller.go
# Check monitor implementation
echo -e "\n=== Checking monitor implementation ==="
rg -A 10 "type UpgradeManagerMonitor struct" controller/monitor/upgrade_manager_monitor.go
# Check webhook validations
echo -e "\n=== Checking webhook validations ==="
rg -A 10 "func.*Validate" webhook/resources/dataengineupgrademanager/validator.go
Length of output: 773
Script:
#!/bin/bash
# Let's check for the existence and content of these files first
echo "=== Checking existence of key files ==="
for file in \
"controller/upgrade_manager_controller.go" \
"controller/monitor/upgrade_manager_monitor.go" \
"webhook/resources/dataengineupgrademanager/validator.go"
do
if [ -f "$file" ]; then
echo "Found: $file"
else
echo "Missing: $file"
fi
done
# Let's check for any files containing upgrade manager related code
echo -e "\n=== Searching for upgrade manager related code ==="
rg -g '!*.generated.go' -g '!*fake*.go' -g '!*test*.go' -g '!*zz_generated*' \
-l "DataEngineUpgradeManager|UpgradeManager" --type go
# Let's check imports of the new type
echo -e "\n=== Checking imports of DataEngineUpgradeManager ==="
rg "import.*longhorn.*v1beta2" -A 5 -g '!*.generated.go' -g '!*fake*.go' -g '!*test*.go' -g '!*zz_generated*' --type go
Length of output: 2822
Script:
#!/bin/bash
# Let's examine the content of key files to understand the integration
echo "=== Checking controller manager registration ==="
rg -A 10 "NewUpgradeManagerController|RegisterController.*UpgradeManager" controller/controller_manager.go
echo -e "\n=== Checking upgrade manager controller structure and initialization ==="
head -n 50 controller/upgrade_manager_controller.go
echo -e "\n=== Checking monitor initialization and structure ==="
head -n 50 controller/monitor/upgrade_manager_monitor.go
echo -e "\n=== Checking webhook validator ==="
head -n 50 webhook/resources/dataengineupgrademanager/validator.go
Length of output: 4981
k8s/pkg/client/listers/longhorn/v1beta2/dataengineupgrademanager.go (4)
1-26
: LGTM! File header and imports are well-structured.
The file follows Kubernetes code generation conventions with proper license header and necessary imports.
55-58
: LGTM! Namespace-specific methods are properly implemented.
The implementation correctly handles namespace scoping for the DataEngineUpgradeManager resource.
60-94
: LGTM! Namespace lister implementation is robust.
The implementation correctly handles listing, retrieving, and error cases for namespace-scoped resources.
Let's verify error handling consistency across the codebase:
✅ Verification successful
Based on the search results, I can see that:
-
The error handling pattern using
errors.NewNotFound(v1beta2.Resource("dataengineupgrademanager"), name)
is consistently used across all listers in the codebase, both in the Longhorn project and in the standard Kubernetes libraries. -
The
Resource()
function implementation is consistent across all packages, including the Longhorn v1beta2 package, where it correctly returns aschema.GroupResource
by usingSchemeGroupVersion.WithResource(resource).GroupResource()
.
Therefore, I can generate the final response:
Error handling in DataEngineUpgradeManagerNamespaceLister is consistent with Kubernetes patterns
The implementation follows the standard Kubernetes error handling pattern for "not found" cases, using the correct resource type and group resolution through the v1beta2.Resource()
function.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Verify consistent error handling for NotFound cases
# across similar listers in the codebase
# Search for similar error handling patterns
rg "errors\.NewNotFound.*Resource\(" --type go
# Search for Resource() method implementation
ast-grep --pattern 'func Resource($$$) schema.GroupResource {
$$$
}'
Length of output: 28492
28-53
: LGTM! Interface and implementation follow Kubernetes patterns.
The DataEngineUpgradeManagerLister implementation correctly handles listing and filtering of resources.
Let's verify the DataEngineUpgradeManager CRD definition:
✅ Verification successful
Let me verify the CRD definition with another script since the first search didn't find it.
Let me check the CRD definition in crds.yaml to verify it matches the Go type.
Let me try one more search with a different pattern to find the CRD definition.
DataEngineUpgradeManager resource type and lister implementation verified
The implementation is correct and consistent with the resource definition:
- CRD is properly defined in
k8s/crds.yaml
with matching type name and scope - Go type is defined in
k8s/pkg/apis/longhorn/v1beta2/dataengineupgrademanager.go
- Resource is correctly registered in the informer factory
- Lister implementation follows standard Kubernetes patterns for namespaced resources
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Verify DataEngineUpgradeManager CRD definition exists
# and matches the resource type used in the lister
# Search for CRD definition
rg -l "kind: CustomResourceDefinition.*dataengineupgrademanagers"
# Search for type definition
ast-grep --pattern 'type DataEngineUpgradeManager struct {
$$$
}'
Length of output: 817
Script:
#!/bin/bash
# Search for CRD definition in yaml files
fd -e yaml -e yml -x rg -l "kind:\s+DataEngineUpgradeManager"
# Also check if there's a CRD registration in Go code
rg -l "SchemeGroupVersion.*dataengineupgrademanagers"
Length of output: 188
Script:
#!/bin/bash
# Extract and examine the DataEngineUpgradeManager CRD definition
rg -A 20 "kind: CustomResourceDefinition.*dataengineupgrademanagers" k8s/crds.yaml
Length of output: 84
Script:
#!/bin/bash
# Search for DataEngineUpgradeManager in CRD file with a more lenient pattern
rg -A 10 "DataEngineUpgradeManager" k8s/crds.yaml
# Also check if the type is registered in the scheme
rg -A 5 "DataEngineUpgradeManager" k8s/pkg/client/informers/externalversions/generic.go
Length of output: 2603
k8s/pkg/apis/longhorn/v1beta2/instancemanager.go (2)
61-67
: LGTM: New instance state for upgrade support
The addition of InstanceStateSuspended
state is well-placed and follows existing conventions. This state is essential for managing instances during live upgrades.
Let's verify the state handling in the controllers:
✅ Verification successful
The suspended state is properly integrated across the codebase
The verification shows that InstanceStateSuspended
is well-integrated and handled appropriately in key controller components:
- Used in instance handler for state management and transitions
- Integrated in volume controller for orchestrating instance states
- Handled in engine controller for managing engine state transitions during upgrades
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for suspended state handling in controllers
rg "InstanceStateSuspended" --type go
Length of output: 1144
154-157
: LGTM: Port range allocation for standby instances
The addition of standby port ranges follows the existing pattern and supports the preparation of standby instances during live upgrades.
Let's verify port range allocation logic:
✅ Verification successful
Port range fields are properly integrated and used in the codebase
The verification shows that the standby port range fields are:
- Properly defined in the API types and CRDs
- Used during engine switchover operations to manage standby target instances
- Integrated into instance management logic for handling target instance replacements
- Consistently handled in RPC/protobuf definitions and instance status conversions
Key usage examples found:
- Engine controller uses these ports during switchover operations
- Instance handler checks these fields to determine instance states
- Fields are properly propagated through the instance manager API
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for standby port allocation implementation
ast-grep --pattern 'StandbyTargetPort$$$'
Length of output: 43
Script:
#!/bin/bash
# Search for any references to standby port fields using ripgrep
rg -i "standbyTargetPort" -A 5 -B 5
Length of output: 17147
k8s/pkg/client/listers/longhorn/v1beta2/expansion_generated.go (3)
77-83
: LGTM: DataEngineUpgradeManager interfaces follow the established pattern.
The new DataEngineUpgradeManagerListerExpansion
and DataEngineUpgradeManagerNamespaceListerExpansion
interfaces are correctly defined following the existing pattern in the file. These interfaces support the v2 data engine live upgrade feature by providing extension points for custom methods.
117-123
: LGTM: NodeDataEngineUpgrade interfaces follow the established pattern.
The new NodeDataEngineUpgradeListerExpansion
and NodeDataEngineUpgradeNamespaceListerExpansion
interfaces are correctly defined following the existing pattern in the file. These interfaces complement the data engine upgrade feature by providing node-specific upgrade capabilities.
Line range hint 1-200
: Verify the code generation process.
Since this is a generated file, we should verify that it was generated using the correct version of lister-gen
and that the generation was triggered by appropriate changes in the source types.
✅ Verification successful
Generated code is consistent with source types and generation markers
The verification confirms that:
- Both
DataEngineUpgradeManager
andNodeDataEngineUpgrade
types have proper generation markers:+genclient
for client generation+k8s:deepcopy-gen
for deep copy generation
- These types are part of a properly marked API package (doc.go has
+k8s:deepcopy-gen=package
) - The generated listers are consistent with all types marked for generation in the v1beta2 package
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Verify the code generation setup and source types
# Check for code generation markers in source files
echo "Checking for code generation markers in API types..."
rg --type go "+k8s:deepcopy-gen" k8s/pkg/apis/longhorn/v1beta2/
rg --type go "+genclient" k8s/pkg/apis/longhorn/v1beta2/
# Look for the type definitions that triggered this generation
echo "Looking for DataEngineUpgradeManager and NodeDataEngineUpgrade type definitions..."
rg --type go "type (DataEngineUpgradeManager|NodeDataEngineUpgrade) struct" k8s/pkg/apis/longhorn/v1beta2/
Length of output: 886
Script:
#!/bin/bash
# Description: Verify the code generation setup and source types with escaped patterns
# Check for code generation markers in API types
echo "Checking for code generation markers in API types..."
rg --type go "\+k8s:deepcopy-gen" k8s/pkg/apis/longhorn/v1beta2/
rg --type go "\+genclient" k8s/pkg/apis/longhorn/v1beta2/
# Check the generation tags in the specific files
echo "Checking generation tags in the specific type files..."
cat k8s/pkg/apis/longhorn/v1beta2/nodedataengineupgrade.go | grep -A 1 "// .*gen"
cat k8s/pkg/apis/longhorn/v1beta2/dataengineupgrademanager.go | grep -A 1 "// .*gen"
# Check if there's a generation script or Makefile target
echo "Looking for code generation scripts..."
find . -type f -name Makefile -exec grep -l "lister-gen\|client-gen\|deepcopy-gen" {} \;
Length of output: 8252
k8s/pkg/client/informers/externalversions/generic.go (2)
103-104
: LGTM: DataEngineUpgradeManager informer integration looks correct.
The addition of the dataengineupgrademanagers
resource case follows the established pattern and correctly integrates with the v1beta2 API version.
113-114
: LGTM: NodeDataEngineUpgrade informer integration looks correct.
The addition of the nodedataengineupgrades
resource case follows the established pattern and correctly integrates with the v1beta2 API version.
controller/node_upgrade_controller.go (5)
135-139
: Previous review comment about undefined maxRetries is still valid
205-208
: Previous review comment about finalizer removal error handling is still valid
223-229
: Previous review comment about status update error handling is still valid
252-257
: Previous review comment about nil data check is still valid
275-283
: Previous review comment about variable shadowing is still valid
controller/controller_manager.go (2)
157-165
: LGTM: Controller initialization follows established patterns
The initialization of both new controllers (dataEngineUpgradeManagerController
and nodeDataEngineUpgradeController
) follows the consistent pattern used throughout the codebase, with proper error handling and parameter passing.
226-227
: Verify controller startup dependencies
While the startup code follows the established pattern, please verify if there are any dependencies between these upgrade controllers and other controllers that might require a specific startup order. For example:
- Does the upgrade process depend on the state of volumes, engines, or nodes?
- Could race conditions occur if these controllers start before their dependencies?
Let's check for potential dependencies in the controller implementations:
✅ Verification successful
Let me check the sync implementations of these upgrade controllers to understand their dependencies.
Let me try a different pattern to find the controller implementations.
Let me check the specific dependencies and initialization order in these controllers.
Based on the analysis of the initialization code and controller startup sequence, I can now provide a final response:
Controller startup order is appropriate
The startup order is safe because:
- Both controllers are initialized with the required dependencies (datastore, scheme, kubeClient) before startup
- They are started after core volume controllers but before Kubernetes controllers, which is the correct order since:
- Volume controllers need to be ready first as they manage the core resources
- Upgrade controllers operate independently of Kubernetes controllers
- Both upgrade controllers are started together, which is appropriate as they handle different aspects of the upgrade process:
- DataEngineUpgradeManager handles the overall upgrade strategy
- NodeDataEngineUpgrade handles node-specific upgrade operations
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check for dependencies in the upgrade controllers
# Look for references to other controllers or their resources
echo "Checking DataEngineUpgradeManager controller dependencies..."
rg -A 5 "type DataEngineUpgradeManagerController struct"
echo "Checking NodeDataEngineUpgrade controller dependencies..."
rg -A 5 "type NodeDataEngineUpgradeController struct"
# Look for resource dependencies in the sync methods
echo "Checking sync methods for resource dependencies..."
rg -B 5 -A 10 "func \(.*\) sync\("
Length of output: 34641
Script:
#!/bin/bash
# Check sync implementations of the upgrade controllers
echo "Checking DataEngineUpgradeManager controller sync..."
rg -A 20 "func \(c \*DataEngineUpgradeManagerController\) sync\(" controller/
echo "Checking NodeDataEngineUpgrade controller sync..."
rg -A 20 "func \(c \*NodeDataEngineUpgradeController\) sync\(" controller/
Length of output: 380
Script:
#!/bin/bash
# Search for upgrade controller files and their contents
echo "Searching for upgrade controller files..."
fd -t f "upgrade.*controller.go" controller/
echo "Checking upgrade controller implementations..."
for file in $(fd -t f "upgrade.*controller.go" controller/); do
echo "=== Contents of $file ==="
cat "$file" | grep -A 10 "func.*sync"
done
echo "Checking for volume/engine dependencies..."
rg -l "volume|engine" $(fd -t f "upgrade.*controller.go" controller/)
Length of output: 1880
Script:
#!/bin/bash
# Check controller initialization and dependencies
echo "=== Checking DataEngineUpgradeManager controller initialization ==="
rg -B 5 -A 10 "NewDataEngineUpgradeManagerController" controller/
echo "=== Checking NodeDataEngineUpgrade controller initialization ==="
rg -B 5 -A 10 "NewNodeDataEngineUpgradeController" controller/
echo "=== Checking controller manager initialization order ==="
rg -B 10 -A 10 "dataEngineUpgradeManagerController.*Run|nodeDataEngineUpgradeController.*Run" controller/
Length of output: 6828
datastore/datastore.go (2)
186-189
: LGTM! Proper initialization of informers
The initialization of the new informers follows the established pattern and correctly adds cache syncs for both new resources.
279-282
: LGTM! Consistent initialization of DataStore fields
The initialization of the new fields follows the existing pattern and maintains consistency with the rest of the codebase.
engineapi/instance_manager.go (4)
283-292
: LGTM: Instance process status fields properly extended
The addition of standby target port fields enhances the instance process status tracking capabilities, which is essential for the v2 data engine live upgrade feature.
532-555
: LGTM: Well-implemented replica address filtering
The function correctly handles:
- Proper parsing and validation of network addresses
- Filtering of replicas based on initiator/target comparison
- Error handling for invalid address formats
864-879
:
Fix typo in error message
The error message contains a typo: "date engine" should be "data engine".
Apply this fix:
- return fmt.Errorf("engine suspension for date engine %v is not supported yet", longhorn.DataEngineTypeV1)
+ return fmt.Errorf("engine suspension for data engine %v is not supported yet", longhorn.DataEngineTypeV1)
Likely invalid or redundant comment.
885-885
:
Fix incorrect function comment
The comment incorrectly states that the function suspends the engine instance when it actually resumes it.
-// EngineInstanceResume suspends engine instance
+// EngineInstanceResume resumes engine instance
Likely invalid or redundant comment.
controller/backup_controller.go (1)
599-607
: LGTM! The data engine upgrade check is well implemented.
The added check prevents nodes undergoing data engine upgrades from taking responsibility for new backups, which is crucial for maintaining system stability during upgrades.
Let's verify the usage of this check across the codebase:
✅ Verification successful
The data engine upgrade check is correctly implemented and consistently used across the codebase
The verification shows that DataEngineUpgradeRequested
is properly handled:
- Used in volume validation (webhook)
- Checked in critical controllers (volume, node, engine, backup)
- Managed by the node upgrade monitor
- Has a dedicated helper in the datastore
The implementation in backup controller aligns with the system-wide approach to prevent operations during data engine upgrades.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check for other controllers that might need similar data engine upgrade checks
# to prevent race conditions during upgrades.
# Search for other isResponsibleFor methods
ast-grep --pattern 'func $$$isResponsibleFor$$$'
# Search for DataEngineUpgradeRequested usage
rg "DataEngineUpgradeRequested"
Length of output: 2467
controller/instance_handler.go (4)
38-46
: LGTM: Interface changes are well-structured
The new methods added to InstanceManagerHandler
interface (SuspendInstance
, ResumeInstance
, SwitchOverTarget
, DeleteTarget
, RequireRemoteTargetInstance
) provide a clean API for v2 data engine live upgrade operations.
224-227
: Resolve error handling uncertainty
The TODO comment about error handling when checking node upgrade status should be addressed.
113-113
:
Fix incorrect log message
The log message incorrectly mentions "Target IP" instead of "Target Port".
- logrus.Infof("Instance %v starts running, Target IP %v", instanceName, status.TargetIP)
+ logrus.Infof("Instance %v starts running, Target Port %v", instanceName, status.TargetPort)
Likely invalid or redundant comment.
602-618
: Verify error handling in suspend-resume flow
The suspend-resume flow needs careful error handling verification:
- If suspend fails, the instance remains in its current state
- If switch-over fails, resume is attempted
- If resume fails after switch-over failure, the instance could be left in an inconsistent state
✅ Verification successful
Let me gather more information about the error handling patterns in the suspend-resume flow.
Let me try one more search pattern to find the implementations and error handling.
Based on the code analysis, I can now provide a final response about the error handling in the suspend-resume flow.
Error handling in suspend-resume flow is properly implemented
The error handling implementation follows a robust pattern:
- If suspend fails, the error is returned immediately without any state change, keeping the instance in its original state
- If switch-over fails after successful suspension:
- Resume is attempted as a recovery mechanism
- If resume fails, it's logged but the original switch-over error is returned
- This maintains error transparency while attempting recovery
The implementation properly handles all error cases and maintains consistent state transitions.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check for potential error handling issues in suspend-resume flows
# Search for similar error handling patterns in the codebase
rg -A 5 "SuspendInstance.*ResumeInstance"
# Check for any error recovery mechanisms
ast-grep --pattern 'if err := $_.ResumeInstance($_); err != nil {
$$$
}'
Length of output: 371
Script:
#!/bin/bash
# Search for suspend instance implementation and its error handling
ast-grep --pattern 'func $_.suspendInstance($$$) {
$$$
}'
# Search for switch over target implementation
ast-grep --pattern 'func $_.switchOverTarget($$$) {
$$$
}'
# Look for other places where suspend-resume flow is used
rg -B 2 -A 5 "suspendInstance.*switchOverTarget"
Length of output: 175
Script:
#!/bin/bash
# Search for suspend and resume related functions with broader patterns
rg -B 2 -A 10 "func.*suspend"
rg -B 2 -A 10 "func.*resume"
# Look for instance state handling
rg -B 2 -A 5 "InstanceState.*Suspended"
# Search for instance manager related error handling
rg -B 2 -A 5 "instanceManagerHandler.*Instance.*error"
Length of output: 14658
types/types.go (2)
43-44
: LGTM: Constants are well-defined and properly organized.
The new constants follow the established naming conventions and are appropriately grouped with related constants. They provide clear identifiers for the data engine upgrade feature components.
Also applies to: 163-164, 193-193
1271-1291
: LGTM: Functions are well-implemented and follow established patterns.
The new functions:
GenerateNodeDataEngineUpgradeName
: Properly generates unique names with sufficient randomnessGetDataEngineUpgradeManagerLabels
: Correctly implements label generation for upgrade managerGetNodeDataEngineUpgradeLabels
: Properly handles conditional labels for node upgrades
All functions follow the codebase's established patterns and best practices.
controller/volume_controller_test.go (2)
Line range hint 1-24
: Well-structured test setup with comprehensive coverage
The test file follows good testing practices:
- Uses fake clients for isolation
- Provides helper functions for test case generation
- Covers various lifecycle scenarios including error cases
506-510
:
Critical: Engine status fields not properly defined in the codebase
The test is setting new engine status fields (TargetIP
, StorageIP
, StorageTargetIP
, TargetPort
) that are not properly defined in the EngineStatus
struct according to the previous review. This will cause runtime errors.
Please ensure these fields are properly defined in:
k8s/pkg/apis/longhorn/v1beta2/engine.go
k8s/pkg/apis/longhorn/v1beta2/instancemanager.go
controller/node_controller.go (1)
Line range hint 2177-2209
: LGTM! Please verify integration with v2 data engine upgrade
The changes to SetSchedulableCondition
look good:
- Clean separation of conditions with clear variable assignments
- Proper handling of both cordoned node and data engine upgrade cases
- Appropriate condition reason and message for data engine upgrade
Let's verify the integration with v2 data engine upgrade:
✅ Verification successful
Integration with v2 data engine upgrade is properly implemented
The verification shows comprehensive integration across the codebase:
- Webhook validation prevents volume attachment to nodes during upgrade
- Engine controller handles upgrade state for instance management
- Volume controller skips replica replenishment during upgrade
- Backup controller respects upgrade state
- Node upgrade monitor manages the upgrade lifecycle by setting/unsetting
DataEngineUpgradeRequested
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Verify the data engine upgrade integration
# Check if there are any other components that need to handle DataEngineUpgradeRequested
rg -A 5 "DataEngineUpgradeRequested"
# Check for any related v2 data engine upgrade handlers
ast-grep --pattern 'func $$$DataEngineUpgrade$$$'
Length of output: 7818
controller/engine_controller.go (2)
634-641
: Improve error handling for target deletion.
The code correctly handles target deletion for engines with CurrentTargetNodeID, but the error handling could be improved.
This issue was previously identified and resolved based on past review comments. The current implementation is correct.
524-527
: Verify port handling during upgrades.
The code correctly handles port assignments for both initiator and target addresses. Based on the learnings, e.Status.Port
is sourced from the SPDK engine and doesn't require validation.
Run the following script to verify port usage patterns in the codebase:
✅ Verification successful
Based on the search results, I can now generate the final response:
Port handling during upgrades is correctly implemented.
The code safely handles port assignments:
- The initiator port is intentionally set to 0 as it's not needed for non-upgrade operations
- The target port is sourced from
e.Status.Port
which is properly managed by the instance manager:- Set when instance starts running (
status.Port = int(instance.Status.PortStart)
) - Reset to 0 when instance stops/fails
- Validated before use (
if e.Status.IP == "" || e.Status.Port == 0 { return nil, fmt.Errorf("require IP and Port") }
)
- Set when instance starts running (
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check for consistent port handling patterns across the codebase
# Check for other occurrences of JoinHostPort with port 0
rg -A 2 "JoinHostPort.*,\s*0\s*\)"
# Check for SPDK engine port assignments
rg "Status\.Port\s*=.*SPDK"
Length of output: 66
Script:
#!/bin/bash
# Let's try a broader search for port handling patterns
# Check for JoinHostPort usage patterns
rg -A 2 "JoinHostPort"
# Check for port assignments and validations
rg -A 2 "\.Port\s*="
# Check for SPDK related port configurations
rg -A 2 "SPDK.*[Pp]ort"
# Look for port-related constants or configurations
rg -A 2 "DefaultPort|PORT"
Length of output: 110179
k8s/crds.yaml (3)
1313-1414
: LGTM: Well-structured CRDs for data engine upgrade management
The new CRDs DataEngineUpgradeManager
and NodeDataEngineUpgrade
are well-designed with:
- Clear separation of cluster-wide and node-level upgrade management
- Comprehensive status tracking with conditions
- Proper validation constraints
- Informative printer columns for kubectl
Also applies to: 2364-2452
1372-1376
: LGTM: Consistent data engine field additions
The dataEngine
field has been consistently added across CRDs with:
- Proper enum validation (
v1
,v2
) - Clear deprecation notices for old fields
- Consistent field placement and documentation
Also applies to: 2419-2420
1791-1792
: LGTM: Proper target node support added
The targetNodeID
field has been consistently added to relevant CRDs to support node targeting during upgrades:
- Present in both spec and status sections for proper state tracking
- Consistent field naming across CRDs
Also applies to: 3237-3238, 3280-3281
controller/volume_controller.go (5)
1007-1012
: LGTM! Clean handling of replica image differences between v1 and v2 data engines
The code correctly handles the different image requirements between v1 and v2 data engines. For v2 volumes, replicas can have different images from the volume, which is properly validated here.
1827-1836
: LGTM! Proper replica startup handling for v2 data engine
The code correctly handles replica startup states for both v1 and v2 data engines, with appropriate image validation.
1923-1930
: LGTM! Proper engine state transition handling for v2 data engine
The code correctly manages engine state transitions during upgrades, with appropriate checks for image and target node alignment.
3817-3822
: LGTM! Proper replica image handling during creation
The code correctly handles replica image assignment based on data engine type, with v2 replicas properly using the instance manager image.
5086-5094
: LGTM! Comprehensive upgrade status check
The code properly determines if a volume is upgrading by checking both image updates and target node changes for v2 data engine.
datastore/longhorn.go (4)
Line range hint 3742-3782
: Improved instance manager lookup with remote node support
The changes to GetInstanceManagerByInstance properly handle remote node instances by using TargetNodeID instead of NodeID when isInstanceOnRemoteNode is true. This is essential for proper instance manager selection during operations like live migration.
3784-3841
: Well-structured helper functions for instance manager operations
The new helper functions listInstanceManagers and filterInstanceManagers provide clean separation of concerns:
- listInstanceManagers handles data engine specific logic
- filterInstanceManagers handles ambiguous selection cases
5641-5875
: Comprehensive implementation of DataEngineUpgrade CRD operations
The new methods for DataEngineUpgradeManager and NodeDataEngineUpgrade CRDs follow consistent patterns and best practices:
- Proper verification of creation/updates
- Finalizer handling
- Status updates
- List operations with RO variants
5998-6005
: Clean helper method for checking node upgrade requests
The IsNodeDataEngineUpgradeRequested helper provides a simple way to check if a node has requested a data engine upgrade.
webhook/resources/nodedataengineupgrade/validator.go (1)
94-95
: LGTM!
The validator returns nil
appropriately when all validations pass.
controller/upgrade_manager_controller.go (4)
135-139
: Undefined variable maxRetries
The variable maxRetries
is used but not defined in this file or imported packages, which will result in a compile-time error.
136-142
: Undefined function handleReconcileErrorLogging
The function handleReconcileErrorLogging
is called but not defined in this file or imported packages, leading to a compile-time error.
202-209
: Ensure proper cleanup before removing the finalizer
When handling deletion, the controller removes the finalizer before confirming that all resources, such as the monitor, are fully cleaned up, which could lead to lingering resources.
263-275
: Potential race condition accessing uc.dataEngineUpgradeManagerMonitor
Access to uc.dataEngineUpgradeManagerMonitor
is not synchronized, which could lead to race conditions if the controller is running multiple workers.
controller/replica_controller.go (3)
Line range hint 636-672
: Ensure all calls to GetInstance
are updated with the new parameter
The method GetInstance
now includes an additional parameter isInstanceOnRemoteNode
. Please verify that all calls to this method in the codebase have been updated to include this parameter with the correct value.
Run the following script to find all calls to GetInstance
and check their parameters:
✅ Verification successful
Based on the rg
search results, I can generate the final response as we have found all the occurrences of GetInstance
in the codebase.
All calls to GetInstance
are properly updated with the new parameter
The codebase search shows that all implementations and calls to GetInstance
have been correctly updated to include the isInstanceOnRemoteNode
parameter:
- Interface definition in
instance_handler.go:38
- Implementation in
replica_controller.go:636
- Implementation in
engine_controller.go:839
- Mock implementation in
instance_handler_test.go:39
- All calls include the boolean parameter with appropriate values:
false
for local node operationstrue
for remote node operations
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Find all calls to `GetInstance` and verify they use the new parameter.
# Expectation: All calls should pass the `isInstanceOnRemoteNode` argument.
rg --type go 'GetInstance\(' -n
Length of output: 1349
Line range hint 316-355
: Ensure all calls to CreateInstance
are updated with the new parameter
The method CreateInstance
now includes an additional parameter isInstanceOnRemoteNode
. Please verify that all calls to this method throughout the codebase have been updated to pass the appropriate value for isInstanceOnRemoteNode
to ensure correct functionality.
Run the following script to find all usages of CreateInstance
and check their parameters:
✅ Verification successful
Let me analyze the results. The search shows all occurrences of CreateInstance
in the codebase. I can see that:
- The matches in
vendor/github.com/go-ole/go-ole/
are unrelated as they belong to a different package and have different signatures. - The relevant matches show:
- Interface definition in
controller/instance_handler.go:39
- Mock implementation in
controller/instance_handler_test.go:51
- Actual implementations in:
controller/replica_controller.go:316
controller/engine_controller.go:469
- Usage in
controller/instance_handler.go
with proper boolean arguments:- Line 750:
CreateInstance(obj, false)
- Line 763:
CreateInstance(obj, true)
- Line 783:
CreateInstance(obj, false)
- Line 750:
- Interface definition in
All calls to CreateInstance
are properly updated with the new parameter
All implementations and calls to CreateInstance
in the codebase have been correctly updated to include the isInstanceOnRemoteNode
parameter. The interface definition, mock implementation, actual implementations, and all usage sites pass the appropriate boolean value.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Find all calls to `CreateInstance` and verify they use the new parameter.
# Expectation: All calls should pass the `isInstanceOnRemoteNode` argument.
rg --type go 'CreateInstance\(' -n
Length of output: 1461
Line range hint 528-537
: Potential logical error: Passing false
for isInstanceOnRemoteNode
in DeleteInstance
In the DeleteInstance
method, the call to GetInstanceManagerByInstance
uses false
as the isInstanceOnRemoteNode
parameter. If the replica instance might be on a remote node, passing false
could lead to incorrect instance manager retrieval and potential errors. Please verify whether the correct value should be determined based on the replica's location.
Run the following script to check usage of GetInstanceManagerByInstance
and how isInstanceOnRemoteNode
is set:
controller/monitor/node_upgrade_monitor.go (2)
192-206
: The issue regarding variable shadowing of err
within the deferred function is still present. Please refer to the previous review comment for details on how to address this.
293-293
: Deferring engineClientProxy.Close()
inside a loop can lead to resource exhaustion due to postponed closures. The concern raised in the previous review comment remains applicable.
k8s/pkg/apis/longhorn/v1beta2/zz_generated.deepcopy.go (6)
981-1007
: DeepCopy Functions for DataEngineUpgradeManager
Added Successfully
The autogenerated DeepCopy
methods for the DataEngineUpgradeManager
type and its related functions appear correctly implemented. These functions will ensure proper deep copying of the custom resource objects.
1010-1041
: DeepCopy Functions for DataEngineUpgradeManagerList
Implemented Correctly
The added deepcopy methods for DataEngineUpgradeManagerList
are correctly generated. This ensures that lists of DataEngineUpgradeManager
objects can be deep copied without issues.
1064-1092
: DeepCopy Methods for DataEngineUpgradeManagerSpec
and DataEngineUpgradeManagerStatus
The DeepCopyInto
and DeepCopy
functions for both DataEngineUpgradeManagerSpec
and DataEngineUpgradeManagerStatus
are properly defined, including deep copying of slices and maps. This maintains data integrity during object copying.
1783-1842
: DeepCopy Functions for NodeDataEngineUpgrade
and Related Types
The deepcopy implementations for NodeDataEngineUpgrade
, NodeDataEngineUpgradeList
, NodeDataEngineUpgradeSpec
, and NodeDataEngineUpgradeStatus
are correctly generated. They handle the nested structures appropriately.
2991-3006
: DeepCopy Methods for UpgradeNodeStatus
The DeepCopyInto
and DeepCopy
methods for UpgradeNodeStatus
are correctly implemented, ensuring accurate copying of these status objects.
3326-3341
: DeepCopy Functions for VolumeUpgradeStatus
The added deepcopy functions for VolumeUpgradeStatus
are appropriate and correctly handle the copying process.
status, ok := data.(*longhorn.DataEngineUpgradeManagerStatus) | ||
if !ok { | ||
log.Errorf("Failed to assert value from dataEngineUpgradeManager monitor: %v", data) | ||
} else { | ||
upgradeManager.Status.InstanceManagerImage = status.InstanceManagerImage | ||
upgradeManager.Status.State = status.State | ||
upgradeManager.Status.Message = status.Message | ||
upgradeManager.Status.UpgradingNode = status.UpgradingNode | ||
upgradeManager.Status.UpgradeNodes = make(map[string]*longhorn.UpgradeNodeStatus) | ||
for k, v := range status.UpgradeNodes { | ||
upgradeManager.Status.UpgradeNodes[k] = &longhorn.UpgradeNodeStatus{ | ||
State: v.State, | ||
Message: v.Message, | ||
} | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Handle type assertion failure when retrieving status data
In the reconcile
function, if the type assertion fails (if !ok
), the function logs an error but continues execution. This may lead to inconsistent state or further errors due to uninitialized fields.
Consider handling the error appropriately by returning an error or stopping further processing to prevent potential issues.
Apply this diff to return an error when the type assertion fails:
func (uc *DataEngineUpgradeManagerController) reconcile(upgradeManagerName string) (err error) {
// ...
if uc.dataEngineUpgradeManagerMonitor != nil {
data, _ := uc.dataEngineUpgradeManagerMonitor.GetCollectedData()
status, ok := data.(*longhorn.DataEngineUpgradeManagerStatus)
if !ok {
log.Errorf("Failed to assert value from dataEngineUpgradeManager monitor: %v", data)
+ return fmt.Errorf("failed to retrieve valid DataEngineUpgradeManagerStatus")
} else {
upgradeManager.Status.InstanceManagerImage = status.InstanceManagerImage
// existing code continues
}
}
// ...
}
Committable suggestion skipped: line range outside the PR's diff.
f03bb50
to
7819f3a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 14
🧹 Outside diff range and nitpick comments (29)
controller/upgrade_manager_controller.go (3)
185-201
: UpdateOwnerID
in a separate function for clarityUpdating the
OwnerID
and handling conflicts can be refactored into a separate function to improve readability and maintainability.Consider extracting this logic into a new method:
func (uc *DataEngineUpgradeManagerController) updateOwnerID(upgradeManager *longhorn.DataEngineUpgradeManager, log *logrus.Entry) error { if upgradeManager.Status.OwnerID != uc.controllerID { upgradeManager.Status.OwnerID = uc.controllerID updatedUpgradeManager, err := uc.ds.UpdateDataEngineUpgradeManagerStatus(upgradeManager) if err != nil { if apierrors.IsConflict(errors.Cause(err)) { return nil } return err } *upgradeManager = *updatedUpgradeManager log.Infof("DataEngineUpgradeManager resource %v got new owner %v", upgradeManager.Name, uc.controllerID) } return nil }Then, in your
reconcile
method, replace the ownership update block with a call to this new function.
117-126
: Optimize work item processing loopIn the
worker
function, consider adding a cancellation context or a mechanism to stop the goroutine more gracefully when the stop channel is closed.This can help prevent potential goroutine leaks during shutdown.
183-184
: Improve logging by including contextThe logger initialized in the
reconcile
function could include additional context, such as the namespace or controller ID, for better traceability.Consider updating the logger initialization:
func (uc *DataEngineUpgradeManagerController) reconcile(upgradeManagerName string) (err error) { upgradeManager, err := uc.ds.GetDataEngineUpgradeManager(upgradeManagerName) // ... - log := getLoggerForDataEngineUpgradeManager(uc.logger, upgradeManager) + log := getLoggerForDataEngineUpgradeManager(uc.logger, upgradeManager).WithFields( + logrus.Fields{ + "namespace": uc.namespace, + "controllerID": uc.controllerID, + }, + ) // ... }webhook/resources/volume/validator.go (2)
104-104
: Clarify the error message for empty EngineImageThe error message could be more user-friendly. Instead of stating "BUG: Invalid empty Setting.EngineImage," consider rephrasing to guide the user on providing a valid engine image.
Apply this diff to improve the error message:
-return werror.NewInvalidError("BUG: Invalid empty Setting.EngineImage", "spec.image") +return werror.NewInvalidError("spec.image must be specified and cannot be empty", "spec.image")
165-177
: Simplify redundant checks forvolume.Spec.NodeID
The condition
if volume.Spec.NodeID != ""
is checked twice within the nestedif
statements. This redundancy can be eliminated for clarity.Apply this diff to remove the redundant check:
if volume.Spec.NodeID != "" { node, err := v.ds.GetNodeRO(volume.Spec.NodeID) if err != nil { err = errors.Wrapf(err, "failed to get node %v", volume.Spec.NodeID) return werror.NewInternalError(err.Error()) } if node.Spec.DataEngineUpgradeRequested { - if volume.Spec.NodeID != "" { return werror.NewInvalidError(fmt.Sprintf("volume %v is not allowed to attach to node %v during v2 data engine upgrade", volume.Name, volume.Spec.NodeID), "spec.nodeID") - } } }datastore/longhorn.go (2)
3995-3998
: Refactor to eliminate redundant emptyimageName
checksThe check for an empty
imageName
inGetDataEngineImageCLIAPIVersion
is duplicated for both data engine types. Consider consolidating this check to reduce code duplication and improve readability.Apply this diff to refactor the function:
+ if imageName == "" { + return -1, fmt.Errorf("cannot check the CLI API Version based on empty image name") + } if types.IsDataEngineV2(dataEngine) { - if imageName == "" { - return -1, fmt.Errorf("cannot check the CLI API Version based on empty image name") - } return 0, nil } - if imageName == "" { - return -1, fmt.Errorf("cannot check the CLI API Version based on empty image name") - } ei, err := s.GetEngineImageRO(types.GetEngineImageChecksumName(imageName)) if err != nil { return -1, errors.Wrapf(err, "failed to get engine image object based on image name %v", imageName) }
5641-5751
: Add unit tests forDataEngineUpgradeManager
methodsThe new methods related to
DataEngineUpgradeManager
enhance upgrade functionality. To maintain code reliability and prevent regressions, please add unit tests covering these methods.k8s/pkg/client/listers/longhorn/v1beta2/dataengineupgrademanager.go (1)
76-94
: Consider adding type assertion error handlingWhile the implementation is correct, consider adding error handling for type assertions to improve robustness:
func (s dataEngineUpgradeManagerNamespaceLister) List(selector labels.Selector) (ret []*v1beta2.DataEngineUpgradeManager, err error) { err = cache.ListAllByNamespace(s.indexer, s.namespace, selector, func(m interface{}) { - ret = append(ret, m.(*v1beta2.DataEngineUpgradeManager)) + if obj, ok := m.(*v1beta2.DataEngineUpgradeManager); ok { + ret = append(ret, obj) + } }) return ret, err } func (s dataEngineUpgradeManagerNamespaceLister) Get(name string) (*v1beta2.DataEngineUpgradeManager, error) { obj, exists, err := s.indexer.GetByKey(s.namespace + "/" + name) if err != nil { return nil, err } if !exists { return nil, errors.NewNotFound(v1beta2.Resource("dataengineupgrademanager"), name) } - return obj.(*v1beta2.DataEngineUpgradeManager), nil + manager, ok := obj.(*v1beta2.DataEngineUpgradeManager) + if !ok { + return nil, fmt.Errorf("cached object is not a *v1beta2.DataEngineUpgradeManager") + } + return manager, nil }k8s/pkg/client/clientset/versioned/typed/longhorn/v1beta2/nodedataengineupgrade.go (2)
67-195
: Consider enhancing error handling with wrapped errors.While the implementation follows standard Kubernetes client patterns, consider wrapping errors with additional context to aid in debugging. For example:
func (c *nodeDataEngineUpgrades) Get(ctx context.Context, name string, options v1.GetOptions) (result *v1beta2.NodeDataEngineUpgrade, err error) { result = &v1beta2.NodeDataEngineUpgrade{} err = c.client.Get(). Namespace(c.ns). Resource("nodedataengineupgrades"). Name(name). VersionedParams(&options, scheme.ParameterCodec). Do(ctx). Into(result) + if err != nil { + return nil, fmt.Errorf("failed to get NodeDataEngineUpgrade %s/%s: %w", c.ns, name, err) + } return }
17-17
: Note: This is an auto-generated file.Any suggested changes would need to be made to the generation templates rather than directly to this file, as direct changes would be lost on the next code generation.
controller/node_upgrade_controller.go (3)
57-57
: Address TODO comment regarding client wrapperThe TODO comment suggests there's technical debt related to client usage that needs to be addressed.
Would you like me to help track this by creating a GitHub issue to remove the event broadcaster wrapper once all clients have moved to use the clientset?
154-158
: Add documentation for isResponsibleFor methodConsider adding a documentation comment explaining:
- The purpose of the preferred owner ID
- The relationship with the node ID
- The conditions under which a controller is considered responsible
235-244
: Improve monitor lifecycle managementThe monitor cleanup logic is spread across multiple places and could lead to resource leaks. Consider:
- Extracting monitor cleanup into a separate method
- Adding error handling for Close() operations
- Ensuring cleanup happens in all error paths
+func (uc *NodeDataEngineUpgradeController) cleanupMonitor() { + if uc.nodeDataEngineUpgradeMonitor != nil { + if err := uc.nodeDataEngineUpgradeMonitor.Close(); err != nil { + uc.logger.WithError(err).Warn("Failed to close node data engine upgrade monitor") + } + uc.nodeDataEngineUpgradeMonitor = nil + } +} if nodeUpgrade.Status.State == longhorn.UpgradeStateCompleted || nodeUpgrade.Status.State == longhorn.UpgradeStateError { uc.updateNodeDataEngineUpgradeStatus(nodeUpgrade) - uc.nodeDataEngineUpgradeMonitor.Close() - uc.nodeDataEngineUpgradeMonitor = nil + uc.cleanupMonitor() }controller/replica_controller.go (1)
528-528
: Consider parameterizing the remote node flag.The hardcoded
false
forisInstanceOnRemoteNode
might need to be parameterized for consistency with other methods.-im, err = rc.ds.GetInstanceManagerByInstance(obj, false) +im, err = rc.ds.GetInstanceManagerByInstance(obj, isInstanceOnRemoteNode)controller/monitor/node_upgrade_monitor.go (3)
24-24
: Consider documenting the rationale for the sync period value.Adding a comment explaining why 3 seconds was chosen as the sync period would help future maintainers understand the timing considerations.
108-137
: Consider enhancing error handling in the run method.The error from
handleNodeUpgrade
is not captured or logged. While the error is propagated through status updates, adding explicit error logging would help with debugging.Apply this diff:
func (m *NodeDataEngineUpgradeMonitor) run(value interface{}) error { nodeUpgrade, err := m.ds.GetNodeDataEngineUpgrade(m.nodeUpgradeName) if err != nil { return errors.Wrapf(err, "failed to get longhorn nodeDataEngineUpgrade %v", m.nodeUpgradeName) } existingNodeUpgradeStatus := m.nodeUpgradeStatus.DeepCopy() - m.handleNodeUpgrade(nodeUpgrade) + if err := m.handleNodeUpgrade(nodeUpgrade); err != nil { + m.logger.WithError(err).Errorf("Failed to handle node upgrade %v", m.nodeUpgradeName) + }
641-683
: Consider optimizing node selection strategy.The current implementation selects the first available node and then potentially updates it if a node with completed upgrade is found. This could be optimized to:
- Use a single pass through the nodes
- Consider additional factors like node capacity and current load
- Implement a more sophisticated scoring mechanism for node selection
Here's a suggested implementation:
func (m *NodeDataEngineUpgradeMonitor) findAvailableNodeForTargetInstanceReplacement(nodeUpgrade *longhorn.NodeDataEngineUpgrade) (string, error) { upgradeManager, err := m.ds.GetDataEngineUpgradeManager(nodeUpgrade.Spec.DataEngineUpgradeManager) if err != nil { return "", err } ims, err := m.ds.ListInstanceManagersBySelectorRO("", "", longhorn.InstanceManagerTypeAllInOne, longhorn.DataEngineTypeV2) if err != nil { return "", err } - availableNode := "" + type nodeScore struct { + nodeID string + score int + } + var bestNode nodeScore for _, im := range ims { if im.Status.CurrentState != longhorn.InstanceManagerStateRunning { continue } if im.Spec.NodeID == nodeUpgrade.Status.OwnerID { continue } - if availableNode == "" { - availableNode = im.Spec.NodeID + score := 0 + + // Base score for running instance manager + score += 1 + + node, err := m.ds.GetNode(im.Spec.NodeID) + if err != nil { + continue + } + + // Prefer nodes with more available resources + if condition := types.GetCondition(node.Status.Conditions, longhorn.NodeConditionTypeSchedulable); condition.Status == longhorn.ConditionStatusTrue { + score += 2 } upgradeNodeStatus, ok := upgradeManager.Status.UpgradeNodes[im.Spec.NodeID] - if !ok { - continue + if ok && upgradeNodeStatus.State == longhorn.UpgradeStateCompleted { + score += 4 } - // Prefer the node that has completed the upgrade - if upgradeNodeStatus.State == longhorn.UpgradeStateCompleted { - availableNode = im.Spec.NodeID - break + if score > bestNode.score { + bestNode = nodeScore{ + nodeID: im.Spec.NodeID, + score: score, + } } } - if availableNode == "" { + if bestNode.nodeID == "" { return "", fmt.Errorf("failed to find available node for target") } - return availableNode, nil + return bestNode.nodeID, nil }engineapi/instance_manager.go (1)
532-555
: Enhance error handling and add validation in getReplicaAddressesWhile the core logic is sound, consider the following improvements:
- Make error messages more descriptive by including the problematic address
- Add validation for empty input addresses
- Document or handle the edge case when all replicas are filtered out
func getReplicaAddresses(replicaAddresses map[string]string, initiatorAddress, targetAddress string) (map[string]string, error) { + if initiatorAddress == "" || targetAddress == "" { + return nil, errors.New("initiator and target addresses are required") + } + initiatorIP, _, err := net.SplitHostPort(initiatorAddress) if err != nil { - return nil, errors.New("invalid initiator address format") + return nil, errors.Errorf("invalid initiator address format: %v", initiatorAddress) } targetIP, _, err := net.SplitHostPort(targetAddress) if err != nil { - return nil, errors.New("invalid target address format") + return nil, errors.Errorf("invalid target address format: %v", targetAddress) } addresses := make(map[string]string) for name, addr := range replicaAddresses { replicaIP, _, err := net.SplitHostPort(addr) if err != nil { - return nil, errors.New("invalid replica address format") + return nil, errors.Errorf("invalid replica address format for %v: %v", name, addr) } if initiatorIP != targetIP && initiatorIP == replicaIP { continue } addresses[name] = addr } + if len(addresses) == 0 { + return nil, errors.New("no valid replica addresses found after filtering") + } return addresses, nil }controller/backup_controller.go (1)
599-607
: LGTM! Consider adding a comment explaining the rationale.The code correctly prevents nodes from being responsible for backups when they have a data engine upgrade requested, which is essential for safe live upgrades.
Consider adding a comment explaining why we skip backup responsibility during data engine upgrades:
+ // Skip backup responsibility when a data engine upgrade is requested + // to prevent potential issues during the upgrade process if node.Spec.DataEngineUpgradeRequested { return false, nil }controller/instance_handler.go (3)
58-165
: Consider refactoring the status sync methodThe
syncStatusIPsAndPorts
method is quite long and handles multiple responsibilities. Consider breaking it down into smaller, focused methods:
syncInitiatorStatus
syncTargetStatus
syncStorageIPs
This would improve readability and maintainability.
716-790
: Enhance error handling in instance creationThe instance creation logic handles both v1 and v2 data engines well, but consider:
- Adding more context to error messages
- Using structured logging with fields
- Adding metrics for instance creation success/failure
883-995
: Add documentation for v2 data engine helper methodsThe new helper methods lack documentation explaining their purpose and behavior. Consider adding detailed comments for:
isVolumeBeingSwitchedBack
isTargetInstanceReplacementCreated
isTargetInstanceRemote
isDataEngineNotBeingLiveUpgraded
This will help other developers understand the v2 data engine upgrade flow.
controller/node_controller.go (1)
Line range hint
2177-2209
: LGTM! Consider adding error handling for condition updates.The implementation for handling data engine upgrades looks good. The code properly disables scheduling during upgrades and provides clear status messages.
Consider adding error handling for the
SetConditionAndRecord
calls to handle potential errors during condition updates. For example:- node.Status.Conditions = - types.SetConditionAndRecord(node.Status.Conditions, + conditions, err := types.SetConditionAndRecord(node.Status.Conditions, longhorn.NodeConditionTypeSchedulable, longhorn.ConditionStatusFalse, reason, message, nc.eventRecorder, node, corev1.EventTypeNormal) + if err != nil { + return err + } + node.Status.Conditions = conditionscontroller/engine_controller.go (2)
437-467
: LGTM with a minor suggestion for error handling improvementThe method effectively handles instance manager and IP resolution for both initiator and target instances. Consider adding more specific error messages for common failure cases.
func (ec *EngineController) findInstanceManagerAndIPs(obj interface{}) (im *longhorn.InstanceManager, initiatorIP string, targetIP string, err error) { e, ok := obj.(*longhorn.Engine) if !ok { - return nil, "", "", fmt.Errorf("invalid object for engine: %v", obj) + return nil, "", "", fmt.Errorf("expected Engine type but got %T", obj) } initiatorIM, err := ec.ds.GetInstanceManagerByInstanceRO(obj, false) if err != nil { - return nil, "", "", err + return nil, "", "", errors.Wrapf(err, "failed to get initiator instance manager for engine %v", e.Name) }
704-760
: LGTM with suggestion for improved modularityThe method effectively handles target switchover with proper validation and logging. Consider breaking down the port handling logic into a separate helper method for better maintainability.
Consider extracting port handling logic:
+func (ec *EngineController) getTargetPort(targetInstance *longhorn.InstanceProcess) int { + port := targetInstance.Status.TargetPortStart + if targetInstance.Status.StandbyTargetPortStart != 0 { + port = targetInstance.Status.StandbyTargetPortStart + } + return port +} func (ec *EngineController) SwitchOverTarget(obj interface{}) error { // ... existing validation code ... - port := targetInstance.Status.TargetPortStart - if targetInstance.Status.StandbyTargetPortStart != 0 { - port = targetInstance.Status.StandbyTargetPortStart - } + port := ec.getTargetPort(targetInstance) log.Infof("Switching over target to %v:%v", targetIM.Status.IP, port) // ... rest of the code ...k8s/crds.yaml (2)
1313-1414
: Enhance field descriptions in DataEngineUpgradeManager CRDThe DataEngineUpgradeManager CRD structure is good, but some fields would benefit from more detailed descriptions:
spec.nodes
could clarify the behavior when nodes are added/removed during upgradestatus.upgradeNodes
could explain the state transitionsstatus.instanceManagerImage
is missing a descriptionnodes: description: |- Nodes specifies the list of nodes to perform the data engine upgrade on. If empty, the upgrade will be performed on all available nodes. + Adding or removing nodes during an upgrade may affect the upgrade process. items: type: string type: array instanceManagerImage: + description: The instance manager image used for the data engine upgrade. type: string upgradeNodes: additionalProperties: description: |- UpgradeState defines the state of the node upgrade process + States can transition from "pending" -> "in-progress" -> "completed"/"failed"
2581-2583
: Enhance documentation for Node upgrade fieldThe
dataEngineUpgradeRequested
field description could be more detailed to help operators understand its implications.dataEngineUpgradeRequested: - description: Request to upgrade the instance manager for v2 volumes on the node. + description: |- + Request to upgrade the instance manager for v2 volumes on the node. + When set to true, the node will be scheduled for data engine upgrade. + The upgrade process will only proceed if there are no ongoing volume operations. + This field should not be modified while an upgrade is in progress. type: booleancontroller/volume_controller.go (2)
1923-1930
: Add error handling for engine state transitionsThe code correctly handles engine state transitions for v2 data engine by checking both image and target node ID. However, consider adding error handling for unexpected states.
if types.IsDataEngineV1(v.Spec.DataEngine) { e.Spec.DesireState = longhorn.InstanceStateRunning } else { if v.Spec.Image == v.Status.CurrentImage && v.Spec.TargetNodeID == v.Status.CurrentTargetNodeID { e.Spec.DesireState = longhorn.InstanceStateRunning + } else { + log.Debugf("Waiting for image/target node sync before setting engine running state") } }
1619-1619
: Enhance error logging for volume dependent resourcesThe warning log could be more descriptive to help with troubleshooting.
-log.WithField("e.Status.CurrentState", e.Status.CurrentState).Warn("Volume is attached but dependent resources are not opened") +log.WithFields(logrus.Fields{ + "engineState": e.Status.CurrentState, + "volumeState": v.Status.State, +}).Warn("Volume is attached but engine or replica resources are not in running state")
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
📒 Files selected for processing (50)
controller/backup_controller.go
(1 hunks)controller/controller_manager.go
(2 hunks)controller/engine_controller.go
(11 hunks)controller/instance_handler.go
(15 hunks)controller/instance_handler_test.go
(4 hunks)controller/monitor/node_upgrade_monitor.go
(1 hunks)controller/monitor/upgrade_manager_monitor.go
(1 hunks)controller/node_controller.go
(2 hunks)controller/node_upgrade_controller.go
(1 hunks)controller/replica_controller.go
(5 hunks)controller/uninstall_controller.go
(4 hunks)controller/upgrade_manager_controller.go
(1 hunks)controller/utils.go
(0 hunks)controller/volume_controller.go
(16 hunks)controller/volume_controller_test.go
(1 hunks)datastore/datastore.go
(3 hunks)datastore/longhorn.go
(6 hunks)engineapi/instance_manager.go
(5 hunks)engineapi/instance_manager_test.go
(1 hunks)k8s/crds.yaml
(84 hunks)k8s/pkg/apis/longhorn/v1beta2/dataengineupgrademanager.go
(1 hunks)k8s/pkg/apis/longhorn/v1beta2/instancemanager.go
(6 hunks)k8s/pkg/apis/longhorn/v1beta2/node.go
(2 hunks)k8s/pkg/apis/longhorn/v1beta2/nodedataengineupgrade.go
(1 hunks)k8s/pkg/apis/longhorn/v1beta2/register.go
(1 hunks)k8s/pkg/apis/longhorn/v1beta2/volume.go
(2 hunks)k8s/pkg/apis/longhorn/v1beta2/zz_generated.deepcopy.go
(4 hunks)k8s/pkg/client/clientset/versioned/typed/longhorn/v1beta2/dataengineupgrademanager.go
(1 hunks)k8s/pkg/client/clientset/versioned/typed/longhorn/v1beta2/fake/fake_dataengineupgrademanager.go
(1 hunks)k8s/pkg/client/clientset/versioned/typed/longhorn/v1beta2/fake/fake_longhorn_client.go
(2 hunks)k8s/pkg/client/clientset/versioned/typed/longhorn/v1beta2/fake/fake_nodedataengineupgrade.go
(1 hunks)k8s/pkg/client/clientset/versioned/typed/longhorn/v1beta2/generated_expansion.go
(2 hunks)k8s/pkg/client/clientset/versioned/typed/longhorn/v1beta2/longhorn_client.go
(3 hunks)k8s/pkg/client/clientset/versioned/typed/longhorn/v1beta2/nodedataengineupgrade.go
(1 hunks)k8s/pkg/client/informers/externalversions/generic.go
(2 hunks)k8s/pkg/client/informers/externalversions/longhorn/v1beta2/dataengineupgrademanager.go
(1 hunks)k8s/pkg/client/informers/externalversions/longhorn/v1beta2/interface.go
(4 hunks)k8s/pkg/client/informers/externalversions/longhorn/v1beta2/nodedataengineupgrade.go
(1 hunks)k8s/pkg/client/listers/longhorn/v1beta2/dataengineupgrademanager.go
(1 hunks)k8s/pkg/client/listers/longhorn/v1beta2/expansion_generated.go
(2 hunks)k8s/pkg/client/listers/longhorn/v1beta2/nodedataengineupgrade.go
(1 hunks)scheduler/replica_scheduler.go
(1 hunks)types/types.go
(4 hunks)webhook/resources/dataengineupgrademanager/mutator.go
(1 hunks)webhook/resources/dataengineupgrademanager/validator.go
(1 hunks)webhook/resources/nodedataengineupgrade/mutator.go
(1 hunks)webhook/resources/nodedataengineupgrade/validator.go
(1 hunks)webhook/resources/volume/validator.go
(5 hunks)webhook/server/mutation.go
(2 hunks)webhook/server/validation.go
(2 hunks)
💤 Files with no reviewable changes (1)
- controller/utils.go
✅ Files skipped from review due to trivial changes (2)
- k8s/pkg/client/clientset/versioned/typed/longhorn/v1beta2/fake/fake_dataengineupgrademanager.go
- k8s/pkg/client/listers/longhorn/v1beta2/nodedataengineupgrade.go
🚧 Files skipped from review as they are similar to previous changes (25)
- controller/controller_manager.go
- controller/monitor/upgrade_manager_monitor.go
- controller/uninstall_controller.go
- engineapi/instance_manager_test.go
- k8s/pkg/apis/longhorn/v1beta2/dataengineupgrademanager.go
- k8s/pkg/apis/longhorn/v1beta2/instancemanager.go
- k8s/pkg/apis/longhorn/v1beta2/node.go
- k8s/pkg/apis/longhorn/v1beta2/nodedataengineupgrade.go
- k8s/pkg/apis/longhorn/v1beta2/register.go
- k8s/pkg/apis/longhorn/v1beta2/volume.go
- k8s/pkg/client/clientset/versioned/typed/longhorn/v1beta2/fake/fake_longhorn_client.go
- k8s/pkg/client/clientset/versioned/typed/longhorn/v1beta2/fake/fake_nodedataengineupgrade.go
- k8s/pkg/client/clientset/versioned/typed/longhorn/v1beta2/generated_expansion.go
- k8s/pkg/client/clientset/versioned/typed/longhorn/v1beta2/longhorn_client.go
- k8s/pkg/client/informers/externalversions/generic.go
- k8s/pkg/client/informers/externalversions/longhorn/v1beta2/dataengineupgrademanager.go
- k8s/pkg/client/informers/externalversions/longhorn/v1beta2/interface.go
- k8s/pkg/client/informers/externalversions/longhorn/v1beta2/nodedataengineupgrade.go
- k8s/pkg/client/listers/longhorn/v1beta2/expansion_generated.go
- scheduler/replica_scheduler.go
- webhook/resources/dataengineupgrademanager/mutator.go
- webhook/resources/dataengineupgrademanager/validator.go
- webhook/resources/nodedataengineupgrade/mutator.go
- webhook/server/mutation.go
- webhook/server/validation.go
🧰 Additional context used
📓 Learnings (2)
controller/engine_controller.go (1)
Learnt from: derekbit
PR: longhorn/longhorn-manager#3282
File: controller/engine_controller.go:524-527
Timestamp: 2024-11-25T12:39:58.926Z
Learning: In `controller/engine_controller.go`, `e.Status.Port` is sourced from the SPDK engine and does not require additional validation.
controller/monitor/node_upgrade_monitor.go (1)
Learnt from: james-munson
PR: longhorn/longhorn-manager#3211
File: app/post_upgrade.go:102-113
Timestamp: 2024-11-10T16:45:04.898Z
Learning: In Go, when a deferred function references a variable like `err`, ensure that the variable is declared in the outer scope and not within an inner scope (such as within `if err := ...`), to prevent compilation errors and unintended variable shadowing.
🪛 golangci-lint (1.61.0)
controller/volume_controller.go
3208-3208: SA9003: empty branch
(staticcheck)
🪛 GitHub Check: CodeFactor
controller/volume_controller.go
[notice] 3221-3221: controller/volume_controller.go#L3221
If block ends with a return statement, so drop this else and outdent its block (move short variable declaration to its own line if necessary) (indent-error-flow)
[notice] 3208-3208: controller/volume_controller.go#L3208
If block ends with a return statement, so drop this else and outdent its block. (indent-error-flow)
🔇 Additional comments (75)
webhook/resources/nodedataengineupgrade/validator.go (2)
40-66
: Validation logic in the Create
method is appropriate
The Create
method correctly validates all required fields for the NodeDataEngineUpgrade
resource and provides explicit error messages for invalid input.
68-95
: Proper enforcement of immutability in the Update
method
The Update
method effectively ensures that critical fields (NodeID
, DataEngine
, InstanceManagerImage
, and DataEngineUpgradeManager
) remain immutable during updates, maintaining the integrity of the resource.
k8s/pkg/client/clientset/versioned/typed/longhorn/v1beta2/dataengineupgrademanager.go (14)
17-17
: Acknowledgment: Code is auto-generated by client-gen
As this file is generated by client-gen
, it should not be manually edited to prevent inconsistencies.
33-37
: Definition of DataEngineUpgradeManagersGetter
interface is appropriate
The getter interface is correctly defined, following standard client-go patterns.
39-51
: DataEngineUpgradeManagerInterface
includes all necessary CRUD methods
All essential methods for managing DataEngineUpgradeManager
resources are appropriately declared.
53-57
: dataEngineUpgradeManagers
struct encapsulates client and namespace effectively
The struct correctly holds the REST client and namespace for resource operations.
59-65
: Constructor newDataEngineUpgradeManagers
initializes the client properly
The function correctly returns a new instance with the REST client and specified namespace.
67-78
: Get
method retrieves resources accurately
The implementation properly fetches a single DataEngineUpgradeManager
resource by name.
80-95
: List
method handles resource listing with timeout considerations
The method correctly lists resources based on label and field selectors, handling timeouts if specified.
97-110
: Watch
method establishes resource watches effectively
The implementation allows for watching changes to DataEngineUpgradeManager
resources, supporting timeouts.
112-123
: Create
method correctly handles resource creation
The method accurately creates a new DataEngineUpgradeManager
resource with the provided specifications.
125-137
: Update
method properly updates existing resources
Resource updates are correctly handled, ensuring the server-side representation is returned.
139-153
: UpdateStatus
method manages status subresource updates correctly
The implementation appropriately updates the status of a DataEngineUpgradeManager
resource.
155-164
: Delete
method removes resources as expected
The method correctly deletes a specified resource by name.
166-180
: DeleteCollection
method handles bulk deletion effectively
The implementation allows for deleting a collection of resources, considering timeout settings.
182-195
: Patch
method applies modifications to resources accurately
The method correctly applies patches to resources, supporting various patch types and subresources.
controller/upgrade_manager_controller.go (5)
211-214
: Handle terminal states appropriately
When the upgrade manager reaches a terminal state (UpgradeStateCompleted
or UpgradeStateError
), the controller stops processing. Ensure that any necessary cleanup or final actions are performed before exiting.
Confirm that all resources are properly finalized and that no further action is required.
74-84
:
Add error handling for event handler registration
When adding the event handler with ds.DataEngineUpgradeManagerInformer.AddEventHandler
, the returned error should be handled to catch any issues during the handler setup.
Apply this diff to handle the error appropriately:
if _, err := ds.DataEngineUpgradeManagerInformer.AddEventHandler(cache.ResourceEventHandlerFuncs{
AddFunc: uc.enqueueDataEngineUpgradeManager,
UpdateFunc: func(old, cur interface{}) { uc.enqueueDataEngineUpgradeManager(cur) },
DeleteFunc: uc.enqueueDataEngineUpgradeManager,
}); err != nil {
return nil, err
+ // Consider logging the error if necessary
}
Ensure that any setup errors are not silently ignored.
Likely invalid or redundant comment.
235-251
:
Handle type assertion failure appropriately
In the reconcile
function, if the type assertion fails (if !ok
), the code logs an error but continues execution. This may lead to a nil pointer dereference or inconsistent state.
Apply this diff to return an error when the type assertion fails:
func (uc *DataEngineUpgradeManagerController) reconcile(upgradeManagerName string) (err error) {
// ...
if uc.dataEngineUpgradeManagerMonitor != nil {
data, _ := uc.dataEngineUpgradeManagerMonitor.GetCollectedData()
status, ok := data.(*longhorn.DataEngineUpgradeManagerStatus)
if !ok {
log.Errorf("Failed to assert value from dataEngineUpgradeManager monitor: %v", data)
+ return fmt.Errorf("failed to retrieve valid DataEngineUpgradeManagerStatus")
} else {
upgradeManager.Status.InstanceManagerImage = status.InstanceManagerImage
// existing code continues
}
}
// ...
}
Likely invalid or redundant comment.
263-275
:
Potential race condition accessing uc.dataEngineUpgradeManagerMonitor
Access to uc.dataEngineUpgradeManagerMonitor
is not synchronized. If the controller runs multiple workers, this could lead to race conditions.
Apply this diff to add a mutex for synchronization:
+import "sync"
type DataEngineUpgradeManagerController struct {
// ...
dataEngineUpgradeManagerMonitor monitor.Monitor
+ monitorMutex sync.Mutex
// ...
}
func (uc *DataEngineUpgradeManagerController) createDataEngineUpgradeManagerMonitor(upgradeManager *longhorn.DataEngineUpgradeManager) (monitor.Monitor, error) {
+ uc.monitorMutex.Lock()
+ defer uc.monitorMutex.Unlock()
if uc.dataEngineUpgradeManagerMonitor != nil {
return uc.dataEngineUpgradeManagerMonitor, nil
}
// ...
}
Likely invalid or redundant comment.
218-227
: Handle status update conflicts properly to prevent requeue loops
In the deferred function, when updating the status, if there is a conflict, you re-enqueue the item. Ensure that this does not lead to infinite requeues and that the controller respects the maxRetries
limit.
Review the conflict handling logic to confirm it aligns with the desired retry mechanism. Run the following script to verify the requeue behavior:
✅ Verification successful
The conflict handling is properly implemented with retry limits
The verification shows that:
- The controller has a global
maxRetries = 3
limit defined inbase_controller.go
- When conflicts occur during status updates, the item is requeued using
enqueueDataEngineUpgradeManager
which adds to the normal queue (not rate-limited) - The base error handling checks
NumRequeues(key) < maxRetries
and uses rate-limited requeues for general reconciliation errors - Status update conflicts use a separate, more lenient requeue mechanism which is appropriate since conflicts are temporary race conditions
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Verify that items are not endlessly requeued due to conflicts.
# Expected: Items should be requeued up to maxRetries times.
grep -A 10 'func (uc *DataEngineUpgradeManagerController) handleErr' controller/upgrade_manager_controller.go
Length of output: 111
Script:
#!/bin/bash
# Let's search for retry-related code in the controller
rg -A 5 -B 5 'maxRetries|requeue|Requeue' controller/upgrade_manager_controller.go
Length of output: 768
Script:
#!/bin/bash
# Let's also check the controller struct definition for retry-related fields
ast-grep --pattern 'type DataEngineUpgradeManagerController struct {
$$$
}'
Length of output: 1207
Script:
#!/bin/bash
# Let's check the reconcile function to understand the error handling flow
rg -A 10 'func \(uc \*DataEngineUpgradeManagerController\) reconcile' controller/upgrade_manager_controller.go
Length of output: 446
Script:
#!/bin/bash
# Let's check the baseController definition since the controller embeds it
rg -A 10 'type baseController struct' .
Length of output: 821
Script:
#!/bin/bash
# Let's also check where maxRetries is defined
rg 'maxRetries.*=' .
Length of output: 198
Script:
#!/bin/bash
# Let's check the enqueue function implementation
ast-grep --pattern 'func (uc *DataEngineUpgradeManagerController) enqueueDataEngineUpgradeManager($$$)'
Length of output: 774
webhook/resources/volume/validator.go (13)
131-131
: Ensure Data Engine is enabled before creation
Validating that the required data engine is enabled is crucial. Confirm that ValidateRequiredDataEngineEnabled
correctly checks the data engine status.
144-145
: Verify engine image compatibility
The check for engine image compatibility is important. Ensure that CheckDataEngineImageCompatiblityByImage
accurately validates the image against the specified data engine.
147-148
: Restrict setting TargetNodeID
during volume creation
It's appropriate to prevent setting spec.targetNodeID
for a new volume. This ensures that the target node is determined during attachment.
154-156
: Feature not supported: Encrypted volumes with Data Engine v2
Encrypted volumes are not supported for Data Engine v2. The validation correctly prevents this configuration.
158-160
: Feature not supported: Backing images with Data Engine v2
Backing images are not supported for Data Engine v2. The validation ensures users are aware of this limitation.
162-164
: Feature not supported: Clone operations with Data Engine v2
Cloning from another volume is not supported for Data Engine v2. Validation here is appropriate.
271-272
: Undefined variable v
in error message
The error message references v.Spec.MigrationNodeID
, but v
is not defined in this context. It should likely be newVolume.Spec.MigrationNodeID
.
275-277
: Validate SnapshotMaxCount
within acceptable range
Good job validating that snapshotMaxCount
is within the acceptable range. This prevents potential issues with snapshot management.
284-288
: Ensure safe updating of snapshot limits
Validating changes to SnapshotMaxCount
and SnapshotMaxSize
helps prevent configurations that could inadvertently delete existing snapshots.
298-305
: Prevent unsupported changes to BackingImage
for Data Engine v2
Changing the BackingImage
is not supported for volumes using Data Engine v2. The validation correctly enforces this restriction.
356-360
: Handle errors when retrieving node information
Proper error handling when retrieving the node ensures that unexpected issues are surfaced appropriately.
368-369
: Logical issue in condition for changing TargetNodeID
The condition checks if oldVolume.Spec.TargetNodeID == "" && oldVolume.Spec.TargetNodeID != newVolume.Spec.TargetNodeID
. Since oldVolume.Spec.TargetNodeID
is ""
, the second part oldVolume.Spec.TargetNodeID != newVolume.Spec.TargetNodeID
will always be true if newVolume.Spec.TargetNodeID
is not empty.
Consider revising the condition for clarity.
407-408
: Restrict setting TargetNodeID
for non-Data Engine v2 volumes
It's appropriate to prevent setting spec.targetNodeID
when the volume is not using Data Engine v2, ensuring consistent behavior.
datastore/longhorn.go (3)
Line range hint 1801-1805
: Ensure proper error handling when InstanceManagerName
is empty
In the getRunningReplicaInstanceManagerRO
function, when r.Status.InstanceManagerName
is empty, the code calls GetInstanceManagerByInstanceRO
. Please verify that GetInstanceManagerByInstanceRO
can handle cases where InstanceManagerName
is not set, to prevent potential nil
pointer dereferences.
3761-3764
: Verify assignment of nodeID
when isInstanceOnRemoteNode
is true
In GetInstanceManagerByInstanceRO
, when isInstanceOnRemoteNode
is true
, nodeID
is assigned from obj.Spec.TargetNodeID
. Ensure that TargetNodeID
is always properly set to avoid potential issues with empty or invalid nodeID
values.
5848-5875
:
Handle errors in RemoveFinalizerForNodeDataEngineUpgrade
In RemoveFinalizerForNodeDataEngineUpgrade
, if the Update
call fails and DeletionTimestamp
is not set, the function does not return the error. This could lead to silent failures. Ensure that errors are properly returned and logged.
Apply this diff to handle the error appropriately:
_, err := s.lhClient.LonghornV1beta2().NodeDataEngineUpgrades(s.namespace).Update(context.TODO(), upgrade, metav1.UpdateOptions{})
if err != nil {
// workaround `StorageError: invalid object, Code: 4` due to empty object
if upgrade.DeletionTimestamp != nil {
return nil
}
- return nil
+ return errors.Wrapf(err, "unable to remove finalizer for nodeDataEngineUpgrade %s", upgrade.Name)
}
Likely invalid or redundant comment.
k8s/pkg/client/listers/longhorn/v1beta2/dataengineupgrademanager.go (2)
28-53
: LGTM! Follows Kubernetes lister patterns
The implementation correctly follows Kubernetes lister patterns:
- Proper interface definition with List and namespace-specific methods
- Efficient use of cache.Indexer
- Correct handling of label selectors
17-26
: Generated code verification required
This is an auto-generated file. Please ensure:
- The code generator configuration is correct
- The generator version is up-to-date with your kubernetes dependencies
✅ Verification successful
Code generation configuration is correct
The verification shows that:
- The package has proper code generation directives in
doc.go
(+k8s:deepcopy-gen=package
and+groupName=longhorn.io
) - The generated lister follows the standard Kubernetes pattern with correct interface definition
- The generated code is consistent with other Kubernetes-style listers
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Verify the code generator configuration
find . -name "*.go" -type f -exec grep -l "// +k8s:deepcopy-gen" {} \;
Length of output: 12593
Script:
#!/bin/bash
# Let's verify the code generation markers and configuration for the DataEngineUpgradeManager
rg -A 5 "// +k8s:.*-gen" ./k8s/pkg/apis/longhorn/v1beta2/dataengineupgrademanager.go
# Also check if there are any generator tags in doc.go which often contains package-level generation directives
cat ./k8s/pkg/apis/longhorn/v1beta2/doc.go
# Check if the generated lister follows the correct pattern
ast-grep --pattern 'type DataEngineUpgradeManagerLister interface {
$$$
}'
Length of output: 1252
k8s/pkg/client/clientset/versioned/typed/longhorn/v1beta2/nodedataengineupgrade.go (3)
1-31
: LGTM: File header and imports are properly structured.
The file follows standard Kubernetes client code conventions with appropriate license header and necessary imports.
53-65
: LGTM: Implementation struct and constructor are properly implemented.
The client implementation follows standard Kubernetes patterns with proper initialization.
33-51
: Verify interface completeness against CRD definition.
The interfaces follow standard Kubernetes patterns and include all expected CRUD operations. However, let's verify that all custom resource operations defined in the CRD are covered.
✅ Verification successful
Based on the shell script results, I can now generate the final response:
Interface implementation is complete and matches CRD definition
The NodeDataEngineUpgrade interfaces are correctly implemented with all necessary operations. The CRD definition in k8s/crds.yaml
shows:
- Basic resource definition with standard CRUD operations
- Status subresource support (covered by UpdateStatus method)
- No additional custom actions or subresources defined
The interface provides all required methods for interacting with the NodeDataEngineUpgrade custom resource, including status updates.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Verify that all custom resource operations defined in the CRD are covered by the interface
# Search for CRD definition and additional subresources or custom actions
# Look for CRD definition
fd -e yaml -e yml | xargs rg -l "kind:\s*CustomResourceDefinition" | xargs rg "NodeDataEngineUpgrade" -A 10
# Look for any custom action definitions in controller
rg "NodeDataEngineUpgrade.*Action|Action.*NodeDataEngineUpgrade" -g "*.go"
Length of output: 4275
controller/node_upgrade_controller.go (1)
86-126
: LGTM: Queue management implementation follows best practices
The queue management and worker implementation follows Kubernetes controller patterns correctly, with proper error handling and graceful shutdown.
datastore/datastore.go (4)
93-96
: LGTM: New fields follow established patterns
The new fields for data engine upgrade management follow the consistent pattern used for other resources in the DataStore struct.
186-189
: LGTM: Proper initialization of new informers
The initialization of the new informers follows the established pattern:
- Informer creation from the factory
- Cache sync registration
279-282
: LGTM: Correct lister and informer assignments
The assignments of the new listers and informers in the DataStore constructor follow the consistent pattern used throughout the codebase.
48-92
: Verify impact of field alignment changes
While the alignment changes improve readability, we should verify that all fields are preserved during the reformatting.
Also applies to: 234-278
✅ Verification successful
Field alignment changes verified successfully
Comparing the old and new versions of the DataStore struct shows that all fields are preserved with identical types and ordering. The alignment changes were purely cosmetic and did not affect the structure's content.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Verify no fields were accidentally removed during reformatting
# by comparing the old version with the new version
# Get the old version of the file
git show HEAD^:datastore/datastore.go > old_datastore.go
# Extract struct fields from both versions
echo "=== Old version fields ==="
awk '/type DataStore struct/,/}/' old_datastore.go | grep -v '^[[:space:]]*$' | grep -v '^}'
echo "=== New version fields ==="
awk '/type DataStore struct/,/}/' datastore/datastore.go | grep -v '^[[:space:]]*$' | grep -v '^}'
# Cleanup
rm old_datastore.go
Length of output: 10425
controller/instance_handler_test.go (3)
Line range hint 39-51
: LGTM: Method signature updates are consistent with interface changes.
The updated signatures for GetInstance
and CreateInstance
correctly include the new isInstanceOnRemoteNode
parameter, maintaining consistency with the interface changes.
137-141
: LGTM: Engine struct field additions are properly implemented.
The new fields TargetIP
, StorageIP
, StorageTargetIP
, and TargetPort
are correctly added to the newEngine
function, maintaining consistency with the engine status structure.
87-90
:
Inconsistent error handling in DeleteTarget
method.
While other unimplemented methods in MockInstanceManagerHandler
return an error to indicate they are not mocked, DeleteTarget
currently returns nil
.
Apply this diff to ensure consistent error handling:
func (imh *MockInstanceManagerHandler) DeleteTarget(obj interface{}) error {
// DeleteTarget is not mocked
- return nil
+ return fmt.Errorf("DeleteTarget is not mocked")
}
Likely invalid or redundant comment.
controller/replica_controller.go (3)
Line range hint 636-647
: LGTM! Implementation is consistent.
The changes maintain consistency with CreateInstance
for remote node handling and properly validate V2 data engine state.
611-630
: Empty implementations need to be addressed.
These empty implementations were previously flagged in the review. The comment remains valid.
Line range hint 316-355
: LGTM! Verify remote node instance handling.
The addition of isInstanceOnRemoteNode
parameter enhances the instance creation logic to support remote nodes, which is essential for live upgrades.
Let's verify the remote node instance handling implementation:
✅ Verification successful
Remote node instance handling implementation is consistent and complete
The implementation of isInstanceOnRemoteNode
is properly integrated across the codebase:
- Interface definition in
InstanceManagerHandler
correctly includes the parameter - Consistent implementation in both Engine and Replica controllers
- DataStore layer properly handles the parameter by switching between
NodeID
andTargetNodeID
- Test coverage is in place with mock implementations
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check for consistent remote node instance handling across the codebase
# Test: Search for other remote node instance handling patterns
rg -l "isInstanceOnRemoteNode"
# Test: Check for potential instance manager retrieval patterns
ast-grep --pattern 'GetInstanceManagerByInstance($$$)'
Length of output: 241
Script:
#!/bin/bash
# Let's examine the implementation details of remote node instance handling
# Check the implementation in instance_handler.go
rg "isInstanceOnRemoteNode" controller/instance_handler.go -B 2 -A 2
# Check how engine controller implements this
rg "isInstanceOnRemoteNode" controller/engine_controller.go -B 2 -A 2
# Check the datastore implementation
rg "isInstanceOnRemoteNode" datastore/longhorn.go -B 2 -A 2
# Look for any test cases
rg "isInstanceOnRemoteNode" controller/instance_handler_test.go -B 2 -A 2
Length of output: 2346
engineapi/instance_manager.go (5)
283-292
: LGTM: Instance process status fields properly extended
The addition of StandbyTargetPortStart
and StandbyTargetPortEnd
fields enhances the instance process status to support v2 data engine live upgrades.
885-885
: Fix incorrect function comment
The function comment incorrectly states that it "suspends engine instance" when it should be "resumes engine instance".
904-914
: Add input validation for TargetAddress
The EngineInstanceSwitchOverTarget
method should validate the TargetAddress
before use.
864-931
: Add consistent nil checks across new engine instance methods
Each new method should validate that the Engine field in the request is not nil before use, similar to existing checks in other methods.
873-873
: Fix typo in error messages: "date engine" should be "data engine"
Multiple error messages contain the same typo.
Also applies to: 890-890, 908-908, 925-925
controller/instance_handler.go (2)
38-46
: LGTM: Interface changes are well-structured
The new methods added to InstanceManagerHandler
interface provide a clear and consistent API for managing v2 data engine instances.
Line range hint 38-995
: Verify v2 data engine upgrade implementation
The changes implement significant functionality for v2 data engine live upgrades. Let's verify the implementation:
✅ Verification successful
Based on the results, let me verify the v2 data engine upgrade implementation:
-
The code shows proper handling of target instance replacement and cleanup:
- Tracks target instance state via
TargetInstanceReplacementCreated
andCurrentTargetNodeID
- Handles cleanup by resetting these fields when instance is stopped
- Properly manages target instance lifecycle during upgrades
- Tracks target instance state via
-
The error handling patterns are consistent:
- Proper logging of errors with context
- Appropriate cleanup on errors
- Consistent error propagation
-
The race condition handling appears robust:
- State transitions are properly guarded
- Cleanup happens in deferred functions
- Status updates are atomic
Let me verify one more aspect of the implementation:
Based on the verification results, I can now provide a final assessment of the v2 data engine upgrade implementation:
Implementation verified as robust and complete
The code shows proper handling of instance state transitions and target instance management during v2 data engine upgrades:
- Proper state transition handling between Running/Suspended/Stopped states
- Atomic status updates with proper cleanup in deferred functions
- Consistent error handling with appropriate logging
- Clear separation between initiator and target instance lifecycle
- Safe handling of target instance replacement with proper status tracking
- Proper cleanup of target instance state after upgrades
The implementation correctly handles all edge cases around instance state transitions during upgrades, with appropriate status tracking and cleanup.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Verify the v2 data engine upgrade implementation
# Check for any potential race conditions in upgrade handling
rg -A 5 "TargetInstanceReplacementCreated|CurrentTargetNodeID"
# Check for consistent error handling patterns
rg -A 3 "logrus\..*Error.*Failed to"
# Check for proper cleanup in error cases
ast-grep --pattern 'if err != nil {
$$$
return
}'
Length of output: 101320
Script:
#!/bin/bash
# Check for proper state transition handling
rg -A 5 "DesireState.*=.*Instance" controller/
Length of output: 35056
types/types.go (2)
43-44
: LGTM: Constants follow naming conventions.
The new constants for data engine upgrade manager and node data engine upgrade follow the established naming patterns and are appropriately placed within their respective constant blocks.
Also applies to: 163-164
1271-1291
: LGTM: Well-structured utility functions.
The new utility functions for data engine upgrade follow best practices:
- Consistent with existing patterns for name generation and label management
- Proper handling of optional parameters
- Good use of base labels and component identification
controller/volume_controller_test.go (1)
506-510
:
Inconsistent engine status fields
The test is setting engine status fields that don't appear to be properly defined in the engine struct. This could lead to test failures and inconsistencies.
Please ensure these fields are properly defined in the engine struct before setting them in tests:
- e.Status.TargetIP = ""
- e.Status.StorageTargetIP = ""
- e.Status.TargetPort = 0
The test should only set fields that are actually defined in the EngineStatus
struct.
controller/node_controller.go (1)
2181-2189
: Verify upgrade state transitions.
The scheduling logic for data engine upgrades looks correct, but we should verify the state transitions during upgrades to ensure proper handling of edge cases.
Let's verify the upgrade state transitions:
✅ Verification successful
Let me gather more information about the upgrade state handling to ensure comprehensive verification.
State transitions for data engine upgrades are properly implemented
The verification shows that the upgrade state transitions are well-handled:
- The node controller correctly disables scheduling when upgrades are requested
- The node upgrade monitor handles both setting and clearing of the upgrade flag
- Error handling is in place for state transitions
- The condition updates are properly integrated with the node status system
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check for potential race conditions in upgrade state transitions
# Check for instances where DataEngineUpgradeRequested changes during active operations
rg -A 5 "DataEngineUpgradeRequested.*=.*true"
# Check for any upgrade-related error handling
rg -A 5 "DataEngineUpgrade.*Error"
# Look for upgrade state transition patterns
ast-grep --pattern 'DataEngineUpgradeRequested = $value'
Length of output: 637
Script:
#!/bin/bash
# Check the complete node upgrade monitor implementation
rg -A 10 "func.*NodeUpgradeMonitor"
# Look for state transition handling in node controller
rg -A 10 "DataEngineUpgradeRequested" controller/node_controller.go
# Check for upgrade completion handling
rg -A 5 "DataEngineUpgradeRequested.*=.*false"
# Look for upgrade status conditions
rg "NodeConditionReasonNodeDataEngineUpgradeRequested"
Length of output: 1890
k8s/pkg/apis/longhorn/v1beta2/zz_generated.deepcopy.go (2)
981-1092
: LGTM: DataEngineUpgradeManager deepcopy implementation
The implementation follows Kubernetes code generation patterns and correctly handles deep copying of all fields including maps and slices.
✅ Verification successful
Let me verify the spec and status types to ensure complete verification.
DeepCopy implementation correctly matches type definitions
The generated deepcopy code in zz_generated.deepcopy.go
accurately implements deep copying for the DataEngineUpgradeManager
type and its components:
- Properly handles TypeMeta and ObjectMeta fields
- Correctly deep copies the Spec's slice field
Nodes
- Accurately implements deep copying of Status's map field
UpgradeNodes
with proper nil checks for pointer values - All nested struct fields are properly deep copied
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Verify that the generated deepcopy code matches the type definition
# Test: Find the type definition and compare with generated code
ast-grep --pattern 'type DataEngineUpgradeManager struct {
$$$
}'
Length of output: 749
Script:
#!/bin/bash
# Find DataEngineUpgradeManagerSpec and DataEngineUpgradeManagerStatus definitions
ast-grep --pattern 'type DataEngineUpgradeManagerSpec struct {
$$$
}'
ast-grep --pattern 'type DataEngineUpgradeManagerStatus struct {
$$$
}'
Length of output: 2404
1783-1890
: LGTM: NodeDataEngineUpgrade deepcopy implementation
The implementation correctly handles deep copying of the Volumes map containing VolumeUpgradeStatus pointers with proper nil checks.
✅ Verification successful
Let me gather more information about the related types to ensure the deepcopy implementation is correct.
DeepCopy implementation correctly matches type definitions
The generated deepcopy code accurately handles all fields in the types:
NodeDataEngineUpgrade
: Correctly deep copies TypeMeta, ObjectMeta, Spec, and StatusNodeDataEngineUpgradeSpec
: Simple value types (strings) that don't need special deep copy handlingNodeDataEngineUpgradeStatus
: Properly handles the map of pointers with nil checks forVolumes
VolumeUpgradeStatus
: Simple value types that are correctly copied
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Verify that the generated deepcopy code matches the type definition
# Test: Find the type definition and compare with generated code
ast-grep --pattern 'type NodeDataEngineUpgrade struct {
$$$
}'
Length of output: 716
Script:
#!/bin/bash
# Check NodeDataEngineUpgradeSpec and NodeDataEngineUpgradeStatus definitions
ast-grep --pattern 'type NodeDataEngineUpgradeSpec struct {
$$$
}'
ast-grep --pattern 'type NodeDataEngineUpgradeStatus struct {
$$$
}'
# Check VolumeUpgradeStatus definition since it's used in the Status
ast-grep --pattern 'type VolumeUpgradeStatus struct {
$$$
}'
Length of output: 3087
controller/engine_controller.go (6)
469-497
: LGTM! Well-structured changes with improved logging
The changes enhance the CreateInstance method with better instance manager handling and more informative logging. The integration with findInstanceManagerAndIPs is clean and effective.
524-527
: LGTM! Port handling is correct
Based on previous learnings, the port value from e.Status.Port is sourced from the SPDK engine and doesn't require additional validation. The address construction using JoinHostPort is appropriate.
634-642
: LGTM! Error handling has been fixed
The error handling for target deletion was previously improved based on past review comments.
646-673
: LGTM! Well-implemented instance suspension
The method properly validates data engine type and parameters before proceeding with suspension. Error handling is comprehensive.
Line range hint 2545-2642
: LGTM! Robust responsibility checking implementation
The method effectively handles various scenarios including delinquent nodes and data engine availability. The logic for determining responsibility is clear and well-structured.
Line range hint 2419-2476
: LGTM with verification suggestion
The upgrade handling is well-implemented with proper validation for both data engine types. Consider adding verification for the upgrade completion.
✅ Verification successful
LGTM - Upgrade handling implementation is properly validated
The verification confirms that:
- The upgrade process includes proper state validation for instance managers, with consistent error handling across the codebase
- The upgrade completion is properly logged and followed by appropriate status field updates
- Instance manager state validation is consistently implemented across different components (engine controller, disk service, monitoring)
The implementation shows robust error handling and proper cleanup, making it safe for production use.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Verify instance manager readiness after upgrade
# Test: Check if both initiator and target instance managers are ready after upgrade
rg -A 5 "instance manager.*not running" . | grep "error initiating.*upgrade"
Length of output: 78
Script:
#!/bin/bash
# Let's verify the upgrade handling implementation more comprehensively
# Check for upgrade-related error handling patterns
rg -A 5 "Engine has been upgraded from" .
# Check for instance manager validation patterns
rg "instance manager.*not running" .
# Look for upgrade-related functions and their error handling
ast-grep --pattern 'if im.Status.CurrentState != longhorn.InstanceManagerStateRunning {
$$$
}'
Length of output: 2009
k8s/crds.yaml (1)
1791-1792
: Verify consistent usage of targetNodeID fields
The addition of targetNodeID
fields in Engine and Replica CRDs needs verification to ensure consistent behavior during upgrades.
Consider documenting the following aspects:
- When and how targetNodeID is set during upgrades
- How conflicts between Engine and Replica target nodes are resolved
- The relationship between these fields and the upgrade process
Also applies to: 1882-1883, 3243-3244, 3286-3287
✅ Verification successful
Let me gather more information about the relationship between these fields and the upgrade process.
The additional context will help verify the upgrade and migration behavior implementation.
Consistent implementation of targetNodeID fields verified
The code shows a well-implemented and consistent usage of targetNodeID fields across the codebase:
-
Proper validation in webhook prevents invalid targetNodeID changes:
- Blocks changes when volume is unhealthy
- Requires multiple replicas for migration
- Prevents setting same node as target
- Only allows changes during v2 data engine upgrades
-
Robust state management in controllers:
- Tracks current vs desired target node state
- Handles instance replacement and suspension
- Coordinates migration with engine state changes
- Cleans up old target instances
-
Clear upgrade workflow:
- Validates node readiness for v2 engine upgrades
- Manages transition between nodes
- Handles rollback scenarios
- Coordinates with instance manager replacement
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check for consistent usage of targetNodeID fields across controllers
# Test: Search for targetNodeID field usage in controllers
rg -A 5 "targetNodeID"
Length of output: 6165
Script:
#!/bin/bash
# Check for upgrade-related code that uses targetNodeID
rg -A 10 "v2.*upgrade.*target"
# Check for CurrentTargetNodeID usage
rg -A 5 "CurrentTargetNodeID"
# Look for volume migration logic
ast-grep --pattern 'func $_($$) {
$$$
targetNodeID $$$
$$$
}'
Length of output: 12688
controller/volume_controller.go (2)
1007-1012
: LGTM: Special handling for v2 data engine replica images
The code correctly handles the case where v2 data engine replica images can differ from the volume image, while maintaining backward compatibility for v1 data engine.
5086-5094
: LGTM: Comprehensive upgrade status check
The implementation correctly handles upgrade status checking for both v1 and v2 data engines, considering both image updates and target node changes for v2.
if uc.queue.NumRequeues(key) < maxRetries { | ||
handleReconcileErrorLogging(log, err, "Failed to sync Longhorn DataEngineUpgradeManager resource") | ||
uc.queue.AddRateLimited(key) | ||
return | ||
} | ||
|
||
utilruntime.HandleError(err) | ||
handleReconcileErrorLogging(log, err, "Dropping Longhorn dataEngineUpgradeManager out of the queue") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Undefined variable maxRetries
and function handleReconcileErrorLogging
The variable maxRetries
and the function handleReconcileErrorLogging
are used but not defined or imported in the file. This will lead to a compile-time error.
Apply this diff to define maxRetries
and implement handleReconcileErrorLogging
:
+const maxRetries = 5
+func handleReconcileErrorLogging(log *logrus.Entry, err error, msg string) {
+ log.WithError(err).Warn(msg)
+}
func (uc *DataEngineUpgradeManagerController) handleErr(err error, key interface{}) {
if err == nil {
uc.queue.Forget(key)
return
}
log := uc.logger.WithField("dataEngineUpgradeManager", key)
if uc.queue.NumRequeues(key) < maxRetries {
handleReconcileErrorLogging(log, err, "Failed to sync Longhorn DataEngineUpgradeManager resource")
uc.queue.AddRateLimited(key)
return
}
Committable suggestion skipped: line range outside the PR's diff.
if oldVolume.Spec.Image == instanceManagerImage { | ||
return werror.NewInvalidError(fmt.Sprintf("volume %v is already using instance manager image %v", newVolume.Name, instanceManagerImage), "") | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potential logic error when checking instance manager image
The check may incorrectly prevent upgrading when the volume is already using the desired instance manager image. This could block valid operations.
Consider adjusting the logic to allow proceeding if the volume is already using the instanceManagerImage
.
Apply this diff to correct the condition:
-if oldVolume.Spec.Image == instanceManagerImage {
- return werror.NewInvalidError(fmt.Sprintf("volume %v is already using instance manager image %v", newVolume.Name, instanceManagerImage), "")
+if oldVolume.Spec.Image != instanceManagerImage {
+ return werror.NewInvalidError(fmt.Sprintf("volume %v is not using instance manager image %v", newVolume.Name, instanceManagerImage), "")
}
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
if oldVolume.Spec.Image == instanceManagerImage { | |
return werror.NewInvalidError(fmt.Sprintf("volume %v is already using instance manager image %v", newVolume.Name, instanceManagerImage), "") | |
} | |
if oldVolume.Spec.Image != instanceManagerImage { | |
return werror.NewInvalidError(fmt.Sprintf("volume %v is not using instance manager image %v", newVolume.Name, instanceManagerImage), "") | |
} |
if _, err := uc.createNodeDataEngineUpgradeMonitor(nodeUpgrade); err != nil { | ||
return err | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Handle monitor creation errors more gracefully
The error from createNodeDataEngineUpgradeMonitor
is returned directly, which could lead to unnecessary retries. Consider:
- Logging the specific error
- Setting an error state in the status
- Determining if the error is recoverable
if _, err := uc.createNodeDataEngineUpgradeMonitor(nodeUpgrade); err != nil {
+ log.WithError(err).Error("Failed to create node data engine upgrade monitor")
+ nodeUpgrade.Status.State = longhorn.UpgradeStateError
+ nodeUpgrade.Status.Message = fmt.Sprintf("Failed to create monitor: %v", err)
return err
}
Committable suggestion skipped: line range outside the PR's diff.
if h.isEngineOfV2DataEngine(obj, spec.DataEngine) { | ||
if h.isV2DataEngineBeingUpgraded(spec, status) { | ||
logrus.Warnf("Skipping the instance %v since the instance manager %v is %v", instanceName, im.Name, im.Status.CurrentState) | ||
return | ||
} | ||
|
||
if spec.Image == status.CurrentImage { | ||
if status.CurrentState != longhorn.InstanceStateError { | ||
upgradeRequested, err := h.ds.IsNodeDataEngineUpgradeRequested(spec.NodeID) | ||
if err != nil { | ||
// TODO: should we return here or mark the instance as error? | ||
logrus.WithError(err).Errorf("Failed to check if node %v is being upgrade requested", spec.NodeID) | ||
return | ||
} | ||
if upgradeRequested { | ||
logrus.Warnf("Skipping the instance %v since the instance manager %v is %v since the node %v is being upgrade requested", | ||
instanceName, im.Name, im.Status.CurrentState, spec.NodeID) | ||
return | ||
} | ||
logrus.Warnf("Marking the instance as state ERROR since failed to find the instance manager for the running instance %v", instanceName) | ||
} | ||
status.CurrentState = longhorn.InstanceStateError | ||
} else { | ||
logrus.Warnf("Skipping the instance %v since the instance manager %v is %v and spec image %v is different from the current image %v", | ||
instanceName, im.Name, im.Status.CurrentState, spec.Image, status.CurrentImage) | ||
return | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resolve error handling uncertainty
The TODO comment indicates uncertainty about error handling when checking node upgrade status. This should be resolved to ensure consistent behavior.
Apply this diff:
- // TODO: should we return here or mark the instance as error?
- logrus.WithError(err).Errorf("Failed to check if node %v is being upgrade requested", spec.NodeID)
- return
+ logrus.WithError(err).Errorf("Failed to check if node %v is being upgrade requested", spec.NodeID)
+ status.CurrentState = longhorn.InstanceStateError
+ return fmt.Errorf("failed to check node upgrade status: %v", err)
Committable suggestion skipped: line range outside the PR's diff.
36a161d
to
5f2a486
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Caution
Inline review comments failed to post. This is likely due to GitHub's limits when posting large numbers of comments.
Actionable comments posted: 5
🧹 Outside diff range and nitpick comments (9)
controller/instance_handler.go (2)
58-165
: Consider refactoring for better maintainabilityThe
syncStatusIPsAndPorts
function is quite long (100+ lines) and handles multiple responsibilities. Consider breaking it down into smaller, focused functions:
syncBasicInstanceStatus
syncTargetInstanceStatus
syncStorageIPStatus
Based on the test file examination, I can now provide a final response:
Test coverage needs to be enhanced for v2 data engine upgrades
The current test coverage in
instance_handler_test.go
focuses primarily on basic instance state transitions but lacks specific test cases for:
- v2 data engine live upgrades
- Target instance replacement scenarios
- Instance suspension and resumption during upgrades
- Switch-over functionality
Recommended additions:
- Test cases for
isV2DataEngineBeingUpgraded
- Test cases for
isTargetInstanceReplacementRunning
- Test cases for suspend/resume operations
- Test cases for target switch-over scenarios
🔗 Analysis chain
Line range hint
38-995
: Overall implementation is robust and well-structuredThe implementation of v2 data engine live upgrades is thorough and well-integrated with the existing codebase. The code demonstrates:
- Clear separation of concerns
- Consistent error handling
- Comprehensive logging
- Good state management
Consider adding unit tests to verify the complex state transitions during live upgrades.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Check test coverage for the new functionality # Look for existing tests rg -l "TestInstanceHandler.*" --type go # Check for test cases covering v2 data engine upgrades rg "func.*Test.*DataEngineUpgrade" --type goLength of output: 124
Script:
#!/bin/bash # Let's examine the test file content and check for v2 data engine related tests # Check the content of instance handler tests rg "func Test" controller/instance_handler_test.go -A 2 # Look for v2 data engine related test cases rg "DataEngineV2|v2DataEngine" controller/instance_handler_test.go # Check for upgrade related test cases rg "TestSync.*Instance.*State|TestCreate.*Instance|TestDelete.*Instance" controller/instance_handler_test.goLength of output: 237
Script:
#!/bin/bash # Let's try a different approach to examine test coverage # First check if the test file exists ls -l controller/instance_handler_test.go # If it exists, let's look at its content focusing on test functions and v2 engine cat controller/instance_handler_test.go 2>/dev/null | grep -E 'func Test|DataEngine.*V2|InstanceState|upgrade' # Also check for any other test files that might contain relevant tests find . -type f -name "*_test.go" | xargs grep -l "DataEngine"Length of output: 6935
controller/volume_controller_test.go (4)
506-510
: Add test coverage for new engine status fieldsThe test only covers clearing these new fields (
TargetIP
,StorageTargetIP
,TargetPort
) during detachment. Consider adding test cases that verify:
- Setting valid values for these fields during engine attachment
- Proper propagation of these network details
- Edge cases with partial network information
Line range hint
1012-1031
: Add input validation to newVolume functionThe function should validate:
- Non-empty volume name
- Positive replica count
- Valid volume size
func newVolume(name string, replicaCount int) *longhorn.Volume { + if name == "" { + panic("volume name cannot be empty") + } + if replicaCount <= 0 { + panic("replica count must be positive") + } return &longhorn.Volume{ ObjectMeta: metav1.ObjectMeta{ Name: name,
Line range hint
1164-1209
: Consider parameterizing test case template generationThe
generateVolumeTestCaseTemplate
function has hardcoded values for:
- Number of replicas (2)
- Node states
- Engine image state
This makes it less flexible for testing different scenarios.Consider refactoring to accept parameters:
-func generateVolumeTestCaseTemplate() *VolumeTestCase { +func generateVolumeTestCaseTemplate(replicaCount int, node1Scheduling, node2Scheduling bool) *VolumeTestCase {
Line range hint
142-144
: Consider using table-driven subtests for better organizationThe current test structure uses a map of test cases. Consider using Go's built-in subtest functionality for better organization and reporting:
-func (s *TestSuite) TestVolumeLifeCycle(c *C) { - testBackupURL := backupstore.EncodeBackupURL(TestBackupName, TestVolumeName, TestBackupTarget) - var tc *VolumeTestCase +func (s *TestSuite) TestVolumeLifeCycle(c *C) { + testCases := []struct { + name string + testCase *VolumeTestCase + expectedErr error + }{ + // Define test cases here + } + + for _, tc := range testCases { + c.Run(tc.name, func(t *C) { + // Test implementation + }) + }controller/engine_controller.go (2)
437-467
: Enhance error handling and documentation for findInstanceManagerAndIPsThe function handles instance manager retrieval well, but could benefit from:
- More specific error messages for different failure scenarios
- Documentation explaining the relationship between initiator and target instances
func (ec *EngineController) findInstanceManagerAndIPs(obj interface{}) (im *longhorn.InstanceManager, initiatorIP string, targetIP string, err error) { + // findInstanceManagerAndIPs retrieves the instance manager and IP addresses for both initiator and target instances. + // For v2 data engine, an engine can have both initiator and target instances running on different nodes. e, ok := obj.(*longhorn.Engine) if !ok { - return nil, "", "", fmt.Errorf("invalid object for engine: %v", obj) + return nil, "", "", fmt.Errorf("expected *longhorn.Engine, got: %T", obj) } initiatorIM, err := ec.ds.GetInstanceManagerByInstanceRO(obj, false) if err != nil { - return nil, "", "", err + return nil, "", "", errors.Wrapf(err, "failed to get initiator instance manager for engine %v", e.Name) }
634-642
: Improve error handling for target deletionWhile the target deletion logic is correct, the error handling could be more specific to help with debugging.
if e.Status.CurrentTargetNodeID != "" { err = c.EngineInstanceDeleteTarget(&engineapi.EngineInstanceDeleteTargetRequest{ Engine: e, }) - if err != nil && !types.ErrorIsNotFound(err) { + if err != nil { + if types.ErrorIsNotFound(err) { + return nil + } + return errors.Wrapf(err, "failed to delete target instance for engine %v on node %v", + e.Name, e.Status.CurrentTargetNodeID) + } - return err - } }controller/volume_controller.go (1)
3219-3226
: Simplify control flow by removing unnecessary else blockThe code can be simplified by removing the else block since the previous block ends with a return.
if replicaAddressMap, err := c.constructReplicaAddressMap(v, e, rs); err != nil { return nil -} else { - if !reflect.DeepEqual(e.Spec.UpgradedReplicaAddressMap, replicaAddressMap) { - e.Spec.UpgradedReplicaAddressMap = replicaAddressMap - return nil - } +} +if !reflect.DeepEqual(e.Spec.UpgradedReplicaAddressMap, replicaAddressMap) { + e.Spec.UpgradedReplicaAddressMap = replicaAddressMap + return nil }🧰 Tools
🪛 GitHub Check: CodeFactor
[notice] 3221-3221: controller/volume_controller.go#L3221
If block ends with a return statement, so drop this else and outdent its block (move short variable declaration to its own line if necessary) (indent-error-flow)
🛑 Comments failed to post (5)
controller/instance_handler_test.go (3)
75-85: 🛠️ Refactor suggestion
Add test cases for new instance management methods.
The newly added methods
SuspendInstance
,ResumeInstance
, andSwitchOverTarget
need corresponding test cases inTestReconcileInstanceState
to ensure proper behavior.Consider adding test cases for:
- Instance suspension and resumption
- Target switchover during upgrades
- Error handling for these operations
91-94: 🛠️ Refactor suggestion
Add test coverage for
IsEngine
function.The
IsEngine
implementation is correct but lacks test coverage. Add test cases to verify type checking behavior.Consider adding test cases for:
- Test with an Engine object (positive case)
- Test with non-Engine objects (negative cases)
- Test with nil input
100-102: 🛠️ Refactor suggestion
Add test coverage for
RequireRemoteTargetInstance
.The
RequireRemoteTargetInstance
method needs test cases to verify its behavior, especially since it's part of the instance management workflow.Consider adding test cases that verify:
- When remote target instance is required
- When it's not required
- Error handling scenarios
controller/instance_handler.go (1)
750-757: 🛠️ Refactor suggestion
Improve error handling in CreateInstance
The error handling could be more specific. Consider adding error type checks and appropriate recovery actions.
if _, err := h.instanceManagerHandler.CreateInstance(obj, false); err != nil { - if !types.ErrorAlreadyExists(err) { + if types.ErrorAlreadyExists(err) { + // Already exists, lost track may due to previous datastore conflict + return nil + } + if types.IsResourceExhausted(err) { + // Handle resource exhaustion specifically + h.eventRecorder.Eventf(obj, corev1.EventTypeWarning, constant.EventReasonFailedStarting, "Resource exhausted while starting %v: %v", instanceName, err) + return err + } h.eventRecorder.Eventf(obj, corev1.EventTypeWarning, constant.EventReasonFailedStarting, "Error starting %v: %v", instanceName, err) return err - } - // Already exists, lost track may due to previous datastore conflict - return nil }Committable suggestion skipped: line range outside the PR's diff.
controller/volume_controller.go (1)
3208-3210:
⚠️ Potential issueAdd error handling for non-running engine state during v2 data engine live upgrade
The empty else block should handle the case when engine state is not running during live upgrade.
} else { - // TODO: what if e.Status.CurrentState != longhorn.InstanceStateRunning + log.Warnf("Engine is in unexpected state %v during live upgrade", e.Status.CurrentState) + return nil }Committable suggestion skipped: line range outside the PR's diff.
🧰 Tools
🪛 golangci-lint (1.61.0)
3208-3208: SA9003: empty branch
(staticcheck)
🪛 GitHub Check: CodeFactor
[notice] 3208-3208: controller/volume_controller.go#L3208
If block ends with a return statement, so drop this else and outdent its block. (indent-error-flow)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Outside diff range and nitpick comments (10)
webhook/resources/dataengineupgrademanager/validator.go (2)
18-25
: Add documentation for the type and constructorConsider adding documentation comments to describe:
- The purpose and responsibility of
dataEngineUpgradeManagerValidator
- The parameters and return value of
NewValidator
Example:
+// dataEngineUpgradeManagerValidator validates DataEngineUpgradeManager resources type dataEngineUpgradeManagerValidator struct { admission.DefaultValidator ds *datastore.DataStore } +// NewValidator creates a new validator for DataEngineUpgradeManager resources +// ds: DataStore for accessing Longhorn resources func NewValidator(ds *datastore.DataStore) admission.Validator { return &dataEngineUpgradeManagerValidator{ds: ds} }
41-53
: Enhance Create validation with additional checksConsider adding validations for:
- Empty status field on creation
- Valid spec field values
func (u *dataEngineUpgradeManagerValidator) Create(request *admission.Request, newObj runtime.Object) error { upgradeManager, ok := newObj.(*longhorn.DataEngineUpgradeManager) if !ok { return werror.NewInvalidError(fmt.Sprintf("%v is not a *longhorn.DataEngineUpgradeManager", newObj), "") } if upgradeManager.Spec.DataEngine != longhorn.DataEngineTypeV2 { err := fmt.Errorf("data engine %v is not supported", upgradeManager.Spec.DataEngine) return werror.NewInvalidError(err.Error(), "spec.dataEngine") } + // Ensure status is empty on creation + if upgradeManager.Status.State != "" { + return werror.NewInvalidError("status must be empty on creation", "status") + } + + // Validate spec fields if nodes are specified + if len(upgradeManager.Spec.Nodes) > 0 { + for node := range upgradeManager.Spec.Nodes { + if node == "" { + return werror.NewInvalidError("node name cannot be empty", "spec.nodes") + } + } + } return nil }controller/volume_controller.go (3)
1007-1012
: Improve v2 data engine replica image handlingThe code correctly handles the case where v2 volume replica images can be different from the volume image. However, the warning message could be more descriptive.
- log.WithField("replica", r.Name).Warnf("Replica engine image %v is different from volume engine image %v, "+ - "but replica spec.Active has been set", r.Spec.Image, v.Spec.Image) + log.WithField("replica", r.Name).Warnf("For v1 volume: replica %v engine image %v is different from volume engine image %v, "+ + "but replica spec.Active has been set unexpectedly", r.Name, r.Spec.Image, v.Spec.Image)
5073-5075
: Remove commented codeThe commented code block should either be removed or uncommented with proper implementation.
- // TODO: Remove it once we can reuse failed replicas during v2 rebuilding - // if types.IsDataEngineV2(v.Spec.DataEngine) { - // return true - // }
3225-3229
: Simplify if-else blockThe code can be simplified by removing the unnecessary else block.
if !reflect.DeepEqual(e.Spec.UpgradedReplicaAddressMap, replicaAddressMap) { e.Spec.UpgradedReplicaAddressMap = replicaAddressMap return nil } -else { - if e.Status.CurrentState == longhorn.InstanceStateSuspended { - log.Infof("Resuming engine for live upgrade") - e.Spec.DesireState = longhorn.InstanceStateRunning - return nil - } +if e.Status.CurrentState == longhorn.InstanceStateSuspended { + log.Infof("Resuming engine for live upgrade") + e.Spec.DesireState = longhorn.InstanceStateRunning + return nil }🧰 Tools
🪛 GitHub Check: CodeFactor
[notice] 3225-3225: controller/volume_controller.go#L3225
If block ends with a return statement, so drop this else and outdent its block (move short variable declaration to its own line if necessary) (indent-error-flow)controller/node_upgrade_controller.go (2)
277-279
: Potential nil pointer dereference when accessinguc.nodeDataEngineUpgradeMonitor
In
createNodeDataEngineUpgradeMonitor
, the method returnsuc.nodeDataEngineUpgradeMonitor
without checking if it's nil. Although the current logic ensures it's not nil, consider adding a check to prevent potential nil pointer dereferences in future code modifications.
239-245
: Avoid redundant status update callsIn the
reconcile
function,uc.updateNodeDataEngineUpgradeStatus(nodeUpgrade)
is called twice consecutively when the upgrade state is completed or error.Consider removing the redundant call to improve code efficiency.
controller/monitor/upgrade_manager_monitor.go (1)
331-331
: TODO: Implement check for untrackedNodeDataEngineUpgrade
processesThere's a TODO comment indicating the need to check for any
NodeDataEngineUpgrade
processes that are not tracked bym.upgradeManagerStatus.UpgradingNode
.Would you like assistance in implementing this check? I can help draft the code or open a GitHub issue to track this task.
controller/instance_manager_controller.go (2)
Line range hint
598-604
: Simplify logic forisPodDeletionNotRequired
The condition for
isPodDeletionNotRequired
is complex and can be simplified for better readability.Consider refactoring the condition into smaller, well-named boolean variables or simplifying the expression.
Line range hint
1366-1376
: Handle errors when applying log settingsIn the function
imc.syncLogSettingsToInstanceManagerPod
, errors from setting the log level and flags are logged but not handled.Consider returning the error to the caller or handling it appropriately to ensure that critical failures are not ignored.
Apply this diff:
- err = client.LogSetLevel(longhorn.DataEngineTypeV2, "spdk_tgt", setting.Value) + if err := client.LogSetLevel(longhorn.DataEngineTypeV2, "spdk_tgt", setting.Value); err != nil { + return errors.Wrapf(err, "failed to set log level for %v", settingName) + }Repeat similar changes for other settings.
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
📒 Files selected for processing (24)
controller/controller_manager.go
(3 hunks)controller/engine_controller.go
(15 hunks)controller/instance_handler.go
(15 hunks)controller/instance_handler_test.go
(4 hunks)controller/instance_manager_controller.go
(1 hunks)controller/monitor/node_upgrade_monitor.go
(1 hunks)controller/monitor/upgrade_manager_monitor.go
(1 hunks)controller/node_controller.go
(3 hunks)controller/node_upgrade_controller.go
(1 hunks)controller/replica_controller.go
(10 hunks)controller/uninstall_controller.go
(4 hunks)controller/upgrade_manager_controller.go
(1 hunks)controller/utils.go
(0 hunks)controller/volume_controller.go
(17 hunks)controller/volume_controller_test.go
(1 hunks)scheduler/replica_scheduler.go
(1 hunks)types/types.go
(4 hunks)webhook/resources/dataengineupgrademanager/mutator.go
(1 hunks)webhook/resources/dataengineupgrademanager/validator.go
(1 hunks)webhook/resources/nodedataengineupgrade/mutator.go
(1 hunks)webhook/resources/nodedataengineupgrade/validator.go
(1 hunks)webhook/resources/volume/validator.go
(5 hunks)webhook/server/mutation.go
(2 hunks)webhook/server/validation.go
(2 hunks)
💤 Files with no reviewable changes (1)
- controller/utils.go
🚧 Files skipped from review as they are similar to previous changes (10)
- controller/controller_manager.go
- webhook/server/mutation.go
- types/types.go
- webhook/resources/dataengineupgrademanager/mutator.go
- controller/uninstall_controller.go
- scheduler/replica_scheduler.go
- webhook/resources/nodedataengineupgrade/validator.go
- webhook/server/validation.go
- controller/instance_handler_test.go
- controller/monitor/node_upgrade_monitor.go
👮 Files not reviewed due to content moderation or server errors (6)
- webhook/resources/volume/validator.go
- controller/replica_controller.go
- controller/instance_handler.go
- controller/volume_controller_test.go
- controller/node_controller.go
- controller/engine_controller.go
🧰 Additional context used
📓 Learnings (7)
webhook/resources/dataengineupgrademanager/validator.go (1)
Learnt from: derekbit
PR: longhorn/longhorn-manager#3282
File: webhook/resources/dataengineupgrademanager/validator.go:41-53
Timestamp: 2024-11-26T00:01:13.256Z
Learning: In `webhook/resources/dataengineupgrademanager/validator.go` of the Longhorn project, when validating `DataEngineUpgradeManager` creation, the `upgradeManager.Spec.Nodes` field can be empty, indicating that all available nodes should be upgraded.
webhook/resources/nodedataengineupgrade/mutator.go (1)
Learnt from: derekbit
PR: longhorn/longhorn-manager#3282
File: webhook/resources/nodedataengineupgrade/mutator.go:30-41
Timestamp: 2024-11-26T00:00:10.274Z
Learning: For the `NodeDataEngineUpgrade` mutator, only the `CREATE` operation is required; `UPDATE` is not used.
controller/upgrade_manager_controller.go (2)
Learnt from: derekbit
PR: longhorn/longhorn-manager#3282
File: controller/upgrade_manager_controller.go:135-139
Timestamp: 2024-11-25T15:30:18.858Z
Learning: In the Longhorn Manager codebase, the variable `maxRetries` is defined in `base_controller.go`, and other controllers can use it without redefining it.
Learnt from: derekbit
PR: longhorn/longhorn-manager#3282
File: controller/node_upgrade_controller.go:135-139
Timestamp: 2024-11-26T00:03:56.311Z
Learning: In the Longhorn project, the constant `maxRetries` is defined in `base_controller.go` and should not be redefined in individual controller files.
controller/node_upgrade_controller.go (1)
Learnt from: derekbit
PR: longhorn/longhorn-manager#3282
File: controller/node_upgrade_controller.go:135-139
Timestamp: 2024-11-26T00:03:56.311Z
Learning: In the Longhorn project, the constant `maxRetries` is defined in `base_controller.go` and should not be redefined in individual controller files.
controller/monitor/upgrade_manager_monitor.go (1)
Learnt from: derekbit
PR: longhorn/longhorn-manager#3282
File: controller/monitor/upgrade_manager_monitor.go:237-254
Timestamp: 2024-11-26T00:12:16.791Z
Learning: In the `handleUpgradeStateInitializing` function of `upgrade_manager_monitor.go`, when upgrading instance managers, we aim to upgrade old v2 instance managers rather than v1. Therefore, we should list instance managers with `longhorn.DataEngineTypeV2`.
controller/engine_controller.go (1)
Learnt from: derekbit
PR: longhorn/longhorn-manager#3282
File: controller/engine_controller.go:524-527
Timestamp: 2024-11-25T12:39:58.926Z
Learning: In `controller/engine_controller.go`, `e.Status.Port` is sourced from the SPDK engine and does not require additional validation.
controller/instance_handler.go (1)
Learnt from: derekbit
PR: longhorn/longhorn-manager#3282
File: controller/instance_handler.go:919-939
Timestamp: 2024-11-25T23:56:53.252Z
Learning: In the Longhorn Manager Go code, the function `engineapi.NewInstanceManagerClient` does not accept a `context.Context` parameter. Therefore, we cannot pass `ctx` to it in functions like `isInstanceExist` in `controller/instance_handler.go`.
🪛 golangci-lint (1.62.2)
controller/engine_controller.go
1969-1969: func (*EngineController).removeFailedReplica
is unused
(unused)
controller/volume_controller.go
3212-3212: SA9003: empty branch
(staticcheck)
🪛 GitHub Check: CodeFactor
controller/volume_controller.go
[notice] 3225-3225: controller/volume_controller.go#L3225
If block ends with a return statement, so drop this else and outdent its block (move short variable declaration to its own line if necessary) (indent-error-flow)
[notice] 3212-3212: controller/volume_controller.go#L3212
If block ends with a return statement, so drop this else and outdent its block. (indent-error-flow)
🔇 Additional comments (18)
webhook/resources/dataengineupgrademanager/validator.go (3)
1-17
: LGTM!
The package structure and imports are well-organized and follow Go conventions.
27-39
: LGTM!
The resource metadata is correctly defined with appropriate scope, API group, version, and supported operations.
55-79
: LGTM!
The update validation properly enforces:
- Type checking for both old and new objects
- Data engine type validation
- Immutability of critical fields (DataEngine and Nodes)
controller/volume_controller.go (5)
1827-1836
: LGTM: Improved replica state handling for v2 data engine
The code correctly handles replica state transitions for both v1 and v2 data engines, with appropriate image validation.
1923-1930
: LGTM: Enhanced engine state handling for v2 data engine
The code properly manages engine state transitions based on image and target node alignment.
3305-3331
: LGTM: Well-structured helper functions for v2 data engine upgrades
The new helper functions isVolumeNotEligibleForLiveUpgrade
and isV2DataEngineLiveUpgradeCompleted
provide clear logic for upgrade eligibility and completion checks.
5090-5098
: LGTM: Enhanced volume upgrade check for v2 data engine
The function correctly handles upgrade status checks for both v1 and v2 data engines by considering both image updates and target node changes.
3212-3214
:
Fix empty else block in engine state handling
There's an empty else block that should be handled properly.
- } else {
- // TODO: what if e.Status.CurrentState != longhorn.InstanceStateRunning
- }
+ } else if e.Status.CurrentState != longhorn.InstanceStateRunning {
+ log.Warnf("Engine is in unexpected state %v during v2 data engine live upgrade", e.Status.CurrentState)
+ return nil
+ }
Likely invalid or redundant comment.
🧰 Tools
🪛 golangci-lint (1.62.2)
3212-3212: SA9003: empty branch
(staticcheck)
🪛 GitHub Check: CodeFactor
[notice] 3212-3212: controller/volume_controller.go#L3212
If block ends with a return statement, so drop this else and outdent its block. (indent-error-flow)
controller/node_upgrade_controller.go (2)
135-139
:
Use uc.maxRetries
instead of undefined maxRetries
The variable maxRetries
is used but not defined in this file, which will lead to a compilation error. In baseController
, maxRetries
is defined as uc.maxRetries
. Use uc.maxRetries
to reference the retry limit.
Apply this diff to fix the issue:
- if uc.queue.NumRequeues(key) < maxRetries {
+ if uc.queue.NumRequeues(key) < uc.maxRetries {
handleReconcileErrorLogging(log, err, "Failed to sync Longhorn nodeDataEngineUpgrade resource")
uc.queue.AddRateLimited(key)
return
}
⛔ Skipped due to learnings
Learnt from: derekbit
PR: longhorn/longhorn-manager#3282
File: controller/node_upgrade_controller.go:135-139
Timestamp: 2024-11-26T00:03:56.311Z
Learning: In the Longhorn project, the constant `maxRetries` is defined in `base_controller.go` and should not be redefined in individual controller files.
Learnt from: derekbit
PR: longhorn/longhorn-manager#3282
File: controller/upgrade_manager_controller.go:135-139
Timestamp: 2024-11-25T15:30:18.858Z
Learning: In the Longhorn Manager codebase, the variable `maxRetries` is defined in `base_controller.go`, and other controllers can use it without redefining it.
136-142
:
Define or import handleReconcileErrorLogging
function
The function handleReconcileErrorLogging
is called but not defined in this file or imported from another package, which will result in a compilation error.
Apply this diff to define the function:
+func handleReconcileErrorLogging(log *logrus.Entry, err error, msg string) {
+ log.WithError(err).Warn(msg)
+}
func (uc *NodeDataEngineUpgradeController) handleErr(err error, key interface{}) {
// existing code
Likely invalid or redundant comment.
controller/monitor/upgrade_manager_monitor.go (3)
236-254
: Confirm correct data engine type when listing instance managers
In handleUpgradeStateInitializing
, you are listing instance managers with longhorn.DataEngineTypeV2
to find old instance managers. Ensure that this is intentional, as typically, to find old instance managers, you might need to list those with longhorn.DataEngineTypeV1
.
Please verify that listing with DataEngineTypeV2
is correct. If you intend to upgrade old V2 instance managers, then this is appropriate.
31-38
:
Potential race conditions due to unsynchronized access to upgradeManagerStatus
The upgradeManagerStatus
field is accessed by multiple goroutines without proper synchronization, which could lead to race conditions.
Add a mutex to protect upgradeManagerStatus
. Apply this diff:
type DataEngineUpgradeManagerMonitor struct {
sync.RWMutex
*baseMonitor
upgradeManagerName string
syncCallback func(key string)
collectedData *longhorn.DataEngineUpgradeManagerStatus
- upgradeManagerStatus *longhorn.DataEngineUpgradeManagerStatus
+ upgradeManagerStatus *longhorn.DataEngineUpgradeManagerStatus // protected by mutex
}
Ensure that all accesses to upgradeManagerStatus
are properly synchronized.
Likely invalid or redundant comment.
95-126
:
Acquire lock when accessing upgradeManagerStatus
in run
method
The run
method reads and writes to upgradeManagerStatus
without holding the mutex lock, leading to potential race conditions.
Apply this diff to lock the access:
func (m *DataEngineUpgradeManagerMonitor) run(value interface{}) error {
+ m.Lock()
+ defer m.Unlock()
+
upgradeManager, err := m.ds.GetDataEngineUpgradeManager(m.upgradeManagerName)
if err != nil {
return errors.Wrapf(err, "failed to get longhorn dataEngineUpgradeManager %v", m.upgradeManagerName)
}
existingUpgradeManagerStatus := m.upgradeManagerStatus.DeepCopy()
m.handleUpgradeManager(upgradeManager)
// existing code
Likely invalid or redundant comment.
controller/instance_manager_controller.go (1)
589-596
:
Avoid variable shadowing with dataEngineUpgradeRequested
The variable dataEngineUpgradeRequested
is declared within the if
block and then used outside its scope.
Apply this diff to declare the variable before the if
block:
- dataEngineUpgradeRequested := false
+ var dataEngineUpgradeRequested bool
if types.IsDataEngineV2(im.Spec.DataEngine) {
requested, err := imc.ds.IsNodeDataEngineUpgradeRequested(im.Spec.NodeID)
if err != nil {
return err
}
dataEngineUpgradeRequested = requested
}
Likely invalid or redundant comment.
webhook/resources/nodedataengineupgrade/mutator.go (1)
30-41
: Include Update
operation in mutator's OperationTypes
Currently, the mutator only handles the Create
operation. However, the NodeDataEngineUpgrade
resource might require mutation during Update
operations as well.
Please verify if Update
operations should be included. If so, modify the OperationTypes
slice accordingly.
controller/upgrade_manager_controller.go (3)
268-276
: 🛠️ Refactor suggestion
Potential race condition with dataEngineUpgradeManagerMonitor
access
Access to uc.dataEngineUpgradeManagerMonitor
is not synchronized, which could lead to race conditions if accessed concurrently by multiple goroutines.
Introduce a mutex to protect access to uc.dataEngineUpgradeManagerMonitor
. Apply this diff:
+uc.monitorMutex.Lock()
+defer uc.monitorMutex.Unlock()
if uc.dataEngineUpgradeManagerMonitor != nil {
return uc.dataEngineUpgradeManagerMonitor, nil
}
// existing code
Also, ensure that all accesses to uc.dataEngineUpgradeManagerMonitor
are protected by the mutex.
Likely invalid or redundant comment.
136-142
:
Define or import handleReconcileErrorLogging
function
The function handleReconcileErrorLogging
is called but not defined in this file or imported from another package, which will result in a compilation error.
Apply this diff to define the function:
+func handleReconcileErrorLogging(log *logrus.Entry, err error, msg string) {
+ log.WithError(err).Warn(msg)
+}
func (uc *DataEngineUpgradeManagerController) handleErr(err error, key interface{}) {
// existing code
Likely invalid or redundant comment.
233-254
:
Handle type assertion failure when retrieving status data
In the reconcile
function, if the type assertion fails (if !ok
), the function logs an error but continues execution. This may lead to inconsistent state or further errors.
Consider returning after logging the error to prevent further processing with invalid data.
Apply this diff:
if !ok {
log.Errorf("Failed to assert value from dataEngineUpgradeManager monitor: %v", data)
+ return fmt.Errorf("failed to assert monitor data to DataEngineUpgradeManagerStatus")
} else {
// existing code
Likely invalid or redundant comment.
data, err := uc.nodeDataEngineUpgradeMonitor.GetCollectedData() | ||
if err != nil { | ||
log.WithError(err).Error("Failed to get collected data from nodeDataEngineUpgrade monitor") | ||
return | ||
} | ||
status, ok := data.(*longhorn.NodeDataEngineUpgradeStatus) | ||
if !ok { | ||
log.Errorf("Failed to assert value from nodeDataEngineUpgrade monitor: %v", data) | ||
return | ||
} | ||
|
||
nodeUpgrade.Status.State = status.State | ||
nodeUpgrade.Status.Message = status.Message | ||
nodeUpgrade.Status.Volumes = make(map[string]*longhorn.VolumeUpgradeStatus) | ||
for k, v := range status.Volumes { | ||
nodeUpgrade.Status.Volumes[k] = &longhorn.VolumeUpgradeStatus{ | ||
State: v.State, | ||
Message: v.Message, | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Synchronize access to nodeUpgrade.Status.Volumes
The loop modifies nodeUpgrade.Status.Volumes
without synchronization. If accessed concurrently, this could lead to a race condition.
Consider protecting access with a mutex or ensuring that this code is executed in a single-threaded context.
if err := wait.PollUntilContextCancel(m.ctx, m.syncPeriod, true, func(context.Context) (bool, error) { | ||
if err := m.run(struct{}{}); err != nil { | ||
m.logger.WithError(err).Error("Stopped monitoring upgrade monitor") | ||
} | ||
return false, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Handle errors returned by m.run()
in polling loop
In the Start
method, errors returned by m.run()
are logged, but the loop continues without handling the error. This could lead to silent failures.
Consider modifying the loop to break on critical errors or implement retry logic if appropriate.
if uc.queue.NumRequeues(key) < maxRetries { | ||
handleReconcileErrorLogging(log, err, "Failed to sync Longhorn DataEngineUpgradeManager resource") | ||
uc.queue.AddRateLimited(key) | ||
return | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use uc.maxRetries
instead of undefined maxRetries
The variable maxRetries
is used but not defined in this file, which will lead to a compilation error. In baseController
, maxRetries
is defined as uc.maxRetries
. Use uc.maxRetries
to reference the retry limit.
Apply this diff to fix the issue:
- if uc.queue.NumRequeues(key) < maxRetries {
+ if uc.queue.NumRequeues(key) < uc.maxRetries {
handleReconcileErrorLogging(log, err, "Failed to sync Longhorn DataEngineUpgradeManager resource")
uc.queue.AddRateLimited(key)
return
}
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
if uc.queue.NumRequeues(key) < maxRetries { | |
handleReconcileErrorLogging(log, err, "Failed to sync Longhorn DataEngineUpgradeManager resource") | |
uc.queue.AddRateLimited(key) | |
return | |
} | |
if uc.queue.NumRequeues(key) < uc.maxRetries { | |
handleReconcileErrorLogging(log, err, "Failed to sync Longhorn DataEngineUpgradeManager resource") | |
uc.queue.AddRateLimited(key) | |
return | |
} |
2136bc6
to
2218478
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Outside diff range and nitpick comments (5)
controller/engine_controller.go (2)
1969-1989
: Remove unused methodremoveFailedReplica
The method
removeFailedReplica
is not used anywhere in the codebase. Consider removing it to improve code maintainability.-func (ec *EngineController) removeFailedReplica(e *longhorn.Engine) error { - replicas, err := ec.ds.ListVolumeReplicasRO(e.Spec.VolumeName) - if err != nil { - return err - } - - engineClientProxy, err := ec.getEngineClientProxy(e, e.Status.CurrentImage) - if err != nil { - return errors.Wrapf(err, "failed to get the engine client %v for removing failed replica from engine", e.Name) - } - defer engineClientProxy.Close() - - for _, r := range replicas { - if r.Spec.LastFailedAt != "" { - if err := engineClientProxy.ReplicaRemove(e, "", r.Name); err != nil && !apierrors.IsNotFound(err) { - return errors.Wrapf(err, "failed to remove failed replica %v from engine", r.Name) - } - } - } - return nil -}🧰 Tools
🪛 golangci-lint (1.62.2)
1969-1969: func
(*EngineController).removeFailedReplica
is unused(unused)
Line range hint
3208-3214
: Handle engine state error during v2 data engine live upgradeThe TODO comment indicates missing error handling for non-running engine state during v2 data engine live upgrade.
} else { - // TODO: what if e.Status.CurrentState != longhorn.InstanceStateRunning + if e.Status.CurrentState == longhorn.InstanceStateError { + log.Errorf("Engine entered error state during v2 data engine live upgrade") + return fmt.Errorf("engine in error state during live upgrade") + } + if e.Status.CurrentState != longhorn.InstanceStateRunning { + log.Debugf("Engine is in %v state, waiting for running state", e.Status.CurrentState) + } }controller/volume_controller.go (2)
3225-3230
: Simplify conditional return logicThe code can be simplified by removing the unnecessary else block and flattening the structure.
if !reflect.DeepEqual(e.Spec.UpgradedReplicaAddressMap, replicaAddressMap) { e.Spec.UpgradedReplicaAddressMap = replicaAddressMap return nil -} else { - return nil } +return nil🧰 Tools
🪛 GitHub Check: CodeFactor
[notice] 3225-3225: controller/volume_controller.go#L3225
If block ends with a return statement, so drop this else and outdent its block (move short variable declaration to its own line if necessary) (indent-error-flow)
5090-5098
: Add documentation for isVolumeUpgrading functionThe function lacks documentation explaining the upgrade conditions for both v1 and v2 data engines.
+// isVolumeUpgrading checks if a volume is being upgraded based on its data engine type. +// For v1 data engine: only image update is considered +// For v2 data engine: both image update and target node changes are considered func isVolumeUpgrading(v *longhorn.Volume) bool { imageNotUpdated := v.Status.CurrentImage != v.Spec.Image if types.IsDataEngineV1(v.Spec.DataEngine) { return imageNotUpdated } return imageNotUpdated || v.Spec.TargetNodeID != v.Status.CurrentTargetNodeID }controller/upgrade_manager_controller.go (1)
233-254
: Handle type assertion failure appropriatelyIn the
reconcile
function, when retrieving status data from the monitor, it's good practice to handle type assertion failures to prevent potential panics.Consider returning an error when the type assertion fails:
if uc.dataEngineUpgradeManagerMonitor != nil { data, err := uc.dataEngineUpgradeManagerMonitor.GetCollectedData() if err != nil { log.WithError(err).Error("Failed to get data from dataEngineUpgradeManager monitor") } else { status, ok := data.(*longhorn.DataEngineUpgradeManagerStatus) if !ok { log.Errorf("Failed to assert value from dataEngineUpgradeManager monitor: %v", data) + return fmt.Errorf("failed to assert monitor data to DataEngineUpgradeManagerStatus") } else { upgradeManager.Status.InstanceManagerImage = status.InstanceManagerImage upgradeManager.Status.State = status.State // remaining code } } }
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
📒 Files selected for processing (24)
controller/controller_manager.go
(3 hunks)controller/engine_controller.go
(15 hunks)controller/instance_handler.go
(15 hunks)controller/instance_handler_test.go
(4 hunks)controller/instance_manager_controller.go
(1 hunks)controller/monitor/node_upgrade_monitor.go
(1 hunks)controller/monitor/upgrade_manager_monitor.go
(1 hunks)controller/node_controller.go
(3 hunks)controller/node_upgrade_controller.go
(1 hunks)controller/replica_controller.go
(10 hunks)controller/uninstall_controller.go
(4 hunks)controller/upgrade_manager_controller.go
(1 hunks)controller/utils.go
(0 hunks)controller/volume_controller.go
(17 hunks)controller/volume_controller_test.go
(1 hunks)scheduler/replica_scheduler.go
(1 hunks)types/types.go
(4 hunks)webhook/resources/dataengineupgrademanager/mutator.go
(1 hunks)webhook/resources/dataengineupgrademanager/validator.go
(1 hunks)webhook/resources/nodedataengineupgrade/mutator.go
(1 hunks)webhook/resources/nodedataengineupgrade/validator.go
(1 hunks)webhook/resources/volume/validator.go
(5 hunks)webhook/server/mutation.go
(2 hunks)webhook/server/validation.go
(2 hunks)
💤 Files with no reviewable changes (1)
- controller/utils.go
🚧 Files skipped from review as they are similar to previous changes (14)
- controller/controller_manager.go
- webhook/server/mutation.go
- controller/instance_manager_controller.go
- webhook/server/validation.go
- webhook/resources/volume/validator.go
- scheduler/replica_scheduler.go
- types/types.go
- controller/uninstall_controller.go
- webhook/resources/dataengineupgrademanager/validator.go
- webhook/resources/nodedataengineupgrade/validator.go
- webhook/resources/dataengineupgrademanager/mutator.go
- webhook/resources/nodedataengineupgrade/mutator.go
- controller/instance_handler_test.go
- controller/node_upgrade_controller.go
👮 Files not reviewed due to content moderation or server errors (3)
- controller/replica_controller.go
- controller/instance_handler.go
- controller/volume_controller_test.go
🧰 Additional context used
📓 Learnings (5)
controller/engine_controller.go (1)
Learnt from: derekbit
PR: longhorn/longhorn-manager#3282
File: controller/engine_controller.go:524-527
Timestamp: 2024-11-25T12:39:58.926Z
Learning: In `controller/engine_controller.go`, `e.Status.Port` is sourced from the SPDK engine and does not require additional validation.
controller/instance_handler.go (1)
Learnt from: derekbit
PR: longhorn/longhorn-manager#3282
File: controller/instance_handler.go:919-939
Timestamp: 2024-11-25T23:56:53.252Z
Learning: In the Longhorn Manager Go code, the function `engineapi.NewInstanceManagerClient` does not accept a `context.Context` parameter. Therefore, we cannot pass `ctx` to it in functions like `isInstanceExist` in `controller/instance_handler.go`.
controller/upgrade_manager_controller.go (2)
Learnt from: derekbit
PR: longhorn/longhorn-manager#3282
File: controller/upgrade_manager_controller.go:135-139
Timestamp: 2024-11-25T15:30:18.858Z
Learning: In the Longhorn Manager codebase, the variable `maxRetries` is defined in `base_controller.go`, and other controllers can use it without redefining it.
Learnt from: derekbit
PR: longhorn/longhorn-manager#3282
File: controller/node_upgrade_controller.go:135-139
Timestamp: 2024-11-26T00:03:56.311Z
Learning: In the Longhorn project, the constant `maxRetries` is defined in `base_controller.go` and should not be redefined in individual controller files.
controller/monitor/upgrade_manager_monitor.go (1)
Learnt from: derekbit
PR: longhorn/longhorn-manager#3282
File: controller/monitor/upgrade_manager_monitor.go:237-254
Timestamp: 2024-11-26T00:12:16.791Z
Learning: In the `handleUpgradeStateInitializing` function of `upgrade_manager_monitor.go`, when upgrading instance managers, we aim to upgrade old v2 instance managers rather than v1. Therefore, we should list instance managers with `longhorn.DataEngineTypeV2`.
controller/monitor/node_upgrade_monitor.go (2)
Learnt from: james-munson
PR: longhorn/longhorn-manager#3211
File: app/post_upgrade.go:102-113
Timestamp: 2024-11-10T16:45:04.898Z
Learning: In Go, when a deferred function references a variable like `err`, ensure that the variable is declared in the outer scope and not within an inner scope (such as within `if err := ...`), to prevent compilation errors and unintended variable shadowing.
Learnt from: derekbit
PR: longhorn/longhorn-manager#3282
File: controller/monitor/node_upgrade_monitor.go:351-357
Timestamp: 2024-11-25T23:55:02.080Z
Learning: `GetVolumeRO` guarantees that `volume` is non-nil when `err == nil`, so explicit nil checks after error handling are not needed.
🪛 golangci-lint (1.62.2)
controller/engine_controller.go
1969-1969: func (*EngineController).removeFailedReplica
is unused
(unused)
controller/volume_controller.go
3212-3212: SA9003: empty branch
(staticcheck)
🪛 GitHub Check: CodeFactor
controller/volume_controller.go
[notice] 3225-3225: controller/volume_controller.go#L3225
If block ends with a return statement, so drop this else and outdent its block (move short variable declaration to its own line if necessary) (indent-error-flow)
[notice] 3212-3212: controller/volume_controller.go#L3212
If block ends with a return statement, so drop this else and outdent its block. (indent-error-flow)
🔇 Additional comments (5)
controller/monitor/upgrade_manager_monitor.go (1)
61-72
: 🛠️ Refactor suggestion
Improve error handling in the polling loop
In the Start
method, the error from m.run()
is logged but not propagated to the polling function. This means the monitor will continue running even after encountering errors that might require attention.
Consider modifying the polling function to return the error, allowing the loop to handle it appropriately:
func (m *DataEngineUpgradeManagerMonitor) Start() {
m.logger.Infof("Start monitoring dataEngineUpgradeManager %v with sync period %v", m.upgradeManagerName, m.syncPeriod)
if err := wait.PollUntilContextCancel(m.ctx, m.syncPeriod, true, func(context.Context) (bool, error) {
if err := m.run(struct{}{}); err != nil {
m.logger.WithError(err).Error("Error in upgrade monitor")
+ // Return the error to potentially stop the polling loop or trigger backoff
+ return false, err
}
return false, nil
}); err != nil {
if errors.Cause(err) == context.Canceled {
m.logger.Infof("Stopped monitoring dataEngineUpgradeManager %v due to context cancellation", m.upgradeManagerName)
} else {
m.logger.WithError(err).Error("Failed to start dataEngineUpgradeManager monitor")
}
}
m.logger.Infof("Stopped monitoring dataEngineUpgradeManager %v", m.upgradeManagerName)
}
Likely invalid or redundant comment.
controller/monitor/node_upgrade_monitor.go (2)
183-201
: Proper error handling in node readiness check
The error handling and logic for checking node readiness in handleUpgradeStateInitializing
are appropriately implemented and follow best practices.
535-575
: Avoid deferring resource closures within loops to prevent resource leaks
Deferring engineClientProxy.Close()
inside a loop postpones all Close()
calls until the function returns, which can lead to resource exhaustion if the loop iterates many times.
Consider explicitly closing engineClientProxy
at the end of each loop iteration:
engineClientProxy, err := engineapi.GetCompatibleClient(engine, engineCliClient, m.ds, m.logger, m.proxyConnCounter)
if err != nil {
return err
}
- defer engineClientProxy.Close()
snapLabels := map[string]string{types.GetLonghornLabelKey(types.LonghornLabelSnapshotForDataEngineLiveUpgrade): m.nodeUpgradeName}
_, err = engineClientProxy.SnapshotCreate(engine, m.upgradeManagerName+"-"+util.RandomID(), snapLabels, freezeFilesystem)
if err != nil {
return err
}
+ engineClientProxy.Close()
}
controller/upgrade_manager_controller.go (1)
135-139
: Usage of maxRetries
is appropriate
The variable maxRetries
is correctly used and is defined in base_controller.go
, following the project conventions.
controller/node_controller.go (1)
2187-2195
: Correctly handle node schedulability during data engine upgrade
The updated logic in SetSchedulableCondition
properly marks the node as unschedulable when a data engine upgrade is in progress, ensuring that scheduling decisions respect the upgrade state.
controller/engine_controller.go
Outdated
func (ec *EngineController) findInstanceManagerAndIPs(obj interface{}) (im *longhorn.InstanceManager, initiatorIP string, targetIP string, err error) { | ||
e, ok := obj.(*longhorn.Engine) | ||
if !ok { | ||
return nil, "", "", fmt.Errorf("invalid object for engine: %v", obj) | ||
} | ||
|
||
initiatorIM, err := ec.ds.GetInstanceManagerByInstanceRO(obj, false) | ||
if err != nil { | ||
return nil, "", "", err | ||
} | ||
|
||
initiatorIP = initiatorIM.Status.IP | ||
targetIP = initiatorIM.Status.IP | ||
im = initiatorIM | ||
|
||
// Target, or called Target Instance, is on another node | ||
if e.Spec.TargetNodeID != "" { | ||
targetIM, err := ec.ds.GetInstanceManagerByInstanceRO(obj, true) | ||
if err != nil { | ||
return nil, "", "", err | ||
} | ||
|
||
targetIP = targetIM.Status.IP | ||
|
||
if !e.Status.TargetInstanceReplacementCreated && e.Status.CurrentTargetNodeID == "" { | ||
im = targetIM | ||
} | ||
} | ||
|
||
return im, initiatorIP, targetIP, nil | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Add error handling for empty IP addresses in findInstanceManagerAndIPs
The method should validate that the IP addresses are not empty before returning them.
func (ec *EngineController) findInstanceManagerAndIPs(obj interface{}) (im *longhorn.InstanceManager, initiatorIP string, targetIP string, err error) {
// ...existing code...
initiatorIP = initiatorIM.Status.IP
+ if initiatorIP == "" {
+ return nil, "", "", fmt.Errorf("initiator instance manager %v has empty IP", initiatorIM.Name)
+ }
targetIP = initiatorIM.Status.IP
+ if targetIP == "" {
+ return nil, "", "", fmt.Errorf("target instance manager %v has empty IP", initiatorIM.Name)
+ }
im = initiatorIM
// ...rest of the code...
}
Committable suggestion skipped: line range outside the PR's diff.
// No need to care about the initiator and target ports if the engine is not being upgraded. | ||
initiatorAddress := net.JoinHostPort(initiatorIP, strconv.Itoa(0)) | ||
targetAddress := net.JoinHostPort(targetIP, strconv.Itoa(e.Status.Port)) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codebase verification
Based on the gathered information, I can now provide a final response:
Port validation is required for JoinHostPort usage
The current implementation has two issues:
- The initiator port is hardcoded to 0, which is not a valid port number for network connections
- The target port (e.Status.Port) is set from engineInstance.Status.PortStart without validation, and the test helper randomPort() generates values between 0-29999, some of which are in the invalid port range
Port numbers should be validated to be within the valid range (1-65535) before using them in net.JoinHostPort. This is especially important since:
- Port 0 is a special value used for dynamic port allocation
- Ports below 1024 are privileged ports
- The current implementation may generate invalid port numbers
🔗 Analysis chain
Verify port values before using them in JoinHostPort
The port handling logic should validate the port values before using them in JoinHostPort to prevent potential issues.
Let me gather more information about the port validation in the codebase.
Let me gather more information about how the port value is set and validated.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check for potential port validation issues in the codebase
# Search for JoinHostPort usage without port validation
rg "net\.JoinHostPort.*strconv\.Itoa.*Port" -A 2 -B 2
Length of output: 2783
Script:
#!/bin/bash
# Check how e.Status.Port is set and validated
ast-grep --pattern 'e.Status.Port = $_'
# Check for port validation functions/methods in the codebase
rg -B 2 -A 2 'func.*(ValidatePort|IsValidPort|CheckPort)'
# Check the struct definition to understand the type of Port
ast-grep --pattern 'type $_ struct {
$$$
Port $_
$$$
}'
Length of output: 66019
Script:
#!/bin/bash
# Check how e.Status.Port is set and where it comes from
rg -B 3 -A 3 "e\.Status\.Port\s*=" --type go
# Check for port validation or range checks
rg -B 3 -A 3 "Port.*range|validatePort|checkPort" --type go
# Look for the randomPort function implementation since it's used in tests
ast-grep --pattern 'func randomPort() $_'
Length of output: 22186
// if types.IsDataEngineV2(v.Spec.DataEngine) { | ||
// return true | ||
// } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Uncomment v2 data engine replica cleanup code
The commented code for v2 data engine replica cleanup should be uncommented and properly implemented.
-// TODO: Remove it once we can reuse failed replicas during v2 rebuilding
-// if types.IsDataEngineV2(v.Spec.DataEngine) {
-// return true
-// }
+if types.IsDataEngineV2(v.Spec.DataEngine) {
+ return true
+}
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
// if types.IsDataEngineV2(v.Spec.DataEngine) { | |
// return true | |
// } | |
if types.IsDataEngineV2(v.Spec.DataEngine) { | |
return true | |
} |
2218478
to
ad84f31
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
🧹 Outside diff range and nitpick comments (18)
controller/instance_handler_test.go (2)
136-140
: Add test cases for target port configurationThe test engine creation now includes target-related fields, but there are no test cases verifying their behavior.
Consider adding test cases that verify:
- Target port configuration for different engine versions
- Storage IP and target IP combinations
- Edge cases when IPs are not set
75-89
: Add test coverage for new mock methodsThe new mock methods (
SuspendInstance
,ResumeInstance
,SwitchOverTarget
,DeleteTarget
) always return errors. Consider adding configurable behavior for testing different scenarios.Example implementation:
+type MockInstanceManagerHandler struct { + shouldFailSuspend bool + shouldFailResume bool + shouldFailSwitchOver bool + shouldFailDelete bool +} func (imh *MockInstanceManagerHandler) SuspendInstance(obj interface{}) error { - return fmt.Errorf("SuspendInstance is not mocked") + if imh.shouldFailSuspend { + return fmt.Errorf("SuspendInstance failed") + } + return nil }controller/replica_controller.go (1)
584-597
: Enhance error handling and logging for data engine upgrade pathThe data engine upgrade path could benefit from additional error handling and logging:
- Consider adding debug logs before and after the engine instance removal
- Add context to the error message when upgrade is requested
Apply this diff to improve error handling and logging:
if types.IsDataEngineV2(r.Spec.DataEngine) && r.Spec.FailedAt != "" { upgradeRequested, err := rc.ds.IsNodeDataEngineUpgradeRequested(r.Spec.NodeID) if err != nil { return err } if upgradeRequested { + log.Infof("Node %v has requested data engine upgrade", r.Spec.NodeID) log.Infof("Deleting failed replica instance %v from its engine instance without cleanup since the node %v is requested to upgrade data engine", r.Name, r.Spec.NodeID) err = rc.removeFailedReplicaInstanceFromEngineInstance(r) if err != nil { + log.WithError(err).Errorf("Failed to remove replica instance during upgrade") return errors.Wrapf(err, "failed to remove failed replica instance %v from engine instance", r.Name) } + log.Infof("Successfully removed replica instance during upgrade") } }engineapi/instance_manager.go (1)
532-555
: Add validation for empty addresses mapConsider adding validation for the input map and improving error messages.
Apply this diff to improve validation:
func getReplicaAddresses(replicaAddresses map[string]string, initiatorAddress, targetAddress string) (map[string]string, error) { + if len(replicaAddresses) == 0 { + return nil, errors.New("empty replica addresses map") + } + initiatorIP, _, err := net.SplitHostPort(initiatorAddress) if err != nil { - return nil, errors.New("invalid initiator address format") + return nil, errors.Wrapf(err, "invalid initiator address format: %s", initiatorAddress) } targetIP, _, err := net.SplitHostPort(targetAddress) if err != nil { - return nil, errors.New("invalid target address format") + return nil, errors.Wrapf(err, "invalid target address format: %s", targetAddress) } addresses := make(map[string]string) for name, addr := range replicaAddresses { replicaIP, _, err := net.SplitHostPort(addr) if err != nil { - return nil, errors.New("invalid replica address format") + return nil, errors.Wrapf(err, "invalid replica address format for %s: %s", name, addr) } if initiatorIP != targetIP && initiatorIP == replicaIP { continue } addresses[name] = addr } return addresses, nil }controller/backup_controller.go (1)
599-607
: Add debug logging for responsibility checkConsider adding debug logging to help troubleshoot responsibility decisions.
Apply this diff to improve logging:
node, err := bc.ds.GetNodeRO(bc.controllerID) if err != nil { return false, err } + log := bc.logger.WithField("node", bc.controllerID) if node.Spec.DataEngineUpgradeRequested { + log.Debug("Node has requested data engine upgrade, not responsible for backup") return false, nil }controller/volume_controller.go (5)
1007-1012
: Improve v2 data engine replica image handling documentationThe code correctly handles the special case where v2 volume replicas can have different images from the volume. However, this logic would benefit from additional documentation explaining why this is necessary for v2 data engine.
Add a comment explaining the v2 data engine replica image handling:
} else if r.Spec.Image != v.Status.CurrentImage { + // For a v2 volume, replica images can differ from the volume image since + // they use the instance manager image instead if types.IsDataEngineV1(v.Spec.DataEngine) { log.WithField("replica", r.Name).Warnf("Replica engine image %v is different from volume engine image %v, "+ "but replica spec.Active has been set", r.Spec.Image, v.Spec.Image) }
1827-1836
: Improve replica state management for v2 data engineThe code correctly handles replica state transitions differently for v1 and v2 data engines, but the logic could be more explicit.
Consider refactoring to make the state transition logic more explicit:
if r.Status.CurrentState == longhorn.InstanceStateStopped { - if types.IsDataEngineV1(e.Spec.DataEngine) { - if r.Spec.Image == v.Status.CurrentImage { - r.Spec.DesireState = longhorn.InstanceStateRunning - } - } else { - // For v2 volume, the image of replica is no need to be the same as the volume image - r.Spec.DesireState = longhorn.InstanceStateRunning - } + shouldStart := false + if types.IsDataEngineV1(e.Spec.DataEngine) { + shouldStart = r.Spec.Image == v.Status.CurrentImage + } else { + shouldStart = true // v2 replicas can start regardless of image + } + if shouldStart { + r.Spec.DesireState = longhorn.InstanceStateRunning + }
5090-5098
: Improve volume upgrade status check documentationThe volume upgrade status check has been enhanced to handle v2 data engine, but could use better documentation.
Add documentation to clarify the upgrade status conditions:
+// isVolumeUpgrading checks if a volume is being upgraded based on: +// 1. For v1 data engine: image mismatch between spec and status +// 2. For v2 data engine: image mismatch or target node change func isVolumeUpgrading(v *longhorn.Volume) bool { imageNotUpdated := v.Status.CurrentImage != v.Spec.Image if types.IsDataEngineV1(v.Spec.DataEngine) { return imageNotUpdated } return imageNotUpdated || v.Spec.TargetNodeID != v.Status.CurrentTargetNodeID }
3261-3285
: Improve error handling in detached volume upgradeThe detached volume upgrade handling looks good but could use more robust error handling.
Consider adding error handling for edge cases:
func (c *VolumeController) handleDetachedVolumeUpgrade(v *longhorn.Volume, e *longhorn.Engine, rs map[string]*longhorn.Replica, log logrus.FieldLogger) error { + if e == nil { + return fmt.Errorf("cannot handle detached volume upgrade: engine is nil") + } if e.Spec.Image != v.Spec.Image { e.Spec.Image = v.Spec.Image e.Spec.UpgradedReplicaAddressMap = map[string]string{} e.Spec.TargetNodeID = "" }
3305-3331
: Enhance volume eligibility checks for live upgradeThe volume eligibility checks for live upgrade are comprehensive but could use better error reporting.
Consider enhancing error reporting:
func isVolumeNotEligibleForLiveUpgrade(v *longhorn.Volume) bool { + reasons := []string{} if v.Status.State != longhorn.VolumeStateAttached { + reasons = append(reasons, "volume not attached") return true } if types.IsDataEngineV1(v.Spec.DataEngine) { if v.Status.Robustness != longhorn.VolumeRobustnessHealthy { + reasons = append(reasons, "v1 volume not healthy") return true } }controller/instance_handler.go (2)
722-778
: Add validation for target instance creationThe target instance creation logic needs additional validation to ensure proper initialization and prevent potential race conditions.
Consider adding validation before target instance creation:
if targetInstanceRequired { + // Validate target node readiness + if err := h.validateTargetNodeReadiness(obj); err != nil { + return errors.Wrapf(err, "target node not ready for instance %v", instanceName) + } logrus.Infof("Creating target instance %v", instanceName) if _, err := h.instanceManagerHandler.CreateInstance(obj, true); err != nil {
867-1001
: Consider adding unit tests for new helper functionsThe new helper functions for v2 data engine upgrade handling lack unit test coverage.
Consider adding unit tests for:
- isV2DataEngineBeingUpgraded
- isVolumeBeingSwitchedBack
- isTargetInstanceReplacementCreated
- isTargetInstanceRemote
types/types.go (1)
1271-1273
: Add validation for node ID in name generationThe node ID parameter should be validated before use in name generation.
func GenerateNodeDataEngineUpgradeName(prefix, nodeID string) string { + if nodeID == "" { + return prefix + "-" + util.RandomID() + } return prefix + "-" + nodeID + "-" + util.RandomID() }controller/uninstall_controller.go (1)
1190-1256
: Add logging for deletion progressConsider adding more detailed logging to track deletion progress and any potential issues.
func (c *UninstallController) deleteDataEngineUpgradeManagers(...) (err error) { + c.logger.Info("Starting deletion of data engine upgrade managers") defer func() { + if err != nil { + c.logger.WithError(err).Error("Failed to delete data engine upgrade managers") + } else { + c.logger.Info("Successfully deleted data engine upgrade managers") + } err = errors.Wrapf(err, "failed to delete dataEngineUpgradeManagers") }()controller/monitor/node_upgrade_monitor.go (3)
491-496
: Add structured logging for better observabilityThe logging could be improved with structured fields for better filtering and debugging.
log.WithFields(logrus.Fields{ "instanceManager": im.Name, "instanceManagerImage": im.Spec.Image, "initiatorIP": initiatorIP, "targetIP": targetIP, + "volumeName": e.Spec.VolumeName, + "dataEngine": e.Spec.DataEngine, }).Info("Creating engine instance")
634-644
: Improve error handling for target deletionThe error handling for target deletion could be improved to avoid potential stale error variable and make the logic more explicit.
if e.Status.CurrentTargetNodeID != "" { err = c.EngineInstanceDeleteTarget(&engineapi.EngineInstanceDeleteTargetRequest{ Engine: e, }) if err != nil { if !types.ErrorIsNotFound(err) { - return errors.Wrapf(err, "failed to delete target for engine %v", e.Name) + return fmt.Errorf("failed to delete target instance: %v", err) } - log.WithError(err).Warnf("Failed to delete target for engine %v", e.Name) + // Log the not found error but continue + log.Warnf("Target instance not found during deletion: %v", err) } }
Line range hint
1969-1989
: Remove unused function removeFailedReplicaThe function
removeFailedReplica
is not used anywhere in the codebase.Consider removing this unused function to improve code maintainability, or if it's intended for future use, add a TODO comment explaining its purpose.
controller/engine_controller.go (1)
Line range hint
2588-2645
: Improve node delinquency handlingThe node delinquency handling could be improved by adding retries and better error handling.
isOwnerNodeDelinquent, err := ec.ds.IsNodeDelinquent(e.Status.OwnerID, e.Spec.VolumeName) if err != nil { - return false, err + return false, errors.Wrapf(err, "failed to check delinquency for owner node %v", e.Status.OwnerID) } isSpecNodeDelinquent, err := ec.ds.IsNodeDelinquent(e.Spec.NodeID, e.Spec.VolumeName) if err != nil { - return false, err + return false, errors.Wrapf(err, "failed to check delinquency for spec node %v", e.Spec.NodeID) }
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
⛔ Files ignored due to path filters (8)
go.mod
is excluded by!go.mod
go.sum
is excluded by!**/*.sum
,!go.sum
vendor/github.com/longhorn/go-spdk-helper/pkg/types/types.go
is excluded by!vendor/**
vendor/github.com/longhorn/longhorn-instance-manager/pkg/api/instance.go
is excluded by!vendor/**
vendor/github.com/longhorn/longhorn-instance-manager/pkg/client/proxy_replica.go
is excluded by!vendor/**
vendor/github.com/longhorn/types/pkg/generated/imrpc/instance.pb.go
is excluded by!**/*.pb.go
,!**/generated/**
,!vendor/**
vendor/github.com/longhorn/types/pkg/generated/spdkrpc/spdk.pb.go
is excluded by!**/*.pb.go
,!**/generated/**
,!vendor/**
vendor/modules.txt
is excluded by!vendor/**
📒 Files selected for processing (56)
controller/backup_controller.go
(1 hunks)controller/controller_manager.go
(3 hunks)controller/engine_controller.go
(15 hunks)controller/instance_handler.go
(15 hunks)controller/instance_handler_test.go
(4 hunks)controller/instance_manager_controller.go
(1 hunks)controller/monitor/node_upgrade_monitor.go
(1 hunks)controller/monitor/upgrade_manager_monitor.go
(1 hunks)controller/node_controller.go
(3 hunks)controller/node_upgrade_controller.go
(1 hunks)controller/replica_controller.go
(10 hunks)controller/uninstall_controller.go
(4 hunks)controller/upgrade_manager_controller.go
(1 hunks)controller/utils.go
(0 hunks)controller/volume_controller.go
(17 hunks)controller/volume_controller_test.go
(1 hunks)datastore/datastore.go
(3 hunks)datastore/longhorn.go
(5 hunks)engineapi/engine.go
(1 hunks)engineapi/enginesim.go
(1 hunks)engineapi/enginesim_test.go
(1 hunks)engineapi/instance_manager.go
(5 hunks)engineapi/instance_manager_test.go
(1 hunks)engineapi/proxy_replica.go
(1 hunks)engineapi/types.go
(1 hunks)k8s/crds.yaml
(84 hunks)k8s/pkg/apis/longhorn/v1beta2/dataengineupgrademanager.go
(1 hunks)k8s/pkg/apis/longhorn/v1beta2/instancemanager.go
(6 hunks)k8s/pkg/apis/longhorn/v1beta2/node.go
(2 hunks)k8s/pkg/apis/longhorn/v1beta2/nodedataengineupgrade.go
(1 hunks)k8s/pkg/apis/longhorn/v1beta2/register.go
(1 hunks)k8s/pkg/apis/longhorn/v1beta2/volume.go
(2 hunks)k8s/pkg/apis/longhorn/v1beta2/zz_generated.deepcopy.go
(4 hunks)k8s/pkg/client/clientset/versioned/typed/longhorn/v1beta2/dataengineupgrademanager.go
(1 hunks)k8s/pkg/client/clientset/versioned/typed/longhorn/v1beta2/fake/fake_dataengineupgrademanager.go
(1 hunks)k8s/pkg/client/clientset/versioned/typed/longhorn/v1beta2/fake/fake_longhorn_client.go
(2 hunks)k8s/pkg/client/clientset/versioned/typed/longhorn/v1beta2/fake/fake_nodedataengineupgrade.go
(1 hunks)k8s/pkg/client/clientset/versioned/typed/longhorn/v1beta2/generated_expansion.go
(2 hunks)k8s/pkg/client/clientset/versioned/typed/longhorn/v1beta2/longhorn_client.go
(3 hunks)k8s/pkg/client/clientset/versioned/typed/longhorn/v1beta2/nodedataengineupgrade.go
(1 hunks)k8s/pkg/client/informers/externalversions/generic.go
(2 hunks)k8s/pkg/client/informers/externalversions/longhorn/v1beta2/dataengineupgrademanager.go
(1 hunks)k8s/pkg/client/informers/externalversions/longhorn/v1beta2/interface.go
(4 hunks)k8s/pkg/client/informers/externalversions/longhorn/v1beta2/nodedataengineupgrade.go
(1 hunks)k8s/pkg/client/listers/longhorn/v1beta2/dataengineupgrademanager.go
(1 hunks)k8s/pkg/client/listers/longhorn/v1beta2/expansion_generated.go
(2 hunks)k8s/pkg/client/listers/longhorn/v1beta2/nodedataengineupgrade.go
(1 hunks)scheduler/replica_scheduler.go
(1 hunks)types/types.go
(4 hunks)webhook/resources/dataengineupgrademanager/mutator.go
(1 hunks)webhook/resources/dataengineupgrademanager/validator.go
(1 hunks)webhook/resources/nodedataengineupgrade/mutator.go
(1 hunks)webhook/resources/nodedataengineupgrade/validator.go
(1 hunks)webhook/resources/volume/validator.go
(5 hunks)webhook/server/mutation.go
(2 hunks)webhook/server/validation.go
(2 hunks)
💤 Files with no reviewable changes (1)
- controller/utils.go
✅ Files skipped from review due to trivial changes (2)
- k8s/pkg/client/listers/longhorn/v1beta2/nodedataengineupgrade.go
- k8s/pkg/client/informers/externalversions/longhorn/v1beta2/nodedataengineupgrade.go
🚧 Files skipped from review as they are similar to previous changes (29)
- k8s/pkg/client/clientset/versioned/typed/longhorn/v1beta2/generated_expansion.go
- k8s/pkg/apis/longhorn/v1beta2/node.go
- controller/controller_manager.go
- k8s/pkg/apis/longhorn/v1beta2/register.go
- engineapi/types.go
- engineapi/proxy_replica.go
- webhook/server/validation.go
- engineapi/enginesim.go
- k8s/pkg/apis/longhorn/v1beta2/volume.go
- engineapi/enginesim_test.go
- engineapi/engine.go
- k8s/pkg/client/clientset/versioned/typed/longhorn/v1beta2/fake/fake_longhorn_client.go
- k8s/pkg/client/informers/externalversions/longhorn/v1beta2/interface.go
- webhook/server/mutation.go
- controller/instance_manager_controller.go
- webhook/resources/volume/validator.go
- k8s/pkg/client/clientset/versioned/typed/longhorn/v1beta2/longhorn_client.go
- webhook/resources/dataengineupgrademanager/mutator.go
- scheduler/replica_scheduler.go
- datastore/datastore.go
- engineapi/instance_manager_test.go
- k8s/pkg/apis/longhorn/v1beta2/instancemanager.go
- k8s/pkg/client/informers/externalversions/longhorn/v1beta2/dataengineupgrademanager.go
- webhook/resources/nodedataengineupgrade/validator.go
- k8s/pkg/apis/longhorn/v1beta2/nodedataengineupgrade.go
- controller/monitor/upgrade_manager_monitor.go
- k8s/pkg/client/clientset/versioned/typed/longhorn/v1beta2/dataengineupgrademanager.go
- k8s/pkg/client/clientset/versioned/typed/longhorn/v1beta2/fake/fake_dataengineupgrademanager.go
- k8s/pkg/client/clientset/versioned/typed/longhorn/v1beta2/fake/fake_nodedataengineupgrade.go
🧰 Additional context used
📓 Learnings (7)
webhook/resources/dataengineupgrademanager/validator.go (1)
Learnt from: derekbit
PR: longhorn/longhorn-manager#3282
File: webhook/resources/dataengineupgrademanager/validator.go:41-53
Timestamp: 2024-11-26T00:01:13.256Z
Learning: In `webhook/resources/dataengineupgrademanager/validator.go` of the Longhorn project, when validating `DataEngineUpgradeManager` creation, the `upgradeManager.Spec.Nodes` field can be empty, indicating that all available nodes should be upgraded.
controller/engine_controller.go (1)
Learnt from: derekbit
PR: longhorn/longhorn-manager#3282
File: controller/engine_controller.go:524-527
Timestamp: 2024-11-25T12:39:58.926Z
Learning: In `controller/engine_controller.go`, `e.Status.Port` is sourced from the SPDK engine and does not require additional validation.
webhook/resources/nodedataengineupgrade/mutator.go (1)
Learnt from: derekbit
PR: longhorn/longhorn-manager#3282
File: webhook/resources/nodedataengineupgrade/mutator.go:30-41
Timestamp: 2024-11-26T00:00:10.274Z
Learning: For the `NodeDataEngineUpgrade` mutator, only the `CREATE` operation is required; `UPDATE` is not used.
controller/upgrade_manager_controller.go (2)
Learnt from: derekbit
PR: longhorn/longhorn-manager#3282
File: controller/upgrade_manager_controller.go:135-139
Timestamp: 2024-11-25T15:30:18.858Z
Learning: In the Longhorn Manager codebase, the variable `maxRetries` is defined in `base_controller.go`, and other controllers can use it without redefining it.
Learnt from: derekbit
PR: longhorn/longhorn-manager#3282
File: controller/node_upgrade_controller.go:135-139
Timestamp: 2024-11-26T00:03:56.311Z
Learning: In the Longhorn project, the constant `maxRetries` is defined in `base_controller.go` and should not be redefined in individual controller files.
controller/node_upgrade_controller.go (1)
Learnt from: derekbit
PR: longhorn/longhorn-manager#3282
File: controller/node_upgrade_controller.go:135-139
Timestamp: 2024-11-26T00:03:56.311Z
Learning: In the Longhorn project, the constant `maxRetries` is defined in `base_controller.go` and should not be redefined in individual controller files.
controller/instance_handler.go (1)
Learnt from: derekbit
PR: longhorn/longhorn-manager#3282
File: controller/instance_handler.go:919-939
Timestamp: 2024-11-25T23:56:53.252Z
Learning: In the Longhorn Manager Go code, the function `engineapi.NewInstanceManagerClient` does not accept a `context.Context` parameter. Therefore, we cannot pass `ctx` to it in functions like `isInstanceExist` in `controller/instance_handler.go`.
controller/monitor/node_upgrade_monitor.go (2)
Learnt from: james-munson
PR: longhorn/longhorn-manager#3211
File: app/post_upgrade.go:102-113
Timestamp: 2024-11-10T16:45:04.898Z
Learning: In Go, when a deferred function references a variable like `err`, ensure that the variable is declared in the outer scope and not within an inner scope (such as within `if err := ...`), to prevent compilation errors and unintended variable shadowing.
Learnt from: derekbit
PR: longhorn/longhorn-manager#3282
File: controller/monitor/node_upgrade_monitor.go:351-357
Timestamp: 2024-11-25T23:55:02.080Z
Learning: `GetVolumeRO` guarantees that `volume` is non-nil when `err == nil`, so explicit nil checks after error handling are not needed.
🪛 golangci-lint (1.62.2)
controller/engine_controller.go
1969-1969: func (*EngineController).removeFailedReplica
is unused
(unused)
controller/volume_controller.go
3212-3212: SA9003: empty branch
(staticcheck)
🪛 GitHub Check: CodeFactor
controller/volume_controller.go
[notice] 3225-3225: controller/volume_controller.go#L3225
If block ends with a return statement, so drop this else and outdent its block (move short variable declaration to its own line if necessary) (indent-error-flow)
[notice] 3212-3212: controller/volume_controller.go#L3212
If block ends with a return statement, so drop this else and outdent its block. (indent-error-flow)
🔇 Additional comments (44)
k8s/pkg/apis/longhorn/v1beta2/dataengineupgrademanager.go (4)
6-15
: Structure of DataEngineUpgradeManagerSpec
is well-defined and clear
The fields DataEngine
and Nodes
are appropriately documented and annotated. The usage of Kubernetes builder annotations ensures proper validation and code generation.
18-23
: UpgradeNodeStatus
struct is concise and correctly structured
The fields State
and Message
adequately capture the status of individual node upgrades. Optional annotations are correctly used.
26-39
: DataEngineUpgradeManagerStatus
fields cover necessary status information
The status struct includes all relevant fields to monitor the upgrade process, such as State
, Message
, UpgradingNode
, and UpgradeNodes
. The use of a map for UpgradeNodes
allows for efficient tracking of each node's status.
41-66
: Kubernetes resource annotations are properly defined for CRD generation
The annotations provide helpful metadata, including short names and print columns, which enhance the usability of the resource when interacting with Kubernetes tools.
webhook/resources/dataengineupgrademanager/validator.go (4)
47-54
: Ensure cluster size validation aligns with upgrade requirements
The check for at least two nodes in the cluster is appropriate for live upgrades. Confirm that this constraint is necessary and documented, as single-node clusters are common in some environments.
If single-node clusters are expected to support live upgrades in the future, consider handling this scenario or providing guidance to users.
56-59
: Validation of DataEngine
field is consistent
Ensuring that only supported data engines are accepted maintains system integrity. The error message appropriately informs the user when an unsupported data engine is specified.
79-81
: Immutability of spec.DataEngine
field is correctly enforced
The validator correctly prevents changes to the DataEngine
field during updates, ensuring consistency of the upgrade process.
83-85
: Immutability of spec.Nodes
field is correctly enforced
By preventing changes to the Nodes
field during updates, the validator ensures that the scope of the upgrade remains consistent throughout the process.
k8s/pkg/client/listers/longhorn/v1beta2/dataengineupgrademanager.go (1)
17-17
: Auto-generated code—no review necessary
This file is auto-generated by lister-gen and should not be manually edited. It correctly provides the lister interfaces for DataEngineUpgradeManager
.
datastore/longhorn.go (6)
5641-5663
: CreateDataEngineUpgradeManager
method follows established patterns
The creation method for DataEngineUpgradeManager
is consistent with existing resource creation functions. It includes verification steps to ensure the resource is properly created.
5666-5678
: Accessor methods for DataEngineUpgradeManager
are appropriately implemented
The GetDataEngineUpgradeManager
and GetDataEngineUpgradeManagerRO
methods provide both mutable and read-only access, following the datastore's conventions.
5680-5690
: UpdateDataEngineUpgradeManager
method correctly updates the resource
The update function ensures that changes to the DataEngineUpgradeManager
are saved and verified, maintaining data integrity.
5692-5702
: Status update method for DataEngineUpgradeManager
is properly defined
Updating the status subresource separately aligns with Kubernetes best practices, allowing for controlled updates of the resource's state.
5753-5763
: Owner references for DataEngineUpgradeManager
are correctly set
Setting the owner references ensures proper garbage collection and resource hierarchy within Kubernetes.
5997-6005
: IsNodeDataEngineUpgradeRequested
method correctly retrieves upgrade state
The function accurately checks whether a data engine upgrade has been requested for a specific node.
webhook/resources/nodedataengineupgrade/mutator.go (2)
43-49
: Nil check added for newObj
in Create
method
The addition of a nil check for newObj
enhances robustness by preventing potential nil pointer dereferences.
51-78
: Mutation logic correctly applies labels and finalizers
The mutate
function adds necessary labels and finalizer to the NodeDataEngineUpgrade
resource, ensuring proper identification and lifecycle management.
k8s/pkg/client/listers/longhorn/v1beta2/expansion_generated.go (1)
77-83
: LGTM: New expansion interfaces follow Kubernetes conventions
The new expansion interfaces for DataEngineUpgradeManager
and NodeDataEngineUpgrade
resources are correctly structured, following standard Kubernetes code generation patterns.
Also applies to: 117-123
k8s/pkg/client/clientset/versioned/typed/longhorn/v1beta2/nodedataengineupgrade.go (4)
33-51
: LGTM: NodeDataEngineUpgrade client interface is complete
The interface correctly defines all necessary CRUD operations with proper context and options handling.
67-78
: LGTM: Read operations are properly implemented
The Get, List, and Watch operations are correctly implemented with proper timeout handling and parameter versioning.
Also applies to: 80-95, 97-110
112-123
: LGTM: Write operations are properly implemented
The Create, Update, and UpdateStatus operations are correctly implemented with proper error handling and result processing.
Also applies to: 125-137, 139-153
155-164
: LGTM: Delete and Patch operations are properly implemented
The Delete, DeleteCollection, and Patch operations are correctly implemented with proper error handling and timeout management.
Also applies to: 166-180, 182-195
k8s/pkg/client/informers/externalversions/generic.go (1)
103-104
: LGTM: New resource types are properly integrated
The dataengineupgrademanagers
and nodedataengineupgrades
resources are correctly added to the v1beta2 group with proper informer factory methods.
Also applies to: 113-114
controller/upgrade_manager_controller.go (4)
135-139
: 🛠️ Refactor suggestion
Use uc.maxRetries
instead of undefined maxRetries
The variable maxRetries
is used but not defined. Based on the learnings, this constant is defined in base_controller.go
.
Apply this diff to use the correct variable:
-if uc.queue.NumRequeues(key) < maxRetries {
+if uc.queue.NumRequeues(key) < uc.maxRetries {
Likely invalid or redundant comment.
202-209
: 🛠️ Refactor suggestion
Verify cleanup before removing finalizer
The finalizer removal should verify that all resources are properly cleaned up.
Add cleanup verification:
if !upgradeManager.DeletionTimestamp.IsZero() {
+ // Ensure all resources are cleaned up
+ if err := uc.verifyCleanup(upgradeManager); err != nil {
+ return err
+ }
if uc.dataEngineUpgradeManagerMonitor != nil {
uc.dataEngineUpgradeManagerMonitor.Close()
uc.dataEngineUpgradeManagerMonitor = nil
}
return uc.ds.RemoveFinalizerForDataEngineUpgradeManager(upgradeManager)
}
Likely invalid or redundant comment.
267-279
:
Add synchronization for monitor access
The dataEngineUpgradeManagerMonitor
field is accessed from multiple goroutines without synchronization, which could lead to race conditions.
Add mutex protection:
type DataEngineUpgradeManagerController struct {
*baseController
+ monitorLock sync.RWMutex
// ... other fields ...
dataEngineUpgradeManagerMonitor monitor.Monitor
// ... remaining fields ...
}
func (uc *DataEngineUpgradeManagerController) createDataEngineUpgradeManagerMonitor(upgradeManager *longhorn.DataEngineUpgradeManager) (monitor.Monitor, error) {
+ uc.monitorLock.Lock()
+ defer uc.monitorLock.Unlock()
if uc.dataEngineUpgradeManagerMonitor != nil {
return uc.dataEngineUpgradeManagerMonitor, nil
}
Likely invalid or redundant comment.
233-254
:
Add nil check for monitor data
The code doesn't check if the data returned from GetCollectedData
is nil before type assertion, which could lead to a panic.
Apply this diff to add proper nil checks:
if uc.dataEngineUpgradeManagerMonitor != nil {
data, err := uc.dataEngineUpgradeManagerMonitor.GetCollectedData()
if err != nil {
log.WithError(err).Error("Failed to get data from dataEngineUpgradeManager monitor")
- } else {
+ } else if data == nil {
+ log.Error("Received nil data from dataEngineUpgradeManager monitor")
+ } else {
status, ok := data.(*longhorn.DataEngineUpgradeManagerStatus)
if !ok {
log.Errorf("Failed to assert value from dataEngineUpgradeManager monitor: %v", data)
Likely invalid or redundant comment.
controller/replica_controller.go (1)
661-684
: Implement empty interface methods
The newly added methods (SuspendInstance
, ResumeInstance
, SwitchOverTarget
, DeleteTarget
, RequireRemoteTargetInstance
, IsEngine
) have empty implementations. These methods are part of the controller interface and should be properly implemented.
engineapi/instance_manager.go (2)
283-292
: LGTM: Instance process status fields properly updated
The new fields for target port management are correctly added to the instance process status.
864-951
: LGTM: Well-structured instance management methods
The new instance management methods are well-structured with:
- Proper nil checks for engine
- Clear error messages
- Consistent handling of different data engine types
controller/volume_controller.go (3)
5073-5075
:
Uncomment v2 data engine replica cleanup code
The commented code for v2 data engine replica cleanup should be uncommented and properly implemented.
-// TODO: Remove it once we can reuse failed replicas during v2 rebuilding
-// if types.IsDataEngineV2(v.Spec.DataEngine) {
-// return true
-// }
+if types.IsDataEngineV2(v.Spec.DataEngine) {
+ return true
+}
Likely invalid or redundant comment.
3212-3214
:
Handle engine state error during v2 data engine live upgrade
The TODO comment indicates missing error handling for non-running engine state during v2 data engine live upgrade.
Apply this fix to handle error states:
} else {
- // TODO: what if e.Status.CurrentState != longhorn.InstanceStateRunning
+ if e.Status.CurrentState == longhorn.InstanceStateError {
+ log.Errorf("Engine entered error state during v2 data engine live upgrade")
+ return fmt.Errorf("engine in error state during live upgrade")
+ }
+ if e.Status.CurrentState != longhorn.InstanceStateRunning {
+ log.Debugf("Engine is in %v state, waiting for running state", e.Status.CurrentState)
+ }
}
Likely invalid or redundant comment.
🧰 Tools
🪛 golangci-lint (1.62.2)
3212-3212: SA9003: empty branch
(staticcheck)
🪛 GitHub Check: CodeFactor
[notice] 3212-3212: controller/volume_controller.go#L3212
If block ends with a return statement, so drop this else and outdent its block. (indent-error-flow)
3184-3252
: Verify v2 data engine live upgrade implementation
The v2 data engine live upgrade implementation looks correct but requires careful verification of the upgrade flow.
Run the following script to verify the upgrade flow:
✅ Verification successful
Let me gather more information about the upgrade implementation to verify the flow comprehensively.
Let me gather more specific information about the v2 data engine live upgrade completion check and state transitions.
Based on the gathered information, I can now provide a final response about the v2 data engine live upgrade implementation.
The v2 data engine live upgrade implementation is correctly handling state transitions and node changes
The implementation follows a safe upgrade sequence:
- Validates target node changes and prevents invalid transitions
- Suspends the engine before target node migration
- Handles replica address mapping during the transition
- Resumes the engine only after successful target migration
- Updates volume status to reflect the new target node
The code includes proper state checks and logging for each transition phase, making it robust and maintainable.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Verify v2 data engine live upgrade implementation by checking:
# 1. Engine state transitions
# 2. Target node changes
# 3. Replica handling
# Check for any engine state transition issues during upgrades
rg "engine.*state.*during.*upgrade" -A 5
# Check for target node changes during upgrades
rg "TargetNodeID.*upgrade" -A 5
# Look for any error handling patterns
ast-grep --pattern 'if $_.Status.CurrentState == longhorn.InstanceStateError {
$$$
}'
Length of output: 2397
Script:
#!/bin/bash
# Check the implementation of isV2DataEngineLiveUpgradeCompleted
ast-grep --pattern 'func isV2DataEngineLiveUpgradeCompleted($_) {
$$$
}'
# Check for volume state validations during upgrade
rg "v2.*data.*engine.*upgrade" -A 5
# Check for replica address map construction during upgrade
ast-grep --pattern 'func ($_.*)constructReplicaAddressMap($_, $_, $_) {
$$$
}'
# Look for any upgrade-related error handling
rg "error.*upgrade" -A 3
Length of output: 34902
Script:
#!/bin/bash
# Check the implementation of isV2DataEngineLiveUpgradeCompleted
ast-grep --pattern 'func isV2DataEngineLiveUpgradeCompleted($v) {
$$$
}'
# Check for state transitions in volume controller
ast-grep --pattern 'func ($_.*)syncVolumeStatusWithEngineState($_, $_, $_) {
$$$
}'
# Check for volume status during upgrade
rg "CurrentState.*InstanceState(Running|Suspended)" -A 3 controller/volume_controller.go
# Check for volume target node handling
rg "CurrentTargetNodeID.*Spec.TargetNodeID" -A 3 controller/volume_controller.go
Length of output: 4294
🧰 Tools
🪛 golangci-lint (1.62.2)
3212-3212: SA9003: empty branch
(staticcheck)
🪛 GitHub Check: CodeFactor
[notice] 3225-3225: controller/volume_controller.go#L3225
If block ends with a return statement, so drop this else and outdent its block (move short variable declaration to its own line if necessary) (indent-error-flow)
[notice] 3212-3212: controller/volume_controller.go#L3212
If block ends with a return statement, so drop this else and outdent its block. (indent-error-flow)
controller/instance_handler.go (2)
58-165
: 🛠️ Refactor suggestion
Enhance error handling in syncStatusIPsAndPorts
The error handling in the target instance manager client operations could be improved to ensure proper cleanup and resource management.
targetClient, err := engineapi.NewInstanceManagerClient(targetIM, false)
if err != nil {
+ if targetClient != nil {
+ targetClient.Close()
+ }
logrus.WithError(err).Errorf("Failed to get instance manager client for target instance manager %v", targetIM.Name)
return
}
defer targetClient.Close()
Likely invalid or redundant comment.
214-240
:
Verify node upgrade status handling
The error handling for node upgrade status checks needs improvement. Currently, it logs and returns on error, which could leave the instance in an inconsistent state.
Apply this diff to improve error handling:
upgradeRequested, err := h.ds.IsNodeDataEngineUpgradeRequested(spec.NodeID)
if err != nil {
- // TODO: should we return here or mark the instance as error?
- logrus.WithError(err).Errorf("Failed to check if node %v is being upgrade requested", spec.NodeID)
- return
+ logrus.WithError(err).Errorf("Failed to check if node %v is being upgrade requested, marking instance as error", spec.NodeID)
+ status.CurrentState = longhorn.InstanceStateError
+ return
}
Likely invalid or redundant comment.
types/types.go (2)
43-44
: LGTM! New kind constants for data engine upgrade
The new constants follow the established naming convention and are properly defined.
163-164
: LGTM! New label constants for data engine upgrade
The label constants are consistent with existing label naming patterns.
controller/uninstall_controller.go (2)
49-50
: LGTM! New CRD name constants
The CRD name constants follow the established naming pattern.
653-665
: LGTM! Resource cleanup for data engine upgrades
The cleanup logic for data engine upgrade resources is properly integrated into the deletion workflow.
controller/node_controller.go (1)
2183-2205
: LGTM: Clean implementation of data engine upgrade scheduling
The changes correctly handle node scheduling during data engine upgrades by:
- Preserving existing cordoned node logic
- Adding data engine upgrade as a new reason to disable scheduling
- Using appropriate condition types and messages
controller/monitor/node_upgrade_monitor.go (3)
75-80
:
Improve error handling in polling loop
The error from m.run()
is logged but not propagated to the polling function. This means the monitor will continue running even after encountering errors that might require attention.
Apply this diff to improve error handling:
if err := wait.PollUntilContextCancel(m.ctx, m.syncPeriod, true, func(context.Context) (bool, error) {
if err := m.run(struct{}{}); err != nil {
m.logger.WithError(err).Error("Stopped monitoring upgrade monitor")
+ // Return error to potentially trigger backoff or retry logic
+ return false, err
}
return false, nil
Likely invalid or redundant comment.
1044-1080
: 🛠️ Refactor suggestion
Add backoff mechanism for error handling
The error handling in the monitor sync could benefit from a backoff mechanism to prevent rapid retries.
Add backoff for retries:
if types.IsDataEngineV2(engine.Spec.DataEngine) && err != nil && apierrors.IsNotFound(errors.Cause(err)) {
+ if m.backoff.IsInBackOffSinceUpdate(engine.Name, time.Now()) {
+ m.logger.Debugf("Engine %v is in backoff period, skipping sync", engine.Name)
+ return false
+ }
upgrading, upgradingCheckErr := m.ds.IsNodeDataEngineUpgradeRequested(engine.Spec.NodeID)
if upgrading {
updated, updatedCheckErr := m.isInstanceManagerUpdated(engine)
if updated {
// ... existing code ...
} else {
m.logger.Warnf("v2 data engine %v is being upgraded, will retry updating status later", engine.Name)
+ m.backoff.Next(engine.Name, time.Now())
return false
}
}
}
Likely invalid or redundant comment.
524-527
: Validate port values before using them in JoinHostPort
The port handling logic should validate the port values before using them in JoinHostPort to prevent potential issues.
k8s/crds.yaml (1)
Line range hint 1-4584
: Skipping review as requested
As per @derekbit's request, skipping the review of k8s/crds.yaml
.
if uc.nodeDataEngineUpgradeMonitor != nil { | ||
uc.updateNodeDataEngineUpgradeStatus(nodeUpgrade) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add error handling for updateNodeDataEngineUpgradeStatus
The updateNodeDataEngineUpgradeStatus
call is not checked for errors, which could mask issues during the upgrade process.
Add error handling:
if uc.nodeDataEngineUpgradeMonitor != nil {
- uc.updateNodeDataEngineUpgradeStatus(nodeUpgrade)
+ if err := uc.updateNodeDataEngineUpgradeStatus(nodeUpgrade); err != nil {
+ log.WithError(err).Error("Failed to update node data engine upgrade status")
+ return err
+ }
}
Committable suggestion skipped: line range outside the PR's diff.
e.Status.TargetIP = "" | ||
e.Status.StorageIP = "" | ||
e.Status.StorageTargetIP = "" | ||
e.Status.Port = 0 | ||
e.Status.TargetPort = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codebase verification
Fields TargetIP, StorageTargetIP, and TargetPort are not defined in the data structures
The test is setting fields that are not defined in either the EngineStatus or InstanceStatus structs:
TargetIP
is not definedStorageTargetIP
is not definedTargetPort
is not defined (onlyPort
exists)
The test should be updated to use only the fields that are actually defined in the data structures:
IP
andStorageIP
for IP-related fieldsPort
for port-related fields
🔗 Analysis chain
Verify field definitions match the data structures
The test is setting fields that may not be properly defined in the codebase:
TargetIP
StorageTargetIP
TargetPort
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Verify field definitions in engine and instance manager types
ast-grep --pattern $'type (EngineStatus|InstanceStatus) struct {
$$$
}' | grep -E 'TargetIP|StorageTargetIP|TargetPort'
Length of output: 122
Script:
#!/bin/bash
# Let's try a different approach to find the struct definitions
# First, let's find all files that might contain these types
fd -e go | xargs rg -l "type.*EngineStatus"
# Then let's examine the specific fields in the struct
rg "type\s+EngineStatus\s+struct" -A 20
Length of output: 4197
Script:
#!/bin/bash
# Let's check the InstanceStatus struct since EngineStatus embeds it
rg "type\s+InstanceStatus\s+struct" -A 15
Length of output: 7233
// DeepCopyInto is an autogenerated deepcopy function, copying the receiver, writing into out. in must be non-nil. | ||
func (in *UpgradeNodeStatus) DeepCopyInto(out *UpgradeNodeStatus) { | ||
*out = *in | ||
return | ||
} | ||
|
||
// DeepCopy is an autogenerated deepcopy function, copying the receiver, creating a new UpgradeNodeStatus. | ||
func (in *UpgradeNodeStatus) DeepCopy() *UpgradeNodeStatus { | ||
if in == nil { | ||
return nil | ||
} | ||
out := new(UpgradeNodeStatus) | ||
in.DeepCopyInto(out) | ||
return out | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DeepCopy implementation is incorrect for UpgradeNodeStatus
The current DeepCopy implementation is incomplete. The UpgradeNodeStatus
struct contains a State
field of type UpgradeState
(string) and a Message
field (string), but the current implementation only does a shallow copy. While strings are immutable in Go and a shallow copy is sufficient for them, the implementation should explicitly copy these fields for consistency and maintainability.
The correct implementation should be:
func (in *UpgradeNodeStatus) DeepCopyInto(out *UpgradeNodeStatus) {
*out = *in
out.State = in.State
out.Message = in.Message
}
// DeepCopyInto is an autogenerated deepcopy function, copying the receiver, writing into out. in must be non-nil. | ||
func (in *VolumeUpgradeStatus) DeepCopyInto(out *VolumeUpgradeStatus) { | ||
*out = *in | ||
return | ||
} | ||
|
||
// DeepCopy is an autogenerated deepcopy function, copying the receiver, creating a new VolumeUpgradeStatus. | ||
func (in *VolumeUpgradeStatus) DeepCopy() *VolumeUpgradeStatus { | ||
if in == nil { | ||
return nil | ||
} | ||
out := new(VolumeUpgradeStatus) | ||
in.DeepCopyInto(out) | ||
return out | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The DeepCopy implementation is incorrect and needs to be fixed
The VolumeUpgradeStatus
struct contains two fields: State
of type UpgradeState
and Message
of type string
, but the current DeepCopyInto implementation is just doing a shallow copy with *out = *in
. While UpgradeState
is a string type (which is safe for shallow copy), the implementation should still copy both fields explicitly for maintainability and consistency with other generated deepcopy code.
The correct implementation should be:
func (in *VolumeUpgradeStatus) DeepCopyInto(out *VolumeUpgradeStatus) {
*out = *in
out.State = in.State
out.Message = in.Message
}
controller/engine_controller.go
Outdated
// Check if the initiator instance is running | ||
im, err := ec.ds.GetRunningInstanceManagerByNodeRO(e.Spec.NodeID, longhorn.DataEngineTypeV2) | ||
if err != nil { | ||
return err | ||
} | ||
if im.Status.CurrentState != longhorn.InstanceManagerStateRunning { | ||
return fmt.Errorf("instance manager %v for initiating instance %v is not running", im.Name, e.Name) | ||
} | ||
|
||
initiatorIMClient, err := engineapi.NewInstanceManagerClient(im, false) | ||
if err != nil { | ||
return err | ||
} | ||
defer initiatorIMClient.Close() | ||
|
||
if _, err := initiatorIMClient.InstanceGet(e.Spec.DataEngine, e.Name, string(longhorn.InstanceManagerTypeEngine)); err != nil { | ||
return err | ||
} | ||
|
||
_, ok := im.Status.InstanceEngines[e.Name] | ||
if !ok { | ||
return fmt.Errorf("initiator instance %v is not found in engine list", e.Name) | ||
} | ||
|
||
// Check whether the target instance is existing | ||
im, err = ec.ds.GetRunningInstanceManagerByNodeRO(e.Spec.TargetNodeID, longhorn.DataEngineTypeV2) | ||
if err != nil { | ||
return err | ||
} | ||
if im.Status.CurrentState != longhorn.InstanceManagerStateRunning { | ||
return fmt.Errorf("instance manager %v for target instance %v is not running", im.Name, e.Name) | ||
} | ||
|
||
targetIMClient, err := engineapi.NewInstanceManagerClient(im, false) | ||
if err != nil { | ||
return err | ||
} | ||
defer targetIMClient.Close() | ||
|
||
if _, err := targetIMClient.InstanceGet(e.Spec.DataEngine, e.Name, string(longhorn.InstanceManagerTypeEngine)); err != nil { | ||
return err | ||
} | ||
|
||
_, ok = im.Status.InstanceEngines[e.Name] | ||
if !ok { | ||
return fmt.Errorf("target instance %v is not found in engine list", e.Name) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Add timeout handling for instance manager checks
The instance manager checks could potentially hang if the instance manager is in an inconsistent state.
Add timeout handling:
+ ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
+ defer cancel()
// Check if the initiator instance is running
im, err := ec.ds.GetRunningInstanceManagerByNodeRO(e.Spec.NodeID, longhorn.DataEngineTypeV2)
if err != nil {
return err
}
- if im.Status.CurrentState != longhorn.InstanceManagerStateRunning {
+ if im.Status.CurrentState != longhorn.InstanceManagerStateRunning || !im.Status.APIMinVersion {
return fmt.Errorf("instance manager %v for initiating instance %v is not running", im.Name, e.Name)
}
- initiatorIMClient, err := engineapi.NewInstanceManagerClient(im, false)
+ initiatorIMClient, err := engineapi.NewInstanceManagerClientWithTimeout(im, false, ctx)
Committable suggestion skipped: line range outside the PR's diff.
6b2aa53
to
e797c80
Compare
Longhorn 9104 Signed-off-by: Derek Su <[email protected]>
… and fields Longhorn 9104 Signed-off-by: Derek Su <[email protected]>
…for backup resources Longhorn 9104 Signed-off-by: Derek Su <[email protected]>
…urces Longhorn 9104 Signed-off-by: Derek Su <[email protected]>
…tor and target IPs If initiatorIP is not equal to targetIP, exclude the replica on the node where the initiator is running, because the node is going to be upgraded and the instance-manager will be deleted. Longhorn 9104 Signed-off-by: Derek Su <[email protected]>
Longhorn 9104 Signed-off-by: Derek Su <[email protected]>
This pull request is now in conflict. Could you fix it @derekbit? 🙏 |
Longhorn 9104 Signed-off-by: Derek Su <[email protected]>
…equested data engine upgrade Longhorn 9104 Signed-off-by: Derek Su <[email protected]>
Update replica controller codes due to the changes in APIs. Longhorn 9104 Signed-off-by: Derek Su <[email protected]>
… changes in APIs Update replica scheduler codes due to the changes in APIs. Longhorn 9104 Signed-off-by: Derek Su <[email protected]>
…s being lively upgraded Longhorn 9104 Signed-off-by: Derek Su <[email protected]>
Longhorn 9104 Signed-off-by: Derek Su <[email protected]>
e797c80
to
77445fc
Compare
…de resources Only allow one active dataEngineUpgradeManager and one active nodeDataEngineUppgrade at the same time. Longhorn 9104 Signed-off-by: Derek Su <[email protected]>
5309a72
to
e091c4f
Compare
@coderabbitai pause |
✅ Actions performedReviews paused. |
Longhorn 9104 Signed-off-by: Derek Su <[email protected]>
e091c4f
to
480372f
Compare
This pull request is now in conflict. Could you fix it @derekbit? 🙏 |
Signed-off-by: Derek Su [email protected]
Which issue(s) this PR fixes:
Issue longhorn/longhorn#9104
What this PR does / why we need it:
Special notes for your reviewer:
Additional documentation or context