Unexpected failover behavior in karmada #5309

NickYadance · 2024-08-06T08:39:35Z

NickYadance
Aug 6, 2024

Releated issue: #5150

Karmada deleted part of resources in a member cluster when the member cluster is down for about ~2minutes, at the same time, Karmada creates all resources again after it recovers.

I try to explain this with failover and reproduce it, but the failover and graceful eviction in Karmada requires that:

The cluster down time >failoverTimeout, then NoExecute taint can be added
The cluster down time >tolerationSeconds , then graceful eviction task can be added into resourcebindings
The graceful eviction task is done, so that orphan works will be deleted

Karmada needs to wait at least failoverTimeout(5min) + tolerationSeconds(5min) before failover actually happens, which doesn't match the actual recovery time of 2min.

After searching for quite a while i still cannot find a possible reasonable cause for this, any suggestions ?

Answered by NickYadance

Aug 7, 2024

Found a possible explanation for this related to getAPIEnablements returning partial results.

karmada/pkg/controllers/status/cluster_status_controller.go

Lines 440 to 471 in b4b6d69

     // getAPIEnablements returns the list of API enablement(supported groups and resources).  
   // The returned lists might be non-nil with partial results even in the case of non-nil error.  
   func getAPIEnablements(clusterClient *util.ClusterClient) ([]clusterv1alpha1.APIEnablement, error) {  
   _, apiResourceList, err := clusterClient.KubeClient.Discovery().ServerGroupsAndResources()  
   if len(apiResourceList) == 0 {  
   return nil, err  
   }  
    
   var apiEnablements []clusterv1alpha1.A…

View full answer

NickYadance · 2024-08-07T07:33:13Z

NickYadance
Aug 7, 2024
Author

Found a possible explanation for this related to getAPIEnablements returning partial results.

karmada/pkg/controllers/status/cluster_status_controller.go

Lines 440 to 471 in b4b6d69

    
           // getAPIEnablements returns the list of API enablement(supported groups and resources). 
        
           // The returned lists might be non-nil with partial results even in the case of non-nil error. 
        
           func getAPIEnablements(clusterClient *util.ClusterClient) ([]clusterv1alpha1.APIEnablement, error) { 
        
           	_, apiResourceList, err := clusterClient.KubeClient.Discovery().ServerGroupsAndResources() 
        
           	if len(apiResourceList) == 0 { 
        
           		return nil, err 
        
           	} 
        
           	var apiEnablements []clusterv1alpha1.APIEnablement 
        
           	for _, list := range apiResourceList { 
        
           		var apiResources []clusterv1alpha1.APIResource 
        
           		for _, resource := range list.APIResources { 
        
           			// skip subresources such as "/status", "/scale" and etc because these are not real APIResources that we are caring about. 
        
           			if strings.Contains(resource.Name, "/") { 
        
           				continue 
        
           			} 
        
           			apiResource := clusterv1alpha1.APIResource{ 
        
           				Name: resource.Name, 
        
           				Kind: resource.Kind, 
        
           			} 
        
           			apiResources = append(apiResources, apiResource) 
        
           		} 
        
           		sort.SliceStable(apiResources, func(i, j int) bool { 
        
           			return apiResources[i].Name < apiResources[j].Name 
        
           		}) 
        
           		apiEnablements = append(apiEnablements, clusterv1alpha1.APIEnablement{GroupVersion: list.GroupVersion, Resources: apiResources}) 
        
           	} 
        
           	sort.SliceStable(apiEnablements, func(i, j int) bool { 
        
           		return apiEnablements[i].GroupVersion < apiEnablements[j].GroupVersion 
        
           	}) 
        
           	return apiEnablements, err 
        
           }

In my case the cluster doesn't shutdown right away but in a graceful way, thus reject incoming connections then close existing connections. It's possible for getAPIEnablements to get partial results during the process, which includes only parts of the GVK, and set them into cluster status. This equals to removing GVK from the cluster manually.

After the NoSchedule taint is added to the cluster, scheduler will immediatily reschedule all resource bindings binded to the cluster.
The ApiEnablement plugin in scheduler will fail due to missing GVK in cluster status. So scheduler will remove the unhealthy cluster from the resource bindings of these missing GVKs, producing BindingUpdate.

Now the issue happens, because the unhealthy cluster no longer exists in resource binding, all the existing works will be deleted, and all workloads will be deleted after the cluster is back alive. This also explains why there are only part of the resources being deleted in my case.

Maybe worthy a look @XiShanYongYe-Chang

1 reply

whitewindmills Aug 9, 2024

yes, I have the same problem.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unexpected failover behavior in karmada #5309

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 1 reply

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

	// getAPIEnablements returns the list of API enablement(supported groups and resources).
	// The returned lists might be non-nil with partial results even in the case of non-nil error.
	func getAPIEnablements(clusterClient *util.ClusterClient) ([]clusterv1alpha1.APIEnablement, error) {
	_, apiResourceList, err := clusterClient.KubeClient.Discovery().ServerGroupsAndResources()
	if len(apiResourceList) == 0 {
	return nil, err
	}

	var apiEnablements []clusterv1alpha1.A…

Unexpected failover behavior in karmada #5309

NickYadance Aug 6, 2024

Replies: 1 comment · 1 reply

NickYadance Aug 7, 2024 Author

whitewindmills Aug 9, 2024

NickYadance
Aug 6, 2024

Replies: 1 comment 1 reply

NickYadance
Aug 7, 2024
Author