External `etcd` Support for Karmada Operator - Part 1 #5699

jabellard · 2024-10-15T20:00:46Z

What type of PR is this?

/kind feature

What this PR does / why we need it:
This is an implementation of this proposal for adding support to the Karmada operator for external etcd cluster connections.

Which issue(s) this PR fixes:
Fixes #5242

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

The new `SecretRef` field added as part of the configuration for connecting to an external etcd cluster can be 
used to reference a secret that contains credentials for connecting to an external etcd cluster.

codecov-commenter · 2024-10-15T20:18:27Z

⚠️ Please install the to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

Attention: Patch coverage is 0% with 24 lines in your changes missing coverage. Please review.

Project coverage is 39.73%. Comparing base (47efa57) to head (b8de967).
Report is 49 commits behind head on master.

Files with missing lines	Patch %	Lines
operator/pkg/controller/karmada/controller.go	0.00%	24 Missing ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #5699      +/-   ##
==========================================
+ Coverage   38.51%   39.73%   +1.21%     
==========================================
  Files         649      650       +1     
  Lines       45133    55171   +10038     
==========================================
+ Hits        17383    21920    +4537     
- Misses      26418    31841    +5423     
- Partials     1332     1410      +78

Flag	Coverage Δ
unittests	`39.73% <0.00%> (+1.21%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

jabellard · 2024-10-15T20:32:39Z

@RainbowMango , @chaosi-zju : I opened up this PR to split the work done in the original one per our discussion earlier at the community meeting.

To further the discussion we were having:
As of today, the operator has support for running multiple Karmada instances in the same namespace. We've been relying on that support to fulfill some of our use cases for a managed platform that we're building on top of Karmada at Bloomberg. That has been working very well for us. Some data from one of our management clusters:

Control Planes:

k -n cncs get karmadas 
NAME   READY   AGE
li     True    5d1h
yao    True    5d1h

Components:

k -n cncs get deployments                           
NAME                                  READY   UP-TO-DATE   AVAILABLE   AGE
li-karmada-aggregated-apiserver       1/1     1            1           5d1h
li-karmada-apiserver                  1/1     1            1           5d1h
li-karmada-controller-manager         1/1     1            1           5d1h
li-karmada-kube-controller-manager    1/1     1            1           5d1h
li-karmada-metrics-adapter            2/2     2            2           5d1h
li-karmada-scheduler                  1/1     1            1           5d1h
li-karmada-search                     1/1     1            1           5d1h
li-karmada-webhook                    1/1     1            1           5d1h
yao-karmada-aggregated-apiserver      1/1     1            1           5d1h
yao-karmada-apiserver                 1/1     1            1           5d1h
yao-karmada-controller-manager        1/1     1            1           5d1h
yao-karmada-kube-controller-manager   1/1     1            1           5d1h
yao-karmada-metrics-adapter           2/2     2            2           5d1h
yao-karmada-scheduler                 1/1     1            1           5d1h
yao-karmada-search                    1/1     1            1           5d1h
yao-karmada-webhook                   1/1     1            1           5d1h

Secrets:

k -n cncs get secrets
NAME                                         TYPE     DATA   AGE
li-karmada-admin-config                      Opaque   1      5d1h
li-karmada-cert                              Opaque   16     5d1h
li-karmada-etcd-cert                         Opaque   6      5d1h
li-karmada-webhook-cert                      Opaque   2      5d1h
yao-karmada-admin-config                     Opaque   1      5d1h
yao-karmada-cert                             Opaque   16     5d1h
yao-karmada-etcd-cert                        Opaque   6      5d1h
yao-karmada-webhook-cert                     Opaque   2      5d1h

The current naming convention uses the Karmada instance name as a prefix for all resources associated with the instance.

jabellard · 2024-10-15T20:45:59Z

I was just looking at how things are done in the helm chart. Looks like a similar convention is used there as well:

RainbowMango

/assign

RainbowMango · 2024-10-16T10:28:49Z

operator/pkg/constants/constants.go

@@ -125,6 +125,9 @@ const (

 	// APIServiceName defines the karmada aggregated apiserver APIService resource name.
 	APIServiceName = "v1alpha1.cluster.karmada.io"
+
+	// KarmadaApiserverEtcdClientCertNameSuffix defines the suffix for the Karmada API server etcd client cert name
+	KarmadaApiserverEtcdClientCertNameSuffix = "karmada-apiserver-etcd-client-cert"


As we discussed at the community meeting, use it as a suffix or a whole name waiting for further exploration.

@RainbowMango , to clarify, I'm fine with whatever naming convention is used . What I'm actually interested in is the support for running multiple Karmada instances in the same namespace and if that will break. Today, the operator has support for that, and we've been relying on that support for some uses cases we have. I added some data on this thread to showcase a scenario where that's been working for us in one of our management clusters.

Can you explain why you want to run multiple instances in the same namespace here?

@chaunceyjiang are you using it this way?

to clarify, I'm fine with whatever naming convention is used . What I'm actually interested in is the support for running multiple Karmada instances in the same namespace and if that will break. Today, the operator has support for that, and we've been relying on that support for some uses cases we have.

Since we now support running multiple instances in the same namespace, there is no reason to disable it. I'm just curious why users might want to manage them in the same namespace, as I think putting each Karmada instance in a separate namespace is a simpler approach. E.g. each instance belongs to one team, which will help manage the quotas.

I added some data on this thread to showcase a scenario where that's been working for us in one of our management clusters.

That was helpful and gave me a lot of confidence. In the historical discussions, I remember we talked more about having one instance per namespace. Thanks.

RainbowMango

Generally looks good to me.

I think you can squash the PR and it is ready to go.

For the secret naming convention, I don't have a strong opinion, I just want to learn the use case, what's the benefit of running multiple instances in the same namespace.

Also, @chaosi-zju seems you are concerned about having the instance name as the prefix of the secret name. Can you explain it in more detail?

RainbowMango · 2024-10-19T08:51:07Z

operator/pkg/controller/karmada/controller.go

+// validateKarmada ensures the Karmada resource adheres to validation rules
+func (ctrl *Controller) validateKarmada(karmada *operatorv1alpha1.Karmada) error {
+	if karmada.Spec.Components.Etcd != nil && karmada.Spec.Components.Etcd.External != nil {
+		expectedSecretName := fmt.Sprintf("%s-%s", karmada.Name, constants.KarmadaApiserverEtcdClientCertNameSuffix)
+		if karmada.Spec.Components.Etcd.External.SecretRef.Name != expectedSecretName {
+			errorMessage := fmt.Sprintf("Secret name for external etcd client must be %s, but got %s", expectedSecretName, karmada.Spec.Components.Etcd.External.SecretRef.Name)
+			ctrl.EventRecorder.Event(karmada, corev1.EventTypeWarning, ValidationErrorReason, errorMessage)
+
+			newCondition := metav1.Condition{
+				Type:               string(operatorv1alpha1.Ready),
+				Status:             metav1.ConditionFalse,
+				Reason:             ValidationErrorReason,
+				Message:            errorMessage,
+				LastTransitionTime: metav1.Now(),
+			}
+			meta.SetStatusCondition(&karmada.Status.Conditions, newCondition)
+			if err := ctrl.Status().Update(context.TODO(), karmada); err != nil {
+				return err
+			}
+			return fmt.Errorf(errorMessage)
+		}
+	}
+	return nil
+}


It looks good. It covers what we discussed perfectly.

Just put some thoughts.
In my opinion, we probably need to have further validation logic for the other part of the Karmada object.
So, my suggestions are:

put the validation part into a separate file, e.g. validation.go.

abstract the etcd validation with a separate function.

But it is no big deal, we can refactor it in the future too. It depends on you.

Signed-off-by: Joe Nathan Abellard <[email protected]> Onwards! Signed-off-by: Joe Nathan Abellard <[email protected]> Onwards! Signed-off-by: Joe Nathan Abellard <[email protected]>

jabellard · 2024-10-19T10:35:58Z

what's the benefit of running multiple instances in the same namespace.

Yeah. Great question. I can provide more context. There are many reasons. A few of them:

Seamless integration with internal systems:
At Bloomberg, we run our own private cloud, and have internal AuthN and AuthZ systems that are integrated with managed k8s clusters for identity management. As you can imagine, we have internal solutions to bridge those systems such as our own custom AuthN webhook service.
Ease of use and consistency of experience
We have other already well established and successful managed services that work that way. People are happy with that as it makes sense for how a lot of things such as tenant onboarding, budget management, and identity management usually work within our environment. For this new managed service we're building, we want to provide a similar experience that people are already happy with.

Thanks for your thoughtful questions. I'm also usually very curious about these things. It helps me to gain more context. I'll squash the PR. Once it's merged, I'll start working on part 2.

RainbowMango · 2024-10-19T11:03:03Z

It sounds like you used to manage a group of services within the same namespace and already have the corresponding infrastructure to support this way.

I think the name conversion is still open to discussion, but it shouldn't be a blocker of this PR.

/lgtm
/approve

karmada-bot · 2024-10-19T11:03:09Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: RainbowMango

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~charts/OWNERS~~ [RainbowMango]
~~operator/OWNERS~~ [RainbowMango]
~~operator/pkg/apis/OWNERS~~ [RainbowMango]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

RainbowMango · 2024-10-19T11:08:29Z

Joe, I guess it's pretty early there, good morning!

jabellard · 2024-10-19T11:16:37Z

Joe, I guess it's pretty early there, good morning!

Hey!

chaosi-zju · 2024-10-21T03:04:45Z

Also, @chaosi-zju seems you are concerned about having the instance name as the prefix of the secret name. Can you explain it in more detail?

I was just looking at how things are done in the helm chart. Looks like a similar convention is used there as well:

Hi, if there are indeed scenarios that require multiple instances to be deployed under one namespace and it is feasible, then I am positive to adding the instance name prefix.

Here, I'm just describing where my concern comes from, which actually comes from our helm chart:

Some user may install karmada core karmada components at first, and then install optional plugins like karmada-scheduler-estimator independently when needed, he needs to do this:

execute helm install karmada ./charts/karmada to install core components, the instance name is karmada, so the signed certs is stored in secret just like karmada-xxx-cert.
execute helm install karmada-scheduler-estimator-member1 ./charts/karmada to install estimator for member1 cluster, the instance name is karmada-scheduler-estimator-member1, so it will looking for secret karmada-scheduler-estimator-member1-xxx-cert, which doesn't exist, so the installation would failed.
execute helm install karmada-scheduler-estimator-member2 ./charts/karmada to install estimator for member2 cluster, will also fail.

The problem comes from the fact that in helm, the instance name, or release name to be exact, is not allowed to be repeated, so there are problems in installing plugins independently at present.

That is my concern, but if there is a real need to install multiple Karmada instances in a namespace, other ways to solve such problems can also be explored.

jabellard · 2024-10-21T09:46:32Z

Here, I'm just describing where my concern comes from, which actually comes from our helm chart:

Thanks for providing that context.

jabellard · 2024-10-21T10:09:35Z

Hi, if there are indeed scenarios that require multiple instances to be deployed under one namespace and it is feasible, then I am positive to adding the instance name prefix.

Thanks for your understanding. To clarify, I'm fine with whatever naming convention is used. The important thing for us is that the support in the operator for running multiple instances in the same namespace is retained since we need that for some use cases.

@RainbowMango , @chaosi-zju : I'd like to start working on part 2 of this feature. Should I use the naming conventions outlined in this issue with instance name prefix for the operator?

RainbowMango · 2024-10-21T10:15:12Z

To clarify, this PR now uses the instance name as the prefix of the expected secret. I think we can keep moving towards part 2.
The name conversion might change over time (not sure yet, as you know), but it shouldn't block this feature.

jabellard · 2024-10-21T10:17:49Z

I think we can keep moving towards part 2.

Ok. Will continue working on that.

karmada-bot added the kind/feature Categorizes issue or PR as related to a new feature. label Oct 15, 2024

karmada-bot requested review from a7i, calvin0327 and Poor12 October 15, 2024 20:00

karmada-bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Oct 15, 2024

RainbowMango reviewed Oct 18, 2024

View reviewed changes

karmada-bot assigned RainbowMango Oct 18, 2024

RainbowMango reviewed Oct 19, 2024

View reviewed changes

Onwards!

b8de967

Signed-off-by: Joe Nathan Abellard <[email protected]> Onwards! Signed-off-by: Joe Nathan Abellard <[email protected]> Onwards! Signed-off-by: Joe Nathan Abellard <[email protected]>

jabellard force-pushed the external_etcd_3 branch from 7307524 to b8de967 Compare October 19, 2024 10:32

karmada-bot added the lgtm Indicates that a PR is ready to be merged. label Oct 19, 2024

karmada-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 19, 2024

karmada-bot merged commit 31bc022 into karmada-io:master Oct 19, 2024
13 checks passed

RainbowMango mentioned this pull request Oct 24, 2024

Fix typo in operator file name #5658

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

External `etcd` Support for Karmada Operator - Part 1 #5699

External `etcd` Support for Karmada Operator - Part 1 #5699

jabellard commented Oct 15, 2024

codecov-commenter commented Oct 15, 2024 •

edited

Loading

jabellard commented Oct 15, 2024

jabellard commented Oct 15, 2024

RainbowMango left a comment

RainbowMango Oct 16, 2024

jabellard Oct 18, 2024

RainbowMango Oct 19, 2024

RainbowMango Oct 19, 2024

RainbowMango Oct 19, 2024

RainbowMango left a comment

RainbowMango Oct 19, 2024

jabellard commented Oct 19, 2024

RainbowMango commented Oct 19, 2024

karmada-bot commented Oct 19, 2024

RainbowMango commented Oct 19, 2024

jabellard commented Oct 19, 2024

chaosi-zju commented Oct 21, 2024

jabellard commented Oct 21, 2024

jabellard commented Oct 21, 2024

RainbowMango commented Oct 21, 2024

jabellard commented Oct 21, 2024

External etcd Support for Karmada Operator - Part 1 #5699

External etcd Support for Karmada Operator - Part 1 #5699

Conversation

jabellard commented Oct 15, 2024

codecov-commenter commented Oct 15, 2024 • edited Loading

Codecov Report

jabellard commented Oct 15, 2024

jabellard commented Oct 15, 2024

RainbowMango left a comment

Choose a reason for hiding this comment

RainbowMango Oct 16, 2024

Choose a reason for hiding this comment

jabellard Oct 18, 2024

Choose a reason for hiding this comment

RainbowMango Oct 19, 2024

Choose a reason for hiding this comment

RainbowMango Oct 19, 2024

Choose a reason for hiding this comment

RainbowMango Oct 19, 2024

Choose a reason for hiding this comment

RainbowMango left a comment

Choose a reason for hiding this comment

RainbowMango Oct 19, 2024

Choose a reason for hiding this comment

jabellard commented Oct 19, 2024

RainbowMango commented Oct 19, 2024

karmada-bot commented Oct 19, 2024

RainbowMango commented Oct 19, 2024

jabellard commented Oct 19, 2024

chaosi-zju commented Oct 21, 2024

jabellard commented Oct 21, 2024

jabellard commented Oct 21, 2024

RainbowMango commented Oct 21, 2024

jabellard commented Oct 21, 2024

External `etcd` Support for Karmada Operator - Part 1 #5699

External `etcd` Support for Karmada Operator - Part 1 #5699

codecov-commenter commented Oct 15, 2024 •

edited

Loading