Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

External etcd Support for Karmada Operator - Part 1 #5699

Merged
merged 1 commit into from
Oct 19, 2024

Conversation

jabellard
Copy link
Contributor

What type of PR is this?

/kind feature

What this PR does / why we need it:
This is an implementation of this proposal for adding support to the Karmada operator for external etcd cluster connections.

Which issue(s) this PR fixes:
Fixes #5242

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

The new `SecretRef` field added as part of the configuration for connecting to an external etcd cluster can be 
used to reference a secret that contains credentials for connecting to an external etcd cluster.

@karmada-bot karmada-bot added the kind/feature Categorizes issue or PR as related to a new feature. label Oct 15, 2024
@karmada-bot karmada-bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Oct 15, 2024
@codecov-commenter
Copy link

codecov-commenter commented Oct 15, 2024

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

Attention: Patch coverage is 0% with 24 lines in your changes missing coverage. Please review.

Project coverage is 39.73%. Comparing base (47efa57) to head (b8de967).
Report is 49 commits behind head on master.

Files with missing lines Patch % Lines
operator/pkg/controller/karmada/controller.go 0.00% 24 Missing ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #5699      +/-   ##
==========================================
+ Coverage   38.51%   39.73%   +1.21%     
==========================================
  Files         649      650       +1     
  Lines       45133    55171   +10038     
==========================================
+ Hits        17383    21920    +4537     
- Misses      26418    31841    +5423     
- Partials     1332     1410      +78     
Flag Coverage Δ
unittests 39.73% <0.00%> (+1.21%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@jabellard
Copy link
Contributor Author

@RainbowMango , @chaosi-zju : I opened up this PR to split the work done in the original one per our discussion earlier at the community meeting.

To further the discussion we were having:
As of today, the operator has support for running multiple Karmada instances in the same namespace. We've been relying on that support to fulfill some of our use cases for a managed platform that we're building on top of Karmada at Bloomberg. That has been working very well for us. Some data from one of our management clusters:

Control Planes:

k -n cncs get karmadas 
NAME   READY   AGE
li     True    5d1h
yao    True    5d1h

Components:

k -n cncs get deployments                           
NAME                                  READY   UP-TO-DATE   AVAILABLE   AGE
li-karmada-aggregated-apiserver       1/1     1            1           5d1h
li-karmada-apiserver                  1/1     1            1           5d1h
li-karmada-controller-manager         1/1     1            1           5d1h
li-karmada-kube-controller-manager    1/1     1            1           5d1h
li-karmada-metrics-adapter            2/2     2            2           5d1h
li-karmada-scheduler                  1/1     1            1           5d1h
li-karmada-search                     1/1     1            1           5d1h
li-karmada-webhook                    1/1     1            1           5d1h
yao-karmada-aggregated-apiserver      1/1     1            1           5d1h
yao-karmada-apiserver                 1/1     1            1           5d1h
yao-karmada-controller-manager        1/1     1            1           5d1h
yao-karmada-kube-controller-manager   1/1     1            1           5d1h
yao-karmada-metrics-adapter           2/2     2            2           5d1h
yao-karmada-scheduler                 1/1     1            1           5d1h
yao-karmada-search                    1/1     1            1           5d1h
yao-karmada-webhook                   1/1     1            1           5d1h

Secrets:

k -n cncs get secrets
NAME                                         TYPE     DATA   AGE
li-karmada-admin-config                      Opaque   1      5d1h
li-karmada-cert                              Opaque   16     5d1h
li-karmada-etcd-cert                         Opaque   6      5d1h
li-karmada-webhook-cert                      Opaque   2      5d1h
yao-karmada-admin-config                     Opaque   1      5d1h
yao-karmada-cert                             Opaque   16     5d1h
yao-karmada-etcd-cert                        Opaque   6      5d1h
yao-karmada-webhook-cert                     Opaque   2      5d1h

The current naming convention uses the Karmada instance name as a prefix for all resources associated with the instance.

@jabellard
Copy link
Contributor Author

I was just looking at how things are done in the helm chart. Looks like a similar convention is used there as well:

Copy link
Member

@RainbowMango RainbowMango left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/assign

@@ -125,6 +125,9 @@ const (

// APIServiceName defines the karmada aggregated apiserver APIService resource name.
APIServiceName = "v1alpha1.cluster.karmada.io"

// KarmadaApiserverEtcdClientCertNameSuffix defines the suffix for the Karmada API server etcd client cert name
KarmadaApiserverEtcdClientCertNameSuffix = "karmada-apiserver-etcd-client-cert"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we discussed at the community meeting, use it as a suffix or a whole name waiting for further exploration.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@RainbowMango , to clarify, I'm fine with whatever naming convention is used . What I'm actually interested in is the support for running multiple Karmada instances in the same namespace and if that will break. Today, the operator has support for that, and we've been relying on that support for some uses cases we have. I added some data on this thread to showcase a scenario where that's been working for us in one of our management clusters.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain why you want to run multiple instances in the same namespace here?

@chaunceyjiang are you using it this way?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to clarify, I'm fine with whatever naming convention is used . What I'm actually interested in is the support for running multiple Karmada instances in the same namespace and if that will break. Today, the operator has support for that, and we've been relying on that support for some uses cases we have.

Since we now support running multiple instances in the same namespace, there is no reason to disable it. I'm just curious why users might want to manage them in the same namespace, as I think putting each Karmada instance in a separate namespace is a simpler approach. E.g. each instance belongs to one team, which will help manage the quotas.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added some data on this thread to showcase a scenario where that's been working for us in one of our management clusters.

That was helpful and gave me a lot of confidence. In the historical discussions, I remember we talked more about having one instance per namespace. Thanks.

Copy link
Member

@RainbowMango RainbowMango left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looks good to me.

I think you can squash the PR and it is ready to go.

For the secret naming convention, I don't have a strong opinion, I just want to learn the use case, what's the benefit of running multiple instances in the same namespace.

Also, @chaosi-zju seems you are concerned about having the instance name as the prefix of the secret name. Can you explain it in more detail?

Comment on lines +112 to +135
// validateKarmada ensures the Karmada resource adheres to validation rules
func (ctrl *Controller) validateKarmada(karmada *operatorv1alpha1.Karmada) error {
if karmada.Spec.Components.Etcd != nil && karmada.Spec.Components.Etcd.External != nil {
expectedSecretName := fmt.Sprintf("%s-%s", karmada.Name, constants.KarmadaApiserverEtcdClientCertNameSuffix)
if karmada.Spec.Components.Etcd.External.SecretRef.Name != expectedSecretName {
errorMessage := fmt.Sprintf("Secret name for external etcd client must be %s, but got %s", expectedSecretName, karmada.Spec.Components.Etcd.External.SecretRef.Name)
ctrl.EventRecorder.Event(karmada, corev1.EventTypeWarning, ValidationErrorReason, errorMessage)

newCondition := metav1.Condition{
Type: string(operatorv1alpha1.Ready),
Status: metav1.ConditionFalse,
Reason: ValidationErrorReason,
Message: errorMessage,
LastTransitionTime: metav1.Now(),
}
meta.SetStatusCondition(&karmada.Status.Conditions, newCondition)
if err := ctrl.Status().Update(context.TODO(), karmada); err != nil {
return err
}
return fmt.Errorf(errorMessage)
}
}
return nil
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks good. It covers what we discussed perfectly.

Just put some thoughts.
In my opinion, we probably need to have further validation logic for the other part of the Karmada object.
So, my suggestions are:

  • put the validation part into a separate file, e.g. validation.go.
  • abstract the etcd validation with a separate function.

But it is no big deal, we can refactor it in the future too. It depends on you.

Signed-off-by: Joe Nathan Abellard <[email protected]>

Onwards!

Signed-off-by: Joe Nathan Abellard <[email protected]>

Onwards!

Signed-off-by: Joe Nathan Abellard <[email protected]>
@jabellard
Copy link
Contributor Author

what's the benefit of running multiple instances in the same namespace.

Yeah. Great question. I can provide more context. There are many reasons. A few of them:

  • Seamless integration with internal systems:
    At Bloomberg, we run our own private cloud, and have internal AuthN and AuthZ systems that are integrated with managed k8s clusters for identity management. As you can imagine, we have internal solutions to bridge those systems such as our own custom AuthN webhook service.
  • Ease of use and consistency of experience
    We have other already well established and successful managed services that work that way. People are happy with that as it makes sense for how a lot of things such as tenant onboarding, budget management, and identity management usually work within our environment. For this new managed service we're building, we want to provide a similar experience that people are already happy with.

Thanks for your thoughtful questions. I'm also usually very curious about these things. It helps me to gain more context. I'll squash the PR. Once it's merged, I'll start working on part 2.

@RainbowMango
Copy link
Member

It sounds like you used to manage a group of services within the same namespace and already have the corresponding infrastructure to support this way.

I think the name conversion is still open to discussion, but it shouldn't be a blocker of this PR.

/lgtm
/approve

@karmada-bot karmada-bot added the lgtm Indicates that a PR is ready to be merged. label Oct 19, 2024
@karmada-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: RainbowMango

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@karmada-bot karmada-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 19, 2024
@RainbowMango
Copy link
Member

Joe, I guess it's pretty early there, good morning!

@jabellard
Copy link
Contributor Author

Joe, I guess it's pretty early there, good morning!

Hey!

@karmada-bot karmada-bot merged commit 31bc022 into karmada-io:master Oct 19, 2024
13 checks passed
@chaosi-zju
Copy link
Member

Also, @chaosi-zju seems you are concerned about having the instance name as the prefix of the secret name. Can you explain it in more detail?

I was just looking at how things are done in the helm chart. Looks like a similar convention is used there as well:

Hi, if there are indeed scenarios that require multiple instances to be deployed under one namespace and it is feasible, then I am positive to adding the instance name prefix.

Here, I'm just describing where my concern comes from, which actually comes from our helm chart:

Some user may install karmada core karmada components at first, and then install optional plugins like karmada-scheduler-estimator independently when needed, he needs to do this:

  1. execute helm install karmada ./charts/karmada to install core components, the instance name is karmada, so the signed certs is stored in secret just like karmada-xxx-cert.
  2. execute helm install karmada-scheduler-estimator-member1 ./charts/karmada to install estimator for member1 cluster, the instance name is karmada-scheduler-estimator-member1, so it will looking for secret karmada-scheduler-estimator-member1-xxx-cert, which doesn't exist, so the installation would failed.
  3. execute helm install karmada-scheduler-estimator-member2 ./charts/karmada to install estimator for member2 cluster, will also fail.

The problem comes from the fact that in helm, the instance name, or release name to be exact, is not allowed to be repeated, so there are problems in installing plugins independently at present.

That is my concern, but if there is a real need to install multiple Karmada instances in a namespace, other ways to solve such problems can also be explored.

@jabellard
Copy link
Contributor Author

Here, I'm just describing where my concern comes from, which actually comes from our helm chart:

Thanks for providing that context.

@jabellard
Copy link
Contributor Author

Hi, if there are indeed scenarios that require multiple instances to be deployed under one namespace and it is feasible, then I am positive to adding the instance name prefix.

Thanks for your understanding. To clarify, I'm fine with whatever naming convention is used. The important thing for us is that the support in the operator for running multiple instances in the same namespace is retained since we need that for some use cases.

@RainbowMango , @chaosi-zju : I'd like to start working on part 2 of this feature. Should I use the naming conventions outlined in this issue with instance name prefix for the operator?

@RainbowMango
Copy link
Member

To clarify, this PR now uses the instance name as the prefix of the expected secret. I think we can keep moving towards part 2.
The name conversion might change over time (not sure yet, as you know), but it shouldn't block this feature.

@jabellard
Copy link
Contributor Author

I think we can keep moving towards part 2.

Ok. Will continue working on that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. kind/feature Categorizes issue or PR as related to a new feature. lgtm Indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add The Ability to Retrieve External Etcd Client Credentials From Secret
5 participants