Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the ability to force the location of the Seed to be in the same region as Kyma cluster [EPIC] #18182

Open
3 of 19 tasks
TorstenD-SAP opened this issue Sep 15, 2023 · 16 comments
Assignees
Labels
area/security Issues or PRs related to security Epic kind/feature Categorizes issue or PR as related to a new feature. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.

Comments

@TorstenD-SAP
Copy link

TorstenD-SAP commented Sep 15, 2023

Description

The user who creates a Kyma cluster in the BTP cockpit should be able to enforce the location of the Control Plane to be in the same region as the Hyperscaler account where the Worker Nodes of the cluster are deployed. If it is not possible to have the Control Plane in the same region, the user should see an error message allowing him to proceed without this enforcement. In all cases it has to be transparent to the customer in which region the Control Plane is hosted.

Reasons

The region of the Control Plane is automatically chosen by Gardener (https://gardener.cloud/docs/gardener/concepts/scheduler/). Because of this the Control Plane could sometimes be deployed in a different region than the worker nodes, among others because Gardener doesn't have Seed clusters in all the regions Kyma can be deployed. This can lead to a violation of the law because the Control Plane could be in another legal area than the Worker Nodes and the customer is storing personal data (e. g. names, email addresses) on the Control Plane. We also have customers which are very sensitive regarding the regions where sensitive data is stored.

AC (Added by PK)

@kyma-bot
Copy link
Contributor

This issue or PR has been automatically marked as stale due to the lack of recent activity.
Thank you for your contributions.

This bot triages issues and PRs according to the following rules:

  • After 60d of inactivity, lifecycle/stale is applied
  • After 7d of inactivity since lifecycle/stale was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Close this issue or PR with /close

If you think that I work incorrectly, kindly raise an issue with the problem.

/lifecycle stale

@kyma-bot kyma-bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 14, 2023
@varbanv varbanv removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 15, 2023
@tobiscr tobiscr added the area/security Issues or PRs related to security label Dec 5, 2023
@TorstenD-SAP
Copy link
Author

A label seed.gardener.cloud/region was added to each Gardener seed. This label can be used to restrict the seeds allowed for a shoot cluster by using the spec.seedSelector in the shoot spec.

Copy link

This issue has been automatically marked as stale due to the lack of recent activity. It will soon be closed if no further activity occurs.
Thank you for your contributions.

@github-actions github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 13, 2024
@TorstenD-SAP TorstenD-SAP removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 13, 2024
@tobiscr
Copy link
Contributor

tobiscr commented Mar 25, 2024

We agreed with @kyma-project/gopher to offer this feature under following constraints:

  • We know that not all regions have yet their own seed, thats why we will show in the UI already a link to the documentation that this feature is not in all regions supported and can lead to failed Kyma clusters (because Gardener rejected the cluster creation)
  • KIM and KEB will not check up-front if a seed exists in the requested regions and follow a "trail and error" approach: if Gardener could create the cluster all is fine otherwise the customer get's an error replied.

@tobiscr tobiscr added the Epic label Apr 9, 2024
@PK85 PK85 changed the title Add the ability to force the location of the Control Plane to be in the same region than the Nodes Add the ability to force the location of the Seed to be in the same region as Kyma cluster Apr 24, 2024
@PK85 PK85 added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Apr 24, 2024
@ralikio ralikio self-assigned this May 16, 2024
@IwonaLanger IwonaLanger self-assigned this May 21, 2024
@ralikio
Copy link
Member

ralikio commented May 21, 2024

I have tested seedSelector field mentioned in #18182 (comment). @PK85 suggested that for our test scenario we should select a shoot region that does not contain any seeds in it - ap-northeast-1. Just by creating a shoot with default configuration it got assigned a aws-ha-us2 region. Creation of another shoot with seedCluster set to ap-northeast-1 resulted in the following status:

*Status*
Create Pending

*Last Message*
Failed to schedule Shoot: none out of the ... seeds has the matching labels required by 
seed selector of 'Shoot' (selector: 'seed.gardener.cloud/region=ap-northeast-1')

Status: Create Pending seems counterintuitive to @kyma-project/gopher and @kyma-project/framefrog and will be consulted with Gardener Team.

@ralikio
Copy link
Member

ralikio commented May 24, 2024

Proposed request sent to Provisioner's graphql API with new field shootAndSeedSameRegion:

{
	runtimeInput: {
		...	
	},
	clusterConfig:{
		gardenerConfig: {
			...	
			shootAndSeedSameRegion: false (default) | true,
		},
		...
	},
}

@ralikio
Copy link
Member

ralikio commented May 24, 2024

@tobiscr
Copy link
Contributor

tobiscr commented May 27, 2024

JFYI - added a draft PR for Gardener to extract the Seed determining logic into separate struct to make it reusable for other apps over their API:

gardener/gardener#9843

@ralikio
Copy link
Member

ralikio commented May 27, 2024

No relevant, see #18182 (comment).


Two additional tests cases conducted regarding Gardener's spec.controlPlane.highAvailability.failureTolerance.type: zone and seedSelector. From the gardener documentation https://gardener.cloud/docs/gardener/high-availability/ we learn that:

Regarding the seed cluster selection, the only constraint is that shoot clusters with failure tolerance type zone are only allowed to run on seed clusters with at least three zones. All other shoot clusters (non-HA or those with failure tolerance type node) can run on seed clusters with any number of zones.

Case I - Creating a non-HA shoot on a region that only contains HA seeds - contains HA in its name

Provider: aws
Seed Selector: eu-north-1 - a region with two HA seeds
HA options: spec.controlPlane.highAvailability.failureTolerance.type: zone not set
Result: shoot gets created successfully.

Case II - Creating a HA shoot on a region that only contains non-HA seeds - no HA in its name

Provider: gcp
Seed Selector: europe-west-3 - a region with one non-HA seed
HA options: spec.controlPlane.highAvailability.failureTolerance.type: zone enabled
Result:

Create Pending - Failed to schedule Shoot: 0/1 seed cluster candidate(s) are eligible for scheduling: {*** => shoot does not tolerate the seed's taints}

Case III - Creating a HA shot in the region that contains one HA seed - contains HA in its name

Provider: gcp
Seed Selector: me-central2 - a region with one HA seed
HA options: spec.controlPlane.highAvailability.failureTolerance.type: zone enabled
Result:

Create Pending - Failed to schedule Shoot: 0/1 seed cluster candidate(s) are eligible for scheduling: {*** => shoot does not tolerate the seed's taints}

@ralikio
Copy link
Member

ralikio commented May 28, 2024

Rendering of schema changes:

Image

Image

@ralikio
Copy link
Member

ralikio commented Jun 3, 2024

Tests for seed selection process when provisioning shoots in high availability configuration (documented in #18182 (comment)) assumed that seeds that contained ha in their name (e.g. aws-ha-eu3) are specially designed to serve HA configuration. This is incorrect. Seeds with such names are just results of old naming conventions. All seeds with at least three zones are able to handle ha control plane deployments. Additionally, there is also visible property that restricts number of seeds available for scheduling. At time of writing the comment all of seeds were deployed across three zones.

@ralikio
Copy link
Member

ralikio commented Jun 6, 2024

As of today we have implemented KEB part for Provisioner. @kyma-project/gopher are waiting for KIM implementation.

@tobiscr
Copy link
Contributor

tobiscr commented Jun 26, 2024

Appendix - some more background information related to this issue:

Customer reported bug
Slack Thread on #kyma-team
Slack Thread on #sap-tech-gardener-live

@PK85 PK85 added the kind/feature Categorizes issue or PR as related to a new feature. label Jul 25, 2024
Copy link

This issue has been automatically marked as stale due to the lack of recent activity. It will soon be closed if no further activity occurs.
Thank you for your contributions.

@github-actions github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 24, 2024
@varbanv varbanv removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 24, 2024
@tobiscr
Copy link
Contributor

tobiscr commented Oct 9, 2024

@kyma-project/gopher : enabling "seed in region as shoot"-flag is only supported for new clusters - customers cannot enable it after the cluster exist - is this correct?

@tobiscr
Copy link
Contributor

tobiscr commented Oct 15, 2024

Feedback from @PK85 (via Slack):

Yes, only for new ones, cause as I remember seed cannot be updated
Maybe your team can double check that

@kyma-project/gopher : enabling "seed in region as shoot"-flag is only supported for new clusters - customers cannot enable it after the cluster exist - is this correct?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/security Issues or PRs related to security Epic kind/feature Categorizes issue or PR as related to a new feature. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

No branches or pull requests

7 participants