Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add: taints for machinePools #187

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

VanillaSpoon
Copy link
Contributor

Issue link

What changes have been made

Taints are being implemented to manage the dispatch of appwrappers. When an appwrapper is deployed, it's crucial that its specified nodes are directed to the appropriate scaled-up instances. This mechanism is particularly vital in team-based clusters that scale up multiple instances for various appwrappers. The currrent workflow potentially allows appwrappers to distribute their workload across scaled-up instances not assigned to them, leading to resource mixups.

By applying taints, we ensure that pods are assigned to the correct nodes, preventing such issues.

The same will be done for nodepools separately.

Verification steps

Deploy an appwrapper with instascale enabled, and specified instances.
View you scaled up machinepool in the dashboard, and check that a taint has been applied with
<aw-name> = value1:PreferNoSchedule.

Once the nodes are scaled you can again check within compute and nodes. Feel free to filter with labels=aw-name=aw-name. When viewing each nodes Details you will see again the same taint is visible.

Ensure the appwrappers pods are deploying successfully to these nodes.

To ensure the opposite, feel free to scale up resources with instascale, and deploy a separate appwrapper, ensuring this appwrappers pods are not deployed to the scaled up resources.

Checks

  • I've made sure the tests are passing.
  • Testing Strategy
    • Unit tests
    • Manual tests
    • Testing is not required for this change

Copy link

openshift-ci bot commented Dec 5, 2023

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign maxusmusti for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@asm582
Copy link
Member

asm582 commented Dec 5, 2023

Thanks for this PR, please make sure we patch taints and change tolerations in the reuse scenario.

@VanillaSpoon
Copy link
Contributor Author

Thanks @asm582
There are currently no reuse scenarios in place for machinepools

@asm582
Copy link
Member

asm582 commented Dec 5, 2023

Thanks @asm582 There are currently no reuse scenarios in place for machinepools
I think reuse is important. we should have it as a part of this PR or the following PR. Can you please check if an issue exists and if not please create a new one.

@VanillaSpoon
Copy link
Contributor Author

Thanks @asm582 There are currently no reuse scenarios in place for machinepools
I think reuse is important. we should have it as a part of this PR or the following PR. Can you please check if an issue exists and if not please create a new one.

Hey, I've spent some time looking into it. Great idea, I'll open a new issue and get reuse added for each machine type :)

Copy link
Contributor

@ChristianZaccaria ChristianZaccaria left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work, left two comments/requested changes.

machinePoolTaint := cmv1.NewTaint().
Key(aw.Name).
Value("value1").
Effect("PreferNoSchedule")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we wish to use for the taints effect PreferNoSchedule as opposed to NoSchedule? The first will still allow other pods without the toleration to be assigned to that node. I think NoSchedule would best suit our use case.

@@ -74,8 +74,13 @@ func (r *AppWrapperReconciler) scaleMachinePool(ctx context.Context, aw *arbv1.A
m[aw.Name] = aw.Name
klog.Infof("The instanceRequired array: %v", userRequestedInstanceType)

machinePoolTaint := cmv1.NewTaint().
Key(aw.Name).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the unlikely but possible scenarios where the key (aw.name) in one namespace is the same name but on another namespace. What would occur? Should we have the key include the namespace too?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants