Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Introduce InstanceType CRD for overriding instance type resources #2404

Closed

Conversation

jonathan-innis
Copy link
Contributor

@jonathan-innis jonathan-innis commented Aug 30, 2022

Fixes kubernetes-sigs/karpenter#751
Relates to #2161

Description

  • Add InstanceType CR for specifying per-instance type settings. For now, you can override the allocatable resources to specify extended resources that have corresponding custom device plugins for specific instance types.
apiVersion: karpenter.sh/v1alpha1
kind: InstanceType
metadata:
  name: c5.large
spec:
  capacity:
    hardware.vendor.com/resource: 2
    hardware.vendor.com/other-resource: 10

How was this change tested?

Does this change impact docs?

  • Yes, PR includes docs updates
  • Yes, issue opened: #
  • No

Release Note

Add `InstanceType` CRD for customizing instance type resources

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@netlify
Copy link

netlify bot commented Aug 30, 2022

Deploy Preview for karpenter-docs-prod canceled.

Name Link
🔨 Latest commit 19e2aee
🔍 Latest deploy log https://app.netlify.com/sites/karpenter-docs-prod/deploys/63112f3743f3a900083706ff

@jonathan-innis jonathan-innis changed the title feat: Introduce InstanceType CRD for overriding feat: Introduce InstanceType CRD for overriding instance type resources Aug 30, 2022
// used by the scheduler. This resource list can contain known resources (cpu, memory, etc.)
// or it may also contain unknown custom device resources for custom device plugins
// +optional
Resources v1.ResourceList `json:"resources,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thoughts on calling this Allocatable for parity with v1.Node?

@jonathan-innis jonathan-innis force-pushed the instance-type-settings branch 2 times, most recently from e8073f8 to 835a11c Compare August 31, 2022 20:13
@jonathan-innis jonathan-innis added the blocked Unable to make progress due to some dependency label Aug 31, 2022
@universam1
Copy link

universam1 commented Sep 19, 2022

@jonathan-innis @ellistarn
Thank you for this proposed solution, but we have serious concerns this does not solve kubernetes-sigs/karpenter#751 for us, let me explain.

unmanageable?

We understand that the core idea of Karpenter is to solve the problem of preselected instance types to dynamically select from the magnitude of various types available in AWS. This CRD however defeats this idea and thus does not provide the quality of #2161 - in contrast we see an extreme overhead here.

The additional complexity introduced here that multiplies with instance type times instance size results into thousands of variations, which is unmanageable!

divergence with kind: Provisioner

This concept requires a manual, logical sync with kind: Provisioner to pin Karpenter to only use those defined instances of kind: InstanceType, which is prone to drift.
There is the logical connection of the choice of types with the definition of their features - but this is not given as there is no coupling btw. those two independent CRD.

idea

It looks like kind: InstanceType would better be part of extended kind: Provisioner CRD to be closely coupled, and then a missing (regex) selector to solve the unmanageability would be solved as well. Or, like with awsnodetemplates.karpenter.k8s.aws reference it like providerRef
Thank you for your consideration

@jonathan-innis
Copy link
Contributor Author

@jonathan-innis @ellistarn Thank you for this proposed solution, but we have serious concerns this does not solve kubernetes-sigs/karpenter#751 for us, let me explain.

unmanageable?

We understand that the core idea of Karpenter is to solve the problem of preselected instance types to dynamically select from the magnitude of various types available in AWS. This CRD however defeats this idea and thus does not provide the quality of #2161 - in contrast we see an extreme overhead here.

The additional complexity introduced here that multiplies with instance type times instance size results into thousands of variations, which is unmanageable!

divergence with kind: Provisioner

This concept requires a manual, logical sync with kind: Provisioner to pin Karpenter to only use those defined instances of kind: InstanceType, which is prone to drift. There is the logical connection of the choice of types with the definition of their features - but this is not given as there is no coupling btw. those two independent CRD.

idea

It looks like kind: InstanceType would better be part of extended kind: Provisioner CRD to be closely coupled, and then a missing (regex) selector to solve the unmanageability would be solved as well. Or, like with awsnodetemplates.karpenter.k8s.aws reference it like providerRef Thank you for your consideration

Hi @universam1, thanks for your feedback. I'm interested to know how you would deal with the difference in resource capacity for extended resources across different instance types. My assumption is that all instance types wouldn't have the same number of extended resources so we didn't want to couple it too closely to the provisioner as it might also cause a provisioner explosion.

Let me know if this understanding is incorrect, though. I'm curious to know how much commonality would exist across your instance types for extended resources and if an overlay concept using something like a label selector would be better.

@universam1
Copy link

universam1 commented Oct 6, 2022

Hi @jonathan-innis , thank you for reaching out and acquiring feedback, appreciated!

My assumption is that all instance types wouldn't have the same number of extended resources

Let me picture our two use cases:

  1. For a large videoconferencing system (up to several hundreds of nodes per cluster) we leverage the ALSA /dev/snd device exposed by the kernel (module). Obviously the amount of these kernel devices available per host are not in relation to an instance size or type and probably not even on CPU architecture, but rather on linux kernel config. For this amount of instances Karpenter is the answer, overcomimg AWS provider limits of the availablity of a certain type/size, so the loosely tied flexibilty of types and sizes possible by Karpenter is what we have been looking for. Now having to constrain on few types and sizes by the current manual declaration cancels the whole benefits out.
  2. We have a container build farm that requires /dev/fuse device. The number of fuse devices provided by the kernel is also not tied to instance type or size, per default 1000 or more. Also here it is impossible for us to define all instance types times instance sizes to match all provider offers.

Granted there are other use cases that I might not be aware of, but I am struggling to picture a case where a one-to-one relation of this config would shine. In our case it would be a management nightmare so to speek. Maybe this context was missing to explain why my simple #2161 is sufficient for us. Thank you for considering!

@jonathan-innis
Copy link
Contributor Author

jonathan-innis commented Oct 6, 2022

Hi @universam1. We are going to close this PR for now. I'm going to migrate our discussion to the RFC PR for instance type settings (#2390) which includes extended resource plugins so that we can keep this feedback in a common place

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blocked Unable to make progress due to some dependency
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Mega Issue: Karpenter doesnt support custom resources requests/limit
3 participants