Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Ad-hoc Operations for Server Management #77

Closed
afritzler opened this issue Jul 8, 2024 · 8 comments · Fixed by #125
Closed

Support Ad-hoc Operations for Server Management #77

afritzler opened this issue Jul 8, 2024 · 8 comments · Fixed by #125
Assignees
Labels

Comments

@afritzler
Copy link
Member

Summary:
We need to extend our declarative model with resources like Server and ServerClaim to support ad-hoc operations such as reboot or power cycle of a server. This enhancement aims to address how we can incorporate imperative operations within the declarative Kubernetes API model.

Background:
Currently, our API allows managing bare metal servers using Server and ServerClaim resources. However, we face challenges in supporting ad-hoc operations, which are inherently imperative and contradict the declarative nature of the Kubernetes API model.

Proposed Solutions:

  1. Annotations-based Approach:

    • Allow ad-hoc operations by adding annotations to the Server resource.
    • Example: metal.ironcore.dev/operations: PowerCycle.
    • The reconciler will check for the presence of this annotation and perform the corresponding operation, such as power cycling the server.

    Pros:

    • Simple to implement and integrate with existing CRD-based models.
    • Minimal changes required to the existing API structure.

    Cons:

    • Annotations are not a first-class citizen for defining operations and might lead to less discoverable and manageable API.
    • Handling multiple concurrent operations or complex workflows might become cumbersome.
  2. Aggregated API Server with SubResources:

    • Transition from a CRD-based model to an aggregated API server.
    • Define custom subresources like PowerCycle and Reboot for the Server resource.
    • Example: POST /apis/metal.ironcore.dev/v1/namespaces/{namespace}/servers/{name}/powercycle.

    Pros:

    • Provides a more RESTful and discoverable way to define and manage imperative operations.
    • Subresources can encapsulate complex logic and workflows better.

    Cons:

    • Requires significant refactoring to migrate to an aggregated API server model.
    • Increased complexity in terms of deployment and maintenance.

Request for Comments:
We seek feedback on the following points:

  • Which approach (Annotations or Aggregated API Server) is more suitable for our use case?
  • Any potential challenges or alternatives that we should consider.
  • Best practices for implementing imperative operations in a declarative system.

Next Steps:
Based on the feedback, we will:

  • Finalize the approach for implementing ad-hoc operations.
  • Create a detailed implementation plan.
  • Assign tasks and start the development process.

Additional Context:

  • Link to relevant documentation or previous discussions.
  • Examples of similar implementations in other projects (if any).

Please provide your feedback and suggestions to help us move forward with this enhancement.

@Nuckal777
Copy link
Contributor

Just as an information, metal3's annotation design allows multiple controllers to set the power state independently: https://book.metal3.io/bmo/reboot_annotation#phased-reboot

@stefanhipfel
Copy link
Contributor

I would vote for option 1

@afritzler afritzler moved this from Backlog to Ready in Metal Automation Sep 4, 2024
@afritzler
Copy link
Member Author

Then I would suggest, that we proceed with option one.

@stefanhipfel
Copy link
Contributor

do we have a list of operations we want to support?

@Nuckal777
Copy link
Contributor

At least for reboot and power cycle there seems to some overlap with #76, which would be a declarative solution. I expect the implementation not arriving soon. In the meanwhile going with an annotation is fine, because it's not part of the API contract and can be removed with ease.

@stefanhipfel stefanhipfel self-assigned this Sep 4, 2024
@stefanhipfel
Copy link
Contributor

stefanhipfel commented Sep 9, 2024

@afritzler did I understand ad-hoc correctly that those operations should be executed early in the reconcile loop, no matter the server state and spec, or do we want to restrict them in some way?

@afritzler
Copy link
Member Author

The question is: do we want to PowerCycle if the Server is in PowerState == Off? I would suggest that we only do a reboot if the Server is On. If it is powered off, it is a no-op.

@stefanhipfel
Copy link
Contributor

stefanhipfel commented Sep 9, 2024

I would hope that the servers bmc would then just ignore it in this case. If not, maybe it was something the enduser wanted to do?

For me the more important question is:
when is the moment we execute those operations. If someone sets operation=PowerCycle, should the operation be executed immediately, ignoring anything else that is currently happening in the reconcile loop.

@stefanhipfel stefanhipfel linked a pull request Sep 12, 2024 that will close this issue
@github-project-automation github-project-automation bot moved this from In progress to Done in Metal Automation Sep 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

3 participants