Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable health status propagation for managed CRs #707

Open
taylormgeorge91 opened this issue Jul 12, 2021 · 10 comments
Open

Enable health status propagation for managed CRs #707

taylormgeorge91 opened this issue Jul 12, 2021 · 10 comments

Comments

@taylormgeorge91
Copy link
Member

/kind feature

Describe the solution you'd like
[A clear and concise description of what you want to happen.]
As an adopter of ODLM, I would like to see the health status of the operands (CustomResources) that are managed for me by ODLM.

Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]
Presently, our use of ODLM includes the OperandRequest resource which is used in conjunction with OperandConfig and OperandRegistry resources to manage the lifecylce of our operators and related CRs. In utilizing this API, we cannot obtain the necessary health information about the requested workloads. For example, are the workloads running, are the workloads available, and are the workloads degraded.

In order to overcome this issue, our controller would have to do one of the following:

  1. Add support for additional CRDs/APIs that must be interfaced with to obtain status, with specific logic to relate to the ODLM OperandRequest instances.
  2. Add support for getting Unstructured resources by GVK and namespacedname, and processing the conditions in that manner.

It would seem that option 2 would best fit as a common feature for users of ODLM if a given set of supported status structures are utilized by the operand.

Reference

See comments here: #622 (comment)

@taylormgeorge91
Copy link
Member Author

@gyliu513 Interested in your comments here.

@gyliu513
Copy link
Member

@taylormgeorge91 what do you mean propogation for managed CRs? I think here the requirement is collecting the health condition for the managed operators/operands by ODLM? Or only check the status of the CRs for each operator?

@horis233 ^^

@taylormgeorge91 taylormgeorge91 changed the title Enable health condition propogation for managed CRs Enable health status propogation for managed CRs Jul 13, 2021
@taylormgeorge91 taylormgeorge91 changed the title Enable health status propogation for managed CRs Enable health status propagation for managed CRs Jul 13, 2021
@taylormgeorge91
Copy link
Member Author

@gyliu513 I mean that there are various CRs created by ODLM as part of fulfilling an OperandRequest.

The owners/controllers of those downstream CRs should be reporting the health of the workload through the CRs status.

Presently, ODLM will not inspect or propagate this health status upward into the OperandRequest as part of the status of that object.

Since the lead operator only interacts with the ODLM resources, such as OperandRequest that is where it would look for the health status.

@taylormgeorge91
Copy link
Member Author

taylormgeorge91 commented Jul 13, 2021

As such, the lead operator reports that CP4WAIOps is ready when in actuality the workloads are not available yet. The CRs have only been created when they move to ready in the OperandRequest. In order to close this gap, we need to be able to obtain the health status information from the downstream CRs managed by ODLM, while not exposing tenant or other sensitive information that may be in the status (especially if it is across namespaces).

@horis233
Copy link
Contributor

horis233 commented Jul 13, 2021

@taylormgeorge91

I can feel the value of this feature, but service health status/monitoring is out of the scope of ODLM. I believe what ODLM can focus on is how to check the status of CRs when they are created or updated.

Since ODLM has no idea which sign means the service is ready, users need to input the health check proof when they create OperandConfig, for example

    - name: ibm-mongodb-operator
      spec:
        MongoDB: {}
        operandRequest: {}
      readyProbe:
        - kind: MongoDB 
          probe: status.phase.running
          status:  Running

Then ODLM can check if the status.phase.running field in the mongodb CR is Running to make sure if it is actually ready.

@gyliu513
Copy link
Member

@horis233 I think your proposal also make sense, one question here is can we set the readyProbe: section as default by ODLM and leverage this to check if the downstream CR is ready?

@taylormgeorge91 comments?

@taylormgeorge91
Copy link
Member Author

taylormgeorge91 commented Jul 14, 2021

@gyliu513 @horis233 Correct, what I am asking in the issue body is for health status information from the managed CRs through ODLM operandrequest.

Note that we can make operandRequest outside of the OperandConfig spec so we would need to consider that.

Also, can there be default known probes that ODLM tries if we have agreed on them through Cloud Pak Platform? That way, if present, it will be represented and the client doesn't have to define probes, potentially multiple, for each? The probe spec could allow for more custom approach and for legacy or community components too that may not have the defaults.

@taylormgeorge91
Copy link
Member Author

@taylormgeorge91
Copy link
Member Author

taylormgeorge91 commented Jul 14, 2021

can feel the value of this feature, but service health status/monitoring is out of the scope of ODLM. I believe what ODLM can focus on is how to check the status of CRs when they are created or updated.

I am not requesting for ODLM to perform service health monitoring, but instead to assist in surfacing information that would be available in a CR managed by ODLM as you explain on checking status. That is what is meant by propagation of the managed CRs status up through the OperandRequest(s).

If ODLM is already watching the managed CRs, it can have a predicate for if the appropriate status properties have changed and make an update if needed, ignoring the irrelevant status updates.

@horis233
Copy link
Contributor

If ODLM is already watching the managed CRs, it can have a predicate for if the appropriate status properties have changed and make an update if needed, ignoring the irrelevant status updates.

ODLM doesn't watch CRs, it can't watch unstructured objects.

Also, can there be default known probes that ODLM tries if we have agreed on them through Cloud Pak Platform? That way, if present, it will be represented and the client doesn't have to define probes, potentially multiple, for each? The probe spec could allow for more custom approach and for legacy or community components too that may not have the defaults.

We need to consider compatibility. ODLM can provide a default probe, but only if users enable it (probably by operandrequest), it will take effect. If we don't have a default probe, then ODLM can just check if the probe in the OperandConfig decides to enable or disable it. Thus, in my personal perspective, I prefer not to set the default probe. Then we can leave the logic for CR to the OperandConfig (If users use OperandRequest to create CR, we can pick the same field in it).

Also, I don't think the status.condition is a good default probe because ODLM could never know what is the meaning of status.condition.reason and status.condition.type

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants