Enable health status propagation for managed CRs #707

taylormgeorge91 · 2021-07-12T15:10:06Z

/kind feature

Describe the solution you'd like
[A clear and concise description of what you want to happen.]
As an adopter of ODLM, I would like to see the health status of the operands (CustomResources) that are managed for me by ODLM.

Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]
Presently, our use of ODLM includes the OperandRequest resource which is used in conjunction with OperandConfig and OperandRegistry resources to manage the lifecylce of our operators and related CRs. In utilizing this API, we cannot obtain the necessary health information about the requested workloads. For example, are the workloads running, are the workloads available, and are the workloads degraded.

In order to overcome this issue, our controller would have to do one of the following:

Add support for additional CRDs/APIs that must be interfaced with to obtain status, with specific logic to relate to the ODLM OperandRequest instances.
Add support for getting Unstructured resources by GVK and namespacedname, and processing the conditions in that manner.

It would seem that option 2 would best fit as a common feature for users of ODLM if a given set of supported status structures are utilized by the operand.

Reference

See comments here: #622 (comment)

The text was updated successfully, but these errors were encountered:

taylormgeorge91 · 2021-07-12T15:10:14Z

@gyliu513 Interested in your comments here.

gyliu513 · 2021-07-13T02:36:19Z

@taylormgeorge91 what do you mean propogation for managed CRs? I think here the requirement is collecting the health condition for the managed operators/operands by ODLM? Or only check the status of the CRs for each operator?

@horis233 ^^

taylormgeorge91 · 2021-07-13T12:44:23Z

@gyliu513 I mean that there are various CRs created by ODLM as part of fulfilling an OperandRequest.

The owners/controllers of those downstream CRs should be reporting the health of the workload through the CRs status.

Presently, ODLM will not inspect or propagate this health status upward into the OperandRequest as part of the status of that object.

Since the lead operator only interacts with the ODLM resources, such as OperandRequest that is where it would look for the health status.

taylormgeorge91 · 2021-07-13T12:45:25Z

As such, the lead operator reports that CP4WAIOps is ready when in actuality the workloads are not available yet. The CRs have only been created when they move to ready in the OperandRequest. In order to close this gap, we need to be able to obtain the health status information from the downstream CRs managed by ODLM, while not exposing tenant or other sensitive information that may be in the status (especially if it is across namespaces).

horis233 · 2021-07-13T13:53:18Z

@taylormgeorge91

I can feel the value of this feature, but service health status/monitoring is out of the scope of ODLM. I believe what ODLM can focus on is how to check the status of CRs when they are created or updated.

Since ODLM has no idea which sign means the service is ready, users need to input the health check proof when they create OperandConfig, for example

    - name: ibm-mongodb-operator
      spec:
        MongoDB: {}
        operandRequest: {}
      readyProbe:
        - kind: MongoDB 
          probe: status.phase.running
          status:  Running

Then ODLM can check if the status.phase.running field in the mongodb CR is Running to make sure if it is actually ready.

gyliu513 · 2021-07-14T04:08:33Z

@horis233 I think your proposal also make sense, one question here is can we set the readyProbe: section as default by ODLM and leverage this to check if the downstream CR is ready?

@taylormgeorge91 comments?

taylormgeorge91 · 2021-07-14T12:23:41Z

@gyliu513 @horis233 Correct, what I am asking in the issue body is for health status information from the managed CRs through ODLM operandrequest.

Note that we can make operandRequest outside of the OperandConfig spec so we would need to consider that.

Also, can there be default known probes that ODLM tries if we have agreed on them through Cloud Pak Platform? That way, if present, it will be represented and the client doesn't have to define probes, potentially multiple, for each? The probe spec could allow for more custom approach and for legacy or community components too that may not have the defaults.

taylormgeorge91 · 2021-07-14T12:27:48Z

See the Example: Condition Types section here: https://playbook.cloudpaklab.ibm.com/orphaned-pages/work-in-progress/operator-custom-resource-status/

taylormgeorge91 · 2021-07-14T12:30:37Z

can feel the value of this feature, but service health status/monitoring is out of the scope of ODLM. I believe what ODLM can focus on is how to check the status of CRs when they are created or updated.

I am not requesting for ODLM to perform service health monitoring, but instead to assist in surfacing information that would be available in a CR managed by ODLM as you explain on checking status. That is what is meant by propagation of the managed CRs status up through the OperandRequest(s).

If ODLM is already watching the managed CRs, it can have a predicate for if the appropriate status properties have changed and make an update if needed, ignoring the irrelevant status updates.

horis233 · 2021-07-14T13:49:00Z

If ODLM is already watching the managed CRs, it can have a predicate for if the appropriate status properties have changed and make an update if needed, ignoring the irrelevant status updates.

ODLM doesn't watch CRs, it can't watch unstructured objects.

Also, can there be default known probes that ODLM tries if we have agreed on them through Cloud Pak Platform? That way, if present, it will be represented and the client doesn't have to define probes, potentially multiple, for each? The probe spec could allow for more custom approach and for legacy or community components too that may not have the defaults.

We need to consider compatibility. ODLM can provide a default probe, but only if users enable it (probably by operandrequest), it will take effect. If we don't have a default probe, then ODLM can just check if the probe in the OperandConfig decides to enable or disable it. Thus, in my personal perspective, I prefer not to set the default probe. Then we can leave the logic for CR to the OperandConfig (If users use OperandRequest to create CR, we can pick the same field in it).

Also, I don't think the status.condition is a good default probe because ODLM could never know what is the meaning of status.condition.reason and status.condition.type

ibm-ci-bot added the kind/feature label Jul 12, 2021

gyliu513 added the cloudpak-requirement label Jul 13, 2021

taylormgeorge91 changed the title ~~Enable health condition propogation for managed CRs~~ Enable health status propogation for managed CRs Jul 13, 2021

taylormgeorge91 changed the title ~~Enable health status propogation for managed CRs~~ Enable health status propagation for managed CRs Jul 13, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable health status propagation for managed CRs #707

Enable health status propagation for managed CRs #707

taylormgeorge91 commented Jul 12, 2021

taylormgeorge91 commented Jul 12, 2021

gyliu513 commented Jul 13, 2021

taylormgeorge91 commented Jul 13, 2021

taylormgeorge91 commented Jul 13, 2021 •

edited

Loading

horis233 commented Jul 13, 2021 •

edited by gyliu513

Loading

gyliu513 commented Jul 14, 2021

taylormgeorge91 commented Jul 14, 2021 •

edited

Loading

taylormgeorge91 commented Jul 14, 2021

taylormgeorge91 commented Jul 14, 2021 •

edited

Loading

horis233 commented Jul 14, 2021

Enable health status propagation for managed CRs #707

Enable health status propagation for managed CRs #707

Comments

taylormgeorge91 commented Jul 12, 2021

Reference

taylormgeorge91 commented Jul 12, 2021

gyliu513 commented Jul 13, 2021

taylormgeorge91 commented Jul 13, 2021

taylormgeorge91 commented Jul 13, 2021 • edited Loading

horis233 commented Jul 13, 2021 • edited by gyliu513 Loading

gyliu513 commented Jul 14, 2021

taylormgeorge91 commented Jul 14, 2021 • edited Loading

taylormgeorge91 commented Jul 14, 2021

taylormgeorge91 commented Jul 14, 2021 • edited Loading

horis233 commented Jul 14, 2021

taylormgeorge91 commented Jul 13, 2021 •

edited

Loading

horis233 commented Jul 13, 2021 •

edited by gyliu513

Loading

taylormgeorge91 commented Jul 14, 2021 •

edited

Loading

taylormgeorge91 commented Jul 14, 2021 •

edited

Loading