Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tests: refactor build and deploy test case #179

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

DnPlas
Copy link
Contributor

@DnPlas DnPlas commented Feb 12, 2024

Remove the assertion that tests the kubeflow-dashboard-operator goes to Blocked when the relation with kubeflow-profiles is not there. This assertion is already covered in unit tests, and it is only adding noise at deployment time.

This commit ensures the dependencies like charms and relations are deployed on time which will give time for everything to be set up before moving ahead with other test cases.

Remove the assertion that tests the kubeflow-dashboard-operator goes to Blocked
when the relation with kubeflow-profiles is not there. This assertion is already
covered in unit tests, and it is only adding noise at deployment time.
This commit ensures the dependencies like charms and relations are deployed on time
which will give time to everything to be setup before moving ahead with other test cases.
@ca-scribner
Copy link
Contributor

@DnPlas can you give more context? afaict 178 is failing for legitimate reasons. That charm is hitting error during a relation-changed event, not blocked.

In principle I think I'm with you about how a unit test is sufficient to test if we go Blocked when missing a relation. But now I'm wondering if by removing this wait_for_idle we end up hiding another error entirely.

@DnPlas
Copy link
Contributor Author

DnPlas commented Feb 13, 2024

@DnPlas can you give more context? afaict 178 is failing for legitimate reasons. That charm is hitting error during a relation-changed event, not blocked.

Yes, the reason why #178 is failing (and potentially any other CI) is because the kubeflow-profile-controller is in maintenance mode, in fact Waiting for pod startup to complete, which can cause the kubeflow-dashboard to fail with hook failed: "kubeflow-profiles-relation-changed" because the dashboard charm depends on the profiles info (see here). By deploying both kubeflow-dashboard and kubeflow-profiles at the same time and wait for idle for BOTH charms, we avoid this.

In principle I think I'm with you about how a unit test is sufficient to test if we go Blocked when missing a relation. But now I'm wondering if by removing this wait_for_idle we end up hiding another error entirely.

It's still there, we are actually waiting for BOTH charms to be active and idle with a timeout of 600s.

@ca-scribner
Copy link
Contributor

The CI is terminating because kubeflow-dashboard is in error:

kubeflow-dashboard/0* error idle 10.1.34.139 hook failed: "kubeflow-profiles-relation-changed"

juju.errors.JujuUnitError: Unit in error: kubeflow-dashboard/0

It should not be possible that this sort of error be caused by kubeflow-profiles in any way (whether it is in maintenance mode, etc). So what I'm saying is that the the CI is catching a legitimate error here where kubeflow-dashboard fails to handle the relation-changed event, and changing the CI to suppress it is not what we should do

Copy link
Contributor

@ca-scribner ca-scribner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After some discussions and looking into #178, this now lgtm. #178 had a separate error so there is nothing the current PR hides on us that we need to worry about, and having an extra wait_for_idle/status check here isn't needed as @DnPlas mentions since we can check this in unit tests.

The only thing I'd change here is the model config setting. Let's handle that as a separate PR, so it is easier to trace if something goes wrong.

@DnPlas
Copy link
Contributor Author

DnPlas commented Feb 15, 2024

Thanks @ca-scribner for the review, agreed on the model config setting. Will update the PR shortly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants