Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Service catalog to provision vm using embedded workflows #23120

Closed
Fryguy opened this issue Jul 31, 2024 Discussed in #23107 · 8 comments
Closed

Service catalog to provision vm using embedded workflows #23120

Fryguy opened this issue Jul 31, 2024 Discussed in #23107 · 8 comments
Assignees
Labels

Comments

@Fryguy
Copy link
Member

Fryguy commented Jul 31, 2024

Discussed in https://github.com/orgs/ManageIQ/discussions/23107

Originally posted by RADHIKA500 July 25, 2024
I am trying to provision VMWare VM using Embedded workflows using reference asl code from below link:
https://github.com/ManageIQ/workflows-examples/tree/master/provision-vm-service

The last portion of the workflow (power on vm) seems to be running into some issue and is not completing. It looks like it is neither failing or succeeding, there are continuous retries to the "PowerOnVM".
As a result, the CPU and memory usage of pod "1-automation" is going extremely high.

Just deleting the service request is not terminating execution of workflow. Any help on how to proceed would be helpful.

@agrare
Copy link
Member

agrare commented Jul 31, 2024

ManageIQ/floe#252 should resolve the issue with the image names

@agrare
Copy link
Member

agrare commented Jul 31, 2024

There is another issue where an error hit during a step doesn't appear to cause the workflow to end in a failure so it gets constantly retried which is the cause of the high cpu usage. It isn't spinning on a single workflow but once you get enough workflows running it does take up all of the available resources.

@Fryguy
Copy link
Member Author

Fryguy commented Jul 31, 2024

Merged ManageIQ/floe#252

@agrare
Copy link
Member

agrare commented Aug 1, 2024

Released ManageIQ/floe#252 in v0.12.0

@agrare
Copy link
Member

agrare commented Aug 1, 2024

@RADHIKA500 If you want to clear the running workflows to stop the 100% cpu issue you can open a rails console and run the following query

oc get pods | grep '1-generic' # just pick any pod that has a database connection
oc rsh POD_NAME
cd /var/www/miq/vmdb
source ./container_env
rails c
ManageIQ::Providers::Workflows::AutomationManager::WorkflowInstance.pluck(:id, :status) # I'm curious what the status is for these workflows so if you can post this output that'd be helpful
ManageIQ::Providers::Workflows::AutomationManager::WorkflowInstance.destroy_all

@agrare
Copy link
Member

agrare commented Aug 1, 2024

I ran a workflow as a Provisioning Entrypoint with a state that failed and it didn't requeue constantly it marked the workflow and service request as failed.

@RADHIKA500
Copy link

RADHIKA500 commented Aug 2, 2024

WorkflowInstance.pluck.log

Hi @agrare

I've pasted the output as you asked.

I haven't run "ManageIQ::Providers::Workflows::AutomationManager::WorkflowInstance.destroy_all" yet, will wait for you to analyze the output and then will run it.

@agrare
Copy link
Member

agrare commented Aug 7, 2024

Thanks @RADHIKA500, I was finally able to reproduce this locally by using an older version specifically floe v0.9.0. I haven't done a bisect to find the exact commit that fixed it but I believe it is ManageIQ/floe#167 which changed how errors from task states are handled.

On the version you're using we set a context.state["Error"] but the state is "finished" before the error is set so it looks like it is still running to the workflow.

This is fixed in the radjabov branch as of commit 049c589d729098585d04fdb638726fc5d0f9e740 https://github.com/ManageIQ/manageiq-providers-workflows/pull/76/files#diff-413b40311487d36df9e66864d193cf0ce3315bb41cd9078a1a192f8d020318aeR22

I'm going to close this but please reopen if you're able to reproduce this error on master.

@agrare agrare closed this as completed Aug 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants