Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set up resource monitoring for tasks of cromwell runs #25

Open
malachig opened this issue Oct 7, 2022 · 5 comments · May be fixed by #28
Open

Set up resource monitoring for tasks of cromwell runs #25

malachig opened this issue Oct 7, 2022 · 5 comments · May be fixed by #28

Comments

@malachig
Copy link
Member

malachig commented Oct 7, 2022

The Cromwell docs describe the capability to have monitoring for every step of your workflow. The docs I have been able to find are limited:

https://cromwell.readthedocs.io/en/stable/wf_options/Google/
Which states:

Specifies a GCS URL to a script that will be invoked prior to the user command being run. For example, if the value for monitoring_script is "gs://bucket/script.sh", it will be invoked as ./script.sh > monitoring.log &. The value monitoring.log file will be automatically de-localized.

https://cromwell.readthedocs.io/en/latest/backends/Google/
Which states:

In order to monitor metrics (CPU, Memory, Disk usage...) about the VM during Call Runtime, a workflow option can be used to specify the path to a script that will run in the background and write its output to a log file.

{
  "monitoring_script": "gs://cromwell/monitoring/script.sh"
}

The output of this script will be written to a monitoring.log file that will be available in the call gcs bucket when the call completes. This feature is meant to run a script in the background during long-running processes. It's possible that if the task is very short that the log file does not flush before de-localization happens and you will end up with a zero byte file.

@malachig
Copy link
Member Author

malachig commented Oct 7, 2022

In order to test this idea in its simplest form I created an example monitor script and tested it on an active google instance that was running a compute intensive step.
https://github.com/griffithlab/cloud-workflows/blob/main/scripts/monitor.sh

I manually logged into the GCP instance using the Google console to test it.

To test on a cromwell run I am attempting the following:

  1. I placed this script in our public google bucket: gs://griffith-lab-workflow-inputs/scripts/monitor.sh

  2. I started a cromwell VM and edited the workflow options config file on this system: sudo vim /shared/cromwell/workflow_options.json. I added the following block to that (at the top level, not nested in another block):

  "monitoring_script": "gs://griffith-lab-workflow-inputs/scripts/monitor.sh"

According to the Cromwell docs, if you modify this conf file you do NOT need to restart Cromwell. These settings should take effect with the next workflow you run.
https://cromwell.readthedocs.io/en/stable/wf_options/Overview/

However, if you DID need to restart Cromwell, based on the startup script (https://github.com/griffithlab/cloud-workflows/blob/main/manual-workflows/server_startup.py) I think you could do: sudo systemctl start cromwell

@malachig
Copy link
Member Author

malachig commented Oct 7, 2022

If the my testing works as expected and we want to add this so it happens automatically, then I think it would be added here:
https://github.com/griffithlab/cloud-workflows/blob/3822d66e6a0423ade093f48f9c2535b07adfbb6a/manual-workflows/resources.sh#L135-L143

@malachig
Copy link
Member Author

malachig commented Oct 7, 2022

In my first test I looked in a gcs_localization.sh script for an individual task and I now see this:

# Localize singleton file 'gs://griffith-lab-workflow-inputs/scripts/monitor.sh' to '/cromwell_root/monitoring.sh'.
singleton_file_to_localize_573998f91cb96365bcb9696ac6baf714=(
  "griffith-lab"
  "3"
  "gs://griffith-lab-workflow-inputs/scripts/monitor.sh"
  "/cromwell_root/monitoring.sh"
)

localize_singleton_file "${singleton_file_to_localize_573998f91cb96365bcb9696ac6baf714[@]}"

@malachig
Copy link
Member Author

malachig commented Oct 7, 2022

And I see output like this (saved in the bucket as: monitoring.log) in a step that completed very quickly:

Seconds	Memory_Percent	Memory_Percent_Peak	Memory_GB	Memory_GB_Peak	Disk_Percent	Disk_Percent_Peak	Disk_GB	Disk_GB_Peak	CPU_Percent	CPU_Percent_Peak
0	8.86	8.86	0.34	0.34	23.00	23.00	7.43	7.43	2.29	2.29

@malachig
Copy link
Member Author

This seems to be working as expected. To activate monitoring one can simply add this to /shared/cromwell/workflow_options.json on the head Cromwell VM:

  "monitoring_script": "gs://griffith-lab-workflow-inputs/scripts/monitor.sh"

Results for each task appear in the Google Bucket for each task result in a file named: monitoring.log

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant