Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

updateing docs & prod service file #465

Merged
merged 4 commits into from
Mar 22, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
54 changes: 35 additions & 19 deletions docs/source/prefect.rst
Original file line number Diff line number Diff line change
@@ -1,31 +1,19 @@
Managing Prefect Server
=======================

Prefect Server is running, but workpool is not available
--------------------------------------------------------

In HPC, you can use the following command to create a work pool.

.. code-block::

prefect work-pool create "workpool"

**Manual instruction:**

If the prefect server is running, but the workpool is not available, then you can create a workpool by going to the website where the prefect server is hosted. Go to `Work Pools` tab and create a workpool with the name `workpool`. This is the name of the workpool that is defined in [prefect.yaml > definitions > work_pools > name](https://github.com/niaid/image_portal_workflows/pull/353/files#diff-b49a6f022232810a70f1a0c2feffbbe84d018b2418a7996e52430c6063ada3a3R23) file.

Continuous Deployment (dev to qa to prod)
-----------------------------------------

Assuming that `dev` environment is working as expected, we can promote the environment to `qa` and `prod` as well.
Server images are promotoed from one environment to the next, i.e. `dev` -> `qa` -> `prod`.

The first thing we need to do is promote the image from `dev` to `qa`.
For example to promote the image from `dev` to `qa`:

.. code-block::

spaces task -f hedwig.spaces-solution.yaml promote-image -- dev qa

Afterwards, we can deploy the aws infrastructure for `qa` as,
We can then deploy the aws infrastructure for `qa`:

.. code-block::

Expand All @@ -52,7 +40,18 @@ Make sure the configurations are correct:

# change it to dev or qa, based on your environment

2. Check prefect config with view
2. Check HPC worker daemon:

.. code-block::

systemctl status hedwig_listener_prod


Certain scenarios require the deamon to be restarted or reloaded, although typically we do not need to perform this step. (see helper_scripts/.service file) The `systemctl` should restart the worker if killed or on crash.



3. Check prefect config with view

.. code-block::

Expand All @@ -62,7 +61,7 @@ Make sure the configurations are correct:
export PREFECT_API_KEY=xyz
export PREFECT_API_URL=abc.com

3. Deploy flows with prefect deploy
4. Deploy flows with prefect deploy

.. code-block::

Expand All @@ -71,6 +70,23 @@ Make sure the configurations are correct:
# However, this will also deploy pytest_runner workflow in other envs (where it's not needed)
# prefect deploy --all

4. Run worker (properly via the helper_scripts/.service file)

The service files should restarts the worker when killed. Normally, we would need to do this step

Troubleshooting:
--------------------------------------------------------

- Prefect Server is running, but workpool is not available

In HPC, you can use the following command to create a work pool.
`prefect work-pool create "workpool"`
Enssure prefect server is running, and workpool is not available. If not create a workpool by going to the website where the prefect server is hosted. Go to `Work Pools` tab and create a workpool with the name `workpool`. This is the name of the workpool that is defined in prefect.yaml > definitions > work_pools > name file.

- IMOD unable to find `env`

.. code-block::

Unable to run command.
Cannot run program "env" (in directory "?"): error=2, No such file or directory

Note `directory "?"`, this implies that something is trying to run in a directory that does not exist. Ensure that the daemon is taken down, ensure that `ps aux | grep hedwig` does not list any processes that may be running, ensure that the service file is correct, ensure that the daemon is `reloaded` and `started`.

14 changes: 11 additions & 3 deletions helper_scripts/hedwig_listener_prod.service
Original file line number Diff line number Diff line change
@@ -1,16 +1,24 @@

[Unit]
Description=Starts the Production listener "Agent" which reaches out to workflow API.
Description=Starts the Production listener worker, which reaches out to workflow API.
After=network.target


[Service]
Type=simple
User=hedwig_prod
Group=hedwig_prod
ExecStart=/gs1/home/hedwig_prod/image_portal_workflows/helper_scripts/hedwig_reg_listen.sh listen
WorkingDirectory=/gs1/home/hedwig_prod
Environment="HEDWIG_ENV=prod"
Environment="REQUESTS_CA_BUNDLE=/etc/pki/tls/certs/ca-bundle.crt"
Environment="PREFECT_API_URL=https://prefect2.hedwig-workflow-api.niaidprod.net/api"
Environment="IMOD_DIR=/opt/rml/imod"
WorkingDirectory=/gs1/home/hedwig_prod/image_portal_workflows
# current setting on prod
# WorkingDirectory=/gs1/home/hedwig_prod

ExecStart=/gs1/home/hedwig_prod/prod/bin/prefect worker start --pool workpool
Restart=always
RestartSec="60s"

[Install]
WantedBy=multi-user.target
Loading