diff --git a/docs/source/prefect.rst b/docs/source/prefect.rst index 6d09cc30..4980f868 100644 --- a/docs/source/prefect.rst +++ b/docs/source/prefect.rst @@ -1,31 +1,19 @@ Managing Prefect Server ======================= -Prefect Server is running, but workpool is not available --------------------------------------------------------- - -In HPC, you can use the following command to create a work pool. - -.. code-block:: - - prefect work-pool create "workpool" - -**Manual instruction:** - -If the prefect server is running, but the workpool is not available, then you can create a workpool by going to the website where the prefect server is hosted. Go to `Work Pools` tab and create a workpool with the name `workpool`. This is the name of the workpool that is defined in [prefect.yaml > definitions > work_pools > name](https://github.com/niaid/image_portal_workflows/pull/353/files#diff-b49a6f022232810a70f1a0c2feffbbe84d018b2418a7996e52430c6063ada3a3R23) file. Continuous Deployment (dev to qa to prod) ----------------------------------------- -Assuming that `dev` environment is working as expected, we can promote the environment to `qa` and `prod` as well. +Server images are promotoed from one environment to the next, i.e. `dev` -> `qa` -> `prod`. -The first thing we need to do is promote the image from `dev` to `qa`. +For example to promote the image from `dev` to `qa`: .. code-block:: spaces task -f hedwig.spaces-solution.yaml promote-image -- dev qa -Afterwards, we can deploy the aws infrastructure for `qa` as, +We can then deploy the aws infrastructure for `qa`: .. code-block:: @@ -52,7 +40,18 @@ Make sure the configurations are correct: # change it to dev or qa, based on your environment -2. Check prefect config with view +2. Check HPC worker daemon: + + .. code-block:: + + systemctl status hedwig_listener_prod + + + Certain scenarios require the deamon to be restarted or reloaded, although typically we do not need to perform this step. (see helper_scripts/.service file) The `systemctl` should restart the worker if killed or on crash. + + + +3. Check prefect config with view .. code-block:: @@ -62,7 +61,7 @@ Make sure the configurations are correct: export PREFECT_API_KEY=xyz export PREFECT_API_URL=abc.com -3. Deploy flows with prefect deploy +4. Deploy flows with prefect deploy .. code-block:: @@ -71,6 +70,23 @@ Make sure the configurations are correct: # However, this will also deploy pytest_runner workflow in other envs (where it's not needed) # prefect deploy --all -4. Run worker (properly via the helper_scripts/.service file) - The service files should restarts the worker when killed. Normally, we would need to do this step + +Troubleshooting: +-------------------------------------------------------- + +- Prefect Server is running, but workpool is not available + + In HPC, you can use the following command to create a work pool. + `prefect work-pool create "workpool"` + Enssure prefect server is running, and workpool is not available. If not create a workpool by going to the website where the prefect server is hosted. Go to `Work Pools` tab and create a workpool with the name `workpool`. This is the name of the workpool that is defined in prefect.yaml > definitions > work_pools > name file. + +- IMOD unable to find `env` + +.. code-block:: + + Unable to run command. + Cannot run program "env" (in directory "?"): error=2, No such file or directory + +Note `directory "?"`, this implies that something is trying to run in a directory that does not exist. Ensure that the daemon is taken down, ensure that `ps aux | grep hedwig` does not list any processes that may be running, ensure that the service file is correct, ensure that the daemon is `reloaded` and `started`. + diff --git a/helper_scripts/hedwig_listener_prod.service b/helper_scripts/hedwig_listener_prod.service index c60bb1f9..abf124d5 100644 --- a/helper_scripts/hedwig_listener_prod.service +++ b/helper_scripts/hedwig_listener_prod.service @@ -1,16 +1,24 @@ [Unit] -Description=Starts the Production listener "Agent" which reaches out to workflow API. +Description=Starts the Production listener worker, which reaches out to workflow API. After=network.target + [Service] Type=simple User=hedwig_prod Group=hedwig_prod -ExecStart=/gs1/home/hedwig_prod/image_portal_workflows/helper_scripts/hedwig_reg_listen.sh listen -WorkingDirectory=/gs1/home/hedwig_prod Environment="HEDWIG_ENV=prod" Environment="REQUESTS_CA_BUNDLE=/etc/pki/tls/certs/ca-bundle.crt" +Environment="PREFECT_API_URL=https://prefect2.hedwig-workflow-api.niaidprod.net/api" +Environment="IMOD_DIR=/opt/rml/imod" +WorkingDirectory=/gs1/home/hedwig_prod/image_portal_workflows +# current setting on prod +# WorkingDirectory=/gs1/home/hedwig_prod + +ExecStart=/gs1/home/hedwig_prod/prod/bin/prefect worker start --pool workpool +Restart=always +RestartSec="60s" [Install] WantedBy=multi-user.target