From 2a7a3122e7d559a5b933ae7d8a55d862e7b9f095 Mon Sep 17 00:00:00 2001 From: Stefan Negru Date: Mon, 11 Dec 2023 14:42:58 +0200 Subject: [PATCH] add Troubleshooting guide --- docs/dictionary/wordlist.txt | 1 + docs/guides/troubleshooting.md | 33 +++++++++++++++++---------------- mkdocs.yml | 1 + 3 files changed, 19 insertions(+), 16 deletions(-) diff --git a/docs/dictionary/wordlist.txt b/docs/dictionary/wordlist.txt index 5249e39..91ddccd 100644 --- a/docs/dictionary/wordlist.txt +++ b/docs/dictionary/wordlist.txt @@ -296,3 +296,4 @@ HOSTKEY PEMKEYPASS PEMKEYPATH SYNCPUBKEYPATH +rabbitmqctl diff --git a/docs/guides/troubleshooting.md b/docs/guides/troubleshooting.md index 85d5b58..2f89c98 100644 --- a/docs/guides/troubleshooting.md +++ b/docs/guides/troubleshooting.md @@ -1,31 +1,32 @@ # Troubleshooting -TODO: -This guide is a stub and has yet to be finished. -If you have feedback to give on the content you would like to see, please contact us on -[github](https://github.com/neicnordic/neic-sda)! - In this guide we aim to give some general tips on how to troubleshoot and restore services to working order. ## After deployment checklist -After having deployed the SDA services in a Federated setup, the following steps can be followed to ensure that everything is up and running correctly. +After having deployed the SDA services in a `FederatedEGA` setup, the following steps can be followed to ensure that everything is up and running correctly. ### Services running -The first step is to verify that the services are up and running and the credentials are valid. Make sure that, +The first step is to verify that the services are up and running and the credentials are valid. Make sure that: - credentials for access to RabbitMQ and Postgres are securely injected to the respective services in the form of secrets -- all the pods/containers are in `Ready`/`Up` status. +- all the pods/containers are in `Ready`/`Up` status and and no restarts among the pods/containers. + - for `FederatedEGA` setup the following pods are required: `intercept`, `ingest`, `verify`, `finalize`, `mapper` and a [Data Retrival API](/docs/dataout.md) -Next step is to make sure that the remote connections (CEGA RabbitMQ) are working. Login to the RabbitMQ admin page and check that, +Next step is to make sure that the remote connections (`CentralEGA` RabbitMQ) are working. Login to the RabbitMQ admin page and check that: -- the Federation status of the Admin tab is in state `running` -- the Shovel status of the Admin tab is in state `running` for all 5 shovels. +- the [Federation](https://www.rabbitmq.com/federation.html) status of the Admin tab is in state `running` + or using `rabbitmqctl federation_status` from the command line of a RabbitMQ pod/container. +- the [Shovel](https://www.rabbitmq.com/shovel.html) status of the Admin tab is in state `running` for all shovels + or using `rabbitmqctl shovel_status` from the command line of a RabbitMQ pod/container. ## End-to-end testing -NOTE: This guide assumes that there exists a test instance account with `CentralEGA`. Make sure that the account is approved and added to the submitters group. +> NOTE: +> This guide assumes that there exists a test instance account with `CentralEGA`. Make sure that the account is approved and added to the submitters group. +> The [local development and testing](guides/local-dev-and-testing.md) guide provides the scripts for testing different parts of the setup, that can be used +> as a reference. ### Upload file(s) @@ -34,16 +35,16 @@ Upload one or a number of files of different sizes and check that, - the file(s) exists in the configured `inbox` of the storage backend (e.g. S3 bucket or POSIX path) - the file(s) entry exists in the database in the `sda.files` and `sda.file_event_log` tables - If the `s3inbox` is used, there should be an `uploaded` event for each specific file in the `sda.file_event_log` -- the file(s) exists in the CEGA Submission portal (here for the test instance) Files listing, which can be accessed after pressing the three lines menu button. +- the file(s) exists in the `CentralEGA` [Submission portal](https://ega-archive.org/submission/metadata/submission/sequencing-phenotype/submitter-portal/) (the submission portal URL address is specific for each `FederatedEGA` node). `Files` listing, which can be accessed after pressing the three lines menu button. ### Make a test submission -Make a submission with the portal and select the file(s) that were uploaded in the previous step. Once the analysis or runs (one of the two is required) step is finished, the messages for the ingestion of the files should appear in the logs of the `ingest` service. Make sure that, +Make a submission with the portal and select the file(s) that were uploaded in the previous step. Once the analysis or runs (one of the two is required) step is finished, the messages for the ingestion of the files should appear in the logs of the `ingest` service. Make sure that: - the messages are arriving for the file(s) included in the submission - the `ingestion`, `verify` and `finalise` processes are started and send a message when finished -- the data in `sda.files` are correct -- the files are logged in the `sda.file_event_log` for each of the services and files +- the data in `sda.files` table are correct +- the files are logged in the `sda.file_event_log` table for each of the services and files - the file(s) exists in the configured `archive` storage backend, see the `archive_file_path` in the `sda.files` table for the name of the archived file(s) - the archived file(s) exists in the configured `backup` storage backend - delete one run in the submitter portal, then and add it back again to make sure the cancel message is working as intended. diff --git a/mkdocs.yml b/mkdocs.yml index 2bc8856..cb0c33d 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -30,4 +30,5 @@ nav: - Contributing: "CONTRIBUTING.md" - Local dev and testing: "guides/local-dev-and-testing.md" - Deploying with Kubernetes: "guides/deploy-k8s.md" + - Troubleshooting: "guides/troubleshooting.md" \ No newline at end of file