diff --git a/docs/connection.md b/docs/connection.md index c46bcce..1b16d0e 100644 --- a/docs/connection.md +++ b/docs/connection.md @@ -10,8 +10,7 @@ The RabbitMQ message brokers of each SDA instance are the **only** components with the necessary credentials to connect to `CentralEGA` message broker. -We call `CEGAMQ` and `LocalMQ` (Local Message Broker, sometimes know as `sda-mq`), -the RabbitMQ message brokers of, respectively, `CentralEGA` and `SDA`/`FederatedEGA`. +The message brokers for `CentralEGA` and `SDA/FederatedEGA` are denoted as `CEGAMQ` and `LocalMQ` (Local Message Broker, sometimes referred to as `sda-mq`), respectively. Local Message Broker -------------------- @@ -26,7 +25,7 @@ Local Message Broker The following environment variables can be used to configure the broker: > NOTE: -> We use [RabbitMQ](https://hub.docker.com/_/rabbitmq) >= `3.8.16` including +> The version of [RabbitMQ](https://hub.docker.com/_/rabbitmq) utilized is >= `3.8.16`, including > the management plugins. | Variable | Description | @@ -107,8 +106,7 @@ Service will wait for messages to arrive. `CEGAMQ` receives notifications from `LocalMQ` using a *shovel*. Everything that is published to its `to_cega` exchange gets forwarded to CentralEGA (using the routing key based on the name -`files.`). We propagate the different status of the -workflow to CentralEGA, using the following routing keys: +`files.`). The various workflow statuses are transmitted to CentralEGA via the following routing keys: | Name | Purpose | |-----------------|:-------------------------------------------| @@ -117,8 +115,8 @@ workflow to CentralEGA, using the following routing keys: | files.inbox | For inbox file operations | | files.verified | For files ready to request accessionID | -Note that we do not need at the moment a queue to store the completed -message, nor the errors, as we forward them to `CentralEGA`. +Note that currently there is no necessity for a queue to store completed +messages or errors, as they are promptly forwarded to `CentralEGA`. ![RabbitMQ setup](./static/CEGA-LEGA.png) @@ -272,7 +270,7 @@ when the `accession ID` has been set (in case of Federated EGA this also means b } ``` -The message sent from the `finalize` service to the `completed` queue. +The message is sent from the `finalize` service to the `completed` queue. ```javascript { diff --git a/docs/dataout.md b/docs/dataout.md index 0465570..18b9af0 100644 --- a/docs/dataout.md +++ b/docs/dataout.md @@ -2,8 +2,7 @@ Data Retrieval API ================== > NOTE: -> We maintain two Data Retrieval API solutions, for which REST APIs are the -> same. +> Two Data Retrieval API solutions are maintained, with identical REST APIs. SDA-DOA ------- @@ -88,8 +87,8 @@ Data Retrieval API can be run with connection to an AAI or without. If connectio set. > NOTE: -> By default we use LifeScience AAI as JWT for authentication -> `OPENID_CONFIGURATION_URL` is set to: +> The default JWT for authentication is LifeScience AAI, +> and the `OPENID_CONFIGURATION_URL` is set to: > If connected to an AAI provider the current implementation is based on diff --git a/docs/db.md b/docs/db.md index f0bf8f1..49f3d77 100644 --- a/docs/db.md +++ b/docs/db.md @@ -1,9 +1,7 @@ Database Setup ============== -We use a Postgres database (version 15+ ) to store intermediate data, in -order to track progress in file ingestion. The `lega` database schema is -documented below. +A Postgres database (version 15+) is employed for storing intermediate data to track progress in file ingestion. The database schema, named lega, is documented below. > NOTE: > Source code repository for DB component is available at: @@ -13,15 +11,15 @@ The database container will initialize and create the necessary database structure and functions if started with an empty area. Procedures for *backing up the database* are important, however considered out of scope for the secure data archive project. -Look at [the SQL definitions](https://github.com/neicnordic/sensitive-data-archive/tree/main/postgresql/initdb.d) -if you are also interested in the database triggers. +Refer to [the SQL definitions](https://github.com/neicnordic/sensitive-data-archive/tree/main/postgresql/initdb.d) +if an interest in the database triggers exists. Configuration ------------- Security is hardened: -- We do not use 'trust' even for local connections +- The 'trust' authentication method is not utilized, even for local connections - Requiring password authentication for all - Enforcing TLS communication - Enforcing client-certificate verification @@ -195,13 +193,9 @@ versions/migrations or none. > Any changes done to database schema initialization should be reflected > in a schema migration script. -Whenever you need to change the database schema, we recommended changing -both the database initialization scripts (and bumping the bootstrapped -schema version) as well as creating the corresponding migration script -to perform the changes on a database in use. +For any required modifications to the database schema, it is advisable to update both the database initialization scripts (along with incrementing the bootstrapped schema version) and generate the corresponding migration script to execute the changes on a currently active database. Migration scripts should be placed in `/migratedb.d/` in the *sensitive-data-archive* repo -(). We recommend naming them -corresponding to the schema version they provide migration to. There is +(). It is advised to name these scripts in alignment with the schema version to which they facilitate migration. There is an "empty" migration script (`01.sql`) that can be used as a template. diff --git a/docs/deploy.md b/docs/deploy.md index 6948dc2..5c941ff 100644 --- a/docs/deploy.md +++ b/docs/deploy.md @@ -1,10 +1,8 @@ Deployments and Local Bootstrap =============================== -We use different deployment strategies for environments like Docker -Swarm, Kubernetes or a local-machine. The [local development and testing](guides/local-dev-and-testing.md) guide is -recommended for local-machine, while -[Kubernetes](https://kubernetes.io/) and [Docker Swarm](https://docs.docker.com/engine/swarm/) are recommended for production. +There are different deployment strategies for environments like Docker Swarm, Kubernetes or a local-machine. The [local development and testing](guides/local-dev-and-testing.md) guide is +recommended for local-machine, while [Kubernetes](https://kubernetes.io/) and [Docker Swarm](https://docs.docker.com/engine/swarm/) are recommended for production. The production deployment repositories are: diff --git a/docs/dictionary/wordlist.txt b/docs/dictionary/wordlist.txt index 266fbda..1713479 100644 --- a/docs/dictionary/wordlist.txt +++ b/docs/dictionary/wordlist.txt @@ -363,4 +363,5 @@ Mina's SPRINGFRAMEWORK env programmatically -assignees +incrementing +assignees \ No newline at end of file diff --git a/docs/encryption.md b/docs/encryption.md index 6773ea2..1fcd575 100644 --- a/docs/encryption.md +++ b/docs/encryption.md @@ -46,8 +46,7 @@ The advantages of the format are, among others: segments surrounding the portion. The file itself is not decrypted and re-encrypted. -In order to encrypt files using this standard we recommend the following -tools: +For encrypting files using this standard, the following tools are recommended: - - samtools extension - - python library diff --git a/docs/guides/deploy-k8s.md b/docs/guides/deploy-k8s.md index 5891428..a20a1e0 100644 --- a/docs/guides/deploy-k8s.md +++ b/docs/guides/deploy-k8s.md @@ -67,7 +67,7 @@ The table below reflects the minimum required resources to run the services in t | sftpinbox | 100m | 128Mi | - | | doa | 100m | 128Mi | - | -Here we provide minimal lists of variables that need to be configured, in addition to the defaults, in the respective `values.yml` file of each of the Helm charts for: +Here, minimal lists of variables requiring configuration, in addition to the defaults, are provided in the respective `values.yml` file for each of the Helm charts. - [SDA services](#sda-services-chart) - [RabbitMQ](#rabbitmq-chart) @@ -253,7 +253,7 @@ Certain services, such as `inbox`, `download`, and `auth`, are configured to exp - download - auth -In addition, Kubernetes allows you to define [Network Policies](https://kubernetes.io/docs/concepts/services-networking/network-policies/) to control the communication between Pods. Network Policies are crucial for enforcing security measures within your cluster. They enable you to specify which Pods can communicate with each other and define rules for ingress and egress traffic. +In addition, Kubernetes allows to define [Network Policies](https://kubernetes.io/docs/concepts/services-networking/network-policies/) to control the communication between Pods. Network Policies are crucial for enforcing security measures within the cluster. These facilitate the specification of which Pods can communicate with each other and define rules for ingress and egress traffic. Here are two recommended basic examples of a Network Policy for namespace isolation and allowing traffic to inbox ingress, a similar policies needs to be in place for `download` and `auth` service: ```yaml diff --git a/docs/guides/deploy-swarm.md b/docs/guides/deploy-swarm.md index d3d075c..c4eb980 100644 --- a/docs/guides/deploy-swarm.md +++ b/docs/guides/deploy-swarm.md @@ -7,4 +7,4 @@ This guide explains how to deploy the Sensitive Data Archive (SDA) in docker swa > If you have feedback to give on the content you would like to see, please contact us on > [github](https://github.com/neicnordic/neic-sda)! -Before this guide is written - you can already get some useful clues on how to deploy SDA in a docker swarm setup through examples of docker swarm compatible docker compose files, as well as some useful development tools, in the [Docker Swarm deployment repo](https://github.com/neicnordic/LocalEGA-deploy-swarm/). +Before this guide is written - one can already get some useful clues on how to deploy SDA in a docker swarm setup through examples of docker swarm compatible docker compose files, as well as some useful development tools, in the [Docker Swarm deployment repo](https://github.com/neicnordic/LocalEGA-deploy-swarm/). diff --git a/docs/guides/local-dev-and-testing.md b/docs/guides/local-dev-and-testing.md index cdd2d83..e5a02bc 100644 --- a/docs/guides/local-dev-and-testing.md +++ b/docs/guides/local-dev-and-testing.md @@ -2,37 +2,37 @@ ## Guide summary -This guide provides a brief introduction on how to locally install and run the Sensitive Data Archive (SDA) components on your development system, and run all the tests locally. +This guide provides a brief introduction on how to locally install and run the Sensitive Data Archive (SDA) components on one's development system, and run all the tests locally. -This guide should get you started by setting up your environment to build and deploy the services of SDA, run all tests in the code base, and run several development related "shortcut" actions from the commandline using custom development helper scripts. +This guide should get one started by setting up one's environment to build and deploy the services of SDA, run all tests in the code base, and run several development related "shortcut" actions from the commandline using custom development helper scripts. -In addition the guide includes a few tips and tricks on how to explore the services running locally after you've deployed them, to learn more about how they operate together and find more documentation. +In addition the guide includes a few tips and tricks on how to explore the services running locally after one have deployed them, to learn more about how they operate together and find more documentation. ## Local security / zone considerations Normally a SDA deployment will consist of at least two different security zones, separating internet facing services from the services accessing the encrypted file archive with more sensitive config information. In this guide and the development deployment tools, these security zones are not implemented. -Hence your testing, staging and production deployments will most likely differ in deployment strategy and end-2-end testing approaches compared to what is documented in this guide. +Hence one's testing, staging and production deployments will most likely differ in deployment strategy and end-2-end testing approaches compared to what is documented in this guide. ## SDA local development and testing helpers -The SDA code itself contains some very useful development helpers, that let's you very easily: +The SDA code itself contains some very useful development helpers, enabling straightforward utilization for enhanced ease: - - Check that you have all required libraries, compilers, docker tools, and if not tries to install them + - Check to have all required libraries, compilers, docker tools, and if not tries to install them - Build the services locally - Lint the code - Run the tests for each service - Run integration tests for services together by deploying the service containers - ...and some more useful actions -Once you have cloned the [neicnordic/sensitive-data-archive](https://github.com/neicnordic/sensitive-data-archive/) to your development system, you can start to +Having cloned the [neicnordic/sensitive-data-archive](https://github.com/neicnordic/sensitive-data-archive/) to a development system, one can start to explore these tools as explained in [this page](sda-dev-test-doc.md). ## What's next, once everything is up and running? -Once you've been able to automagically deploy and run all the tests and integration tests, thinking *"that was too easy, what really happened?"*, you could go in multiple direction from here to learn more: +Having been able to automagically deploy and run all the tests and integration tests, thinking *"that was too easy, what really happened?"*, one could go in multiple direction from here to learn more: - List all the services running with ```$ docker ps``` - Peek into logs of services with ```$ docker logs container-id-or-name``` diff --git a/docs/guides/tls.md b/docs/guides/tls.md index a10701e..e707c7d 100644 --- a/docs/guides/tls.md +++ b/docs/guides/tls.md @@ -5,6 +5,4 @@ > If you have feedback to give on the content, please contact us on > [github](https://github.com/neicnordic/neic-sda)! -If you have spent time on the internet you have used tls encryption, -which is the encryption used for https requests. -Setting up TLS for a system can be a little bit tricky though. +Those who have navigated the online realm have undoubtedly encountered TLS encryption, the cryptographic protocol employed for securing HTTPS requests. However, configuring TLS for a system can prove to be somewhat intricate. diff --git a/docs/guides/troubleshooting.md b/docs/guides/troubleshooting.md index dc4e168..5607677 100644 --- a/docs/guides/troubleshooting.md +++ b/docs/guides/troubleshooting.md @@ -1,6 +1,6 @@ # Troubleshooting -In this guide we aim to give some general tips on how to troubleshoot and restore services to working order. +This guide aims to provide general tips on troubleshooting and restoring services to a functional state. ## After deployment checklist @@ -58,4 +58,4 @@ Finally, when all files have been ingested, the submission portal should allow f - the dataset has the status `registered` in the `sda.dataset_event_log` - the dataset gets the status `released` in the `sda.dataset_event_log`, this might take a while depending on what date was chosen in the submitter portal. -Once all the submission steps have been verified, we can assume that the pipeline part of the deployment is working properly. +Upon the confirmation of all submission steps, it is reasonable to infer that the deployment's pipeline component is functioning correctly. diff --git a/docs/index.md b/docs/index.md index c572b5f..4ca55ba 100644 --- a/docs/index.md +++ b/docs/index.md @@ -7,17 +7,17 @@ The NeIC Sensitive Data Archive (SDA) is an encrypted data archive, implemented The modular architecture of SDA supports both stand alone deployment of an archive, and the use case of deploying a Federated node in the [Federated European Genome-phenome Archive network (FEGA)](https://ega-archive.org/about/projects-and-funders/federated-ega/), serving discoverable sensitive datasets in the main [EGA web portal](https://ega-archive.org). > NOTE: -> Throughout this documentation, we can refer to [Central -> EGA](https://ega-archive.org/) as `CEGA`, or `CentralEGA`, and *any* -> `FederatedEGA` instance also know as: `FEGA`, `LEGA` or -> `LocalEGA`. In the context of NeIC we will refer to the Federated EGA as the -> `Sensitive Data Archive` or `SDA`. +> Throughout this documentation, reference to [Central +> EGA](https://ega-archive.org/) may be made as `CEGA` or `CentralEGA`, and *any* +> `FederatedEGA` instance is alternatively known as `FEGA`, `LEGA`, or `LocalEGA`. +> Within the context of NeIC, the Federated EGA is denoted as +> the `Sensitive Data Archive` or `SDA`. Organisation of the NeIC SDA Operations Handbook ------------------------------------------------ -This operations handbook is organized in four main parts, that each has it's own main section in the left navigation menu. Here we provide a condensed summary, follow the links below or use the menu navigation to each section's own detailed introduction page: +This operations handbook is organized in four main parts, that each has it's own main section in the left navigation menu. Here is a condensed summary, follow the links below or use the menu navigation to each section's own detailed introduction page: 1. **Structure**: Provides overview material for how the services can be deployed in different constellations and highlights communication paths. diff --git a/docs/structure.md b/docs/structure.md index 527bec4..87df9e7 100644 --- a/docs/structure.md +++ b/docs/structure.md @@ -91,7 +91,7 @@ The NeIC SDA is targeting both types of setup but also to allow for the possibil ### Container deployment options -The components of SDA are all container based using Docker standards for building container images. They can be deployed in a range of different ways depending on your local needs. The SDA developers are currently aware of the following alternatives in use: +The components of SDA are all container based using Docker standards for building container images. Deployment options can vary based on local requirements and can be adapted to suit different needs. The SDA developers are currently aware of the following alternatives in use: 1. Kubernetes (OpenShift) 2. Docker Swarm diff --git a/docs/submission.md b/docs/submission.md index 5afd555..6e2b751 100644 --- a/docs/submission.md +++ b/docs/submission.md @@ -130,32 +130,29 @@ completes and the checksum is valid, a message of completion is sent to > **Important** > If a file disappears or is overwritten in the inbox before ingestion is completed, ingestion may not be possible. -If any of the above steps generates an error, we exit the workflow and -log the error. In case the error is related to a misuse from the user, -such as submitting the wrong checksum or tampering with the encrypted -file, the error is forwarded to `CentralEGA` in order to be displayed in -the Submission Interface. +Should any of the aforementioned steps result in an error, the workflow is terminated, and the error is logged. If the error is attributed to user misuse, such as providing an incorrect checksum or tampering with the encrypted file, it is reported to `CentralEGA` for display in the Submission Interface. + Submission Inbox ---------------- -`CentralEGA` contains a database of users, with IDs and passwords. We -have developed several solutions allowing user authentication against -CentralEGA user database: +`CentralEGA` contains a database of users, with IDs and passwords. Multiple solutions +have been developed to facilitate user authentication +against the CentralEGA user database.: - [Apache Mina Inbox](submission.md##sftp-inbox); - [S3 Proxy Inbox](submission.md#s3-proxy-inbox); - [TSD File API](submission.md#tsd-file-api). -Each solution uses CentralEGA's user IDs, but will also be extended to -use Elixir IDs (of which we strip the `@elixir-europe.org` suffix). +Every solution utilizes CentralEGA's user IDs and is planned for +extension to incorporate Elixir IDs, from which the `@elixir-europe.org` suffix is removed. The procedure is as follows: the inbox is started without any created user. When a user wants to log into the inbox (via `sftp`, `s3` or `https`), the inbox service looks up the username in a local queries the -CentralEGA REST endpoint. Upon return, we store the user credentials in -the local cache and create the user's home directory. The user now gets -logged in if the password or public key authentication succeeds. +CentralEGA REST endpoint. Upon the user's return, their credentials are +stored in the local cache, and a home directory for the user is created. +The user now gets logged in if the password or public key authentication succeeds. {% include-markdown "services/sftpinbox.md"