Skip to content

Commit

Permalink
consistent language: removed "we", "you" and past tense (#103)
Browse files Browse the repository at this point in the history
  • Loading branch information
aaperis authored Dec 21, 2023
2 parents 7da0ca4 + b6cc30c commit a2a119f
Show file tree
Hide file tree
Showing 14 changed files with 51 additions and 67 deletions.
14 changes: 6 additions & 8 deletions docs/connection.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,8 +10,7 @@ The RabbitMQ message brokers of each SDA instance are the **only**
components with the necessary credentials to connect to `CentralEGA`
message broker.

We call `CEGAMQ` and `LocalMQ` (Local Message Broker, sometimes know as `sda-mq`),
the RabbitMQ message brokers of, respectively, `CentralEGA` and `SDA`/`FederatedEGA`.
The message brokers for `CentralEGA` and `SDA/FederatedEGA` are denoted as `CEGAMQ` and `LocalMQ` (Local Message Broker, sometimes referred to as `sda-mq`), respectively.

Local Message Broker
--------------------
Expand All @@ -26,7 +25,7 @@ Local Message Broker
The following environment variables can be used to configure the broker:

> NOTE:
> We use [RabbitMQ](https://hub.docker.com/_/rabbitmq) >= `3.8.16` including
> The version of [RabbitMQ](https://hub.docker.com/_/rabbitmq) utilized is >= `3.8.16`, including
> the management plugins.
| Variable | Description |
Expand Down Expand Up @@ -107,8 +106,7 @@ Service will wait for messages to arrive.
`CEGAMQ` receives notifications from `LocalMQ` using a *shovel*.
Everything that is published to its `to_cega` exchange gets forwarded to
CentralEGA (using the routing key based on the name
`files.<internal_queue_name>`). We propagate the different status of the
workflow to CentralEGA, using the following routing keys:
`files.<internal_queue_name>`). The various workflow statuses are transmitted to CentralEGA via the following routing keys:

| Name | Purpose |
|-----------------|:-------------------------------------------|
Expand All @@ -117,8 +115,8 @@ workflow to CentralEGA, using the following routing keys:
| files.inbox | For inbox file operations |
| files.verified | For files ready to request accessionID |

Note that we do not need at the moment a queue to store the completed
message, nor the errors, as we forward them to `CentralEGA`.
Note that currently there is no necessity for a queue to store completed
messages or errors, as they are promptly forwarded to `CentralEGA`.

![RabbitMQ setup](./static/CEGA-LEGA.png)

Expand Down Expand Up @@ -272,7 +270,7 @@ when the `accession ID` has been set (in case of Federated EGA this also means b
}
```

The message sent from the `finalize` service to the `completed` queue.
The message is sent from the `finalize` service to the `completed` queue.

```javascript
{
Expand Down
7 changes: 3 additions & 4 deletions docs/dataout.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,7 @@ Data Retrieval API
==================

> NOTE:
> We maintain two Data Retrieval API solutions, for which REST APIs are the
> same.
> Two Data Retrieval API solutions are maintained, with identical REST APIs.
SDA-DOA
-------
Expand Down Expand Up @@ -88,8 +87,8 @@ Data Retrieval API can be run with connection to an AAI or without. If connectio
set.

> NOTE:
> By default we use LifeScience AAI as JWT for authentication
> `OPENID_CONFIGURATION_URL` is set to:
> The default JWT for authentication is LifeScience AAI,
> and the `OPENID_CONFIGURATION_URL` is set to:
> <https://proxy.aai.lifescience-ri.eu/.well-known/openid-configuration>
If connected to an AAI provider the current implementation is based on
Expand Down
18 changes: 6 additions & 12 deletions docs/db.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,7 @@
Database Setup
==============

We use a Postgres database (version 15+ ) to store intermediate data, in
order to track progress in file ingestion. The `lega` database schema is
documented below.
A Postgres database (version 15+) is employed for storing intermediate data to track progress in file ingestion. The database schema, named lega, is documented below.

> NOTE:
> Source code repository for DB component is available at:
Expand All @@ -13,15 +11,15 @@ The database container will initialize and create the necessary database
structure and functions if started with an empty area. Procedures for *backing up the database* are important, however considered out of scope for
the secure data archive project.

Look at [the SQL definitions](https://github.com/neicnordic/sensitive-data-archive/tree/main/postgresql/initdb.d)
if you are also interested in the database triggers.
Refer to [the SQL definitions](https://github.com/neicnordic/sensitive-data-archive/tree/main/postgresql/initdb.d)
if an interest in the database triggers exists.

Configuration
-------------

Security is hardened:

- We do not use 'trust' even for local connections
- The 'trust' authentication method is not utilized, even for local connections
- Requiring password authentication for all
- Enforcing TLS communication
- Enforcing client-certificate verification
Expand Down Expand Up @@ -195,13 +193,9 @@ versions/migrations or none.
> Any changes done to database schema initialization should be reflected
> in a schema migration script.
Whenever you need to change the database schema, we recommended changing
both the database initialization scripts (and bumping the bootstrapped
schema version) as well as creating the corresponding migration script
to perform the changes on a database in use.
For any required modifications to the database schema, it is advisable to update both the database initialization scripts (along with incrementing the bootstrapped schema version) and generate the corresponding migration script to execute the changes on a currently active database.

Migration scripts should be placed in `/migratedb.d/` in the *sensitive-data-archive* repo
(<https://github.com/neicnordic/sensitive-data-archive/tree/main/postgresql>). We recommend naming them
corresponding to the schema version they provide migration to. There is
(<https://github.com/neicnordic/sensitive-data-archive/tree/main/postgresql>). It is advised to name these scripts in alignment with the schema version to which they facilitate migration. There is
an "empty" migration script (`01.sql`) that can be used as a
template.
6 changes: 2 additions & 4 deletions docs/deploy.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,8 @@
Deployments and Local Bootstrap
===============================

We use different deployment strategies for environments like Docker
Swarm, Kubernetes or a local-machine. The [local development and testing](guides/local-dev-and-testing.md) guide is
recommended for local-machine, while
[Kubernetes](https://kubernetes.io/) and [Docker Swarm](https://docs.docker.com/engine/swarm/) are recommended for production.
There are different deployment strategies for environments like Docker Swarm, Kubernetes or a local-machine. The [local development and testing](guides/local-dev-and-testing.md) guide is
recommended for local-machine, while [Kubernetes](https://kubernetes.io/) and [Docker Swarm](https://docs.docker.com/engine/swarm/) are recommended for production.

The production deployment repositories are:

Expand Down
3 changes: 2 additions & 1 deletion docs/dictionary/wordlist.txt
Original file line number Diff line number Diff line change
Expand Up @@ -363,4 +363,5 @@ Mina's
SPRINGFRAMEWORK
env
programmatically
assignees
incrementing
assignees
3 changes: 1 addition & 2 deletions docs/encryption.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,8 +46,7 @@ The advantages of the format are, among others:
segments surrounding the portion. The file itself is not decrypted
and re-encrypted.

In order to encrypt files using this standard we recommend the following
tools:
For encrypting files using this standard, the following tools are recommended:

- <https://github.com/samtools/htslib-crypt4gh/> - samtools extension
- <https://github.com/EGA-archive/crypt4gh> - python library
Expand Down
4 changes: 2 additions & 2 deletions docs/guides/deploy-k8s.md
Original file line number Diff line number Diff line change
Expand Up @@ -67,7 +67,7 @@ The table below reflects the minimum required resources to run the services in t
| sftpinbox | 100m | 128Mi | - |
| doa | 100m | 128Mi | - |

Here we provide minimal lists of variables that need to be configured, in addition to the defaults, in the respective `values.yml` file of each of the Helm charts for:
Here, minimal lists of variables requiring configuration, in addition to the defaults, are provided in the respective `values.yml` file for each of the Helm charts.

- [SDA services](#sda-services-chart)
- [RabbitMQ](#rabbitmq-chart)
Expand Down Expand Up @@ -253,7 +253,7 @@ Certain services, such as `inbox`, `download`, and `auth`, are configured to exp
- download
- auth

In addition, Kubernetes allows you to define [Network Policies](https://kubernetes.io/docs/concepts/services-networking/network-policies/) to control the communication between Pods. Network Policies are crucial for enforcing security measures within your cluster. They enable you to specify which Pods can communicate with each other and define rules for ingress and egress traffic.
In addition, Kubernetes allows to define [Network Policies](https://kubernetes.io/docs/concepts/services-networking/network-policies/) to control the communication between Pods. Network Policies are crucial for enforcing security measures within the cluster. These facilitate the specification of which Pods can communicate with each other and define rules for ingress and egress traffic.
Here are two recommended basic examples of a Network Policy for namespace isolation and allowing traffic to inbox ingress, a similar policies needs to be in place for `download` and `auth` service:

```yaml
Expand Down
2 changes: 1 addition & 1 deletion docs/guides/deploy-swarm.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,4 +7,4 @@ This guide explains how to deploy the Sensitive Data Archive (SDA) in docker swa
> If you have feedback to give on the content you would like to see, please contact us on
> [github](https://github.com/neicnordic/neic-sda)!
Before this guide is written - you can already get some useful clues on how to deploy SDA in a docker swarm setup through examples of docker swarm compatible docker compose files, as well as some useful development tools, in the [Docker Swarm deployment repo](https://github.com/neicnordic/LocalEGA-deploy-swarm/).
Before this guide is written - one can already get some useful clues on how to deploy SDA in a docker swarm setup through examples of docker swarm compatible docker compose files, as well as some useful development tools, in the [Docker Swarm deployment repo](https://github.com/neicnordic/LocalEGA-deploy-swarm/).
16 changes: 8 additions & 8 deletions docs/guides/local-dev-and-testing.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,37 +2,37 @@

## Guide summary

This guide provides a brief introduction on how to locally install and run the Sensitive Data Archive (SDA) components on your development system, and run all the tests locally.
This guide provides a brief introduction on how to locally install and run the Sensitive Data Archive (SDA) components on one's development system, and run all the tests locally.

This guide should get you started by setting up your environment to build and deploy the services of SDA, run all tests in the code base, and run several development related "shortcut" actions from the commandline using custom development helper scripts.
This guide should get one started by setting up one's environment to build and deploy the services of SDA, run all tests in the code base, and run several development related "shortcut" actions from the commandline using custom development helper scripts.

In addition the guide includes a few tips and tricks on how to explore the services running locally after you've deployed them, to learn more about how they operate together and find more documentation.
In addition the guide includes a few tips and tricks on how to explore the services running locally after one have deployed them, to learn more about how they operate together and find more documentation.


## Local security / zone considerations

Normally a SDA deployment will consist of at least two different security zones, separating internet facing services from the services accessing the encrypted file archive with more sensitive config information. In this guide and the development deployment tools, these security zones are not implemented.

Hence your testing, staging and production deployments will most likely differ in deployment strategy and end-2-end testing approaches compared to what is documented in this guide.
Hence one's testing, staging and production deployments will most likely differ in deployment strategy and end-2-end testing approaches compared to what is documented in this guide.


## SDA local development and testing helpers

The SDA code itself contains some very useful development helpers, that let's you very easily:
The SDA code itself contains some very useful development helpers, enabling straightforward utilization for enhanced ease:

- Check that you have all required libraries, compilers, docker tools, and if not tries to install them
- Check to have all required libraries, compilers, docker tools, and if not tries to install them
- Build the services locally
- Lint the code
- Run the tests for each service
- Run integration tests for services together by deploying the service containers
- ...and some more useful actions

Once you have cloned the [neicnordic/sensitive-data-archive](https://github.com/neicnordic/sensitive-data-archive/) to your development system, you can start to
Having cloned the [neicnordic/sensitive-data-archive](https://github.com/neicnordic/sensitive-data-archive/) to a development system, one can start to
explore these tools as explained in [this page](sda-dev-test-doc.md).

## What's next, once everything is up and running?

Once you've been able to automagically deploy and run all the tests and integration tests, thinking *"that was too easy, what really happened?"*, you could go in multiple direction from here to learn more:
Having been able to automagically deploy and run all the tests and integration tests, thinking *"that was too easy, what really happened?"*, one could go in multiple direction from here to learn more:

- List all the services running with ```$ docker ps```
- Peek into logs of services with ```$ docker logs container-id-or-name```
Expand Down
4 changes: 1 addition & 3 deletions docs/guides/tls.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,4 @@
> If you have feedback to give on the content, please contact us on
> [github](https://github.com/neicnordic/neic-sda)!
If you have spent time on the internet you have used tls encryption,
which is the encryption used for https requests.
Setting up TLS for a system can be a little bit tricky though.
Those who have navigated the online realm have undoubtedly encountered TLS encryption, the cryptographic protocol employed for securing HTTPS requests. However, configuring TLS for a system can prove to be somewhat intricate.
4 changes: 2 additions & 2 deletions docs/guides/troubleshooting.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Troubleshooting

In this guide we aim to give some general tips on how to troubleshoot and restore services to working order.
This guide aims to provide general tips on troubleshooting and restoring services to a functional state.

## After deployment checklist

Expand Down Expand Up @@ -58,4 +58,4 @@ Finally, when all files have been ingested, the submission portal should allow f
- the dataset has the status `registered` in the `sda.dataset_event_log`
- the dataset gets the status `released` in the `sda.dataset_event_log`, this might take a while depending on what date was chosen in the submitter portal.

Once all the submission steps have been verified, we can assume that the pipeline part of the deployment is working properly.
Upon the confirmation of all submission steps, it is reasonable to infer that the deployment's pipeline component is functioning correctly.
12 changes: 6 additions & 6 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,17 +7,17 @@ The NeIC Sensitive Data Archive (SDA) is an encrypted data archive, implemented
The modular architecture of SDA supports both stand alone deployment of an archive, and the use case of deploying a Federated node in the [Federated European Genome-phenome Archive network (FEGA)](https://ega-archive.org/about/projects-and-funders/federated-ega/), serving discoverable sensitive datasets in the main [EGA web portal](https://ega-archive.org).

> NOTE:
> Throughout this documentation, we can refer to [Central
> EGA](https://ega-archive.org/) as `CEGA`, or `CentralEGA`, and *any*
> `FederatedEGA` instance also know as: `FEGA`, `LEGA` or
> `LocalEGA`. In the context of NeIC we will refer to the Federated EGA as the
> `Sensitive Data Archive` or `SDA`.
> Throughout this documentation, reference to [Central
> EGA](https://ega-archive.org/) may be made as `CEGA` or `CentralEGA`, and *any*
> `FederatedEGA` instance is alternatively known as `FEGA`, `LEGA`, or `LocalEGA`.
> Within the context of NeIC, the Federated EGA is denoted as
> the `Sensitive Data Archive` or `SDA`.

Organisation of the NeIC SDA Operations Handbook
------------------------------------------------

This operations handbook is organized in four main parts, that each has it's own main section in the left navigation menu. Here we provide a condensed summary, follow the links below or use the menu navigation to each section's own detailed introduction page:
This operations handbook is organized in four main parts, that each has it's own main section in the left navigation menu. Here is a condensed summary, follow the links below or use the menu navigation to each section's own detailed introduction page:

1. **Structure**: Provides overview material for how the services can be deployed in different constellations and highlights communication paths.

Expand Down
2 changes: 1 addition & 1 deletion docs/structure.md
Original file line number Diff line number Diff line change
Expand Up @@ -91,7 +91,7 @@ The NeIC SDA is targeting both types of setup but also to allow for the possibil

### Container deployment options

The components of SDA are all container based using Docker standards for building container images. They can be deployed in a range of different ways depending on your local needs. The SDA developers are currently aware of the following alternatives in use:
The components of SDA are all container based using Docker standards for building container images. Deployment options can vary based on local requirements and can be adapted to suit different needs. The SDA developers are currently aware of the following alternatives in use:

1. Kubernetes (OpenShift)
2. Docker Swarm
Expand Down
23 changes: 10 additions & 13 deletions docs/submission.md
Original file line number Diff line number Diff line change
Expand Up @@ -130,32 +130,29 @@ completes and the checksum is valid, a message of completion is sent to
> **Important**
> If a file disappears or is overwritten in the inbox before ingestion is completed, ingestion may not be possible.
If any of the above steps generates an error, we exit the workflow and
log the error. In case the error is related to a misuse from the user,
such as submitting the wrong checksum or tampering with the encrypted
file, the error is forwarded to `CentralEGA` in order to be displayed in
the Submission Interface.
Should any of the aforementioned steps result in an error, the workflow is terminated, and the error is logged. If the error is attributed to user misuse, such as providing an incorrect checksum or tampering with the encrypted file, it is reported to `CentralEGA` for display in the Submission Interface.


Submission Inbox
----------------

`CentralEGA` contains a database of users, with IDs and passwords. We
have developed several solutions allowing user authentication against
CentralEGA user database:
`CentralEGA` contains a database of users, with IDs and passwords. Multiple solutions
have been developed to facilitate user authentication
against the CentralEGA user database.:

- [Apache Mina Inbox](submission.md##sftp-inbox);
- [S3 Proxy Inbox](submission.md#s3-proxy-inbox);
- [TSD File API](submission.md#tsd-file-api).

Each solution uses CentralEGA's user IDs, but will also be extended to
use Elixir IDs (of which we strip the `@elixir-europe.org` suffix).
Every solution utilizes CentralEGA's user IDs and is planned for
extension to incorporate Elixir IDs, from which the `@elixir-europe.org` suffix is removed.

The procedure is as follows: the inbox is started without any created
user. When a user wants to log into the inbox (via `sftp`, `s3` or
`https`), the inbox service looks up the username in a local queries the
CentralEGA REST endpoint. Upon return, we store the user credentials in
the local cache and create the user's home directory. The user now gets
logged in if the password or public key authentication succeeds.
CentralEGA REST endpoint. Upon the user's return, their credentials are
stored in the local cache, and a home directory for the user is created.
The user now gets logged in if the password or public key authentication succeeds.

{%
include-markdown "services/sftpinbox.md"
Expand Down

0 comments on commit a2a119f

Please sign in to comment.