-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docs: add architecture documentation #3184
Draft
fengelniederhammer
wants to merge
20
commits into
main
Choose a base branch
from
arc42
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+689
−0
Draft
Changes from 16 commits
Commits
Show all changes
20 commits
Select commit
Hold shift + click to select a range
fe95ef1
wip arc42
fengelniederhammer 5246c6c
wip arc42
fengelniederhammer 7887d92
wip arc42
fengelniederhammer 0055772
arc42 building blocks
fengelniederhammer 2c511df
arc42 runtime view
fengelniederhammer 2f8a837
arc42 runtime view
fengelniederhammer 7185251
arc42 runtime view
fengelniederhammer 13bd0a4
rename directory
fengelniederhammer b8bced7
arc42 deployment view
fengelniederhammer 105e352
arc42 crosscutting concepts
fengelniederhammer 157d0c0
arc42 ena deposition runtime view
fengelniederhammer 9864127
arc42 risks
fengelniederhammer f1d8c68
arc42 quality requirements
fengelniederhammer e43ed72
arc42 quality requirements
fengelniederhammer b5b9d86
arc42 introduction
fengelniederhammer 79e3358
arc42 solution strategy
fengelniederhammer d5a1ea2
arc42 more intro
fengelniederhammer 47ab988
arc42
fengelniederhammer 3f8b8cb
arc42 ena deposition risk and technical debt
fengelniederhammer 4ebf870
arc42 adr
fengelniederhammer File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
# Introduction And Goals | ||
|
||
Also see the top level [README.md](../README.md) for a high-level overview of the project. | ||
|
||
Loculus is a software package to power microbial genomial databases. | ||
|
||
This is an overview of important use cases: | ||
|
||
![Use Cases](plantuml/01_use_cases.svg) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,18 @@ | ||
# Architecture Constraints | ||
|
||
Loculus is developed under the following constraints: | ||
|
||
### Open Source Software | ||
|
||
We decided to develop Loculus under an open source license. | ||
The code is publicly available. | ||
|
||
Some aspects why we chose to develop Loculus as open source software: | ||
* to increase transparency of the project, | ||
* to allow others to contribute, | ||
* others are supposed to use the software - they should be able to see how it works. | ||
|
||
### Configurability | ||
|
||
Loculus is designed to be highly configurable. | ||
It should be usable for different organisms and different use cases. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
# Context and Scope | ||
|
||
This section puts Loculus into context with the outside world and defines the scope of the project. | ||
All external participants are listed in the diagram below: | ||
|
||
![Context View](plantuml/03_context_view.svg) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
# Solution Strategy | ||
|
||
This describes important decision that were made to solve the problem: | ||
* Loculus uses [LAPIS](https://github.com/GenSpectrum/LAPIS) and [SILO](https://github.com/GenSpectrum/LAPIS-SILO) to provide fast access to the sequence data. | ||
* Loculus implements a central HTTP API to store and retrieve data. | ||
This API encapsulates the data storage in a Postgres database. | ||
All other services interact with this API. | ||
The API is mostly agnostic to organism-specific logic. | ||
* A preprocessing pipeline handles the organism-specifics, such as alignment and translation. | ||
We provide a Nextclade-based pipeline, but maintainers can plug in their own pipeline. | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,62 @@ | ||
# Building Block View | ||
|
||
In the following diagrams, the arrows point from the actor to the system component that is used by the actor. | ||
Data flow may be in the opposite direction | ||
(e.g. in the case of a download: the actor requests a download from the website, the website sends the data to the | ||
actor). | ||
|
||
## Overview | ||
|
||
This diagram provides a high level overview of the components of Loculus | ||
and how they interact with each other and external participants. | ||
|
||
![Building Block View](plantuml/05_level_1.svg) | ||
|
||
* Users can either | ||
* use the website to browse the data and download sequences | ||
* or they can use LAPIS directly to query the data (e.g. for automated analysis). | ||
* Submitters can | ||
* log in via Keycloak | ||
* submit new sequence data via the website | ||
* or they can use the API directly to automate their submission process. | ||
* The backend infrastructure stores and processed the data. | ||
* LAPIS / SILO provides the query engine for the sequence data that is stored in the backend infrastructure. | ||
* The backend infrastructure also fetches sequence data from / uploads sequence data to INSDC services. | ||
* The website and the backend infrastructure use Keycloak to verify the identity of users. | ||
|
||
## LAPIS / SILO | ||
|
||
This diagram shows how Loculus utilizes | ||
[LAPIS](https://github.com/GenSpectrum/LAPIS) and | ||
[SILO](https://github.com/GenSpectrum/LAPIS-SILO). | ||
|
||
![LAPIS / SILO](plantuml/05_level_2_lapis.svg) | ||
|
||
* LAPIS provides an HTTP API to query the sequence data. | ||
* LAPIS is used by the website, but it can also be used by users directly. | ||
* The SILO API is a query engine that stores the data in memory to provide fast access. | ||
LAPIS accesses it via HTTP. | ||
* The SILO preprocessing fetches data from the Loculus backend in a regular interval, | ||
processes it into a format that the SILO API can load and stores the result in a shared volume (on disc). | ||
* The SILO API will pick up the processed data and load it into memory. | ||
|
||
## Loculus Backend Infrastructure | ||
|
||
This diagram shows the backend infrastructure of Loculus. | ||
|
||
![Backend Infrastructure](plantuml/05_level_2_backend.svg) | ||
|
||
The "Loculus Backend" is the central HTTP API. | ||
It encapsulates the data storage. | ||
All data is stored in a Postgres database. | ||
Several other components interact with the backend: | ||
* The website | ||
* sends data to the backend (e.g. new sequence data, new created groups) | ||
* requests data from the backend (e.g. some parts of sequence data, groups) | ||
* Submitters can use the API directly to submit new sequence data. | ||
* The preprocessing pipeline fetches unprocessed data, processes it and resubmits it to the backend. | ||
* The Ingest service fetches data from NCBI and submits it to the backend. | ||
* Ingest must be specifically enabled for a specific organism. | ||
* The ENA deposition service checks whether new data has been uploaded to Loculus and submits it to ENA. | ||
* ENA deposition must be specifically enabled for a specific organism. | ||
* The SILO preprocessing fetches all sequence data from the backend and loads it into SILO. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,27 @@ | ||
# Runtime view | ||
|
||
## Sequence Entry Lifecycle | ||
|
||
The following diagram shows a prototypical lifecycle of sequence data in Loculus: | ||
A submitter uploads data on the website, the backend infrastructure processes it | ||
and finally, the data is available for querying via LAPIS. | ||
|
||
![Submission Process](plantuml/06_submission_process.svg) | ||
|
||
The [backend runtime view](../backend/docs/runtime_view.md) provides a more detailed view of what happens in the backend | ||
during the submission process. | ||
|
||
## Sequence Entry Lifecycle | ||
|
||
The next diagram depicts the user interaction when data has been uploaded that is rejected by the preprocessing pipeline in more detail: | ||
|
||
![Submission Details](plantuml/06_user_submission_details.svg) | ||
|
||
Users are asked to edit erroneous data and resubmit it, before they can approve it. | ||
If the data has been reprocessed successfully, they can approve it, and it will be available for querying via LAPIS. | ||
|
||
## ENA deposition | ||
|
||
![ENA deposition](plantuml/06_ena_deposition.svg) | ||
|
||
TODO: describe this. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,37 @@ | ||
# Deployment View | ||
|
||
All artifacts of Loculus are available as Docker images. | ||
Thus, Loculus can be operated in any environment that supports Docker containers. | ||
Due to the extensive configuration processing, we provide a [Helm](https://helm.sh/) chart that does most of the work there, | ||
so we suggest to operate Loculus in a Kubernetes cluster. | ||
|
||
## High Level Overview | ||
|
||
In a productive environment, you will most likely want persistent databases. | ||
We recommend hosting the databases external to your Loculus cluster, as shown in the following diagram: | ||
|
||
![Deployment Overview](plantuml/07_deployment_overview.svg) | ||
|
||
For local development, we use [k3d](https://k3d.io/) to spin up a local cluster. | ||
There, also the databases are hosted within the cluster, because they don't need to be persistent. | ||
|
||
## Cluster Internals | ||
|
||
The following diagram sketches the internal structure of the deployed cluster. | ||
Only connections to/from outside the cluster are marked with arrows here. | ||
All other connections are omitted for simplicity. | ||
|
||
![Cluster Details](plantuml/07_cluster_details.svg) | ||
|
||
Inside the cluster, we assume that there is [Traefik](https://traefik.io/) running as an ingress controller. | ||
[k3s](https://k3s.io/) and k3d already come with Traefik installed by default. | ||
We configured Traefik to expose the relevant services to the public: | ||
* the website, | ||
* the backend, | ||
* LAPIS, | ||
* Keycloak. | ||
|
||
We only need a single instance of the website, the backend and keycloak (and their respective databases). | ||
The other services (LAPIS, SILO, preprocessing pipeline, ingest and ENA deposition) have to be configured | ||
and deployed per organism that the Loculus instance supports. | ||
We utilize Helm to generate those multiple service instances. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,18 @@ | ||
# Crosscutting Concepts | ||
|
||
## Logging | ||
|
||
Log messages are written directly to stdout so that they can be collected by the container orchestrator. | ||
|
||
## Request Tracing | ||
|
||
Where possible, APIs should implement request ids: | ||
* The API should accept a request id in the request header. | ||
* The API must include the request id in the response header. If no request id is provided, the API should generate one. | ||
* The API must include the request id in all log messages. | ||
|
||
This allows for tracing of requests through the system. | ||
It is also helpful if services log the request id that they receive from a service that they consume. | ||
|
||
In Spring Boot, implementing request ids is quite straight forward with `@RequestScope`. | ||
Also see [the implementation in the backend](https://github.com/loculus-project/loculus/blob/cbbbc9746604679df225059af6683ebcb568e038/backend/src/main/kotlin/org/loculus/backend/log/RequestId.kt). |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
# Architecture Decisions | ||
|
||
ADRs... | ||
|
||
Check Nuclino |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,26 @@ | ||
# Quality Requirements | ||
|
||
Following the [ISO-25010](https://iso25000.com/index.php/en/iso-25000-standards/iso-25010) standard, we define the following quality requirements for our system: | ||
|
||
## Performance Efficiency | ||
|
||
* Time behavior: When a submitter uploads a sequence, then the sequence should be available for querying within 10 minutes. | ||
* Time behavior: When a user queries a sequence, then the query should return within 1 second. | ||
|
||
## Interaction Capability | ||
|
||
* Operability: A maintainer should be able to set up a new Loculus instance from reading the documentation. | ||
|
||
## Security | ||
|
||
* Integrity: Only submitters belonging to the respective group should be able to make changes on sequence data. | ||
|
||
## Transparency | ||
|
||
We also identified two quality requirements that don't fit into the ISO-25010 standard: | ||
|
||
* The Loculus project is transparent. Important decisions are publicly documented. | ||
Users can comprehend how Loculus works and how the data is processed. | ||
* It is comprehensible who submitted which data and when. | ||
This is important so that submitters can be credited appropriately for their work | ||
(e.g. by citing their data in a publication). |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
# Risks and Technical Debt | ||
|
||
## Configuration Processing | ||
|
||
We use a `values.yaml` file as a main input source for the Helm chart for the configuration of a Loculus instance. | ||
|
||
We leveraged the powerful templating capabilities of Helm to generate the configuration files for the individual artifacts. | ||
This works well, because we can distribute the mostly redundant configuration values efficiently. | ||
|
||
However, this became quite complex and hard to maintain over time. | ||
It is untested and hard to debug, if something goes wrong. | ||
It is also (as of now) mostly undocumented. | ||
|
||
Some parts of the configuration are redundant and could be simplified. | ||
Also, the Helm chart contains a lot of default values | ||
that are not suitable for general Loculus instances and will result in unexpected behavior if not overwritten. | ||
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
# Glossary | ||
|
||
See glossary on the documentation page: | ||
* https://loculus.org/introduction/glossary/ | ||
* [source file](../docs/src/content/docs/introduction/glossary.md) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
# Architecture Documentation | ||
|
||
This folder documents the architecture of Loculus. | ||
It is based on the template provided by https://arc42.org/. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
plantuml.jar |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,31 @@ | ||
@startuml | ||
|
||
title Loculus Use Cases | ||
left to right direction | ||
|
||
actor User as user | ||
actor Submitter as submitter | ||
actor Maintainer as maintainer | ||
|
||
rectangle Loculus { | ||
usecase "Upload data" as upload | ||
usecase "Revise data" as revise | ||
usecase "Browse data" as browse | ||
usecase "Download data" as download | ||
|
||
usecase "Configure new organism" as configure | ||
usecase "Host own instance" as host | ||
usecase "Sync data with INSDC" as insdc | ||
} | ||
|
||
submitter --> upload | ||
submitter --> revise | ||
|
||
user --> browse | ||
user --> download | ||
|
||
maintainer --> configure | ||
maintainer --> host | ||
maintainer --> insdc | ||
|
||
@enduml |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ena deposition was written as an optional component, however it still needs to submit all data and keep submission state. Therefore we duplicate all records and keep them in the backend db schema and the ena deposition schema - this creates unnecessary database bloat.
Although the two schemas are in the same db they behave as separate dbs with only the backend pod directly querying the public db schema and the ena-deposition and ingest (see below) pod querying the ena-deposition schema.
Potentially the ingest and ena-submission pod should be merged together as they both interact with INSDC and are optional. Additionally, ingest queries the ena deposition schema directly at the moment to ensure it does not reingest sequences that we submitted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added something 👍