docs: add architecture documentation #3184

fengelniederhammer · 2024-11-06T09:19:50Z

Summary

This adds architecture documentation following the https://docs.arc42.org/ template.

Screenshot

PR Checklist

All necessary documentation has been adapted.
~~- [ ] The implemented feature is covered by an appropriate test.~~

architecture_docs/plantuml/06_ena_deposition.puml

fhennig · 2024-11-06T10:35:37Z

Starting point: https://github.com/loculus-project/loculus/blob/arc42/architecture_docs/README.md

I just read through it all, looks great already! I really like the 4 different view (block view, runtime, deployment, ..).

As someone more recently joining, I'm quite curious about things that probably go into section 1:

What is the intended user/submitter audience?
- small lab groups with limited deployement skills?
What are some use cases that the software should support?
- i.e. a pathoplexus-like use-case, using it just for analysis of ENA data, using it just in your lab etc.
- I think these different use cases also influence the architecture quite a bit, with the quite modular setup we have (i.e. you don't need the ENA submission bit, or you don't need auth, or you can even use LAPIS on its own)
What is maybe not in the feature scope?
solution strategy: Could that be 'build a web app'? I'm not sure if that is something that should go in there?

Maybe also worth adding somewhere: LAPIS is not just a "headless loculus", right? maybe that's just a "historic" thing, but I think LAPIS is pretty independent from loculus in some ways, might be interesting to have that in there somewhere too as a context or driving factor.

Just some random assorted thoughts! Not sure if any of this is really in the scope of the architecture docs.

But I really like the structure and what's there so far!

anna-parker · 2024-11-06T10:54:42Z

architecture_docs/11_risks_and_technical_debt.md

+
+Some parts of the configuration are redundant and could be simplified.
+Also, the Helm chart contains a lot of default values 
+that are not suitable for general Loculus instances and will result in unexpected behavior if not overwritten.


Ena deposition was written as an optional component, however it still needs to submit all data and keep submission state. Therefore we duplicate all records and keep them in the backend db schema and the ena deposition schema - this creates unnecessary database bloat.

Although the two schemas are in the same db they behave as separate dbs with only the backend pod directly querying the public db schema and the ena-deposition and ingest (see below) pod querying the ena-deposition schema.

Potentially the ingest and ena-submission pod should be merged together as they both interact with INSDC and are optional. Additionally, ingest queries the ena deposition schema directly at the moment to ensure it does not reingest sequences that we submitted.

I added something 👍

fengelniederhammer · 2024-11-06T11:11:19Z

architecture_docs/03_context_and_scope.md

+We provide instructions how to install Loculus, and we host instances ourselves,
+but other maintainers can host their own instances as well.
+We offer guidance and documentation and are open for feature requests, but we do not provide direct support for custom instances.
+We cannot forsee all possible configurations and environments that Loculus might be deployed in.


Does someone have a suggestion how to define the scope better? What's in the scope, what's not in the scope?

fengelniederhammer · 2024-11-06T12:20:41Z

architecture_docs/09_architecture_decisions.md

+We decided to:
+* Use NCBI to download data, because it's [datasets cli](https://www.ncbi.nlm.nih.gov/datasets/docs/v2/download-and-install/)
+  is most convenient to use.
+* Use ENA to upload data, because TODO????????


@anna-parker @corneliusroemer Do you know why this was decided?

Because they have a submission API that is publicly documented and no other INSDC database has this.

anna-parker · 2024-11-06T12:22:34Z

architecture_docs/11_risks_and_technical_debt.md

+  where it duplicates the data from the backend to keep track of which sequences have already been uploaded. 
+2.  The ingest service accesses the same schema as the deposition to check which ingested sequences have been uploaded by Loculus.
+
+A solution to the first problem would be to adapt the backend such that it can track which sequences have been uploaded to ENA.


This was decided against to keep ena deposition optional but yes it might be a better solution

fengelniederhammer added 11 commits November 6, 2024 10:18

wip arc42

fe95ef1

wip arc42

5246c6c

wip arc42

7887d92

arc42 building blocks

0055772

arc42 runtime view

2c511df

arc42 runtime view

2f8a837

arc42 runtime view

7185251

rename directory

13bd0a4

arc42 deployment view

b8bced7

arc42 crosscutting concepts

105e352

arc42 ena deposition runtime view

157d0c0

fengelniederhammer commented Nov 6, 2024

View reviewed changes

architecture_docs/plantuml/06_ena_deposition.puml Outdated Show resolved Hide resolved

fengelniederhammer added 4 commits November 6, 2024 11:13

arc42 risks

9864127

arc42 quality requirements

f1d8c68

arc42 quality requirements

e43ed72

arc42 introduction

b5b9d86

arc42 solution strategy

79e3358

anna-parker reviewed Nov 6, 2024

View reviewed changes

fengelniederhammer added 2 commits November 6, 2024 12:00

arc42 more intro

d5a1ea2

arc42

47ab988

fengelniederhammer commented Nov 6, 2024

View reviewed changes

fengelniederhammer added 2 commits November 6, 2024 13:01

arc42 ena deposition risk and technical debt

3f8b8cb

arc42 adr

4ebf870

fengelniederhammer commented Nov 6, 2024

View reviewed changes

anna-parker reviewed Nov 6, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: add architecture documentation #3184

docs: add architecture documentation #3184

fengelniederhammer commented Nov 6, 2024

fhennig commented Nov 6, 2024

anna-parker Nov 6, 2024 •

edited

Loading

fengelniederhammer Nov 6, 2024

fengelniederhammer Nov 6, 2024

fengelniederhammer Nov 6, 2024

anna-parker Nov 6, 2024

anna-parker Nov 6, 2024

docs: add architecture documentation #3184

Are you sure you want to change the base?

docs: add architecture documentation #3184

Conversation

fengelniederhammer commented Nov 6, 2024

Summary

Screenshot

PR Checklist

fhennig commented Nov 6, 2024

anna-parker Nov 6, 2024 • edited Loading

Choose a reason for hiding this comment

fengelniederhammer Nov 6, 2024

Choose a reason for hiding this comment

fengelniederhammer Nov 6, 2024

Choose a reason for hiding this comment

fengelniederhammer Nov 6, 2024

Choose a reason for hiding this comment

anna-parker Nov 6, 2024

Choose a reason for hiding this comment

anna-parker Nov 6, 2024

Choose a reason for hiding this comment

anna-parker Nov 6, 2024 •

edited

Loading