-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docs: add architecture documentation #3184
base: main
Are you sure you want to change the base?
Conversation
Starting point: https://github.com/loculus-project/loculus/blob/arc42/architecture_docs/README.md I just read through it all, looks great already! I really like the 4 different view (block view, runtime, deployment, ..). As someone more recently joining, I'm quite curious about things that probably go into section 1:
Maybe also worth adding somewhere: LAPIS is not just a "headless loculus", right? maybe that's just a "historic" thing, but I think LAPIS is pretty independent from loculus in some ways, might be interesting to have that in there somewhere too as a context or driving factor. Just some random assorted thoughts! Not sure if any of this is really in the scope of the architecture docs. But I really like the structure and what's there so far! |
|
||
Some parts of the configuration are redundant and could be simplified. | ||
Also, the Helm chart contains a lot of default values | ||
that are not suitable for general Loculus instances and will result in unexpected behavior if not overwritten. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ena deposition was written as an optional component, however it still needs to submit all data and keep submission state. Therefore we duplicate all records and keep them in the backend db schema and the ena deposition schema - this creates unnecessary database bloat.
Although the two schemas are in the same db they behave as separate dbs with only the backend pod directly querying the public db schema and the ena-deposition and ingest (see below) pod querying the ena-deposition schema.
Potentially the ingest and ena-submission pod should be merged together as they both interact with INSDC and are optional. Additionally, ingest queries the ena deposition schema directly at the moment to ensure it does not reingest sequences that we submitted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added something 👍
We provide instructions how to install Loculus, and we host instances ourselves, | ||
but other maintainers can host their own instances as well. | ||
We offer guidance and documentation and are open for feature requests, but we do not provide direct support for custom instances. | ||
We cannot forsee all possible configurations and environments that Loculus might be deployed in. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does someone have a suggestion how to define the scope better? What's in the scope, what's not in the scope?
We decided to: | ||
* Use NCBI to download data, because it's [datasets cli](https://www.ncbi.nlm.nih.gov/datasets/docs/v2/download-and-install/) | ||
is most convenient to use. | ||
* Use ENA to upload data, because TODO???????? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@anna-parker @corneliusroemer Do you know why this was decided?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because they have a submission API that is publicly documented and no other INSDC database has this.
where it duplicates the data from the backend to keep track of which sequences have already been uploaded. | ||
2. The ingest service accesses the same schema as the deposition to check which ingested sequences have been uploaded by Loculus. | ||
|
||
A solution to the first problem would be to adapt the backend such that it can track which sequences have been uploaded to ENA. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was decided against to keep ena deposition optional but yes it might be a better solution
Summary
This adds architecture documentation following the https://docs.arc42.org/ template.
Screenshot
PR Checklist
- [ ] The implemented feature is covered by an appropriate test.