Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor/Rewrite motherlode #5

Open
motey opened this issue Apr 22, 2020 · 2 comments
Open

Refactor/Rewrite motherlode #5

motey opened this issue Apr 22, 2020 · 2 comments
Labels
Status: Suggested This issue is a suggestion for doing something new or different in CovidGraph Tag: Documentation About CovidGraph Documentation Tag: Help Wanted Extra attention is needed

Comments

@motey
Copy link
Member

motey commented Apr 22, 2020

Current Status

Motherlode is a proof of concept script at the moment. It works but the structure is not fitted for large scale expandability in future

Desired Status

Motherlode should be broke down to seperated classes and offer easy expandability and a more pleasent boarding for new devs

Tasks

[ ] Discuss possible structure/technologies with focus on future features
[ ] Declare/Define and document structure
[ ] implement changes

issues to take into account:
#8
#7

hint: no-holds-barred: Change of plattform/language is possible if its serves the goal. Discussion is open

@motey motey added Tag: Documentation About CovidGraph Documentation Status: Suggested This issue is a suggestion for doing something new or different in CovidGraph Tag: Help Wanted Extra attention is needed Type: Question This issue raises a question for discussion and removed Type: Question This issue raises a question for discussion labels Apr 22, 2020
@frankschmitt
Copy link
Collaborator

Some ideas off the top of my head:

  • use Neo4J to keep track of dependencies / order the source systems (currently, motherlode determines this itself, but since we require a running Neo4J instance anyway, we can just as well use its graph algorithms for tracking this info)
  • parallelize loading. Currently, all loaders run strictly sequential; loaders that don't depend on each other can be run in parallel (if the Neo4J instance and Docker host can handle the load)
  • evaluate existing FOSS tools (e.g. Apache Beam, Apache NiFi, Apache Kafka)

@motey
Copy link
Member Author

motey commented May 8, 2020

* use Neo4J to keep track of dependencies / order the source systems (currently, motherlode determines this itself, but since we require a running Neo4J instance anyway, we can just as well use its graph algorithms for tracking this info)

Had the same idea. but this would make bootstrapping motherlode harder.
Also wiping the database and refill it via motherlode will not be possible.

On the other hand having the information which datasources are loaded (and even the possiblity connect data to its datasource) is pretty compelling. Maybe a hybrid approach would be one good solution. This could be achieved by extending the :LoadingLog functionality (

def create_log_node(dataloader_name, image):
)

parallelize loading. Currently, all loaders run strictly sequential; loaders that don't depend on each other can be run in parallel (if the Neo4J instance and Docker host can handle the load)

YES! i will create an issue for that

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Suggested This issue is a suggestion for doing something new or different in CovidGraph Tag: Documentation About CovidGraph Documentation Tag: Help Wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants