Home

Scheduler IoW Notes

Current Challenges

build files (implnet_{jobs,ops}_*.py) are tracked in the repo, making a verbose git history and PRs more work to review
multiple organizations have stored configurations in the repo, causing a higher burden on maintainers
build is done with environment variables and multiple coupled components instead of one build script, making it more challenging to debug, test, and refactor

Build the gleanerconfig.yml
- This config builds upon a gleanerconfigPREFIX.yaml file that is the base template
- Each organization has a nabuconfig.yaml which specifies configuration and context for how to retrieve triplet data and how to store it in minio
Generate the jobs/ ops/ sch/ and repositories/ directories which container the Python files that describe when to run the job
Generate the workspace.yaml file which describes the relative path for the Python file which contains references to all the jobs
- This can likely be eliminated when refactoring

Condense code into one central Python build program
- Use https://github.com/docker/docker-py to control the containers instead of shell scripts. (Makes it easier to test and debug to have it all in one language as a data pipeline)
- By using a cli library like https://typer.tiangolo.com/ we can validate argument correctness and fail early, making it easier to debug instead of reading in the arguments and failing after containers are spun up
Move all build files to the root of the repo to make it more clear for end users
- (i.e. makefiles, build/ directory, etc.)
Refactor such that individual organizations store their configuration outside the repo.
- The Python build program should be able to read the configuration files at an arbitrary path that the user specifices
Add types and doc strings for easier maintenance long term
Use jinja templating instead of writing raw text to the output files
Currently jobs are ran by hard coding them in a template in a Python file
- Unclear if this is scalable to huge datasets. Probably best to use a generator so we do not need to load everything into the ast