This repository has been archived by the owner on Feb 1, 2024. It is now read-only.

AlexandreKempf / HackDuck Public archive

Notifications You must be signed in to change notification settings
Fork 0
Star 1

Machine learning data flow for reproducible data science

1 star 0 forks Branches Tags Activity

Notifications

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.github/workflows		.github/workflows
src/HackDuck		src/HackDuck
.gitignore		.gitignore
LICENCE		LICENCE
README.md		README.md
hackduck		hackduck
setup.py		setup.py

Repository files navigation

IDEAL HACKDUCK PROJECT

Run model from with a REST app (MLflow):

save a github folder for each project
can easely have predition on a bunch of data

FEATURES:

seed for reproducibility
map arguments to loop over a list
mlflow integration (automatic logs parameters, can log metrics or artifacts)
all prefect avantages
handle subflows
task bank to do basic operations
unit test handle by ward

TODO:

map over subflows ?
run it in a docker
save version for all requirements (needed to rerun the flow)
save python files inside mlruns/... and git them and save git commit
being able to rerun a previous flow (save args and kwargs and output ref)
put to prod thanks to travis CI that create the MLflow git repo
generate examples for people to use

use it

hackduck config.yaml --threshold 5

or

from HackDuck import run_flow
import yaml
config = yaml.load(open('config.yaml', 'r'), Loader=yaml.FullLoader)
run_flow(config, {'threshold': 5})

About

Machine learning data flow for reproducible data science

Report repository

Releases 6

Packages

No packages published

Languages

Python 100.0%