Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update the readme examples to include tuning and setting global logger #45

Merged
merged 5 commits into from
Aug 2, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 5 additions & 2 deletions .github/workflows/CI.yml
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,10 @@ jobs:
JULIA_NUM_THREADS: '2'
MLFLOW_TRACKING_URI: "http://localhost:5000/api"
- uses: julia-actions/julia-processcoverage@v1
- uses: codecov/codecov-action@v3
- uses: codecov/codecov-action@v4
with:
files: lcov.info
token: ${{ secrets.CODECOV_TOKEN }}
fail_ci_if_error: false
verbose: true


100 changes: 88 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,8 @@

[ci-dev]: https://github.com/pebeto/MLJFlow.jl/actions/workflows/CI.yml
[ci-dev-img]: https://github.com/pebeto/MLJFlow.jl/actions/workflows/CI.yml/badge.svg?branch=dev "Continuous Integration (CPU)"
[codecov-dev]: https://codecov.io/github/JuliaAI/MLJFlow.jl?branch=dev
[codecov-dev-img]: https://codecov.io/gh/JuliaAI/MLJFlow.jl/branch/dev/graphs/badge.svg?branch=dev "Code Coverage"
[codecov-dev]: https://codecov.io/github/JuliaAI/MLJFlow.jl
[codecov-dev-img]: https://codecov.io/github/JuliaAI/MLJFlow.jl/graph/badge.svg?token=TBCMJOK1WR "Code Coverage"

[MLJ](https://github.com/alan-turing-institute/MLJ.jl) is a Julia framework for
combining and tuning machine learning models. MLJFlow is a package that extends
Expand All @@ -22,7 +22,7 @@ metrics, log parameters, log artifacts, etc.).
This project is part of the GSoC 2023 program. The proposal description can be
found [here](https://summerofcode.withgoogle.com/programs/2023/projects/iRxuzeGJ).
The entire workload is divided into three different repositories:
[MLJ.jl](https://github.com/alan-turing-institute/MLJ.jl),
[MLJ.jl](https://github.com/alan-turing-institute/MLJ.jl),
[MLFlowClient.jl](https://github.com/JuliaAI/MLFlowClient.jl) and this one.

## Features
Expand All @@ -33,14 +33,14 @@ The entire workload is divided into three different repositories:
- [x] Provides a wrapper `Logger` for MLFlowClient.jl clients and associated
metadata; instances of this type are valid "loggers", which can be passed to MLJ
functions supporting the `logger` keyword argument.

- [x] Provides MLflow integration with MLJ's `evaluate!`/`evaluate` method (model
**performance evaluation**)

- [x] Extends MLJ's `MLJ.save` method, to save trained machines as retrievable MLflow
client artifacts

- [ ] Provides MLflow integration with MLJ's `TunedModel` wrapper (to log **hyper-parameter
- [x] Provides MLflow integration with MLJ's `TunedModel` wrapper (to log **hyper-parameter
tuning** workflows)

- [ ] Provides MLflow integration with MLJ's `IteratedModel` wrapper (to log **controlled
Expand All @@ -60,8 +60,8 @@ shell/console, run `mlflow server` to launch an mlflow service on a local server
Refer to the [MLflow documentation](https://www.mlflow.org/docs/latest/index.html) for
necessary background.

We assume MLJDecisionTreeClassifier is in the user's active Julia package
environment.
**Important.** For the examples that follow, we assume `MLJ`, `MLJDecisionTreeClassifier`
and `MLFlowClient` are in the user's active Julia package environment.

```julia
using MLJ # Requires MLJ.jl version 0.19.3 or higher
Expand All @@ -73,7 +73,7 @@ instance. The experiment name and artifact location are optional.
```julia
logger = MLJFlow.Logger(
"http://127.0.0.1:5000/api";
experiment_name="MLJFlow test",
experiment_name="test",
artifact_location="./mlj-test"
)
```
Expand All @@ -89,25 +89,54 @@ model = DecisionTreeClassifier(max_depth=4)
Now we call `evaluate` as usual but provide the `logger` as a keyword argument:

```julia
evaluate(model, X, y, resampling=CV(nfolds=5), measures=[LogLoss(), Accuracy()], logger=logger)
evaluate(
model,
X,
y,
resampling=CV(nfolds=5),
measures=[LogLoss(), Accuracy()],
logger=logger,
)
```

Navigate to "http://127.0.0.1:5000" on your browser and select the "Experiment" matching
the name above ("MLJFlow test"). Select the single run displayed to see the logged results
of the performance evaluation.


### Logging outcomes of model tuning

Continuing with the previous example:

```julia
r = range(model, :max_depth, lower=1, upper=5)
tmodel = TunedModel(
model,
tuning=Grid(),
range = r;
resampling=CV(nfolds=9),
measures=[LogLoss(), Accuracy()],
logger=logger,
)

mach = machine(tmodel, X, y) |> fit!
```

Return to the browser page (refreshing if necessary) and you will find five more
performance evaluations logged, one for each value of `max_depth` evaluated in tuning.


### Saving and retrieving trained machines as MLflow artifacts

Let's train the model on all data and save the trained machine as an MLflow artifact:

```julia
mach = machine(model, X, y) |> fit!
run = MLJBase.save(logger, mach)
run = MLJ.save(logger, mach)
```

Notice that in this case `MLJBase.save` returns a run (and instance of `MLFlowRun` from
MLFlowClient.jl).
Notice that in this case `MLJBase.save` returns a run (an instance of `MLFlowRun` from
MLFlowClient.jl).

To retrieve an artifact we need to use the MLFlowClient.jl API, and for that we need to
know the MLflow service that our `logger` wraps:
Expand All @@ -129,3 +158,50 @@ We can predict using the deserialized machine:
```julia
predict(mach2, X)
```

### Setting a global logger

Set `logger` as the global logging target by running `default_logger(logger)`. Then,
unless explicitly overridden, all loggable workflows will log to `logger`. In particular,
to *suppress* logging, you will need to specify `logger=nothing` in your calls.

So, for example, if we run the following setup

```julia
using MLJ

# using a new experiment name here:
logger = MLJFlow.Logger(
"http://127.0.0.1:5000/api";
experiment_name="test global logging",
artifact_location="./mlj-test"
)

default_logger(logger)

X, y = make_moons(100) # a table and a vector with 100 rows
DecisionTreeClassifier = @load DecisionTreeClassifier pkg=DecisionTree
model = DecisionTreeClassifier()
```

Then the following is automatically logged

```julia
evaluate(model, X, y)
```

But the following is *not* logged:


```julia
evaluate(model, X, y; logger=nothing)
```

To save a machine when a default logger is set, one can use the following syntax:

```julia
mach = machine(model, X, y) |> fit!
MLJ.save(mach)
```

Retrieve the saved machine as described earlier.
Loading