Merge pull request #43 from alan-turing-institute/dev

For a 0.3.4 release
JuliaAI · May 7, 2020 · 3bf16f8 · 3bf16f8
2 parents adc66b5 + ca2b161
commit 3bf16f8
Show file tree

Hide file tree

Showing 9 changed files with 405 additions and 81 deletions.
diff --git a/.travis.yml b/.travis.yml
@@ -2,6 +2,8 @@
 language: julia
 os:
   - linux
+env:
+  - JULIA_NUM_THREADS=30
 julia:
   - 1.0
   - 1.1

diff --git a/Project.toml b/Project.toml
@@ -1,14 +1,15 @@
 name = "MLJTuning"
 uuid = "03970b2e-30c4-11ea-3135-d1576263f10f"
 authors = ["Anthony D. Blaom <[email protected]>"]
-version = "0.3.3"
+version = "0.3.4"
 
 [deps]
 ComputationalResources = "ed09eef8-17a6-5b46-8889-db040fac31e3"
 Distributed = "8ba89e20-285c-5b6f-9357-94700520ee1b"
 Distributions = "31c24e10-a181-5473-b8eb-7969acd0382f"
 MLJBase = "a7f614a8-145f-11e9-1d2a-a57a1082229d"
 MLJModelInterface = "e80e1ace-859a-464e-9ed9-23947d8ae3ea"
+ProgressMeter = "92933f4c-e287-5a05-a399-4b506db050ca"
 Random = "9a3f8284-a2c9-5f02-9a11-845980a1fd5c"
 RecipesBase = "3cdcf5f2-1ef4-517c-9805-6587b60abb01"
 
@@ -17,6 +18,7 @@ ComputationalResources = "^0.3"
 Distributions = "^0.22,^0.23"
 MLJBase = "^0.12.2,^0.13"
 MLJModelInterface = "^0.2"
+ProgressMeter = "^1.1"
 RecipesBase = "^0.8"
 julia = "^1"
 

diff --git a/README.md b/README.md
@@ -11,7 +11,7 @@ learning models.
 
  - [Who is this repo for?](#who-is-this-repo-for)
  - [What's provided here?](#what-is-provided-here)
- - [How do I implement a new tuning strategy?](#How-do-I-implement-a-new-tuning-strategy)
+ - [How do I implement a new tuning strategy?](#how-do-i-implement-a-new-tuning-strategy)
 
 *Note:* This component of the [MLJ
   stack](https://github.com/alan-turing-institute/MLJ.jl#the-mlj-universe)
@@ -31,9 +31,8 @@ hyperparameter optimization tasks from there.
 MLJTuning is the place for developers to integrate hyperparameter
 optimization algorithms (here called *tuning strategies*) into MLJ,
 either by adding code to [/src/strategies](/src/strategies), or by
-importing MLJTuning into a third-party package and implementing
-MLJTuning's [tuning strategy
-interface](#implementing-a-new-tuning-strategy).
+importing MLJTuning into a third-pary package and implementing
+MLJTuning's [tuning strategy interface](#how-do-i-implement-a-new-tuning-strategy).
 
 MLJTuning is a component of the [MLJ
   stack](https://github.com/alan-turing-institute/MLJ.jl#the-mlj-universe)
@@ -57,27 +56,36 @@ This repository contains:
   strategy) before training the optimal model on all supplied data
 
 - an abstract **[tuning strategy
-  interface]((#implementing-a-new-tuning-strategy))** to allow
+  interface](#how-do-i-implement-a-new-tuning-strategy)** to allow
   developers to conveniently implement common hyperparameter
-  optimization strategies, such as those in Table 1 (already
-  implemented) and the following: Latin hypercubes, bandit, simulated
-  annealing, Bayesian optimization using Gaussian processes,
-  structured tree Parzen estimators, multi-objective (Pareto)
-  optimization, genetic algorithms, AD-powered gradient descent
-  methods
-
-- the **implementations** of the tuning strategy interface given
-  below, which come pre-loaded into 
-  [MLJ](https://github.com/alan-turing-institute/MLJ.jl)
+  optimization strategies, such as:
+
+  - [x] search models generated by an arbitrary iterator, eg `models = [model1,
+	model2, ...]` (built-in `Explicit` strategy)
+
+  - [x] grid search (built-in `Grid` strategy)
 
-tuning strategy      | type                | providing package
----------------------|---------------------|-------------------------
-explicit search      | `Explicit`          | MLJTuning.jl
-grid search          | `Grid`              | MLJTuning.jl
-random search        | `RandomSearch`      | MLTuning.jl 
+  - [ ] Latin hypercubes
+
+  - [x] random search (built-in `RandomSearch` strategy)
 
-> Table 1. Implemented tuning strategies
+  - [ ] bandit
+
+  - [ ] simulated annealing
+
+  - [ ] Bayesian optimization using Gaussian processes
+
+  - [ ] structured tree Parzen estimators
+
+  - [ ] multi-objective (Pareto) optimization
 
+  - [ ] genetic algorithms
+
+  - [ ] AD-powered gradient descent methods 
+
+- a selection of **implementations** of the tuning strategy interface,
+  currently all those accessible from
+  [MLJ](https://github.com/alan-turing-institute/MLJ.jl) itself.
 
 - the code defining the MLJ functions `learning_curves!` and `learning_curve` as
   these are essentially one-dimensional grid searches
@@ -129,8 +137,8 @@ begin, on the basis of the specific strategy and a user-specified
 - An *evaluation* is the value returned by some call to the
   `evaluate!` method, when passed the resampling strategy (e.g.,
   `CV(nfolds=9)` and performance measures specified by the user when
-  specifying the tuning task (e.g., `cross_entropy`,
-  `accuracy`, `mae`). Recall that such a value is a named tuple of vectors
+  specifying the tuning task (e.g., `cross_entropy`b,
+  `accuracy`). Recall that such a value is a named tuple of vectors
   with keys `measure`, `measurement`, `per_fold`, and
   `per_observation`. See [Evaluating Model
   Performance](https://alan-turing-institute.github.io/MLJ.jl/dev/evaluating_model_performance/)
@@ -361,7 +369,7 @@ algorithm (available to the `models!` method). Be sure to make this
 object mutable if it needs to be updated by the `models!` method. 
 
 The `state` is a place to record the outcomes of any necessary
-initialization of the tuning algorithm (performed by `setup`) and a
+intialization of the tuning algorithm (performed by `setup`) and a
 place for the `models!` method to save and read transient information
 that does not need to be recorded in the history.
 
@@ -370,6 +378,11 @@ is `fit!` the first time, and not on subsequent calls (unless
 `force=true`). (Specifically, `MLJBase.fit(::TunedModel, ...)` calls
 `setup` but `MLJBase.update(::TunedModel, ...)` does not.)
 
+The `setup` function is called once only, when a `TunedModel` machine
+is `fit!` the first time, and not on subsequent calls (unless
+`force=true`). (Specifically, `MLJBase.fit(::TunedModel, ...)` calls
+`setup` but `MLJBase.update(::TunedModel, ...)` does not.)
+
 The `verbosity` is an integer indicating the level of logging: `0`
 means logging should be restricted to warnings, `-1`, means completely
 silent.
@@ -397,7 +410,7 @@ MLJTuning.models!(tuning::MyTuningStrategy, model, history, state, verbosity)
 ```
 
 This is the core method of a new implementation. Given the existing
-`history` and `state`, it must return a vector ("batch") of new
+`history` and `state`, it must return a vector ("batch") of *new*
 model instances to be evaluated. Any number of models can be returned
 (and this includes an empty vector or `nothing`, if models have been
 exhausted) and the evaluations will be performed in parallel (using
@@ -425,7 +438,7 @@ case).
 If a tuning strategy implementation needs to pass additional
 "metadata" along with each model, to be passed to `result` for
 recording in the history, then instead of model instances, `models!`
-should return a vector of *tuples* of the form `(m, metadata)`, where
+should returne a vector of *tuples* of the form `(m, metadata)`, where
 `m` is a model instance, and `metadata` the associated data. See the
 discussion above on `result`.
 
@@ -493,7 +506,7 @@ model
 
 - `tuning_report(::MyTuningStrategy, ...)` is a method the implementer
   may overload. It should return a named tuple with `history` as one
-  of the keys (the format being up to the implementation). The fallback is
+  of the keys (the format up to the implementation.) The fallback is
   to return the raw history:
 
 ```julia
@@ -532,7 +545,7 @@ MLJTuning.DEFAULT_N` to see check the current value.
 The most rudimentary tuning strategy just evaluates every model
 generated by some iterator, such iterators constituting the only kind
 of supported range. The models generated must all have a common type
-and, in the implementation below, the type information is conveyed by
+and, in th implementation below, the type information is conveyed by
 the specified prototype `model` (which is otherwise ignored).  The
 fallback implementations for `result`, `best` and `report_history`
 suffice.

diff --git a/src/MLJTuning.jl b/src/MLJTuning.jl
@@ -25,6 +25,7 @@ import Distributions
 import ComputationalResources: CPU1, CPUProcesses,
     CPUThreads, AbstractResource
 using Random
+using ProgressMeter
 
 
 ## CONSTANTS