Monitoring convergence in gradient-based optimization

**Is your feature request related to a problem? Please describe.**

Currently, model training via gradient-based optimization in Tribuo [terminates after a fixed number of epochs](https://github.com/oracle/tribuo/blob/main/Common/SGD/src/main/java/org/tribuo/common/sgd/AbstractSGDTrainer.java#L197). The main problem with maximum iteration number as a stopping criterion is that there is no relation between the stopping criterion and the optimality of the current iterate. It is difficult to know a priori how many epochs will be sufficient for a given training problem, and there are costs to over- or under-estimating this number (but especially underestimation). 

**Describe the solution you'd like**

Ideally, for iterative gradient-based optimization we would be able to use problem-specific stopping criteria such as a threshold on relative reduction of the loss or the norm of the gradient. Typically these are accompanied by a (large) max-epoch cutoff to bound computation time and catch cases where the loss diverges. For stochastic algorithms we could also consider [early stopping rules](https://scikit-learn.org/stable/auto_examples/linear_model/plot_sgd_early_stopping), for example based on the loss on a held-out validation set.

Are there any plans to implement zero- or first-order stopping criteria for optimizers extending AbstractSGDTrainer? Are there other workarounds for checking convergence of the optimizer in the case of linear and logistic regression?

**Describe alternatives you've considered**

An alternative to implementing new stopping criteria could be to (optionally) report some metric(s) relevant to the specific training problem after training is "completed" according to the max-epoch rule. These could include the norm of the gradient or a sequence of loss values at each epoch. 

One alternative that does not work in general is to change the optimization algorithm from the standard SGD. All optimizers implement some form of iterative, gradient-based optimization, so they all face the same problem of enforcing an appropriate stopping criterion. 



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Monitoring convergence in gradient-based optimization #293

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Monitoring convergence in gradient-based optimization #293

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions