Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add multi-precision support #23

Open
wants to merge 23 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 20 commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion Project.toml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
name = "FluxNLPModels"
uuid = "31fab0eb-bb78-4d15-8993-a8083bba6d27"
authors = ["Farhad Rahbarnia <[email protected]>"]
version = "0.0.1"
version = "0.1.0"

[deps]
Flux = "587475ba-b771-5e3f-ad9e-33799f191a9c"
Expand Down
3 changes: 1 addition & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,8 @@
[![Dev](https://img.shields.io/badge/docs-dev-blue.svg)](https://JuliaSmoothOptimizers.github.io/FluxNLPModels.jl/dev)
[![Build Status](https://github.com/JuliaSmoothOptimizers/FluxNLPModels.jl/workflows/CI/badge.svg)](https://github.com/JuliaSmoothOptimizers/FluxNLPModels.jl/actions)
[![Codecov](https://codecov.io/gh/JuliaSmoothOptimizers/FluxNLPModels.jl/branch/main/graph/badge.svg)](https://codecov.io/gh/JuliaSmoothOptimizers/FluxNLPModels.jl)
<!-- TODO check the links -->

This package serves as an NLPModels interface to the [Flux.jl](https://github.com/FluxML/Flux.jl) deep learning framework. It enables seamless integration between Flux's neural network architectures and NLPModels' optimization tools for natural language processing tasks.
This package serves as an NLPModels interface to the [Flux.jl](https://github.com/FluxML/Flux.jl) deep learning framework. It enables seamless integration between Flux's neural network architectures and NLPModels' optimization tools for non-linear programming (NLP) problems.

## Installation

Expand Down
1 change: 0 additions & 1 deletion docs/make.jl
Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@
#TODO redo this section
using Documenter, FluxNLPModels

makedocs(
Expand Down
17 changes: 5 additions & 12 deletions docs/src/index.md
Original file line number Diff line number Diff line change
@@ -1,19 +1,18 @@
#TODO redo this section
# FluxNLPModels.jl

## Compatibility
Julia ≥ 1.6.

## How to install
TODO: this section needs work since our package is not yet register

This module can be installed with the following command:
```julia
# pkg> add FluxNLPModels
# pkg> test FluxNLPModels
pkg> add FluxNLPModels
```

## Synopsis
FluxNLPModels exposes neural network models as optimization problems conforming to the NLPModels.jl API. FluxNLPModels is an interface between [Flux.jl](https://github.com/FluxML/Flux.jl)'s classification neural networks and [NLPModels.jl](https://github.com/JuliaSmoothOptimizers/NLPModels.jl.git).

FluxNLPModels exposes neural network models as optimization problems conforming to the [NLPModels API](https://github.com/JuliaSmoothOptimizers/NLPModels.jl). FluxNLPModels is an interface between [Flux.jl](https://github.com/FluxML/Flux.jl)'s classification neural networks and [NLPModels.jl](https://github.com/JuliaSmoothOptimizers/NLPModels.jl).

A `FluxNLPModel` gives the user access to:
- The values of the neural network variables/weights `w`;
Expand All @@ -25,13 +24,7 @@ In addition, it provides tools to:
- Retrieve the current minibatch ;
- Measure the neural network's loss at the current `w`.

## How to use
Check the tutorials
<!-- Check the [tutorial](https://juliasmoothoptimizers.github.io/FluxNLPModels.jl/stable/). -->

# Bug reports and discussions

If you think you found a bug, feel free to open an [issue]<!--(https://github.com/JuliaSmoothOptimizers/FluxNLPModels.jl/issues). --> TODO: add repo link
Focused suggestions and requests can also be opened as issues. Before opening a pull request, please start an issue or a discussion on the topic.
If you encounter any bugs or have suggestions for improvement, please open an [issue](https://github.com/JuliaSmoothOptimizers/FluxNLPModels.jl/issues). For general questions or discussions related to this repository and the [JuliaSmoothOptimizers](https://github.com/JuliaSmoothOptimizers) organization, feel free to start a discussion [here](https://github.com/JuliaSmoothOptimizers/Organization/discussions).

If you have a question that is not suited for a bug report, feel free to start a discussion [here](#TODO). This forum is for general discussion about this repository and the [JuliaSmoothOptimizers](https://github.com/JuliaSmoothOptimizers). Questions about any of our packages are welcome.
65 changes: 22 additions & 43 deletions docs/src/tutorial.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
# FluxNLPModels.jl Tutorial

## Setting up
This step-by-step example assumes prior knowledge of [Julia](https://julialang.org/) and [Flux.jl](https://github.com/FluxML/Flux.jl).
See the [Julia tutorial](https://julialang.org/learning/) and the [Flux.jl tutorial](https://fluxml.ai/Flux.jl/stable/models/quickstart/#man-quickstart) for more details.
Expand Down Expand Up @@ -39,23 +40,13 @@ using JSOSolvers
First, a NN model needs to be define in Flux.jl.
Our model is very simple: It consists of one "hidden layer" with 32 "neurons", each connected to every input pixel. Each neuron has a sigmoid nonlinearity and is connected to every "neuron" in the output layer. Finally, softmax produces probabilities, i.e., positive numbers that add up to 1.

We have two ways of defining the models:

1. **Direct Definition**: You can directly define the model in your code, specifying the layers and their connections using Flux's syntax. This approach allows for more flexibility and customization.
```@example FluxNLPModel
model = Flux.Chain(Dense(28^2=> 32, relu), Dense(32=>10))
```

2. **Method-Based Definition**: Alternatively, you can create a method that returns the model. This method can encapsulate the specific architecture and parameters of the model, making it easier to reuse and manage. It provides a convenient way to define and initialize the model when needed.
```@example FluxNLPModel
function build_model(; imgsize = (28, 28, 1), nclasses = 10)
return Chain(Dense(prod(imgsize), 32, relu), Dense(32, nclasses))
end
```
One can create a method that returns the model. This method can encapsulate the specific architecture and parameters of the model, making it easier to reuse and manage. It provides a convenient way to define and initialize the model when needed.



Both approaches have their advantages, and you can choose the one that suits your needs and coding style.
```@example FluxNLPModel
function build_model(; imgsize = (28, 28, 1), nclasses = 10)
return Chain(Dense(prod(imgsize), 32, relu), Dense(32, nclasses))
end
```

### Loss function

Expand All @@ -65,35 +56,15 @@ We can define any loss function that we need, here we use Flux build-in logitcro
const loss = Flux.logitcrossentropy
```

We also define a loss function `loss_and_accuracy`.
```@example FluxNLPModel
function loss_and_accuracy(data_loader, model, device)
acc = 0
ls = 0.0f0
num = 0
for (x, y) in data_loader
x, y = device(x), device(y)
ŷ = model(x)
ls += loss(ŷ, y, agg = sum)
acc += sum(onecold(ŷ) .== onecold(y)) ## Decode the output of the model
num += size(x)[end]
end
return ls / num, acc / num
end
```


### Load datasets and define minibatch
In this section, we will cover the process of loading datasets and defining minibatches for training your model using Flux. Loading and preprocessing data is an essential step in machine learning, as it allows you to train your model on real-world examples.

We will specifically focus on loading the MNIST dataset. We will divide the data into training and testing sets, ensuring that we have separate data for model training and evaluation.

Additionally, we will define minibatches, which are subsets of the dataset that are used during the training process. Minibatches enable efficient training by processing a small batch of examples at a time, instead of the entire dataset. This technique helps in managing memory resources and improving convergence speed.



```@example FluxNLPModel
function getdata(batchsize)
function getdata(bs)
ENV["DATADEPS_ALWAYS_ACCEPT"] = "true"

# Loading Dataset
Expand All @@ -108,14 +79,13 @@ function getdata(batchsize)
ytrain, ytest = onehotbatch(ytrain, 0:9), onehotbatch(ytest, 0:9)

# Create DataLoaders (mini-batch iterators)
train_loader = DataLoader((xtrain, ytrain), batchsize = batchsize, shuffle = true)
test_loader = DataLoader((xtest, ytest), batchsize = batchsize)
train_loader = DataLoader((xtrain, ytrain), batchsize = bs, shuffle = true)
test_loader = DataLoader((xtest, ytest), batchsize = bs)

return train_loader, test_loader
end
```


### Transfering to FluxNLPModels

```@example FluxNLPModel
Expand All @@ -129,9 +99,6 @@ end
nlp = FluxNLPModel(model, train_loader, test_loader; loss_f = loss)
```




## Tools associated with a FluxNLPModel
The problem dimension `n`, where `w` ∈ ℝⁿ:
```@example FluxNLPModel
Expand All @@ -154,4 +121,16 @@ The length of `w` must be `nlp.meta.nvar`.
```@example FluxNLPModel
g = similar(w)
NLPModels.grad!(nlp, w, g)
```

## Train a neural network with JSOSolvers.R2

```@example FluxNLPModel
max_time = 60. # run at most 1min
callback = (nlp,
solver,
stats) -> FluxNLPModels.minibatch_next_train!(nlp)

solver_stats = R2(nlp; callback, max_time)
test_accuracy = FluxNLPModels.accuracy(nlp) #check the accuracy
```
42 changes: 31 additions & 11 deletions src/FluxNLPModels.jl
Original file line number Diff line number Diff line change
Expand Up @@ -29,44 +29,63 @@ A FluxNLPModel has fields
"""
mutable struct FluxNLPModel{T, S, C <: Chain, F <: Function} <: AbstractFluxNLPModel{T, S}
meta::NLPModelMeta{T, S}
chain::C
chain::Vector{C}
counters::Counters
loss_f::F
size_minibatch::Int #TODO remove this
training_minibatch_iterator #TODO remove this, right now we pass the data
test_minibatch_iterator #TODO remove this
size_minibatch::Int
training_minibatch_iterator
test_minibatch_iterator
current_training_minibatch
current_test_minibatch
rebuild # this is used to create the rebuild of flat function
rebuild # this is used to create the rebuild of flat function
current_training_minibatch_status
current_test_minibatch_status
w::S
Types::Vector{DataType}
end

"""
FluxNLPModel(chain_ANN data_train=MLDatasets.MNIST.traindata(Float32), data_test=MLDatasets.MNIST.testdata(Float32); size_minibatch=100)
FluxNLPModel(chain_ANN, data_train=MLDatasets.MNIST.traindata(Float32), data_test=MLDatasets.MNIST.testdata(Float32); size_minibatch=100)

Build a `FluxNLPModel` from the neural network represented by `chain_ANN`.
`chain_ANN` is built using [Flux.jl](https://fluxml.ai/) for more details.
The other data required are: an iterator over the training dataset `data_train`, an iterator over the test dataset `data_test` and the size of the minibatch `size_minibatch`.
Suppose `(xtrn,ytrn) = Fluxnlp.data_train`
"""

function FluxNLPModel(
chain_ANN::C,
data_train,
data_test;
kwargs...
) where {C <: Chain}
FluxNLPModel([chain_ANN],data_train,data_test;kwargs...)
end

function FluxNLPModel(
chain_ANN::T,
chain_ANN::Vector{T},
data_train,
data_test;
current_training_minibatch = first(data_train),
current_test_minibatch = first(data_test),
current_training_minibatch = [],
current_test_minibatch = [],
size_minibatch::Int = 100,
loss_f::F = Flux.mse, #Flux.crossentropy,
) where {T <: Chain, F <: Function}
x0, rebuild = Flux.destructure(chain_ANN)
d = Flux.destructure.(chain_ANN)
rebuild = [del[2] for del in d]
x0 = d[end][1]
Types = eltype.([del[1] for del in d])
n = length(x0)
meta = NLPModelMeta(n, x0 = x0)
if (isempty(data_train) || isempty(data_test))
error("train data or test is empty")
end

if (isempty(current_training_minibatch) || isempty(current_test_minibatch))
current_training_minibatch = first(data_train)
current_test_minibatch = first(data_test)
end
test_types_consistency(Types,data_train,data_test)
test_devices_consistency(chain_ANN,data_train,data_test)
return FluxNLPModel(
meta,
chain_ANN,
Expand All @@ -81,6 +100,7 @@ function FluxNLPModel(
nothing,
nothing,
x0,
Types,
)
end

Expand Down
64 changes: 39 additions & 25 deletions src/FluxNLPModels_methods.jl
Original file line number Diff line number Diff line change
@@ -1,58 +1,74 @@
"""
f = obj(nlp, x)
f = obj(nlp, w)

Evaluate `f(x)`, the objective function of `nlp` at `x`.
Evaluate `f(w)`, the objective function of `nlp` at `w`.

# Arguments
- `nlp::AbstractFluxNLPModel{T, S}`: the FluxNLPModel data struct
- `w::AbstractVector{T}`: is the vector of weights/variables;
- `nlp::AbstractFluxNLPModel{T, S}`: the FluxNLPModel data struct;
- `w::AbstractVector{T}`: is the vector of weights/variables.

# Output
- `f_w`: the new objective function
- `f_w`: the new objective function.

"""
function NLPModels.obj(nlp::AbstractFluxNLPModel{T, S}, w::AbstractVector{T}) where {T, S}
function NLPModels.obj(nlp::AbstractFluxNLPModel{T, S}, w::AbstractVector) where {T, S}
increment!(nlp, :neval_obj)
set_vars!(nlp, w)
x, y = nlp.current_training_minibatch
return nlp.loss_f(nlp.chain(x), y)
type_ind = find_type_index(nlp,w)
return nlp.loss_f(nlp.chain[type_ind](x), y)
end

"""
g = grad!(nlp, x, g)
g = grad!(nlp, w, g)

Evaluate `∇f(w)`, the gradient of the objective function at `w` in place.

Evaluate `∇f(x)`, the gradient of the objective function at `x` in place.
# Arguments
- `nlp::AbstractFluxNLPModel{T, S}`: the FluxNLPModel data struct
- `nlp::AbstractFluxNLPModel{T, S}`: the FluxNLPModel data struct;
- `w::AbstractVector{T}`: is the vector of weights/variables;
-`g::AbstractVector{T}`: the gradient vector
- `g::AbstractVector{T}`: the gradient vector.

# Output
- `g`: the gradient at point x
- `g`: the gradient at point `w`.

"""
function NLPModels.grad!(
nlp::AbstractFluxNLPModel{T, S},
w::AbstractVector{T},
g::AbstractVector{T},
w::AbstractVector,
g::AbstractVector,
) where {T, S}
@lencheck nlp.meta.nvar w g
increment!(nlp, :neval_grad)
x, y = nlp.current_training_minibatch
g .= gradient(w_g -> local_loss(nlp, x, y, w_g), w)[1]
#check_weights_data_type(w,x)
increment!(nlp, :neval_grad)
type_ind = find_type_index(nlp,w)
nlp.chain[type_ind] = nlp.rebuild[type_ind](w)

g .= gradient(w_g -> local_loss(nlp,nlp.rebuild[type_ind],x,y,w_g),w)[1]
return g
end

function NLPModels.grad(
nlp::AbstractFluxNLPModel{T, S},
w::AbstractVector,
) where {T, S}
g = similar(w)
grad!(nlp,w,g)
end

"""
objgrad!(nlp, x, g)
objgrad!(nlp, w, g)

Evaluate both `f(x)`, the objective function of `nlp` at `x` and `∇f(x)`, the gradient of the objective function at `x` in place.
Evaluate both `f(w)`, the objective function of `nlp` at `w`, and `∇f(w)`, the gradient of the objective function at `w` in place.

# Arguments
- `nlp::AbstractFluxNLPModel{T, S}`: the FluxNLPModel data struct
- `nlp::AbstractFluxNLPModel{T, S}`: the FluxNLPModel data struct;
- `w::AbstractVector{T}`: is the vector of weights/variables;
-`g::AbstractVector{T}`: the gradient vector
- `g::AbstractVector{T}`: the gradient vector.

# Output
- `f_w`, `g`: the new objective function, and the gradient at point x
- `f_w`, `g`: the new objective function, and the gradient at point w.

"""
function NLPModels.objgrad!(
Expand All @@ -61,14 +77,12 @@ function NLPModels.objgrad!(
g::AbstractVector{T},
) where {T, S}
@lencheck nlp.meta.nvar w g
#both updates
increment!(nlp, :neval_obj)
increment!(nlp, :neval_grad)
set_vars!(nlp, w)

x, y = nlp.current_training_minibatch
f_w = nlp.loss_f(nlp.chain(x), y)
g .= gradient(w_g -> local_loss(nlp, x, y, w_g), w)[1]
f_w = obj(nlp,w)
grad!(nlp,w,g)

return f_w, g
end
Loading