JuliaSmoothOptimizers · d-monnet · Jul 4, 2023 · Jul 4, 2023 · Jul 5, 2023 · Jul 5, 2023
diff --git a/Project.toml b/Project.toml
@@ -1,7 +1,7 @@
 name = "FluxNLPModels"
 uuid = "31fab0eb-bb78-4d15-8993-a8083bba6d27"
 authors = ["Farhad Rahbarnia <[email protected]>"]
-version = "0.0.1"
+version = "0.1.0"
 
 [deps]
 Flux = "587475ba-b771-5e3f-ad9e-33799f191a9c"

diff --git a/README.md b/README.md
@@ -4,9 +4,8 @@
 [![Dev](https://img.shields.io/badge/docs-dev-blue.svg)](https://JuliaSmoothOptimizers.github.io/FluxNLPModels.jl/dev)
 [![Build Status](https://github.com/JuliaSmoothOptimizers/FluxNLPModels.jl/workflows/CI/badge.svg)](https://github.com/JuliaSmoothOptimizers/FluxNLPModels.jl/actions)
 [![Codecov](https://codecov.io/gh/JuliaSmoothOptimizers/FluxNLPModels.jl/branch/main/graph/badge.svg)](https://codecov.io/gh/JuliaSmoothOptimizers/FluxNLPModels.jl) 
-<!-- TODO check the links -->
 
-This package serves as an NLPModels interface to the [Flux.jl](https://github.com/FluxML/Flux.jl) deep learning framework. It enables seamless integration between Flux's neural network architectures and NLPModels' optimization tools for natural language processing tasks.
+This package serves as an NLPModels interface to the [Flux.jl](https://github.com/FluxML/Flux.jl) deep learning framework. It enables seamless integration between Flux's neural network architectures and NLPModels' optimization tools for non-linear programming (NLP) problems.
 
 ## Installation
 

diff --git a/docs/make.jl b/docs/make.jl
@@ -1,4 +1,3 @@
-#TODO redo this section
 using Documenter, FluxNLPModels
 
 makedocs(

diff --git a/docs/src/index.md b/docs/src/index.md
@@ -1,19 +1,18 @@
-#TODO redo this section 
 # FluxNLPModels.jl
 
 ## Compatibility
 Julia ≥ 1.6.
 
 ## How to install
-TODO: this section needs work since our package is not yet register
+
 This module can be installed with the following command:
 ```julia
-# pkg> add FluxNLPModels
-# pkg> test FluxNLPModels
+pkg> add FluxNLPModels
 ```
 
 ## Synopsis
-FluxNLPModels exposes neural network models as optimization problems conforming to the NLPModels.jl API. FluxNLPModels is an interface between [Flux.jl](https://github.com/FluxML/Flux.jl)'s classification neural networks and [NLPModels.jl](https://github.com/JuliaSmoothOptimizers/NLPModels.jl.git).
+
+FluxNLPModels exposes neural network models as optimization problems conforming to the [NLPModels API](https://github.com/JuliaSmoothOptimizers/NLPModels.jl). FluxNLPModels is an interface between [Flux.jl](https://github.com/FluxML/Flux.jl)'s classification neural networks and [NLPModels.jl](https://github.com/JuliaSmoothOptimizers/NLPModels.jl).
 
 A `FluxNLPModel` gives the user access to:
 - The values of the neural network variables/weights `w`;
@@ -25,13 +24,7 @@ In addition, it provides tools to:
 - Retrieve the current minibatch ;
 - Measure the neural network's loss at the current `w`.
 
-## How to use
-Check the tutorials
-<!-- Check the [tutorial](https://juliasmoothoptimizers.github.io/FluxNLPModels.jl/stable/). -->
-
 # Bug reports and discussions
 
-If you think you found a bug, feel free to open an [issue]<!--(https://github.com/JuliaSmoothOptimizers/FluxNLPModels.jl/issues). --> TODO: add repo link
-Focused suggestions and requests can also be opened as issues. Before opening a pull request, please start an issue or a discussion on the topic.
+If you encounter any bugs or have suggestions for improvement, please open an [issue](https://github.com/JuliaSmoothOptimizers/FluxNLPModels.jl/issues). For general questions or discussions related to this repository and the [JuliaSmoothOptimizers](https://github.com/JuliaSmoothOptimizers) organization, feel free to start a discussion [here](https://github.com/JuliaSmoothOptimizers/Organization/discussions).
 
-If you have a question that is not suited for a bug report, feel free to start a discussion [here](#TODO). This forum is for general discussion about this repository and the [JuliaSmoothOptimizers](https://github.com/JuliaSmoothOptimizers). Questions about any of our packages are welcome.
diff --git a/docs/src/tutorial.md b/docs/src/tutorial.md
@@ -1,4 +1,5 @@
 # FluxNLPModels.jl Tutorial
+
 ## Setting up 
 This step-by-step example assumes prior knowledge of [Julia](https://julialang.org/) and [Flux.jl](https://github.com/FluxML/Flux.jl).
 See the [Julia tutorial](https://julialang.org/learning/) and the [Flux.jl tutorial](https://fluxml.ai/Flux.jl/stable/models/quickstart/#man-quickstart) for more details.
@@ -39,23 +40,13 @@ using JSOSolvers
 First, a NN model needs to be define in Flux.jl.
 Our model is very simple: It consists of one "hidden layer" with 32 "neurons", each connected to every input pixel. Each neuron has a sigmoid nonlinearity and is connected to every "neuron" in the output layer. Finally, softmax produces probabilities, i.e., positive numbers that add up to 1.
 
-We have two ways of defining the models:
-
-1. **Direct Definition**: You can directly define the model in your code, specifying the layers and their connections using Flux's syntax. This approach allows for more flexibility and customization.
-   ```@example FluxNLPModel
-    model = Flux.Chain(Dense(28^2=> 32, relu), Dense(32=>10)) 
-   ```
-
-2. **Method-Based Definition**: Alternatively, you can create a method that returns the model. This method can encapsulate the specific architecture and parameters of the model, making it easier to reuse and manage. It provides a convenient way to define and initialize the model when needed.
-   ```@example FluxNLPModel
-    function build_model(; imgsize = (28, 28, 1), nclasses = 10)
-      return Chain(Dense(prod(imgsize), 32, relu), Dense(32, nclasses)) 
-    end
-   ```
+One can create a method that returns the model. This method can encapsulate the specific architecture and parameters of the model, making it easier to reuse and manage. It provides a convenient way to define and initialize the model when needed.
 
-
-
-Both approaches have their advantages, and you can choose the one that suits your needs and coding style.
+```@example FluxNLPModel
+function build_model(; imgsize = (28, 28, 1), nclasses = 10)
+  return Chain(Dense(prod(imgsize), 32, relu), Dense(32, nclasses)) 
+end
+```
 
 ### Loss function
 
@@ -65,35 +56,15 @@ We can define any loss function that we need, here we use Flux build-in logitcro
 const loss = Flux.logitcrossentropy
 ```
 
-We also define a loss function `loss_and_accuracy`. 
-```@example FluxNLPModel
-  function loss_and_accuracy(data_loader, model, device)
-    acc = 0
-    ls = 0.0f0
-    num = 0
-    for (x, y) in data_loader
-      x, y = device(x), device(y)
-      ŷ = model(x)
-      ls += loss(ŷ, y, agg = sum)
-      acc += sum(onecold(ŷ) .== onecold(y)) ## Decode the output of the model
-      num += size(x)[end]
-    end
-    return ls / num, acc / num
-  end 
-```
-
-
 ### Load datasets and define minibatch 
 In this section, we will cover the process of loading datasets and defining minibatches for training your model using Flux. Loading and preprocessing data is an essential step in machine learning, as it allows you to train your model on real-world examples.
 
 We will specifically focus on loading the MNIST dataset. We will divide the data into training and testing sets, ensuring that we have separate data for model training and evaluation.
 
 Additionally, we will define minibatches, which are subsets of the dataset that are used during the training process. Minibatches enable efficient training by processing a small batch of examples at a time, instead of the entire dataset. This technique helps in managing memory resources and improving convergence speed.
 
-
-
 ```@example FluxNLPModel
-function getdata(batchsize)
+function getdata(bs)
   ENV["DATADEPS_ALWAYS_ACCEPT"] = "true"
 
   # Loading Dataset	
@@ -108,14 +79,13 @@ function getdata(batchsize)
   ytrain, ytest = onehotbatch(ytrain, 0:9), onehotbatch(ytest, 0:9)
 
   # Create DataLoaders (mini-batch iterators)
-  train_loader = DataLoader((xtrain, ytrain), batchsize = batchsize, shuffle = true)
-  test_loader = DataLoader((xtest, ytest), batchsize = batchsize)
+  train_loader = DataLoader((xtrain, ytrain), batchsize = bs, shuffle = true)
+  test_loader = DataLoader((xtest, ytest), batchsize = bs)
 
   return train_loader, test_loader
 end
 ```
 
-
 ### Transfering to FluxNLPModels
 
 ```@example FluxNLPModel
@@ -129,9 +99,6 @@ end
   nlp = FluxNLPModel(model, train_loader, test_loader; loss_f = loss)
 ```
 
-
-
-
 ## Tools associated with a FluxNLPModel
 The problem dimension `n`, where `w` ∈ ℝⁿ:
 ```@example FluxNLPModel
@@ -154,4 +121,16 @@ The length of `w` must be `nlp.meta.nvar`.
 ```@example FluxNLPModel
 g = similar(w)
 NLPModels.grad!(nlp, w, g)
+```
+
+## Train a neural network with JSOSolvers.R2
+
+```@example FluxNLPModel
+max_time = 60. # run at most 1min
+callback = (nlp, 
+            solver, 
+            stats) -> FluxNLPModels.minibatch_next_train!(nlp)
+
+solver_stats = R2(nlp; callback, max_time)
+test_accuracy = FluxNLPModels.accuracy(nlp) #check the accuracy
 ```
diff --git a/src/FluxNLPModels.jl b/src/FluxNLPModels.jl
@@ -29,44 +29,63 @@ A FluxNLPModel has fields
 """
 mutable struct FluxNLPModel{T, S, C <: Chain, F <: Function} <: AbstractFluxNLPModel{T, S}
   meta::NLPModelMeta{T, S}
-  chain::C
+  chain::Vector{C}
   counters::Counters
   loss_f::F
-  size_minibatch::Int #TODO remove this 
-  training_minibatch_iterator #TODO remove this, right now we pass the data
-  test_minibatch_iterator #TODO remove this 
+  size_minibatch::Int
+  training_minibatch_iterator
+  test_minibatch_iterator
   current_training_minibatch
   current_test_minibatch
-  rebuild # this is used to create the rebuild of flat function 
+  rebuild # this is used to create the rebuild of flat function
   current_training_minibatch_status
   current_test_minibatch_status
   w::S
+  Types::Vector{DataType}
 end
 
 """
-    FluxNLPModel(chain_ANN data_train=MLDatasets.MNIST.traindata(Float32), data_test=MLDatasets.MNIST.testdata(Float32); size_minibatch=100)
+    FluxNLPModel(chain_ANN, data_train=MLDatasets.MNIST.traindata(Float32), data_test=MLDatasets.MNIST.testdata(Float32); size_minibatch=100)
 
 Build a `FluxNLPModel` from the neural network represented by `chain_ANN`.
 `chain_ANN` is built using [Flux.jl](https://fluxml.ai/) for more details.
 The other data required are: an iterator over the training dataset `data_train`, an iterator over the test dataset `data_test` and the size of the minibatch `size_minibatch`.
 Suppose `(xtrn,ytrn) = Fluxnlp.data_train`
 """
+
+function FluxNLPModel(
+  chain_ANN::C,
+  data_train,
+  data_test;
+  kwargs...
+) where {C <: Chain}
+  FluxNLPModel([chain_ANN],data_train,data_test;kwargs...)
+end
+
 function FluxNLPModel(
-  chain_ANN::T,
+  chain_ANN::Vector{T},
   data_train,
   data_test;
-  current_training_minibatch = first(data_train),
-  current_test_minibatch = first(data_test),
+  current_training_minibatch = [],
+  current_test_minibatch = [],
   size_minibatch::Int = 100,
   loss_f::F = Flux.mse, #Flux.crossentropy,
 ) where {T <: Chain, F <: Function}
-  x0, rebuild = Flux.destructure(chain_ANN)
+  d = Flux.destructure.(chain_ANN)
+  rebuild = [del[2] for del in d]
+  x0 = d[end][1]
+  Types = eltype.([del[1] for del in d])
   n = length(x0)
   meta = NLPModelMeta(n, x0 = x0)
   if (isempty(data_train) || isempty(data_test))
     error("train data or test is empty")
   end
-
+  if (isempty(current_training_minibatch) || isempty(current_test_minibatch))
+    current_training_minibatch = first(data_train)
+    current_test_minibatch = first(data_test)
+  end
+  test_types_consistency(Types,data_train,data_test)
+  test_devices_consistency(chain_ANN,data_train,data_test)
   return FluxNLPModel(
     meta,
     chain_ANN,
@@ -81,6 +100,7 @@ function FluxNLPModel(
     nothing,
     nothing,
     x0,
+    Types,
   )
 end
 

diff --git a/src/FluxNLPModels_methods.jl b/src/FluxNLPModels_methods.jl
@@ -1,58 +1,74 @@
 """
-    f = obj(nlp, x)
+    f = obj(nlp, w)
 
-Evaluate `f(x)`, the objective function of `nlp` at `x`.
+Evaluate `f(w)`, the objective function of `nlp` at `w`.
 
 # Arguments
-- `nlp::AbstractFluxNLPModel{T, S}`: the FluxNLPModel data struct
-- `w::AbstractVector{T}`: is the vector of weights/variables;
+- `nlp::AbstractFluxNLPModel{T, S}`: the FluxNLPModel data struct;
+- `w::AbstractVector{T}`: is the vector of weights/variables.
 
 # Output
-- `f_w`: the new objective function
+- `f_w`: the new objective function.
+
 """
-function NLPModels.obj(nlp::AbstractFluxNLPModel{T, S}, w::AbstractVector{T}) where {T, S}
+function NLPModels.obj(nlp::AbstractFluxNLPModel{T, S}, w::AbstractVector) where {T, S}
   increment!(nlp, :neval_obj)
   set_vars!(nlp, w)
   x, y = nlp.current_training_minibatch
-  return nlp.loss_f(nlp.chain(x), y)
+  type_ind = find_type_index(nlp,w)
+  return nlp.loss_f(nlp.chain[type_ind](x), y)
 end
 
 """
-    g = grad!(nlp, x, g)
+    g = grad!(nlp, w, g)
+
+Evaluate `∇f(w)`, the gradient of the objective function at `w` in place.
 
-Evaluate `∇f(x)`, the gradient of the objective function at `x` in place.
 # Arguments
-- `nlp::AbstractFluxNLPModel{T, S}`: the FluxNLPModel data struct
+- `nlp::AbstractFluxNLPModel{T, S}`: the FluxNLPModel data struct;
 - `w::AbstractVector{T}`: is the vector of weights/variables;
--`g::AbstractVector{T}`: the gradient vector
+- `g::AbstractVector{T}`: the gradient vector.
 
 # Output
-- `g`: the gradient at point x
+- `g`: the gradient at point `w`.
+
 """
 function NLPModels.grad!(
   nlp::AbstractFluxNLPModel{T, S},
-  w::AbstractVector{T},
-  g::AbstractVector{T},
+  w::AbstractVector,
+  g::AbstractVector,
 ) where {T, S}
   @lencheck nlp.meta.nvar w g
-  increment!(nlp, :neval_grad)
   x, y = nlp.current_training_minibatch
-  g .= gradient(w_g -> local_loss(nlp, x, y, w_g), w)[1]
+  #check_weights_data_type(w,x)
+  increment!(nlp, :neval_grad)
+  type_ind = find_type_index(nlp,w)
+  nlp.chain[type_ind] = nlp.rebuild[type_ind](w)
+
+  g .= gradient(w_g -> local_loss(nlp,nlp.rebuild[type_ind],x,y,w_g),w)[1]
   return g
 end
 
+function NLPModels.grad(
+  nlp::AbstractFluxNLPModel{T, S},
+  w::AbstractVector,
+) where {T, S}
+  g = similar(w)
+  grad!(nlp,w,g)
+end
+
 """
-    objgrad!(nlp, x, g)
+    objgrad!(nlp, w, g)
 
-    Evaluate both `f(x)`, the objective function of `nlp` at `x` and `∇f(x)`, the gradient of the objective function at `x` in place.
+Evaluate both `f(w)`, the objective function of `nlp` at `w`, and `∇f(w)`, the gradient of the objective function at `w` in place.
 
 # Arguments
-- `nlp::AbstractFluxNLPModel{T, S}`: the FluxNLPModel data struct
+- `nlp::AbstractFluxNLPModel{T, S}`: the FluxNLPModel data struct;
 - `w::AbstractVector{T}`: is the vector of weights/variables;
--`g::AbstractVector{T}`: the gradient vector
+- `g::AbstractVector{T}`: the gradient vector.
 
 # Output
-- `f_w`, `g`: the new objective function, and the gradient at point x
+- `f_w`, `g`: the new objective function, and the gradient at point w.
 
 """
 function NLPModels.objgrad!(
@@ -61,14 +77,12 @@ function NLPModels.objgrad!(
   g::AbstractVector{T},
 ) where {T, S}
   @lencheck nlp.meta.nvar w g
-  #both updates
   increment!(nlp, :neval_obj)
   increment!(nlp, :neval_grad)
   set_vars!(nlp, w)
 
-  x, y = nlp.current_training_minibatch
-  f_w = nlp.loss_f(nlp.chain(x), y)
-  g .= gradient(w_g -> local_loss(nlp, x, y, w_g), w)[1]
+  f_w = obj(nlp,w)
+  grad!(nlp,w,g)
 
   return f_w, g
 end