Skip to content

Latest commit

 

History

History
406 lines (290 loc) · 24 KB

automating-neural-network-configuration-with-keras-tuner.md

File metadata and controls

406 lines (290 loc) · 24 KB
title date categories tags
Automating neural network configuration with Keras Tuner
2020-06-09
deep-learning
frameworks
deep-neural-network
hyperparameter-tuning
hyperparameters
keras
keras-tuner
training-process

Machine learning has been around for many decades now. Starting with the Rosenblatt Perceptron in the 1950s, followed by Multilayer Perceptrons and a variety of other machine learning techniques like Support Vector Machines, we have arrived in the age of deep neural networks since 2012.

In the last few years, we have seen an explosion of machine learning research: a wide variety of neural network architectures was invented, published, and the same goes for tuning the neural networks - i.e., what set of hyperparameters works best given a certain problem scenario. That's why training a neural network is often considered to be more of an art than a science - intuition through experience often guides the deep learning engineer into picking the right configuration for their model.

However, I do believe that this is going to end. Not deep learning itself, but the amount of knowledge required for successfully training a deep neural network. In fact, training ML models is being commoditized... and in today's blog, we'll cover one of the ways in which this is currently happening, namely, with the Keras Tuner. Keras Tuner is a technique which allows deep learning engineers to define neural networks with the Keras framework, define a search space for both model parameters (i.e. architecture) and model hyperparameters (i.e. configuration options), and first search for the best architecture before training the final model.

We'll first cover the supervised machine learning process and illustrate hyperparameter tuning and its difficulties in more detail. Subsequently, we'll provide some arguments as to why automating hyperparameter tuning can lead to better end results in possibly less time. Then, we introduce the Keras Tuner, and close off with a basic example so that you can get basic experience. In another blog post, we'll cover the Keras Tuner building blocks, which will help you gain a deeper understanding of automated hyperparameter tuning.

Update 08/Dec/2020: added references to PCA article.


[toc]


Training neural networks: what is (hyper)parameter tuning?

Let's take a step back. Before we can understand automated parameter and hyperparameter tuning, we must first take a look at what it is in the first place.

That's why we'll take a look at the high-level supervised machine learning process that we're using to explain how training a neural network works throughout this website.

Here it is:

In your machine learning workflow, you have selected or extracted features and targets for your model based on a priori analysis of your dataset - perhaps using dimensionality reduction techniques like PCA. Using those features, you will be able to train your machine learning model - visible in green. You do so iteratively:

  • Before training starts, you initialize the weights of your neural network in a random or almost-random way;
  • In the forward pass, you'll feed all your samples (often, in minibatches) to the machine learning model, which generates predictions.
  • With a loss function, the predictions are compared to the true targets, and a loss value emerges.
  • Through backwards computation of the error contribution of particular neurons in the backwards pass, it becomes clear how much each neuron contributes to the error.
  • With an optimizer such as Gradient Descent or Adaptive Optimization, the weights are changed a tiny bit.
  • A new iteration starts, where we expect that the model performs a little bit better. This goes on until the model has improved sufficiently for it to be used in practice.

Neural network architecture and configuration

If you look at how we build models, you'll generally see that doing so consists of three individual steps:

  1. Creating the model skeleton (in Keras, this happens through the Sequential API or the Functional API).
  2. Instantiating the model: using the skeleton and configuration options to create a trainable model.
  3. Fitting data to the model: starting the training process.

Tuning parameters in your neural network

In step (1), you add various layers of your neural network to the skeleton, such as the Convolutional Neural Network created here with Keras:

# Create the model
model = Sequential()
model.add(Conv2D(32, kernel_size=(3, 3), activation='relu', input_shape=input_shape))
model.add(Conv2D(64, kernel_size=(3, 3), activation='relu'))
model.add(Conv2D(128, kernel_size=(3, 3), activation='relu'))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dense(no_classes, activation='softmax'))

Here, the architectural choices you make (such as the number of filters for a Conv2D layer, kernel size, or the number of output nodes for your Dense layer) determine what are known as the parameters of your neural network - the weights (and by consequence biases) of your neural network:

The parameters of a neural network are typically the weights of the connections. In this case, these parameters are learned during the training stage. So, the algorithm itself (and the input data) tunes these parameters.

Robin, at StackExchange

Tuning hyperparameters in your neural network

However, things don't end there. Rather, in step (2), you'll configure the model during instantiation by setting a wide range of configuration options. Those options include, but are not limited to:

  • The optimizer that is used during training: e.g., whether you are using Gradient Descent or an adaptive optimizer like Adam.
  • The learning rate that is used during optimization: i.e., what fraction of the error contribution found will be used for optimization for a particular neuron.
  • The batch size that will be used during the forward pass.
  • The number of iterations (or epochs) that will be used for training the neural network

Here's why they are called _hyper_parameters:

The hyper parameters are typically the learning rate, the batch size or the number of epochs. The are so called "hyper" because they influence how your parameters will be learned. You optimize these hyper parameters as you want (depends on your possibilities): grid search, random search, by hand, using visualisations… The validation stage help you to both know if your parameters have been learned enough and know if your hyper parameters are good.

Robin, at StackExchange

As Robin suggests, hyperparameters can be selected (and optimized) in multiple ways. The easiest way of doing so is by hand: you, as a deep learning engineer, select a set of hyperparameters that you will subsequently alter in an attempt to make the model better.

However, can't we do this in a better way when training a Keras model?


Automating (hyper)parameter tuning for faster & better experimentation: introducing the Keras Tuner

As you would have expected: yes, we can! :) Let's introduce Keras Tuner to the scene. As you would expect from engineers, the description as to what it does is really short but provides all the details:

A hyperparameter tuner for Keras, specifically for tf.keras with TensorFlow 2.0.

Keras-tuner on GitHub

If you already want to look around, you could visit their website, and if not, let's take a look at what it does.

Automatically tuning (hyper)parameters of your Keras model through search spaces

Keras Tuner can be used for automatically tuning the parameters and hyperparameters of your Keras model. It does so by means of a search space. If you are used to a bit of mathematics, you are well aware of what a space represents. If not, and that's why we're using this particular space, you can likely imagine what we mean when we talk about a three-dimensional or a two-dimensional space.

Indeed, in the case of a 2D space - where the axes represent e.g. the hyperparameter learning rate and the parameter (or, more strictly, contributing factor to the number of parameters) number of layers, you can visualize the space as follows:

Here, all the intersections between the two axes (dimensions) are possible combinations of hyperparameters that can be selected for the model. For example, learning rate [latex]LR[/latex] and number of layers [latex]N[/latex] can be [latex](LR = 10^{-3}, N = 4)[/latex], but also [latex](LR = 10^{-2}, N = 2)[/latex] is possible, and so on. Here, we have two dimensions (which benefits visualization), but the more tunable options you add to your model, the more dimensions will be added to your search space.

Hopefully, you are now aware about how a search space is constructed by yourself when you want Keras Tuner to look for a most optimal set of hyperparameters and parameters for your neural network.

You can use a wide range of HyperParameters building block styles for creating your search space:

  • Boolean values, which are set to true or false
  • Choice values, which represent an array of choices from which one value is chosen for a set of hyperparameters.
  • Fixed values, which aren't tunable but rather are fixed as they are.
  • Float values, which represent floating-point values (such as the learning rate above).
  • Integer values, which represent integer values to be tuned (such as the number of layers above).

Although the choice values and float/integer values look a lot like each other, they are different - in the sense that you can specify a range in the latter. However, that's too much detail for now - we will cover all the tunable HyperParameters in that different blog post we already mentioned before. At this point, it's important that you understand that using Keras Tuner will allow you to construct a search space by means of the building blocks mentioned before.

Putting bounds to your search space

And it's also important that you understand that it does so within constraints set by the user. That is, searching the hyperparameter space cannot go on indefinitely. Keras Tuner allows you to constrain searching: by setting a maximum number of trials, you can tell the tuner to cut off tuning after some time.

There's one thing missing, still. It's nice that we have a seach space, but how exactly does Keras Tuner perform the search operation?

Applying various search strategies

By means of a search strategy!

It's like as if you've lost something, and there are multiple options you can configure to find back what you've lost. And as with anything, there are many ways in which you can do a particular thing... the same is true for searching through your hyperparameter space :)

We'll cover the various search strategies in more detail in that other blog post that we've mentioned. Here's a brief overview of the search strategies that are supported by Keras Tuner:

  • Random search: well, this one is pretty easy. For every dimension in your search space, this algorithm will select a random value, train the model, and report the results.
  • Bayesian optimization: viewing hyperparameters tuning as the optimization of a black-box function, and using Bayes' rule for optimization.
  • Hyperband: this one attempts to reduce the total tuning time by running experiments very shortly, then only taking the best of them for longer training, in a competition-style fashion.
  • Sklearn: allowing you to tune hyperparameters for Scikit-learn models as well, using cross-validated hyperparameter search.

A basic example of using Keras Tuner

Now let's take a look at using Keras Tuner for optimizing your Keras model. We will be building a simple ConvNet, as we have seen in the Conv2D tutorial. We'll subsequently tune its hyperparameters with Keras Tuner for a limited number of epochs, and finally train the best model fully. We'll keep it simple: we're only going to construct a one-dimensional search space based on the learning rate for the Adam optimizer.

Make sure that Keras Tuner is installed by executing pip install -U keras-tuner first in your machine learning environment :)

Imports, model configuration, and loading the data

Open up your IDE and create a file e.g. called tuning.py. Here, you're going to write down your code. We'll start with imports (such as tensorflow.keras and kerastuner), defining the model configuration options and loading the data. If you have no experience in doing so, I recommend that you first read the Conv2D post as I explain these things there in more detail. Here's the code that you'll add first:

from tensorflow.keras.datasets import mnist
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Flatten, Conv2D
from tensorflow.keras.losses import sparse_categorical_crossentropy
from tensorflow.keras.optimizers import Adam
from kerastuner.tuners import RandomSearch

# Model configuration
batch_size = 50
img_width, img_height, img_num_channels = 28, 28, 1
loss_function = sparse_categorical_crossentropy
no_classes = 10
no_epochs = 25
validation_split = 0.2
verbosity = 1

# Load MNIST data
(input_train, target_train), (input_test, target_test) = mnist.load_data()

# Reshape data
input_train = input_train.reshape(input_train.shape[0], img_width, img_height, 1)
input_test = input_test.reshape(input_test.shape[0], img_width, img_height, 1)

# Determine shape of the data
input_shape = (img_width, img_height, img_num_channels)

# Parse numbers as floats
input_train = input_train.astype('float32')
input_test = input_test.astype('float32')

# Scale data
input_train = input_train / 255
input_test = input_test / 255

In brief, what it does:

  • Load all the modules and libraries that you'll be using today.
  • Defining all the hyperparameters that we will not be tuning today, and other configuration options.
  • Loading the MNIST dataset, and reshaping it into Conv2D-compatible format.
  • Cast the data into float32 format which allows GPU owners to train their models faster.
  • Scaling the data into the [latex][0, 1][/latex] range which benefits the training process.

Defining the model-building function

Keras Tuner allows you to perform your experiments in two ways. The first, and more scalable, approach is a HyperModel class, but we don't use it today - as Keras Tuner itself introduces people to automated hyperparameter tuning via model-building functions.

Those functions are nothing more than a Python def where you create the model skeleton and compile it, as you would do usually. However, here, you also construct your search space - that space we explained above. For example, I make the learning rate hyperparameter tunable by specifying it as follows: hp.Choice('learning_rate', values=[1e-2, 1e-3, 1e-4]).

Here's the code for the model-building function. If you've used Keras before, you instantly recognize what it does!

# MODEL BUILDING FUNCTION
def build_model(hp):
  # Create the model
  model = Sequential()
  model.add(Conv2D(32, kernel_size=(3, 3), activation='relu', input_shape=input_shape))
  model.add(Conv2D(64, kernel_size=(3, 3), activation='relu'))
  model.add(Conv2D(128, kernel_size=(3, 3), activation='relu'))
  model.add(Flatten())
  model.add(Dense(128, activation='relu'))
  model.add(Dense(no_classes, activation='softmax'))

  # Display a model summary
  model.summary()

  # Compile the model
  model.compile(loss=loss_function,
                optimizer=Adam(
                  hp.Choice('learning_rate',
                            values=[1e-2, 1e-3, 1e-4])),
                metrics=['accuracy'])
  
  # Return the model
  return model

Performing tuning

Now, it's time to perform tuning. As we've constructed our search space, we must first define our search strategy - and it will be RandomSearch today:

# Perform tuning
tuner = RandomSearch(
    build_model,
    objective='val_accuracy',
    max_trials=5,
    executions_per_trial=3,
    directory='tuning_dir',
    project_name='machinecurve_example')

We'll add the model-building function as the function that contains our model and our search space. Our goal is to minimize validation accuracy (Keras Tuner automatically infers whether it should be maximized or minimized based on the objective), tell it that it should perform 5 trials, and that it should perform 3 executions per trial. The latter ensures that it's not simply variance that causes a hyperparameter to be 'best', as more instances of better performance tend to suggest that performance is actually better. The directory and project_name attributes are set so that checkpoints of the tuning operations are saved.

Now that we have configured our search strategy, it's time to print a summary of it and actually perform the search operation:

# Display search space summary
tuner.search_space_summary()

# Perform random search
tuner.search(input_train, target_train,
             epochs=5,
             validation_split=validation_split)

Here, we instruct Keras Tuner to perform hyperparameter tuning with our training set, for 5 epochs per trial, and to make sure to make a validation split (of 20%, in our case, given how we have configured our model).

Fully train the best model

Once the search is complete, you can get the best model, and train it fully as per your configuration:

# Get best model
models = tuner.get_best_models(num_models=1)
best_model = models[0]

# Fit data to model
history = best_model.fit(input_train, target_train,
            batch_size=batch_size,
            epochs=no_epochs,
            verbose=verbosity,
            validation_split=validation_split)

# Generate generalization metrics
score = model.evaluate(input_test, target_test, verbose=0)
print(f'Test loss: {score[0]} / Test accuracy: {score[1]}')

That's it! :) You should now have a fully working Keras Tuner based hyperparameter tuner. If you run python tuning.py, of course while having all the dependencies installed onto your system, the tuning and eventually the training process should begin.

Full model code

If you wish to obtain the full model code, that's of course also possible. Here you go:

from tensorflow.keras.datasets import mnist
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Flatten, Conv2D
from tensorflow.keras.losses import sparse_categorical_crossentropy
from tensorflow.keras.optimizers import Adam
from kerastuner.tuners import RandomSearch

# Model configuration
batch_size = 50
img_width, img_height, img_num_channels = 28, 28, 1
loss_function = sparse_categorical_crossentropy
no_classes = 10
no_epochs = 25
validation_split = 0.2
verbosity = 1

# Load MNIST data
(input_train, target_train), (input_test, target_test) = mnist.load_data()

# Reshape data
input_train = input_train.reshape(input_train.shape[0], img_width, img_height, 1)
input_test = input_test.reshape(input_test.shape[0], img_width, img_height, 1)

# Determine shape of the data
input_shape = (img_width, img_height, img_num_channels)

# Parse numbers as floats
input_train = input_train.astype('float32')
input_test = input_test.astype('float32')

# Scale data
input_train = input_train / 255
input_test = input_test / 255

# MODEL BUILDING FUNCTION
def build_model(hp):
  # Create the model
  model = Sequential()
  model.add(Conv2D(32, kernel_size=(3, 3), activation='relu', input_shape=input_shape))
  model.add(Conv2D(64, kernel_size=(3, 3), activation='relu'))
  model.add(Conv2D(128, kernel_size=(3, 3), activation='relu'))
  model.add(Flatten())
  model.add(Dense(128, activation='relu'))
  model.add(Dense(no_classes, activation='softmax'))

  # Display a model summary
  model.summary()

  # Compile the model
  model.compile(loss=loss_function,
                optimizer=Adam(
                  hp.Choice('learning_rate',
                            values=[1e-2, 1e-3, 1e-4])),
                metrics=['accuracy'])
  
  # Return the model
  return model

# Perform tuning
tuner = RandomSearch(
    build_model,
    objective='val_accuracy',
    max_trials=1,
    executions_per_trial=1,
    directory='tuning_dir',
    project_name='machinecurve_example')

# Display search space summary
tuner.search_space_summary()

# Perform random search
tuner.search(input_train, target_train,
             epochs=5,
             validation_split=validation_split)

# Get best model
models = tuner.get_best_models(num_models=1)
best_model = models[0]

# Fit data to model
history = best_model.fit(input_train, target_train,
            batch_size=batch_size,
            epochs=no_epochs,
            verbose=verbosity,
            validation_split=validation_split)

# Generate generalization metrics
score = model.evaluate(input_test, target_test, verbose=0)
print(f'Test loss: {score[0]} / Test accuracy: {score[1]}')

Summary

In this blog post, you've been introduced to automated tuning of your neural network parameters and hyperparameters. Over the next years, this will become an increasingly important aspect of machine learning, in my opinion - because why leave to humans what computers could do better? Maybe, machine learning configuration will even become commoditized because of such progress! The benefit is that you've read this post (and may likely deepen your understanding by performing some Google searches). You're now aware of this trend, and can steer your learnings towards staying on top of the machine learning wave :)

What's more, you've also been able to get some practical experience with a code example using Keras Tuner. I hope you've learnt something today, and that it will help your machine learning endeavors :) If you have any questions, remarks, or other comments, please feel free to leave a comment in the comments section below. Thank you for reading MachineCurve today and happy engineering! 😎


References

Keras tuner. (n.d.). https://keras-team.github.io/keras-tuner/

Data Science Stack Exchange. (n.d.). Model parameters & hyper parameters of neural network & their tuning in training & validation stagehttps://datascience.stackexchange.com/questions/17635/model-parameters-hyper-parameters-of-neural-network-their-tuning-in-training