A Detailed Beginner’s Guide to Keras Tuner

Published in

Heartbeat

10 min readFeb 23, 2023

Introduction

Have you ever wondered how to use Bayesian optimization with TensorFlow? Curious how to design your deep neural networks’ depth and shape? Well, here’s the guide for you.

An image of a flowing blue and purple mesh against a black background. — Image Courtesy of UrielSC

A big part of data science is tuning our models and improving upon them over time. But instead of just tuning specific hyperparameters, we can also decide how our network is shaped. In this article, I will go through some basic concepts of creating a neural network using TensorFlow and then explore how we might improve upon our model’s architecture using Keras Tuner.

Motivation

I was motivated to write this article because I found the current resources out there a bit nebulous. There is little discussion about the basics to help you even get started (like imports and other available options).

While we walk through a simple implementation of Keras Tuner on a small DNN, the following principles I’ve compiled can also be applied to the development of larger DNNs and CNNs. For example, we can test how many layers of convolution and pooling are optimal for a CNN, or we can optimize our design with different types and shapes of kernels. With DNNs, we can test varying depths, sizes, and other parameters.

Goal

Changing the shape, optimization, and overall design of our network will allow us to build a model that performs more accurately and efficiently. The shape of a given network impacts a lot of things; training time, convergence, and performance are just a few examples.

This guide will start at the very beginning and go through the following:

Importing the necessary packages to optimize your neural network;
Loading some test data;
Building out a rudimentary network with early callback;
Discussing the tuning process, and;
Interpreting the results.

The goal, by the end of this article, is to give you a starting point into the wonderful world of Keras Tuner!

STEP 1: Imports!

I highly recommend having the following packages installed: pandas, NumPy, TensorFlow, and keras_tuner. We will first import these packages, quickly load in some data, and then start building a really simple network using the iris dataset. The reason for using the iris dataset is that it gives us a simple entry point into TensorFlow and allows us to focus on the tuner and not modeling our results.

# Data collection and transformation Packages
import pandas as pd
import numpy as np

# Tensorflow and keras tuning
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.callbacks import EarlyStopping
from keras.utils import to_categorical 
import keras_tuner as kt

# sklearn for dataset and train, test, split
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, confusion_matrix

# small visualization at end
import seaborn as sns

STEP 2: Iris data set-up

First things first, we need to load our data from sklearn, create our train/test splits, and make sure our target is categorical.

If you aren’t already familiar with the iris dataset, it consists of several characteristics of three different species of iris flower, including sepal length and width. The flowers are labeled as one of the following: [0, 1, 2]. Feel free to explore as you see fit to get a better understanding of the data.

After we’ve loaded and split the data, we need encode our target variable. Because the targets are [0, 1, 2], our TensorFlow model will consider these a “continuous” variable, even though that’s not the intention. To account for this, we will use the Keras function to_categorical. This will allow our neural network to understand our labels as we intend them and create predictions accordingly. I am making a new variable here so that we can use np.argmax later to convert our predictions to a class.

#Load iris data from sklearn
iris = datasets.load_iris()
X = iris.data # X is an array of 5 features with 150 records
y = iris.target # y is a list of 150 targets labeled either 0, 1, or 2

#Create our train test split
X_train, X_test, y_train, y_test = train_test_split(X, y, 
                          test_size=0.33, random_state=42)
#Make sure our target is categorical
#Just training for model
y_train_cat = to_categorical(y_train)
y_test_cat = to_categorical(y_test)

STEP 3: Building our first model

We will now build a simple one-layer neural network just to test performance.

#Create sequential model
model = keras.Sequential()

# Add an input layer, build it any which way
model.add(layers.Dense(5, activation='relu',input_shape=(4,)))

# Add an output layer 
model.add(layers.Dense(3, activation='softmax'))

# Compile the model
model.compile(loss='categorical_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])

# Use earlystopping to prevent overfitting once validation loss
# change is minimal
early_stop = EarlyStopping(monitor='val_loss', patience=1)
model.summary()

We can take a look at our model summary below:

An image of Python code output from calling model.summary() on a single-layer Keras/TensorFlow model. — Source: Image by author

Now we will fit the model and take a look at the results using sklearn’s classification report:

model.fit(X_train, y_train_cat,epochs=100, 
         validation_data=(X_test, y_test_cat), callbacks=[early_stop], 
          verbose=1)

#Using argmax to get the index of the highest value
y_pred = np.argmax(model.predict(X_test))
print(classification_report(y_test, y_pred))

The Python code output of Scikit-learn’s classification report; four columns including precision, recall, f1-score, and support. — Source: Image by author

That’s not a bad start at all! But can we do better? Let’s tune this model with Keras Tuner!

STEP 4: Building out and using the tuner

In this section we will simply talk about running the tuner. Afterwards, we’ll go through the code step-by-step and explain what each part is doing.

What tips do big name companies have for students and start ups? We asked them! Read or watch our industry Q&A for advice from teams at Stanford, Google, and HuggingFace.

First, we define a function based on the features we wish to explore. We can define things like our dense layers, to test in a range of dense layers, to test in a range of dropout layers, and between different optimizers. I will show the function below and then explore what each of these parts are doing. After you write the function, you run the tuner code itself which will now tune based off of that model.

def model_builder(hp):    
    # Tuning our model
    model = keras.Sequential()
    
    model.add(layers.Dense(units = hp.Int('dense-bot', 
        min_value=3, max_value=5, step=1), activation='relu'))
    
    for i in range(hp.Int('num_dense_layers', 1, 3)):
        model.add(layers.Dense(units=hp.Int('dense_' + str(i),
                     min_value=3, max_value=5, step=1), activation='relu'))
        model.add(layers.Dropout(hp.Choice('dropout_'+ str(i), 
                      values=[0.0, 0.1, 0.2])))

    model.add(layers.Dense(3, activation='softmax'))
        hp_optimizer=hp.Choice('Optimizer', values=['Adam', 'SGD'])
        if hp_optimizer == 'Adam':
            hp_learning_rate = hp.Choice('learning_rate', 
                                values=[1e-1, 1e-2, 1e-3])
        elif hp_optimizer == 'SGD':
            hp_learning_rate = hp.Choice('learning_rate', 
                                          values=[1e-1, 1e-2, 1e-3])
            nesterov=True
            momentum=0.9
        
    model.compile(optimizer = hp_optimizer, loss='categorical_crossentropy', 
                    metrics=['accuracy'])
    return model

We run the kt.tuners function using the desired method of optimization. We can choose Bayesian Optimization, Random Search, Hyperband, or define a custom tuner class. The optimization method defines the search space and how our tuner will approach optimization.

Our tuner will then work through different hyperparameter options and find the “best” model for us. We will discuss the difference between these three options in the next portion.

tuner = kt.tuners.BayesianOptimization(
        model_builder, #The model function we wrote
        seed=777, #Seed for evaluation
        objective='val_loss', #Which objective to keep track of
        max_trials=30, #Number of trials to try
        directory='.', #Directory to save tuning files to
        overwrite=True, #Whether to overwrite each run
        project_name='tuning-iris') #Name of the tuning files 

tuner.search_space_summary # will tell you the search space

#This will perform the actual search
#Search on our training data, validate with test data
#Use early callback to prevent overfitting
tuner.search(X_train, y_train_cat, epochs=10, 
              validation_data=(X_test, y_test_cat),
              callbacks=[early_stop])

As the tuner runs, you’ll see it go through a series of trials and between each trial, it’ll give you a little summary of its progress. You can use these status updates to gauge performance and get an idea of what the search is discovering. Some trials will be better than others (as is the nature of the search), and this will continue until the tuner has completed the number of trials you requested.

Once the tuner has completed the trials, you’ll probably be eager for the results, right? Fortunately, we can access the best models directly, automatically compile them, and have them ready to use, all in one go!

# Get the top 2 models.
models = tuner.get_best_models(num_models=2)
best_model = models[0]

# Build the model.
# Needed for `Sequential` without specified `input_shape`.
# We use X_train.shape focusing on number of features
# None allows us to control batch size and 4 is the number of features
best_model.build(input_shape=(None, 4))
best_model.summary()

An image of Python code output from calling model.summary() on a three-layer Keras/TensorFlow model. — Source: Image by author

# Alternatively we can use this to see the summary
tuner.results_summary()

STEP 5: Understanding our tuner

Now let’s break down each part of the tuner function, including the tuner itself:

# Start by creating the function
def model_builder(hp):
    # Let it know what model we want, in this case, sequential
    model = keras.Sequential()
    
    #This is our initial input layer.
    #This will generally match the data we are using
    model.add(layers.Dense(units = hp.Int('dense-bot', 
        min_value=3, max_value=5, step=1), activation='relu'))

    #This is the magical chunk here
    #We are not sure how many layers is ideal
    #So we tell the tuner, hey, try anywhere from 1-3 layers
    for i in range(hp.Int('num_dense_layers', 1, 3)):
        model.add(layers.Dense(units=hp.Int('dense_' + str(i),
                     min_value=3, max_value=5, step=1), activation='relu'))
        model.add(layers.Dropout(hp.Choice('dropout_'+ str(i), 
                      values=[0.0, 0.1, 0.2])))
    #This is our output layer, just like normal
    #There are 3 potential outcomes so we set it as 3 nodes
    model.add(layers.Dense(3, activation='softmax'))
    
    #... continued below

This first chunk sets up the network architecture. We start by instantiating a sequential model, as we would normally with Keras. We define our first layer (because we will have, at the very least, one layer) based on the shape of our input data.

You’ll notice units defined here using hp.Int(). This is a special hyperparameter class built in to Keras. All it really does is return a number (in this case, an integer), but it helps us contextualize the design. By using hp.Int('dense-bot', min_value=3, max_value=5, step=1) we are saying: “test a dense layer called 'dense-bot' with a minimum node count of 3, a maximum node count of 5, incrementing the nodes by 1 step at a time.”

In the next chunk, we get into the real magic of Keras Tuner!

We define hp.Int('num_dense_layers', 1, 3) and loop over this range. This is saying that we want an additional 1-3 layers. In each of these layers, we want a Dense layer that is between 3 to 5 nodes in size, and a Dropout layer that drops either 0%, 10% or 20% of the input units. The tuner will now build variations of these models and, using Bayesian optimization (which we defined when we instantiated tuner), will determine the optimal hyperparameter values and network architecture.

Next, we add our output layer, which we define as has having three nodes and a ‘Softmax’ activation function (for our categorical output).

#... continued from code above
   
    # Decide between one of two optimizers
    # If 'Adam' or 'SGD' test the appropriate parameters
    hp_optimizer=hp.Choice('Optimizer', values=['Adam', 'SGD'])
    if hp_optimizer == 'Adam':
        hp_learning_rate = hp.Choice('learning_rate', 
                            values=[1e-1, 1e-2, 1e-3])
    elif hp_optimizer == 'SGD':
        hp_learning_rate = hp.Choice('learning_rate', 
                                      values=[1e-1, 1e-2, 1e-3])
        nesterov=True
        momentum=0.9
       
    #Compile and return the model
    model.compile(optimizer = hp_optimizer, loss='categorical_crossentropy', 
                    metrics=['accuracy'])
    return model

Here, we test both the Adam and SGD optimizers. We also test varying learning rates with the appropriate parameter settings. We choose our usual loss function and metric for a multi-categorical problem. We finish by compiling the model and returning it so that it can be evaluated.

Got all that? Great, now let’s talk about the tuner itself!

kt.tuners.BayesianOptimization(
...
)

When defining the tuner itself, we have three main optimization options: BayesianOptimization, RandomSearch, and HyperBand (for more on creating a custom tuner, see the documentation here). RandomSearch is similar to GridSearch but randomly chooses feature combinations for computational simplicity. Bayesian uses our well-known Bayesian Optimization technique. But HyperBand is a little special. HyperBand will randomly sample all combinations, do a few iterations of each, and then choose the best options to do full tests on. In this way, it saves you some time and efficiency. Choose your search space based on what fits your needs!

This is just the beginning! There are plenty of ways to build, customize, and configure this function for all kinds of neural networks. Remember, how you decide on early callback can impact the overall fitting.

STEP 6: Using Tuned Model

Let’s wrap this up by looking at the results of our tuning. We’ll use the same data as we did with our baseline model above, only this time we’ll use the model we created with Keras Tuner.

First we fit the model:

#Same parameters as before
best_model.fit(X_train, y_train_cat,epochs=100, 
         validation_data=(X_test, y_test_cat), 
         callbacks=[early_stop], 
          verbose=1)

Then we make our predictions and print the results:

y_pred = np.argmax(best_model.predict(X_test), axis=1)
print(classification_report(y_test, y_pred))

Wow! That’s crazy! Let’s get a look at those results with a quick Seaborn heatmap:

sns.heatmap(confusion_matrix(y_test, y_pred), annot=True)

A confusion matrix graphic, as plotted with Scikit-learn’s confusion_matrix and Seaborn’s heatmap; plot shows perfect matches along the diagnoal and has a magma-toned colorbar at the right. — Source: Image by author

Conclusion

Clearly, the tuning worked! Of course, this is a relatively small toy dataset, but you can imagine how useful Keras Tuner is with bigger datasets as well.

I do hope that this tutorial gave you the building blocks to begin using Keras Tuner yourself! And remember, this is just the beginning! We haven’t even begun working with model history, Tensorboard, or experiment tracking tools like Comet to truly understand our improvements. In the next article, we’ll cover evaluation of the model in more detail and how to really get the most out of the tuner.

Thanks for reading!

Editor’s Note: Heartbeat is a contributor-driven online publication and community dedicated to providing premier educational resources for data science, machine learning, and deep learning practitioners. We’re committed to supporting and inspiring developers and engineers from all walks of life.

Editorially independent, Heartbeat is sponsored and published by Comet, an MLOps platform that enables data scientists & ML teams to track, compare, explain, & optimize their experiments. We pay our contributors, and we don’t sell ads.

If you’d like to contribute, head on over to our call for contributors. You can also sign up to receive our weekly newsletter (Deep Learning Weekly), check out the Comet blog, join us on Slack, and follow Comet on Twitter and LinkedIn for resources, events, and much more that will help you build better ML models, faster.