Optimal Deep Learning model selection using wandb.ai

In this post I use Weights and Biases’ wandb.ai ‘sweep’ feature, to automatically select the best Deep Learning model out of a set of models created through Grid Search. I chanced upon the Weights and Biases site when I was training and fine-tuning the T5 transformer model, on Kaggle, for my post GenerativeAI:Using T5 Transformer model to summarise Indian Philosophy. During this process Kaggle had requested for a token from wandb.ai.

Out of curiosity, I started to explore this Weights and Biases (W&B) machine learning site and was impressed with the visualisation capabilities of this site. So I decided to give weights and biases a try. It is quite interesting to see the live visualisation features of the site and it is becomes very easy to select the optimal model when we are trying to do a Grid search or Random search through a combination of hyper-parameters.

For this purpose, I used my processed T20 match dataset which I had used to compute the Win Probability of T20 teams. For more details please see my post GooglyPlusPlus: Win Probability using Deep Learning and player embeddings

Searching through high dimensional hyperparameter spaces to find the most performant model can quickly get unwieldy. Hyperparameter sweeps provide an organised and efficient way to automatically search through combinations of hyperparameter values (e.g. learning rate, batch size, epochs, dropout, optimizer type) to find the most optimal values.

Here are the steps

a) Install, import

!pip install wandb -qU

import wandb
from wandb.keras import WandbCallback
wandb.login()

import pandas as pd
import numpy as np
from zipfile import ZipFile
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras import regularizers
from pathlib import Path
import matplotlib.pyplot as plt

b) Load the dataset

import pandas as pd
import numpy as np
from zipfile import ZipFile
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras import regularizers

df1=pd.read_csv('t20.csv')
print("Shape of dataframe=",df1.shape)

train_dataset = df1.sample(frac=0.8,random_state=0)
test_dataset = df1.drop(train_dataset.index)
train_dataset1 = train_dataset[['batsmanIdx','bowlerIdx','ballNum','ballsRemaining','runs','runRate','numWickets','runsMomentum','perfIndex']]
test_dataset1 = test_dataset[['batsmanIdx','bowlerIdx','ballNum','ballsRemaining','runs','runRate','numWickets','runsMomentum','perfIndex']]
train_dataset1
train_labels = train_dataset.pop('isWinner')
test_labels = test_dataset.pop('isWinner')
train_dataset1

a=train_dataset1.describe()
stats=a.transpose
a

Shape of dataframe= (1359888, 10)
batsmanIdx	bowlerIdx	ballNum	ballsRemaining	runs	runRate	numWickets	runsMomentum	perfIndex
count	1.087910e+06	1.087910e+06	1.087910e+06	1.087910e+06	1.087910e+06	1.087910e+06	1.087910e+06	1.087910e+06	1.087910e+06
mean	2.561058e+03	1.939449e+03	1.185352e+02	6.001942e+01	8.110290e+01	1.611611e+00	2.604912e+00	2.886850e-01	9.619675e+00
std	1.479446e+03	1.095097e+03	6.934078e+01	3.514725e+01	4.977998e+01	2.983874e+00	2.195410e+00	6.066070e-01	4.602859e+00
min	1.000000e+00	1.000000e+00	1.000000e+00	1.000000e+00	-5.000000e+00	-5.000000e+00	0.000000e+00	3.571429e-02	0.000000e+00
25%	1.230000e+03	9.400000e+02	5.900000e+01	3.000000e+01	4.100000e+01	1.043478e+00	1.000000e+00	1.058824e-01	6.539326e+00
50%	2.492000e+03	1.919000e+03	1.170000e+02	5.900000e+01	7.800000e+01	1.300000e+00	2.000000e+00	1.408451e-01	9.246753e+00
75%	3.868000e+03	2.884000e+03	1.770000e+02	9.000000e+01	1.170000e+02	1.590312e+00	4.000000e+00	2.352941e-01	1.218349e+01
max	5.226000e+03	3.848000e+03	2.860000e+02	1.610000e+02	2.780000e+02	2.510000e+02	1.000000e+01	1.100000e+01	6.600000e+01

c) Define the Deep Learning model

import pandas as pd
import numpy as np
from keras.layers import Input, Embedding, Flatten, Dense
from keras.models import Model
from keras.layers import Input, Embedding, Flatten, Dense, Reshape, Concatenate, Dropout
from keras.models import Model

tf.random.set_seed(432)
# create input layers for each of the predictors
batsmanIdx_input = Input(shape=(1,), name='batsmanIdx')
bowlerIdx_input = Input(shape=(1,), name='bowlerIdx')
ballNum_input = Input(shape=(1,), name='ballNum')
ballsRemaining_input = Input(shape=(1,), name='ballsRemaining')
runs_input = Input(shape=(1,), name='runs')
runRate_input = Input(shape=(1,), name='runRate')
numWickets_input = Input(shape=(1,), name='numWickets')
runsMomentum_input = Input(shape=(1,), name='runsMomentum')
perfIndex_input = Input(shape=(1,), name='perfIndex')

# Set the embedding size
no_of_unique_batman=len(df1["batsmanIdx"].unique())
print(no_of_unique_batman)
no_of_unique_bowler=len(df1["bowlerIdx"].unique())
print(no_of_unique_bowler)
embedding_size_bat = no_of_unique_batman ** (1/4)
embedding_size_bwl = no_of_unique_bowler ** (1/4)


# create embedding layer for the categorical predictor
batsmanIdx_embedding = Embedding(input_dim=no_of_unique_batman+1, output_dim=16,input_length=1)(batsmanIdx_input)
batsmanIdx_flatten = Flatten()(batsmanIdx_embedding)
bowlerIdx_embedding = Embedding(input_dim=no_of_unique_bowler+1, output_dim=16,input_length=1)(bowlerIdx_input)
bowlerIdx_flatten = Flatten()(bowlerIdx_embedding)

# concatenate all the predictors
x = keras.layers.concatenate([batsmanIdx_flatten,bowlerIdx_flatten, ballNum_input, ballsRemaining_input, runs_input, runRate_input, numWickets_input, runsMomentum_input, perfIndex_input])

# add hidden layers
x = Dense(64, activation='relu')(x)
x = Dropout(0.1)(x)
x = Dense(32, activation='relu')(x)
x = Dropout(0.1)(x)
x = Dense(16, activation='relu')(x)
x = Dropout(0.1)(x)
x = Dense(8, activation='relu')(x)
x = Dropout(0.1)(x)
# add output layer
output = Dense(1, activation='sigmoid', name='output')(x)
print(output.shape)
# create model

# Initialize a new W&B run
#run = wandb.init(project='t20', group='cricket')

model = Model(inputs=[batsmanIdx_input,bowlerIdx_input, ballNum_input, ballsRemaining_input, runs_input, runRate_input, numWickets_input, runsMomentum_input, perfIndex_input], outputs=output)
model.summary()

# Initialize a new W&B run
run = wandb.init(project='t20', group='cricket')
wandb.init(
    # set the wandb project where this run will be logged
    project="t20",

    # track hyperparameters and run metadata
    config={
    "learning_rate": 0.02,
    "dropout": 0.01,
    "batch_size": 1024,
    "epochs": 5,
    }
)


5226
3848
(None, 1)
Model: "model"
__________________________________________________________________________________________________
 Layer (type)                Output Shape                 Param #   Connected to                  
==================================================================================================
 batsmanIdx (InputLayer)     [(None, 1)]                  0         []                            
                                                                                                  
 bowlerIdx (InputLayer)      [(None, 1)]                  0         []                            
                                                                                                  
 embedding (Embedding)       (None, 1, 16)                83632     ['batsmanIdx[0][0]']          
                                                                                                  
 embedding_1 (Embedding)     (None, 1, 16)                61584     ['bowlerIdx[0][0]']           
                                                                                                  
 flatten (Flatten)           (None, 16)                   0         ['embedding[0][0]']           
                                                                                                  
 flatten_1 (Flatten)         (None, 16)                   0         ['embedding_1[0][0]']         
                                                                                                  
 ballNum (InputLayer)        [(None, 1)]                  0         []                            
                                                                                                  
 ballsRemaining (InputLayer  [(None, 1)]                  0         []                            
 )                                                                                                
                                                                                                  
 runs (InputLayer)           [(None, 1)]                  0         []                            
                                                                                                  
 runRate (InputLayer)        [(None, 1)]                  0         []                            
                                                                                                  
 numWickets (InputLayer)     [(None, 1)]                  0         []                            
                                                                                                  
 runsMomentum (InputLayer)   [(None, 1)]                  0         []                            
                                                                                                  
 perfIndex (InputLayer)      [(None, 1)]                  0         []                            
                                                                                                  
 concatenate (Concatenate)   (None, 39)                   0         ['flatten[0][0]',             
                                                                     'flatten_1[0][0]',           
                                                                     'ballNum[0][0]',             
                                                                     'ballsRemaining[0][0]',      
                                                                     'runs[0][0]',                
                                                                     'runRate[0][0]',             
                                                                     'numWickets[0][0]',          
                                                                     'runsMomentum[0][0]',        
                                                                     'perfIndex[0][0]']           
                                                                                                  
 dense (Dense)               (None, 64)                   2560      ['concatenate[0][0]']         
                                                                                                  
 dropout (Dropout)           (None, 64)                   0         ['dense[0][0]']               
                                                                                                  
 dense_1 (Dense)             (None, 32)                   2080      ['dropout[0][0]']             
                                                                                                  
 dropout_1 (Dropout)         (None, 32)                   0         ['dense_1[0][0]']             
                                                                                                  
 dense_2 (Dense)             (None, 16)                   528       ['dropout_1[0][0]']           
                                                                                                  
 dropout_2 (Dropout)         (None, 16)                   0         ['dense_2[0][0]']             
                                                                                                  
 dense_3 (Dense)             (None, 8)                    136       ['dropout_2[0][0]']           
                                                                                                  
 dropout_3 (Dropout)         (None, 8)                    0         ['dense_3[0][0]']             
                                                                                                  
 output (Dense)              (None, 1)                    9         ['dropout_3[0][0]']           
                                                                                                  
==================================================================================================
Total params: 150529 (588.00 KB)
Trainable params: 150529 (588.00 KB)
Non-trainable params: 0 (0.00 Byte)

d) Create a Training script

def get_optimizer(lr=1e-2, optimizer="adam"):
    "Select optmizer between adam and sgd with momentum"
    if optimizer.lower() == "adam":
        return tf.keras.optimizers.Adam(learning_rate=lr)
    if optimizer.lower() == "sgd":
        return tf.keras.optimizers.SGD(learning_rate=lr, momentum=0.1)

def train(model, batch_size=1024, epochs=10, lr=1e-2, optimizer='adam', log_freq=10):

    # Compile model like you usually do.
    tf.keras.backend.clear_session()
    model.compile(loss="binary_crossentropy",
                  optimizer=get_optimizer(lr, optimizer),
                  metrics=["accuracy"])

    # callback setup
    cbs = [WandbCallback(data_type='auto', log_batch_frequency=None)]

    # train the model
    history=model.fit([train_dataset1['batsmanIdx'],train_dataset1['bowlerIdx'],train_dataset1['ballNum'],train_dataset1['ballsRemaining'],train_dataset1['runs'],
           train_dataset1['runRate'],train_dataset1['numWickets'],train_dataset1['runsMomentum'],train_dataset1['perfIndex']], train_labels, epochs=epochs, batch_size=batch_size,callbacks=cbs,
          validation_data = ([test_dataset1['batsmanIdx'],test_dataset1['bowlerIdx'],test_dataset1['ballNum'],test_dataset1['ballsRemaining'],test_dataset1['runs'],
           test_dataset1['runRate'],test_dataset1['numWickets'],test_dataset1['runsMomentum'],test_dataset1['perfIndex']],test_labels), verbose=1)

e) Define the sweep for Grid Search

#Grid search
sweep_config = {
    'method': 'grid'
    }

metric = {
    'name': 'val_loss',
    'goal': 'minimize'
    }

sweep_config['metric'] = metric
# Optimizers - Adam, SGD
parameters_dict = {
    'optimizer': {
        'values': ['adam', 'sgd']
        },
    'dropout': {
          'values': [0.1, 0.05]
        },
    }

sweep_config['parameters'] = parameters_dict

parameters_dict.update({
    'epochs': {
        'value': 20}
    })

import math
# Set learning_rate, batch_size
parameters_dict.update({
    'learning_rate': {
         'values': [0.005,0.008,0.01,.03]  
      },
    'batch_size': {
        'values': [1024,2048]
      }
    })

import pprint
pprint.pprint(sweep_config)

'method': 'grid',
 'metric': {'goal': 'minimize', 'name': 'val_loss'},
 'parameters': {'batch_size': {'values': [1024, 2048]},
                'dropout': {'values': [0.1, 0.05]},
                'epochs': {'value': 20},
                'learning_rate': {'values': [0.005, 0.008, 0.01, 0.03]},
                'optimizer': {'values': ['adam', 'sgd']}}}

f) Wrap the Training Loop


def sweep_train(config_defaults=None):
    # Initialize wandb with a sample project name
    with wandb.init(config=config_defaults):  # this gets over-written in the Sweep

        # Specify the other hyperparameters to the configuration, if any
        wandb.config.architecture_name = "DL"
        wandb.config.dataset_name = "T20"
        # initialize model
        #model = T20Net(wandb.config.dropout)

        train(model,
              wandb.config.batch_size,
              wandb.config.epochs,
              wandb.config.learning_rate,
              wandb.config.optimizer)

g) Initialise Sweep and Run Agent

sweep_id = wandb.sweep(sweep_config, project="sweeps-keras-t20")

wandb.agent(sweep_id, sweep_train, count=10)

wandb: WARNING Calling wandb.login() after wandb.init() has no effect.
wandb: Agent Starting Run: zbaaq0bn with config:
wandb: 	batch_size: 1024
wandb: 	dropout: 0.1
wandb: 	epochs: 20
wandb: 	learning_rate: 0.005
wandb: 	optimizer: adam

Epoch 19/20
1061/1063 [============================>.] - ETA: 0s - loss: 0.3073 - accuracy: 0.8490/usr/local/lib/python3.10/dist-packages/keras/src/engine/training.py:3000: UserWarning: You are saving your model as an HDF5 file via `model.save()`. This file format is considered legacy. We recommend using instead the native Keras format, e.g. `model.save('my_model.keras')`.
  saving_api.save_model(
wandb: Adding directory to artifact (/content/wandb/run-20231004_065327-zbaaq0bn/files/model-best)... Done. 0.0s
1063/1063 [==============================] - 15s 14ms/step - loss: 0.3073 - accuracy: 0.8490 - val_loss: 0.3093 - val_accuracy: 0.8479
Epoch 20/20
1062/1063 [============================>.] - ETA: 0s - loss: 0.3052 - accuracy: 0.8502/usr/local/lib/python3.10/dist-packages/keras/src/engine/training.py:3000: UserWarning: You are saving your model as an HDF5 file via `model.save()`. This file format is considered legacy. We recommend using instead the native Keras format, e.g. `model.save('my_model.keras')`.
  saving_api.save_model(
wandb: Adding directory to artifact (/content/wandb/run-20231004_065327-zbaaq0bn/files/model-best)... Done. 0.0s
1063/1063 [==============================] - 18s 17ms/step - loss: 0.3052 - accuracy: 0.8502 - val_loss: 0.3068 - val_accuracy: 0.8490
Waiting for W&B process to finish... (success).
Run history:

accuracy	▁▅▅▆▆▆▇▇▇▇▇▇▇███████
epoch	▁▁▂▂▂▃▃▄▄▄▅▅▅▆▆▇▇▇██
loss	█▅▄▃▃▃▃▂▂▂▂▂▂▂▁▁▁▁▁▁
val_accuracy	▁▂▃▄▄▅▅▅▆▆▆▇▇▇▇▇▇███
val_loss	█▆▅▅▄▄▃▃▃▃▃▂▂▂▂▂▂▁▁▁

Run summary:

accuracy	0.85022
best_epoch	19
best_val_loss	0.30681
epoch	19
loss	0.30521
val_accuracy	0.849
val_loss	0.30681

...
...
wandb: Agent Starting Run: 4qtyxzq9 with config:
wandb: 	batch_size: 1024
wandb: 	dropout: 0.1
wandb: 	epochs: 20
wandb: 	learning_rate: 0.008
wandb: 	optimizer: sgd
...
...

Epoch 18/20
1063/1063 [==============================] - 13s 12ms/step - loss: 0.2672 - accuracy: 0.8697 - val_loss: 0.2819 - val_accuracy: 0.8624
Epoch 19/20
1061/1063 [============================>.] - ETA: 0s - loss: 0.2669 - accuracy: 0.8697/usr/local/lib/python3.10/dist-packages/keras/src/engine/training.py:3000: UserWarning: You are saving your model as an HDF5 file via `model.save()`. This file format is considered legacy. We recommend using instead the native Keras format, e.g. `model.save('my_model.keras')`.
  saving_api.save_model(
wandb: Adding directory to artifact (/content/wandb/run-20231004_070920-4qtyxzq9/files/model-best)... Done. 0.0s
1063/1063 [==============================] - 14s 13ms/step - loss: 0.2669 - accuracy: 0.8697 - val_loss: 0.2813 - val_accuracy: 0.8635
Epoch 20/20
1063/1063 [==============================] - 13s 12ms/step - loss: 0.2650 - accuracy: 0.8707 - val_loss: 0.2957 - val_accuracy: 0.8557
Waiting for W&B process to finish... (success).
6.805 MB of 6.818 MB uploaded (0.108 MB deduped)
Run history:

accuracy	▁▂▃▃▄▄▄▄▄▄▄▄▅▅▄▆▅▆▆█
epoch	▁▁▂▂▂▃▃▄▄▄▅▅▅▆▆▇▇▇██
loss	█▇▆▆▅▅▅▅▅▅▅▄▄▄▄▄▄▃▃▁
val_accuracy	▇▅▅▁█▅▇▆▆▅█▅▅▆▃▇▁▇█▁
val_loss	▃▄▄▅▁▃▂▃▃▃▁▄▄▂▆▂█▁▁█

Run summary:

accuracy	0.87067
best_epoch	18
best_val_loss	0.28127
epoch	19
loss	0.26499
val_accuracy	0.85565
val_loss	0.29573
...
...
wandb: Agent Starting Run: lt2fknva with config:
wandb: 	batch_size: 1024
wandb: 	dropout: 0.1
wandb: 	epochs: 20
wandb: 	learning_rate: 0.01
wandb: 	optimizer: adam
Tracking run with wandb version 0.15.11
Run data is saved locally in /content/wandb/run-20231004_071359-lt2fknva
Syncing run lively-sweep-5 to Weights & Biases (docs)
...
...
Epoch 19/20
1063/1063 [==============================] - 14s 13ms/step - loss: 0.2779 - accuracy: 0.8651 - val_loss: 0.2883 - val_accuracy: 0.8607
Epoch 20/20
1060/1063 [============================>.] - ETA: 0s - loss: 0.2795 - accuracy: 0.8643/usr/local/lib/python3.10/dist-packages/keras/src/engine/training.py:3000: UserWarning: You are saving your model as an HDF5 file via `model.save()`. This file format is considered legacy. We recommend using instead the native Keras format, e.g. `model.save('my_model.keras')`.
  saving_api.save_model(
wandb: Adding directory to artifact (/content/wandb/run-20231004_071359-lt2fknva/files/model-best)... Done. 0.0s
1063/1063 [==============================] - 16s 15ms/step - loss: 0.2795 - accuracy: 0.8643 - val_loss: 0.2831 - val_accuracy: 0.8620
Waiting for W&B process to finish... (success).
Run history:

accuracy	▁▁▁▂▂▃▃▄▅▅▅▆▆▆▆▆▇▇█▇
epoch	▁▁▂▂▂▃▃▄▄▄▅▅▅▆▆▇▇▇██
loss	███▇▇▆▅▆▅▄▄▃▃▃▃▂▂▂▁▂
val_accuracy	▁▅▂▆▆▅▂▆▆▅▇▇▆▇▅▃▃▆▇█
val_loss	▇▆▇▅▃▅█▆▅▄▂▃▄▂▆▆▇▃▃▁

Run summary:

accuracy	0.8643
best_epoch	19
best_val_loss	0.28309
epoch	19
loss	0.27949
val_accuracy	0.86195
val_loss	0.28309
...
...

In the W & B site each of the runs or captured very nicely

The best model is ‘lively-sweep-5‘ with the lowest validation loss

The picture below gives the validation loss for various combinations of the hyper-para meters

It is very easy to visually pick the best model with loss as shown below. It is lively-sweep-5. we can see the values of the hyper-parameters for this DL model

Details of optimal Deep Learning model

a. Run – lively-sweep-5

b. optimizer – adam

c. learning_rate – 0.01

d. batch_size – 1024

e. dropout – 0.1