Classifier: MLP and CNN

This notebook implements a classifier of MNIST digits. This is an example where a model maps a tensor to a discrete value, i.e. $f: \mathbb{R}^{d1\times d2} \rightarrow \{0,..9\}$. We will implement two classifiers, one a multi-layer perception (MLP) and another using convolutional neural network (CNN).

MLP Classifier

Let's try a multi-layer MLP where $x$ is the input and $W, b$ are the parameters (to be learned). We will use $\text{ReLU}$ as the non-linear activation function, which is an built-in function available in Kokoyi.

Click here

to reveal the answer and copy-paste them into the cell below:
\Module{MLP}{x; W, b}
L \gets |W| \\
h[0 \leq i \leq L] \gets
    \begin{cases}
        x & i = 0 \\
        \ReLU (W[i-1] \cdot h[i-1] + b[i-1]) & i < L \\
        W[i-1] \cdot h[i-1] + b[i-1] & otherwise \\
    \end{cases} \\
\Return h[L] \\
\EndModule
In [1]:
\Module{MLP}{x; W, b}
L \gets |W| \\
h[0 \leq i \leq L] \gets
    \begin{cases}
        x & i = 0 \\
        \ReLU (W[i-1] \cdot h[i-1] + b[i-1]) & i < L \\
        W[i-1] \cdot h[i-1] + b[i-1] & otherwise \\
    \end{cases} \\
\Return h[L] \\
\EndModule

\[\require{color}\] \begin{array}{l} \rule[0pt]{160mm}{1.00mm}\\[0pt] \textbf{Module}\quad \mathit{\mathit{MLP}} (\mathit{x};\ \mathit{W},\ \mathit{b})\\[0pt] \rule[0pt]{160mm}{0.50mm}\\[0pt] \mathit{L} \gets |\mathit{W}|\\[0pt] \mathit{h}_{[0 \leq \mathit{i} \leq \mathit{L}]} \gets \begin{cases} \mathit{x} & \mathit{i} = 0\\[0pt] {\color{blue}\operatorname{ReLU}}({\mathit{W}}_{[{{\mathit{i} - 1}}]} \cdot {\mathit{h}}_{[{{\mathit{i} - 1}}]} + {\mathit{b}}_{[{{\mathit{i} - 1}}]}) & \mathit{i} < \mathit{L}\\[0pt] {\mathit{W}}_{[{{\mathit{i} - 1}}]} \cdot {\mathit{h}}_{[{{\mathit{i} - 1}}]} + {\mathit{b}}_{[{{\mathit{i} - 1}}]} & \mathit{otherwise}\\[0pt] \end{cases} \\[0pt] \textbf{Return}\ {\mathit{h}}_{[{{\mathit{L}}}]}\\[0pt] \rule[0pt]{160mm}{1.00mm}\\[0pt] \end{array}


Here we first get the number of layers, and then define transformation per layer. Note how layers (and arrays in general) are defined and indexed with brackets. The pattern here is very much like recursion: 1) specify the boundary conditions (using cases), 2) write transition function (on the right-hand side), 3) take care of iteration range (on the left-hand side).

Initialization. We have just used Kokoyi to write our module, its initialization is just like a standard PyTorch module, Kokoyi automatically establishes the names of the modules:

In [2]:
import kokoyi
import torch
import torch.nn as nn
import torch.nn.functional as F

class MLP(torch.nn.Module):
    def __init__(self, dims):
        super().__init__()
        self.W = nn.ParameterList([nn.Parameter(torch.empty(dims[i + 1], dims[i]) - 0.5) for i in range(len(dims) - 1)])
        self.b = nn.ParameterList([nn.Parameter(torch.empty(dims[i + 1])) for i in range(len(dims) - 1)])
        for param in self.W:
            nn.init.xavier_uniform_(param)  
        for param in self.b:
            nn.init.uniform_(param)

    def get_parameters(self):
        return self.W, self.b

    forward = kokoyi.symbol["MLP"]

However, you can also let Kokoyi set it up and just do some filling. You can do so when you are on a cell of a Kokoyi module, and just hit the button at the top manual.

Click here

to see the default initialization code generated by Kokoyi for this model:
class MLP(torch.nn.Module):
    def __init__(self):
        """ Add your code for parameter initialization here (not necessarily the same names)."""
        super().__init__()
        self.W = None
        self.b = None

    def get_parameters(self):
        """ Change the following code to return the parameters as a tuple in the order of (W, b)."""
        return None

    forward = kokoyi.symbol["MLP"]

Now you can build a $MLP$ classifier by instantiating one:

In [3]:
dims = [1 * 28 * 28, 256, 10]
mlp = MLP(dims)

CNN Classifier

Before we move on, let's define a model that uses CNNs in addition to MLP. CNN is much more efficient to process images using the inductive bias inherent to visual data.

We will use $ConvBlock$ module to extract features from the input images, and generate new features to feed into the $MLP$ module. $ConvBlock$ consists of two convolutions, each followed by a rectified linear unit ($ReLU$). Note that $Flatten$ is an built-in function of Kokoyi, which flattens input by reshaping it into a one-dimensional tensor.

In [4]:
\Module {ConvBlock}{x ; Conv2d_0, Conv2d_1} 
    \Return \ReLU(Conv2d_1(\ReLU(Conv2d_0(x)))) \\
\EndModule

\Module{CNN}{x; ConvBlock, MLP}
    h_c \gets \MaxPool2d(ConvBlock(x), 2)\\ 
    h \gets \Flatten(h_c) \\ 
    \hat{y} \gets MLP(h) \\
    \Return \hat{y} \\
\EndModule

\[\require{color}\] \begin{array}{l} \rule[0pt]{160mm}{1.00mm}\\[0pt] \textbf{Module}\quad \mathit{\mathit{ConvBlock}} (\mathit{x};\ \mathit{Conv2d_0},\ \mathit{Conv2d_1})\\[0pt] \rule[0pt]{160mm}{0.50mm}\\[0pt] \textbf{Return}\ {\color{blue}\operatorname{ReLU}}(\mathit{Conv2d_1}({\color{blue}\operatorname{ReLU}}(\mathit{Conv2d_0}(\mathit{x}))))\\[0pt] \rule[0pt]{160mm}{1.00mm}\\[0pt] \rule[0pt]{160mm}{1.00mm}\\[0pt] \textbf{Module}\quad \mathit{\mathit{CNN}} (\mathit{x};\ \mathit{ConvBlock},\ \mathit{MLP})\\[0pt] \rule[0pt]{160mm}{0.50mm}\\[0pt] \mathit{h_c} \gets {\color{blue}\operatorname{MaxPool2d}}(\mathit{ConvBlock}(\mathit{x}), 2)\\[0pt] \mathit{h} \gets {\color{blue}\operatorname{Flatten}}(\mathit{h_c})\\[0pt] \mathit{\hat{y}} \gets \mathit{MLP}(\mathit{h})\\[0pt] \textbf{Return}\ \mathit{\hat{y}}\\[0pt] \rule[0pt]{160mm}{1.00mm}\\[0pt] \end{array}


Similarly, you need to initialize the CNN module, as in cell below, or you can use auto-init feature.

Click here

to see the default initialization code generated by Kokoyi for this model:
class ConvBlock(torch.nn.Module):
    def __init__(self):
        """ Add your code for parameter initialization here (not necessarily the same names)."""
        super().__init__()
        self.Conv2d_0 = None
        self.Conv2d_1 = None

    def get_parameters(self):
        """ Change the following code to return the parameters as a tuple in the order of (Conv2d_0, Conv2d_1)."""
        return None

    forward = kokoyi.symbol["ConvBlock"]


class CNN(torch.nn.Module):
    def __init__(self):
        """ Add your code for parameter initialization here (not necessarily the same names)."""
        super().__init__()
        self.ConvBlock = None
        self.MLP = None

    def get_parameters(self):
        """ Change the following code to return the parameters as a tuple in the order of (ConvBlock, MLP)."""
        return None

    forward = kokoyi.symbol["CNN"]
In [5]:
# Initialize the CNN module
class ConvBlock(torch.nn.Module):
    def __init__(self, in_channels, out_channels, mid_channels=None):
        super().__init__()
        if not mid_channels:
            mid_channels = out_channels
        self.conv0 = kokoyi.nn.Conv2d(in_channels, mid_channels, 3, 1, 1)
        self.conv1 = kokoyi.nn.Conv2d(mid_channels, out_channels, 3, 1, 1)

    def get_parameters(self):
        return self.conv0, self.conv1

    forward = kokoyi.symbol["ConvBlock"]


class CNN(torch.nn.Module):
    def __init__(self, channels, Linear_dims):
        super().__init__()
        self.ConvBlock = ConvBlock(channels[0], channels[2], channels[1])
        self.MLP = MLP(Linear_dims)
    
    def get_parameters(self):
        return self.ConvBlock, self.MLP
    
    forward = kokoyi.symbol["CNN"]

Note that we call kokoyi.nn.Conv2d not torch.nn.Conv2d in ConvBlock definition. NN modules in Kokoyi are basically the same as NN modules in torch, but with some changes to facilitate auto-batching -- you noticed that all definitions in Kokoyi are on single sample, auto-batching refers to Kokoyi compiler's ability to batch samples automatically in training. There is a separate note on how to port PyTorch modules into Kokoyi, as this is a slightly advanced topics. For now, all you need to know is that configuring kokoyi module takes the same parameters as in PyTorch.

Now you can build a $CNN$ classifier by instantiating one:

In [6]:
cnn = CNN([1, 32, 32], [32 * 14 * 14, 256, 10])

Loss. We use the standard cross entropy loss:

In [7]:
loss(\hat{y}, y) \gets \CrossEntropy(\hat{y}, y) \\

\[\require{color}\] \begin{array}{l} \mathit{loss}(\mathit{\hat{y}}, \mathit{y})\gets {\color{blue}\operatorname{CrossEntropy}}(\mathit{\hat{y}}, \mathit{y})\\[0pt] \end{array}


Training loop and data pipeline

We will use MNIST in torchvision, which is basically a bunch of 2D images (of handwritten numbers) and their class labels (from 0 to 9). Let's continue with some basic setup.

In [ ]:
import torchvision
import torchvision.transforms as transforms

batch_size = 32

train_mnist = torchvision.datasets.MNIST(root='data/mnist', train=True, download=True,
                                                transform=transforms.ToTensor())
train_data_loader = torch.utils.data.DataLoader(train_mnist,
                                                batch_size=batch_size,
                                                shuffle=True, pin_memory=True, num_workers=4)
test_mnist = torchvision.datasets.MNIST(root='data/mnist', train=False, download=True,
                                               transform=transforms.ToTensor())
test_data_loader = torch.utils.data.DataLoader(test_mnist,
                                               batch_size=batch_size,
                                               shuffle=True, pin_memory=True, num_workers=4)
                                               
# Use GPU if possible
if torch.cuda.is_available():
    device_name = 'cuda'
else:
    device_name = 'cpu'
print('Using device: ', device_name)
device = torch.device(device_name)

Now we can start training both our $MLP$ model and $CNN$ model using the standard SGD training loop.

In [ ]:
from torch.optim import SGD

# Put our models on device
cnn = cnn.to(device)
mlp = mlp.to(device)
# Use SGD with momentum
optimizer_cnn = SGD(cnn.parameters(), lr=1e-2, momentum=0.9)
optimizer_mlp = SGD(mlp.parameters(), lr=1e-2, momentum=0.9)

for epoch in range(5):
    cnn_train_acc = 0
    mlp_train_acc = 0
    for batch_X, batch_y in train_data_loader:        
        batch_X, batch_y = batch_X.to(device), batch_y.to(device)
        
        # zero the parameter gradients
        optimizer_cnn.zero_grad()
        optimizer_mlp.zero_grad()
        
        # forward
        cnn_pred_y = cnn(batch_X, batch_level=[1]).squeeze()
        mlp_pred_y = mlp(torch.flatten(batch_X, 1), batch_level=[1]).squeeze()
        
        # loss
        cnn_loss = kokoyi.symbol['loss'](cnn_pred_y, batch_y)
        mlp_loss = kokoyi.symbol['loss'](mlp_pred_y, batch_y)
        
        # backward + optimize
        cnn_loss.backward()
        optimizer_cnn.step()
        
        # backward + optimzie
        mlp_loss.backward()
        optimizer_mlp.step()
        
        # count train acc
        cnn_train_acc += sum(torch.where(torch.argmax(cnn_pred_y, dim=1) == batch_y, 1, 0))  
        mlp_train_acc += sum(torch.where(torch.argmax(mlp_pred_y, dim=1) == batch_y, 1, 0))  

    cnn_train_acc = 1. * cnn_train_acc / len(train_data_loader) / batch_size
    mlp_train_acc = 1. * mlp_train_acc / len(train_data_loader) / batch_size
    print('=' * 20 + ' epoch ' + str(epoch) + ' ' + '=' * 20)
    print('CNN Training accuracy : %.6f' % (cnn_train_acc))
    print('MLP Training accuracy : %.6f' % (mlp_train_acc))

Finally, we can test the model accuracy with similar code.

In [ ]:
cnn_test_acc = 0
mlp_test_acc = 0

for batch_X, batch_y in test_data_loader:
    batch_X, batch_y = batch_X.to(device), batch_y.to(device)
    cnn_p = cnn(batch_X, batch_level=[1])
    mlp_p = mlp(torch.flatten(batch_X, 1), batch_level=[1])
    cnn_pred_y = F.softmax(cnn_p.squeeze(), dim=1)    
    mlp_pred_y = F.softmax(mlp_p.squeeze(), dim=1)    
    cnn_test_acc += sum(torch.where(torch.argmax(cnn_pred_y, dim=1) == batch_y, 1, 0))  
    mlp_test_acc += sum(torch.where(torch.argmax(mlp_pred_y, dim=1) == batch_y, 1, 0))  

print('CNN Test accuracy: %.6f' % (1. * cnn_test_acc / len(test_data_loader) / batch_size))
print('MLP Test accuracy: %.6f' % (1. * mlp_test_acc / len(test_data_loader) / batch_size))
In [ ]: