This notebook implements a classifier of MNIST digits. This is an example where a model maps a tensor to a discrete value, i.e. $f: \mathbb{R}^{d1\times d2} \rightarrow \{0,..9\}$. We will implement two classifiers, one a multi-layer perception (MLP) and another using convolutional neural network (CNN).
Let's try a multi-layer MLP where $x$ is the input and $W, b$ are the parameters (to be learned). We will use $\text{ReLU}$ as the non-linear activation function, which is an built-in function available in Kokoyi.
\Module{MLP}{x; W, b} L \gets |W| \\ h[0 \leq i \leq L] \gets \begin{cases} x & i = 0 \\ \ReLU (W[i-1] \cdot h[i-1] + b[i-1]) & i < L \\ W[i-1] \cdot h[i-1] + b[i-1] & otherwise \\ \end{cases} \\ \Return h[L] \\ \EndModule
\Module{MLP}{x; W, b}
L \gets |W| \\
h[0 \leq i \leq L] \gets
\begin{cases}
x & i = 0 \\
\ReLU (W[i-1] \cdot h[i-1] + b[i-1]) & i < L \\
W[i-1] \cdot h[i-1] + b[i-1] & otherwise \\
\end{cases} \\
\Return h[L] \\
\EndModule
Here we first get the number of layers, and then define transformation per layer. Note how layers (and arrays in general) are defined and indexed with brackets. The pattern here is very much like recursion: 1) specify the boundary conditions (using cases
), 2) write transition function (on the right-hand side), 3) take care of iteration range (on the left-hand side).
Initialization. We have just used Kokoyi to write our module, its initialization is just like a standard PyTorch module, Kokoyi automatically establishes the names of the modules:
import kokoyi
import torch
import torch.nn as nn
import torch.nn.functional as F
class MLP(torch.nn.Module):
def __init__(self, dims):
super().__init__()
self.W = nn.ParameterList([nn.Parameter(torch.empty(dims[i + 1], dims[i]) - 0.5) for i in range(len(dims) - 1)])
self.b = nn.ParameterList([nn.Parameter(torch.empty(dims[i + 1])) for i in range(len(dims) - 1)])
for param in self.W:
nn.init.xavier_uniform_(param)
for param in self.b:
nn.init.uniform_(param)
def get_parameters(self):
return self.W, self.b
forward = kokoyi.symbol["MLP"]
However, you can also let Kokoyi set it up and just do some filling. You can do so when you are on a cell of a Kokoyi module, and just hit the button at the top manual.
class MLP(torch.nn.Module): def __init__(self): """ Add your code for parameter initialization here (not necessarily the same names).""" super().__init__() self.W = None self.b = None def get_parameters(self): """ Change the following code to return the parameters as a tuple in the order of (W, b).""" return None forward = kokoyi.symbol["MLP"]
Now you can build a $MLP$ classifier by instantiating one:
dims = [1 * 28 * 28, 256, 10]
mlp = MLP(dims)
Before we move on, let's define a model that uses CNNs in addition to MLP. CNN is much more efficient to process images using the inductive bias inherent to visual data.
We will use $ConvBlock$ module to extract features from the input images, and generate new features to feed into the $MLP$ module. $ConvBlock$ consists of two convolutions, each followed by a rectified linear unit ($ReLU$). Note that $Flatten$ is an built-in function of Kokoyi, which flattens input by reshaping it into a one-dimensional tensor.
\Module {ConvBlock}{x ; Conv2d_0, Conv2d_1}
\Return \ReLU(Conv2d_1(\ReLU(Conv2d_0(x)))) \\
\EndModule
\Module{CNN}{x; ConvBlock, MLP}
h_c \gets \MaxPool2d(ConvBlock(x), 2)\\
h \gets \Flatten(h_c) \\
\hat{y} \gets MLP(h) \\
\Return \hat{y} \\
\EndModule
Similarly, you need to initialize the CNN module, as in cell below, or you can use auto-init feature.
class ConvBlock(torch.nn.Module): def __init__(self): """ Add your code for parameter initialization here (not necessarily the same names).""" super().__init__() self.Conv2d_0 = None self.Conv2d_1 = None def get_parameters(self): """ Change the following code to return the parameters as a tuple in the order of (Conv2d_0, Conv2d_1).""" return None forward = kokoyi.symbol["ConvBlock"] class CNN(torch.nn.Module): def __init__(self): """ Add your code for parameter initialization here (not necessarily the same names).""" super().__init__() self.ConvBlock = None self.MLP = None def get_parameters(self): """ Change the following code to return the parameters as a tuple in the order of (ConvBlock, MLP).""" return None forward = kokoyi.symbol["CNN"]
# Initialize the CNN module
class ConvBlock(torch.nn.Module):
def __init__(self, in_channels, out_channels, mid_channels=None):
super().__init__()
if not mid_channels:
mid_channels = out_channels
self.conv0 = kokoyi.nn.Conv2d(in_channels, mid_channels, 3, 1, 1)
self.conv1 = kokoyi.nn.Conv2d(mid_channels, out_channels, 3, 1, 1)
def get_parameters(self):
return self.conv0, self.conv1
forward = kokoyi.symbol["ConvBlock"]
class CNN(torch.nn.Module):
def __init__(self, channels, Linear_dims):
super().__init__()
self.ConvBlock = ConvBlock(channels[0], channels[2], channels[1])
self.MLP = MLP(Linear_dims)
def get_parameters(self):
return self.ConvBlock, self.MLP
forward = kokoyi.symbol["CNN"]
Note that we call kokoyi.nn.Conv2d
not torch.nn.Conv2d
in ConvBlock definition. NN modules in Kokoyi are basically the same as NN modules in torch, but with some changes to facilitate auto-batching -- you noticed that all definitions in Kokoyi are on single sample, auto-batching refers to Kokoyi compiler's ability to batch samples automatically in training. There is a separate note on how to port PyTorch modules into Kokoyi, as this is a slightly advanced topics. For now, all you need to know is that configuring kokoyi module takes the same parameters as in PyTorch.
Now you can build a $CNN$ classifier by instantiating one:
cnn = CNN([1, 32, 32], [32 * 14 * 14, 256, 10])
Loss. We use the standard cross entropy loss:
loss(\hat{y}, y) \gets \CrossEntropy(\hat{y}, y) \\
import torchvision
import torchvision.transforms as transforms
batch_size = 32
train_mnist = torchvision.datasets.MNIST(root='data/mnist', train=True, download=True,
transform=transforms.ToTensor())
train_data_loader = torch.utils.data.DataLoader(train_mnist,
batch_size=batch_size,
shuffle=True, pin_memory=True, num_workers=4)
test_mnist = torchvision.datasets.MNIST(root='data/mnist', train=False, download=True,
transform=transforms.ToTensor())
test_data_loader = torch.utils.data.DataLoader(test_mnist,
batch_size=batch_size,
shuffle=True, pin_memory=True, num_workers=4)
# Use GPU if possible
if torch.cuda.is_available():
device_name = 'cuda'
else:
device_name = 'cpu'
print('Using device: ', device_name)
device = torch.device(device_name)
Now we can start training both our $MLP$ model and $CNN$ model using the standard SGD training loop.
from torch.optim import SGD
# Put our models on device
cnn = cnn.to(device)
mlp = mlp.to(device)
# Use SGD with momentum
optimizer_cnn = SGD(cnn.parameters(), lr=1e-2, momentum=0.9)
optimizer_mlp = SGD(mlp.parameters(), lr=1e-2, momentum=0.9)
for epoch in range(5):
cnn_train_acc = 0
mlp_train_acc = 0
for batch_X, batch_y in train_data_loader:
batch_X, batch_y = batch_X.to(device), batch_y.to(device)
# zero the parameter gradients
optimizer_cnn.zero_grad()
optimizer_mlp.zero_grad()
# forward
cnn_pred_y = cnn(batch_X, batch_level=[1]).squeeze()
mlp_pred_y = mlp(torch.flatten(batch_X, 1), batch_level=[1]).squeeze()
# loss
cnn_loss = kokoyi.symbol['loss'](cnn_pred_y, batch_y)
mlp_loss = kokoyi.symbol['loss'](mlp_pred_y, batch_y)
# backward + optimize
cnn_loss.backward()
optimizer_cnn.step()
# backward + optimzie
mlp_loss.backward()
optimizer_mlp.step()
# count train acc
cnn_train_acc += sum(torch.where(torch.argmax(cnn_pred_y, dim=1) == batch_y, 1, 0))
mlp_train_acc += sum(torch.where(torch.argmax(mlp_pred_y, dim=1) == batch_y, 1, 0))
cnn_train_acc = 1. * cnn_train_acc / len(train_data_loader) / batch_size
mlp_train_acc = 1. * mlp_train_acc / len(train_data_loader) / batch_size
print('=' * 20 + ' epoch ' + str(epoch) + ' ' + '=' * 20)
print('CNN Training accuracy : %.6f' % (cnn_train_acc))
print('MLP Training accuracy : %.6f' % (mlp_train_acc))
Finally, we can test the model accuracy with similar code.
cnn_test_acc = 0
mlp_test_acc = 0
for batch_X, batch_y in test_data_loader:
batch_X, batch_y = batch_X.to(device), batch_y.to(device)
cnn_p = cnn(batch_X, batch_level=[1])
mlp_p = mlp(torch.flatten(batch_X, 1), batch_level=[1])
cnn_pred_y = F.softmax(cnn_p.squeeze(), dim=1)
mlp_pred_y = F.softmax(mlp_p.squeeze(), dim=1)
cnn_test_acc += sum(torch.where(torch.argmax(cnn_pred_y, dim=1) == batch_y, 1, 0))
mlp_test_acc += sum(torch.where(torch.argmax(mlp_pred_y, dim=1) == batch_y, 1, 0))
print('CNN Test accuracy: %.6f' % (1. * cnn_test_acc / len(test_data_loader) / batch_size))
print('MLP Test accuracy: %.6f' % (1. * mlp_test_acc / len(test_data_loader) / batch_size))