Image Classification with CNN

Wimukthi Madhusanka
4 min readOct 6, 2021

--

In this article, we will see how to build an Image classifier for the SVHN dataset using a Convolution Neural Network in Tensorflow.

A snapshot of the SVHN dataset.

Let us first look at what is CNN. A convolutional neural network (CNN, or ConvNet) is a type of artificial neural network that is frequently used in deep learning to interpret visual images. They are also known as shift invariant or space invariant artificial neural networks (SIANN) since they are built on the shared-weight architecture of convolution kernels or filters that slide along input features and generate translation equivariant outputs known as feature maps.

When compared to other image classification methods, CNNs require very minimal pre-processing. This implies that, in contrast to traditional methods, the network learns to improve the filters (or kernels) through automatic learning. This freedom from past information and human interference in feature extraction is a significant benefit of these models.

Now let’s have the dataset we are going to use.

SVHN Dataset

SVHN is a real-world image dataset that is widely used in object detection algorithms machine learning algorithms with very little data preprocessing and formatting requirements. This is quite similar to the MNIST dataset (e.g., the images are of tiny cropped digits), but it includes an order of magnitude more labelled data (nearly 600,000 digit images) and is based on a considerably tougher, unresolved real-world problem (recognizing digits and numbers in natural scene images). This dataset has been created using house numbers in Google Street View photos. You can access this dataset from here.

Let’s get started!!!

To begin, we’ll perform some imports and load the dataset.

Now let’s load data and perform the necessary preprocessing steps.

We’ll visualize some data points and see how they look like.

output:

Before the data into the model, we need to change the shape of the data. And then we normalize the data dividing by 255.0 so that it the model can converge much faster.

Now, since all the pre-processing is completed, we can initialize the model. This model will include two Conv2D layers activated by rectified linear unit (relu) function and two MaxPool2D layers followed by relu activated two dense layers with an 11-way softmax output.

The architecture of this model is shown below.

The model architecture of the Image classifier

let’s build the model and see the summary.

Output:

Model summary

Now we can compile the model and define some callbacks so that we can save the progress of the model training. When compiling the model, we’re going to use the “sparse-categorical-cross-entropy” as the loss function. We will use accuracy as the evaluation metric.

Here we will use two callbacks. The ModelCheckpoint callback is used to save the weights (in a checkpoint file) after each epoch, so the model or weights can be loaded later to continue the training from the state saved. CSVLogger callback streams epoch results to a CSV file.

Hooray!! Finally, we can fit the model. let’s use 30 epochs here.

Now we can check the model performance using the model.evaluate function.

Output:

Let’s try to visualize how our training and test set accuracies have changed over the number of epochs. We also can plot the loss vs. epoch graph to see how the loss has decreased over the number of epochs.

Output:

epoch vs accuracy plot and epoch vs loss plot

Looking at these graphs, we can see that both the testing data and training data have increased their accuracies with the number of epochs. But the model has not performed in the testing set as best as it has performed in the training set. This might be due to the overfitting. The model has fitted too well to the training data so that it doesn't perform that better in new instances. We can prevent this by using some regularization like L1 and L2. We will see that in another article.

If you want to know more about modelling neural networks in TensorFlow, you can visit their website at tensorflow.org.

References

Y. Netzer, T. Wang, A. Coates, A. Bissacco, B. Wu and A. Y. Ng. “Reading Digits in Natural Images with Unsupervised Feature Learning”. NIPS Workshop on Deep Learning and Unsupervised Feature Learning, 2011.

--

--

Wimukthi Madhusanka

Fourth year undergraduate in university of Moratuwa, Faculty of Business.