Intro

After taking a break from this project, I decided to round out the year by making another attempt. Of course, there is more that I could do to refine this programme, more on that after I discuss the solution, but overall I feel that I have completed the exercise I set myself.

I’ll remind you of the problem and go over the solution in detail below, in summary, though the main features I used to complete the exercise were using NumPy to create an array with a boolean pattern and a lambda function to isolate elements in the data that match this pattern. So if you want to see an application of these features read on…

Summary of progress to date

As a recap, this project is based on an exercise from Chapter 4 of Deep Learning for Coders. Chapter 4 consists of a walkthrough on how to create a programme that can recognise handwritten for 3 or 7 based on the MNIST database, which contains thousands of handwritten images.

The exercise was to implement a solution for digits 0-9. I felt this was too much of a challenge for me given my coding and math skills so I decided to re-implement a simpler task from the chapter, that is creating a composite image for a digit based on the mean of all images in the dataset.

My initial attempt fell short of the mark. Later I did find a solution based on a PyTorch module, however, I was never entirely happy with this solution since the point of the exercise was to develop a more fundamental understanding of how a programme could transform data to create a composite image.

So what I have now is a notebook that can accept a digit from 0-9 and create a composite image of that digit based on the averages from the MNIST database.

Why this implementation work

Before explaining how my solution works, I will briefly touch on how the MNIST database is structured. As mentioned, it is a collection of thousands of handwritten digits 0-9. Each image is in greyscale and made up of a 28x28 grid of pixels in various hues.

As with my prior attempts, I am not processing the data directly, instead, I am using a CSV that specifies the integer represented in the sample and a numeric version of the pixel hues associated with each integer in a 28x28 grid. The CSV file has the following structure, as you can see each row represents an integer and all the pixel values from position 1x1 to 28x28.

image of spreadsheet

Visually each digit in the data source has an appearance like this:

image of sample digit

The broad idea in this solution is much the same as in my initial attempt. That is to say, isolate rows for each digit 0-9 and then take the average pixel value from each image from position 1x1 to 28x28. Visually the process would look something like this, with red dots representing the values I want to isolate:

diagram showing process of isolation and combination 1

diagram showing process of isolation and combination 2

With my initial attempt, I tried to create a dictionary for the integer labels and set all the arrays of the pixel values as keys. I could not find a way to successfully append the arrays to the dictionaries so I decided to take a different approach.

After reviewing some chapters in Wes McKinney’s Python for Data Analysis I noticed that it is possible to specify a pattern as a boolean in NumPy arrays. My thought then was to create an array of the labels that specify a digit, say 5, as the true value and the rest as false. Since the structure in the CSV file consists of the label followed by all the pixel values associated with this digit, it should be a matter then of applying this boolean pattern across each column. Visually this would look like, with the pattern set in the first column then repeating across subsequent columns:

image of interating with boolean

Some further manipulation was required before I could take the average. As it currently stands the arrays I’m using might be thought of as vertical slices when in fact I want horizontal slices to compare each position of the pixel value across the arrays. Fortunately, Numpy has a transpose function that makes this trivial.

image of switching orientation

The last step is to just use the Numpy’s mean function, specifying an axis of 1 so that a new array is created based on the mean of each value across the array. For instance, here NumPy would select the zero index in the arrays and take the mean of them.

illustration of taking the mean

The ultimate result is an image like this in the case of 5:

image of composite 5

As one might expect from a composite of thousands of handwritten 5s, it is rather blurry. You can try out this yourself, just download my notebook, the CSV file data source and make sure they’re in the same directory.

Further thoughts

My greatest learning from this solution is the capability of NumPy. I am still using native Python’s list function in part, at some point, I would like to reimplement this project using only arrays.
The purpose of the exercise as outlined in Deep Learning for Coders was to create a programme capable of recognising handwritten digits. I’ve completed this task with this exercise. I may then at some point return to this and attempt to properly implement a deep learning programme.