Comment on Chapter 1

Note on style - this initial blog post will be divided into three sections. In the first I will review what I understand to be the key points from the first chapter of the fastai book. In the second I will mention how, following the authors’ instructions, I setup a ‘workspace’ for DL. And finally I’ll discuss some open questions I have.

Lesson 1 - Video Lecture

Key points from Chapter 1

With this chapter Howard and Gugger provide a useful overview of the subject of Deep Learning, some extremely useful tips on how to setup a development environment for DL coding and analysis (more on that below) and provide a summary of their approach to teaching DL.

The authors make great efforts to make the subject approachable, emphasising that neither advanced qualifications nor high level coding ability are necessary to implement Deep Learning techniques. Indeed, they say right from the beginning that they intend to give readers a sense of ‘the complete game’. I can say from experience of other courses or books in this area that this is a refreshingly different approach.

They also provide a good overview of the history of the discipline - from McCulloch and Pitts’ notion of an artificial neuron to more contemporary concepts such as Parallel Distributed Processes (PDP).

For me the most interesting topic is the difference between Machine Learning and more traditional forms of programming. From what I understand, traditional programming is based on the notion that inputs from the user will go through a function (defined by the programmer) and specific output(s) will be the result.

This approach works well when, for example, we want to automate repetitive tasks. However it is not suitable for more complex or conceptual tasks, such as recognising the difference between a cat or dog, imitating a particular author’s writing style or making a good movie recommendation.

The reason traditional approaches don’t work here is that a programmer would need to specify every single aspect relevant to the task. A far better approach then is one were the machine itself can ‘learn’ i.e. the programme (or model) has a process inherent to itself that enables it to output a result that is intelligible and accurate to humans.

To make this clearer, I briefly touch on the solution that ultimately caught on, which was conceived by Arthur Samuel and called Machine Learning. ML essentially involves taking data, weighting it in some way (through labelling the data for instance) and training the programme (now referred to as a model) to recognise patterns within that data. This process will repeat, with the weights adjusting through each cycle, until the programmer considers the programme sufficiently accurate.

Interestingly once the model is trained, it can be used in the manner of a traditional programme. That means novel data can be introduced, without weighting, and the model will then make predictions - again, for example, whether a picture shows a cat or dog.

While this training process is based on repetition, training to frequently on the same data set will actually decrease accuracy. This is a situation known as over-fitting where the model makes predications to close to its training data.

My summary here may sound very theoretical but Howard and Gugger present their account in quite a practical fashion with lots of coding and real life examples.

Setting up a DL coding environment

One surprising element of this course is the great advice that Howard and Gugger offer on how to set up your working environment for DL projects. In fact, this may have been the feature that ultimately persuaded me to work through the entire course.

The best thing to do is check their site for the details link but just to give a brief summary of what they describe: Haward and Gugger have developed a framework called fastai that enables users to access DL techniques in PyTorch in a much more straightforward manner than is possible than through coding for PyTorch directly. (I’ve not personally used PyTorch but this is my understanding of what fastai does.) One consequence of this is that a GPU is required for fastai to function.

I do not have a GPU in my rather cheap Lenovo laptop but not to worry as the authors provide a very useful, and thorough, guide on how to set up a cloud system to run the course exercises, which are written in standard Jupyter Notebooks (although Google Colab versions are also available).

I would emphasise that this is in no way an intimidating or difficult process, in fact I’m almost stunned by how easy it is to set up a cloud computer.

Personally I’m using a free service called Gradient which is offered by a company known as Paperspace. While there are some restrictions (such as the cloud system shutting down automatically after 6 hours) Paperspace have integrated fastai’s Jupyter Notebooks into their service, which means you can jump right in once you’ve set it up.

The only complaint I have, and it’s very minor, is that for Paperspace do give you the option of running the course notebooks on systems without GPUs, which makes no sense since the notebooks won’t work without GPUs. Also a few times, when I’ve started the virtual machine, it seems to have automatically selected the CPU system.

Anyway I certainly can’t complain, this is a great service and I think Paperspace should be given some credit for making DL this accessible for free to basically anyone.

As a side note, another extremely useful discussion on setting up your coding environment can be found in Wes McKinney’s Python for Data Analysis. McKinney’s text focuses far more on the mechanics of data analysis, personally I see it more as a reference book than something I would read cover to cover. Nevertheless the opening chapter were he discusses the basics modules required for doing data analysis in Python is something I return to every time I setup a new PC.

Open questions

One point I’m not entirely clear on is how the theoretical notion of artificial neurons can produce intelligible results. Having done some other more mathematical focused courses on this subject I believe the reason that artificial neurons, in particular through layering, can do this is to with the fact they can be thought of as strings of matrix multiplications. This allows the system to compute linear algebra, which enable the computer to predict an answer.

I’m not completely happy with my own explanation here. I do think at some point I will have to do a deep dive on the Maths but for now this is my understanding.