Overview of CH 2 and the Drivetrain Approach

Much like chapter 1, chapter 2 mixes a discussion on theoretical concepts with practical examples. The majority of the chapter is devoted to explaining how to build an image recognition application using the Bing Image Search API.

However, the topic I found of most interesting was the Drivetrain Approach, which is essentially a way to design and implement applications using Machine Learning.

In summary, it consists of four elements:

  1. Defined Objective

  2. Levers

  3. Data

  4. Model

Howard and Gugger describe the ultimate goal for data driven apps as actionable outcomes i.e. such apps should do more than just producing more data but enable us to actively do something. This is achieved by first properly defining an objective and then considering the levers, which are the actions necessary to achieve the objective. After that, you should consider what data is needed to evaluate the result and finally what form the programme, or model, should take so that it can make predictions.

To discuss this method more fully, by looking at it from a slightly different angle, I will discuss how it might apply to a small programme I wrote myself.

link to video lecture

A programme to generate text

Howard and Gugger repeatedly emphasise that is important to create projects while studying their text - particularly projects related to work.

One project I’d been thinking about for some time was a programme to generate text for Amazon product descriptions. These texts are generally required by online marketplaces, such as Amazon, and customers do find it useful to understand the technical aspects of the products they’re buying, but is this something worth spending a lot of human hours working on? While customers do need to know the technical aspects, this could all be easily put in a table. Also while it is probably good having more detailed descriptions to highlight the selling points for more high value items, do customers really read these carefully?

As I mentioned this was a topic I’d been thinking about for some time so, before considering if this could be a good candidate for a Machine Learning approach, I will first discuss an alternative way to approach this task. I think discussing how this differs from ML will be illuminating regarding the difference between ML and traditional programming styles.

Generating texts based on Markov analysis

As mentioned a few times before, I’m a big fan of Allen Downey’s Think Python book. One of its exercises is to write a programme that can imitate any author’s style via Markov Analysis. Markov Analysis in this context refers to the probability that one word (or a longer series) follows another in a sequence. Essentially a programme using this concept would process a sample of the author’s writing, determine the probability of one word (or a series of words) following another, and then outputs a new text after randomly choosing a starting word.

Based on this case study then I wrote a solution that would generate a product description for Amazon based on several existing GPU product descriptions. You can see my solution here and can try it out with any text in txt or csv formats.

Returning to the main discussion, what I think is interesting about this example is that it doesn’t actually use any concepts from Machine Learning but produces quite readable, albeit inaccurate, texts. In fact, as I assembled product descriptions to write this programme, I started to wonder if sellers on Amazon are already using such programmes to generate their texts.

You can read up the full description for how this programme works on my GitHub but in brief, it essentially creates a dictionary that maps suffixes to prefixes and using Python’s random module, it’s able to choose a suffix to follow a prefix based on the frequency that a suffix follows that a prefix in the sample of the author’s work. All that’s required to start the process off is for the programme to randomly select an initial word from the author’s existing text.

What I like about this programme is that it’s incredibly simple, it can create a plausible reproduction of an author’s writing by just relying on a simple assumption that a particular word is likely to follow another due to an author’s style. In fact, reversing the input-output of this programme could be extremely useful i.e. using a programme to identify an author of an unknown source by examing the frequency that certain words follow others.

However, this is supposed to be a blog post about Machine Learning so I will now consider (although not code up, well not yet anyway) how this basic programme could be enhanced by ML.

Applying ML to this Programme

Following the Drivetrain Approach, for this programme to implement ML, I would set an objective of something like “Generate Amazon product descriptions that customers find engaging”. This could be further refined to something like “Generate engaging texts that prompt customers to make purchases” or “Generate engaging texts that prompt customers to make purchases that they might not do otherwise”. I think the latter two would extremely difficult to evaluate and might require thinking through other aspects of the online shopping experience, such as product recommendations.

Sticking with the first point for now then. A simple lever to check that our audience finds the content engaging would be to include a like button on the page. Texts that the audience find engaging i.e. that get a lot of likes, could be compared with texts that don’t rank so well. By focusing on the differences between such texts a model could be trained to understand what is and is not engaging and generate more engaging texts accordingly.

The data we would then track is if customers like these texts more than previously generated ones.

It’s when we begin to consider the form of the model that problems begin to develop. The main one I anticipate is that what audiences find engaging and what is factually accurate doesn’t necessarily match. This could be a real problem for something like GPUs or CPUs were describing the specs accurately (so the customers know what they’re buying) is extremely important.

An alternate approach then might be to have professional content writers edit the texts generated by the programme. In this case, the data would be the changes the writers make to the model’s generated texts. These texts could be compared and overtime the model could be trained to make both accurate and engaging texts then. Eventually, the customer likes data could be also integrated to make even more intricate comparisons.

Having said that, I suspect this approach might not work in the end since a team of writers, or perhaps just a single writer, may end up making too many minute adjustments for the model to ever generate fully accurate and engaging texts. But if the model could at least provide writers with a good starting point, which they could edit and adjust, that may be enough to save considerable time.

Implications

Another recommendation from Howard and Gugger is to consider what would happen if your idea worked ‘really well’. In this example, I feel that we have to be quite precise in describing the implications.

For instance, if I was to consider Howard and Gugger’s recommendation at face value, we would have a model that could create truly accurate and engaging texts and this, in turn, would remove the need to have people writing such texts.

This is a stark conclusion, however, I think it is highly unlikely that such a programme could be developed at least in the near term. I believe this to be the case first because, as mentioned above, getting a programme to produce both engaging and accurate content would require so many subtle changes that they become impossible to model.

However, if we take a step back though and consider the possibility of creating a model that generates either accurate or compelling product descriptions I do think that is very possible.

For instance, some descriptions will likely be very repetitive, with only minute adjustments necessary to reflect different variants of the product e.g. length, memory capacity, power requirements. Such texts could easily be generated by a simple programme using templates and spreadsheets of the different specifications. Other models could create more interesting and engaging texts. In this situation, there would still be a need for human intervention to check that the texts the model is producing are accurate and logical.

Even though here the model here does not meet all our requirements, it would free up considerable time which would enable human writers to focus on more complex and intricate task related to promoting products such as blogging, video commentary or community building directly with customers or influencers.