Reading Notes - Ch. 2 - Fastbook

First, some advice...

Underestimating the constraints and overestimating the capabilities of deep learning may lead to frustratingly poor results. Conversely, overestimating the constraints and underestimating the capabilities of deep learning may mean you do not attempt a solvable problem because you talk yourself out of it.

When starting your deep learning journey, don't try to look for the perfect dataset or the perfect project. Start with a small project that is exciting to you and keep iterating, learning and improving!

...if you take this approach, you will be on your third iteration of learning and improving while the perfectionists are still in the planning stages!

Also, when iterating, ensure that you are spending enough time on all stages of the project - starting from data collection to training your model all the way to deploying your model.

By completing the project end to end, you will see where the trickiest bits are, and which bits make the biggest difference to the final result.

Humans and Machines

Deep learning models are capable of doing some truly amazing things. From classifying images and text, to generating images from captions and vice versa. However, although results of deep learning models might be context-appropriate, there is no guarantee that they will be correct. If you don't believe me, take a look at what this deep learning model has to say about recycling!

The best way to leverage deep learning is by making it a part of a process in which a human user can interact with it closely. This greatly increases the productivity of human beings, and also eliminates the risk associated with a deep learning model operating on its own in the wild.

The Drivetrain Approach

Many accurate models are of no use to anyone, and many inaccurate models are highly useful.

Consider what your objective is
Think about what actions you can take to meet that objective (levers)
Think about what data you have, or can acquire, that can help
Build a model to determine the best action to take to obtain the best results to meet your objective

The goal of the Drivetrain Approach is the produce "actionable outcomes" instead of just predictions.

Biased Data

A model is only as good as the data used to train it. The world is full of biased data so we need to be careful not to let this bias creep in when we are creating our dataset.

Be sure to think carefully about the types of data that you might expect to see in practice in your application, and check carefully to ensure that all these types are reflected in your model's source data.

DataLoaders and DataBlock in Fastai

A DataBlock provides a template to create an instance of a DataLoaders object.

bears = DataBlock(
        blocks=(ImageBlock, CategoryBlock),
        get_items=get_image_files,
        splitter=RandomSplitter(valid_pct=0.2, seed=42),
        get_y=parent_label,
        item_tfms=Resize(128))

ImageBlock: the independent variable in our dataset is going to be images.
CategoryBlock: the dependent variable, i.e. the value we are trying to predict (type of bear) is of a categorical type.
get_items=get_image_files: use the get_image_files function to read images in the file paths that will be given to the DataBlock.
splitter: specify how to split the dataset into train/validation set. We put aside 20% of data in the validation set. The seed property ensures every time we run this piece of code, we get the same train/validation split. This is important for reproducibility.
get_y=parent_label: use the name of the parent folder of each image as the label.
item_tfms=Resize(128): resize each image to 128 pixels high and wide.

DataLoaders is a thin class that stores instances of DataLoader passed to it.

dls = bears.dataloaders(path)

By default, it includes validation and training DataLoader objects.

Data Augmentation

In practice, when resizing images to a dimension of 128x128, we randomly select a part of the image of that size and crop to that part. On each epoch, a different part of the image is used. This allows the model to learn from different features in the image.

This kind of augmentation can be achieved using the RandomResizedCrop function:

item_tfms=RandomResizedCrop(128, min_scale=0.3)

The min_scale parameter specified how much of the image should be visible at minimum.

When doing this, the model recognizes each part of the image as a completely new image. As a result, data augmentation allows us to train our model with effectively more samples than we have in our dataset. The model also learns about objects better when it sees them at different positions in an image.

Other examples of data augmentation that can be used are:

Rotation
Flipping
Perspective warping
Brightness changes
Contrast changes

Using Your Model to Clean Your Data

It is possible to use the predictions a trained model makes to clean the dataset by observing:

The wrong predictions that the model makes (especially if the model is confident about the incorrect prediction)
The right predictions that the model makes with very low confidence

This approach can make it easy to find mislabelled images in the dataset.

Avoiding Disaster

Examples of other things to consider when using deep learning as a part of a bigger system:

Managing multiple versions of models
A/B testing
Canarying
Refreshing data (let dataset keep growing vs. delete some old data)
Data labeling
Monitoring
Detecting model rot

Problems that can occur in production:

Out-of-domain data: data the model sees in production that is very different from what it saw during training. There is no complete technical solution to this problem and it needs to be handled while rolling out the product.
Domain shift: data the model sees in production changes over time.

The following deployment process can be used to mitigate the risks:

Manual process: run deep learning model but don't use it to drive any actions. Instead, let human beings verify the model predictions and ensure they make sense.
Limited scope deployment: use the model with close human supervision for a limited scope (e.g. small geographical area or short period of time)
Gradual expansion: steadily increase scope of rollout. Ensure there are reporting systems in place to alert you about significant change in actions being taken compared to manual process used previously. Think about everything that could go wrong, what measure/report can reflect that problem and include it in regular reporting.

Models may change behavior of system they are part of. Using data with bias to train a model may start a positive feedback loop (e.g. predictive policing model used to concentrate policing effort may use past data about police activity that is biased).

"The same technology that's the source of so much excitement in my career is being used in law enforcement in ways that could mean that in the coming years, my son, who is 7 now, is more likely to be profiled or arrested - or worse - for no reason other than his race and where we live."

Source: https://www.nytimes.com/2017/12/02/opinion/sunday/intelligent-policing-and-my-innocent-children.html