Reading Notes - Ch. 7 - Fastbook

You should aim to have an iteration speed of no more than a couple of minutes

The more experiments you can do, the better!

If it takes more than a couple of minutes:

  • cut down your dataset (e.g. Imagenette from ImageNet)
  • simplify your model

Normalization

Normalized data has mean of 0 and standardard deviation of 1.

Using the Normalize transform in batch_tfms when creating a data block will apply the transform on the whole mini-batch at once.

Fastai provides the mean and standard deviation of ImageNet as imagenet_stats.

batch_tfms=Normalize.from_stats(*imagenet_stats)

Normalization is especially important when using pretrained models.

Progressive resizing

Gradually using larger and larger images as you train

This can be considered a form of data augmentation.

Starting with small images helps complete training much faster.

Ending with large images makes the final accuracy much higher.

The process of increasing size and training more epochs can be repeated as many times.

Normalization and Progressive resizing for Cassava Leaf Disease Classification

import fastai
from fastai.vision.all import *

print(fastai.__version__)
2.5.0
import pandas as pd

df = pd.read_csv('wandb_cassava_train_val_split.csv')
df.head()
image_id label is_val
0 1000015157.jpg 0 False
1 1000201771.jpg 3 False
2 100042118.jpg 1 False
3 1000723321.jpg 1 False
4 1000812911.jpg 3 False

We use the is_val column in the dataframe to split between our training and validation set.

def splitter(df):
    train = df.index[~df['is_val']].tolist()
    valid = df.index[df['is_val']].tolist()
    return train, valid

The image files are located inside the train_images folder.

path = Path()
def get_x(row):
    return path/'train_images'/row['image_id']

def get_y(row):
    return row['label']

To make progressive resizing easier, we define a get_dls function which will provide dataloaders for the image size, presize and batch size that we specify.

def get_dls(size=224, presize=460, bs=64):
    db = DataBlock(
                  blocks = (ImageBlock, CategoryBlock),
                  get_x = get_x,
                  get_y = get_y,
                  splitter = splitter,
                  item_tfms = [Resize(presize)],
                  batch_tfms = [*aug_transforms(size=size), Normalize.from_stats(*imagenet_stats)]
                )
    return db.dataloaders(df, bs=bs)

We start with a batch size of 64 (since that's all my GPU allows) with images first presized to 460 and then resized after augmentation to 224.

dls = get_dls(size=224, presize=460, bs=64)
dls.show_batch()

We start with a pretrained resnet50 model.

learn = cnn_learner(dls, resnet50, metrics=accuracy)

We use the LR finder to find the best learning rate.

learn.lr_find()
SuggestedLRs(valley=0.0020892962347716093)

With the pretrained layers frozen, we train the new head of the model for 15 epochs.

learn.fit_one_cycle(15, 2e-3)
epoch train_loss valid_loss accuracy time
0 1.334056 0.934803 0.714887 03:13
1 0.831297 0.691847 0.774480 03:12
2 0.635915 0.549400 0.813508 03:14
3 0.549636 0.516897 0.831269 03:14
4 0.498941 0.508619 0.827530 03:13
5 0.484030 0.480811 0.831269 03:14
6 0.456951 0.486462 0.832905 03:14
7 0.414097 0.470169 0.843421 03:14
8 0.402916 0.436164 0.849965 03:14
9 0.384304 0.428638 0.858144 03:13
10 0.352178 0.434263 0.857911 03:14
11 0.334097 0.422378 0.857911 03:14
12 0.321164 0.425513 0.860481 03:13
13 0.320289 0.423510 0.863286 03:13
14 0.299428 0.422989 0.864922 03:13

We can now unfreeze all layers and train the model with larger images.

learn.unfreeze()

But first, we need to find the best learning rate again since we have more parameters to train.

learn.lr_find()
SuggestedLRs(valley=1.4454397387453355e-05)

We use the get_dls method to now create dataloaders for images presized to 560 and then augmented to size 448 (twice the previous size).

learn.dls = get_dls(size=448, presize=560, bs=32)

We train with discriminative learning rates so that the parameters in earlier layers are not changed as much as those in later layers (especially the newly attached head).

learn.fit_one_cycle(5, lr_max=slice(1e-6,1e-4))
80.00% [4/5 55:31<13:52]
epoch train_loss valid_loss accuracy time
0 0.381326 0.376921 0.876373 13:58
1 0.342633 0.372094 0.877074 13:50
2 0.338595 0.369840 0.877308 13:51
3 0.304467 0.356885 0.881281 13:51

20.22% [108/534 02:38<10:26 0.3118]
learn.validate()
(#2) [0.35694417357444763,0.8815143704414368]

Using normalization and progressive resizing helps surpass the baseline accuracy of 0.879645.

Test Time Augmentation

Just like we augment images during training, we can augment images during validation (create multiple versions of each image with different augmentations) and consider the average or maximum of the predictions for each image.

This doesn't increase training time, but increases time required for validation/inference.

accuracy(*learn.tta()).item()
0.885019838809967

Test time augmentation gives us a slight increase in accuracy.

Mixup

Mixup is a data augmentation technique that can be used if you don't have much data and don't have a pretrained model that was trained on data similar to your dataset.

Mixup creates 'virtual' training examples by taking a linear combination of two random images and their respective one-hot encoded labels.

When using mixup, models need to be trained for longer.

Overfitting becomes less of a problem since we're always showing the model a random combination of two images.

Mixup can easily be used in fastai using the MixUp callback.

We train using mixup for 50 epochs and use the SaveModelCallback to save the model that obtains the highest accuracy.

learn.fit_one_cycle(50, lr_max=slice(1e-6,1e-4),
    cbs=[
        MixUp(0.2),
        SaveModelCallback(monitor='accuracy', fname="final-model-mixup")
    ]
)
epoch train_loss valid_loss accuracy time
0 0.563171 0.364460 0.875906 10:45
1 0.560961 0.365941 0.876139 10:42
2 0.545700 0.366716 0.874737 10:42
3 0.553164 0.364982 0.879411 10:43
4 0.529518 0.362979 0.879645 10:42
5 0.506076 0.360623 0.878944 10:42
6 0.530476 0.359260 0.882215 10:42
7 0.536703 0.361020 0.880580 10:42
8 0.516699 0.360086 0.881047 10:42
9 0.496925 0.361238 0.878710 10:42
10 0.507886 0.358501 0.879177 10:42
11 0.510365 0.358715 0.881281 10:43
12 0.530395 0.359560 0.878944 10:42
13 0.491285 0.361192 0.879177 10:42
14 0.522682 0.361729 0.877775 10:42
15 0.508480 0.362230 0.879177 10:42
16 0.499569 0.357935 0.881281 10:42
17 0.501868 0.360290 0.880813 10:42
18 0.473372 0.359328 0.879645 10:42
19 0.500815 0.358407 0.880813 10:42
20 0.487852 0.365024 0.876139 10:42
21 0.475025 0.359043 0.881514 10:42
22 0.497047 0.364012 0.876840 10:43
23 0.492802 0.361568 0.880112 10:43
24 0.513045 0.361230 0.880112 10:43
25 0.506585 0.360267 0.878476 10:43
26 0.477366 0.358040 0.880580 10:42
27 0.505896 0.360870 0.880346 10:42
28 0.484040 0.361964 0.879878 10:43
29 0.475160 0.359512 0.880813 10:42
30 0.467336 0.359745 0.881982 10:43
31 0.521411 0.359965 0.880580 10:43
32 0.481626 0.360192 0.883618 10:43
33 0.469224 0.362771 0.880346 10:42
34 0.493307 0.360698 0.879411 10:42
35 0.499327 0.360638 0.881514 10:42
36 0.498453 0.361014 0.879878 10:43
37 0.480062 0.360784 0.880346 10:43
38 0.480351 0.360703 0.880580 10:43
39 0.483412 0.359774 0.881748 10:42
40 0.464054 0.359994 0.881281 10:43
41 0.477166 0.361671 0.881047 10:42
42 0.456126 0.362255 0.878243 10:42
43 0.506936 0.360502 0.879878 10:43
44 0.477292 0.361211 0.881748 10:42
45 0.470658 0.360269 0.882215 10:43
46 0.475890 0.362398 0.882449 10:43
47 0.487197 0.361504 0.880346 10:42
48 0.484013 0.361592 0.880346 10:43
49 0.478416 0.363075 0.879645 10:43
Better model found at epoch 0 with accuracy value: 0.8759055733680725. Better model found at epoch 1 with accuracy value: 0.8761392831802368. Better model found at epoch 3 with accuracy value: 0.8794111013412476. Better model found at epoch 4 with accuracy value: 0.8796447515487671. Better model found at epoch 6 with accuracy value: 0.8822154998779297. Better model found at epoch 32 with accuracy value: 0.883617639541626.

We can now load the model that obtained the highest accuracy and check its performance using test time augmentation.

learn.load('final-model-mixup')
learn.validate()
(#2) [0.3601917326450348,0.883617639541626]
accuracy(*learn.tta(use_max=True)).item()
0.8847861886024475

Unfortunately, mixup does not show any improvements in the accuracy for this model.

Label Smoothing

Label smoothing replaces 1s in the target with a value slightly less than 1, and 0s with a value slightly higher than 0.

The reasoning behind this is that using labels of 1/0 encourages overfitting as models keep trying to push outputs for the correct class higher and higher. This can especially be a problem if the data is not labelled correctly.

Using label smoothing encourages model to be less confident, resulting in a robust model even when there is mislabelled data. The model can generalize better at inference.

Label smoothing can be used in fastai using the LabelSmoothingCrossEntropy loss function.

© 2021 Ravi Suresh Mashru. All rights reserved.