Reading Notes - Ch. 7 - Fastbook

You should aim to have an iteration speed of no more than a couple of minutes

The more experiments you can do, the better!

If it takes more than a couple of minutes:

cut down your dataset (e.g. Imagenette from ImageNet)
simplify your model

Normalization

Normalized data has mean of 0 and standardard deviation of 1.

Using the Normalize transform in batch_tfms when creating a data block will apply the transform on the whole mini-batch at once.

Fastai provides the mean and standard deviation of ImageNet as imagenet_stats.

batch_tfms=Normalize.from_stats(*imagenet_stats)

Normalization is especially important when using pretrained models.

Progressive resizing

Gradually using larger and larger images as you train

This can be considered a form of data augmentation.

Starting with small images helps complete training much faster.

Ending with large images makes the final accuracy much higher.

The process of increasing size and training more epochs can be repeated as many times.

Normalization and Progressive resizing for Cassava Leaf Disease Classification

import fastai
from fastai.vision.all import *

print(fastai.__version__)

2.5.0

import pandas as pd

df = pd.read_csv('wandb_cassava_train_val_split.csv')
df.head()

	image_id	label	is_val
0	1000015157.jpg	0	False
1	1000201771.jpg	3	False
2	100042118.jpg	1	False
3	1000723321.jpg	1	False
4	1000812911.jpg	3	False

We use the is_val column in the dataframe to split between our training and validation set.

def splitter(df):
    train = df.index[~df['is_val']].tolist()
    valid = df.index[df['is_val']].tolist()
    return train, valid

The image files are located inside the train_images folder.

path = Path()
def get_x(row):
    return path/'train_images'/row['image_id']

def get_y(row):
    return row['label']

To make progressive resizing easier, we define a get_dls function which will provide dataloaders for the image size, presize and batch size that we specify.

def get_dls(size=224, presize=460, bs=64):
    db = DataBlock(
                  blocks = (ImageBlock, CategoryBlock),
                  get_x = get_x,
                  get_y = get_y,
                  splitter = splitter,
                  item_tfms = [Resize(presize)],
                  batch_tfms = [*aug_transforms(size=size), Normalize.from_stats(*imagenet_stats)]
                )
    return db.dataloaders(df, bs=bs)

We start with a batch size of 64 (since that's all my GPU allows) with images first presized to 460 and then resized after augmentation to 224.

dls = get_dls(size=224, presize=460, bs=64)
dls.show_batch()

We start with a pretrained resnet50 model.

learn = cnn_learner(dls, resnet50, metrics=accuracy)

We use the LR finder to find the best learning rate.

learn.lr_find()

SuggestedLRs(valley=0.0020892962347716093)

With the pretrained layers frozen, we train the new head of the model for 15 epochs.

learn.fit_one_cycle(15, 2e-3)

epoch	train_loss	valid_loss	accuracy	time
0	1.334056	0.934803	0.714887	03:13
1	0.831297	0.691847	0.774480	03:12
2	0.635915	0.549400	0.813508	03:14
3	0.549636	0.516897	0.831269	03:14
4	0.498941	0.508619	0.827530	03:13
5	0.484030	0.480811	0.831269	03:14
6	0.456951	0.486462	0.832905	03:14
7	0.414097	0.470169	0.843421	03:14
8	0.402916	0.436164	0.849965	03:14
9	0.384304	0.428638	0.858144	03:13
10	0.352178	0.434263	0.857911	03:14
11	0.334097	0.422378	0.857911	03:14
12	0.321164	0.425513	0.860481	03:13
13	0.320289	0.423510	0.863286	03:13
14	0.299428	0.422989	0.864922	03:13

We can now unfreeze all layers and train the model with larger images.

learn.unfreeze()

But first, we need to find the best learning rate again since we have more parameters to train.

learn.lr_find()

SuggestedLRs(valley=1.4454397387453355e-05)

We use the get_dls method to now create dataloaders for images presized to 560 and then augmented to size 448 (twice the previous size).

learn.dls = get_dls(size=448, presize=560, bs=32)

We train with discriminative learning rates so that the parameters in earlier layers are not changed as much as those in later layers (especially the newly attached head).

learn.fit_one_cycle(5, lr_max=slice(1e-6,1e-4))

80.00% [4/5 55:31<13:52]

epoch	train_loss	valid_loss	accuracy	time
0	0.381326	0.376921	0.876373	13:58
1	0.342633	0.372094	0.877074	13:50
2	0.338595	0.369840	0.877308	13:51
3	0.304467	0.356885	0.881281	13:51

20.22% [108/534 02:38<10:26 0.3118]

learn.validate()

(#2) [0.35694417357444763,0.8815143704414368]

Using normalization and progressive resizing helps surpass the baseline accuracy of 0.879645.

Test Time Augmentation

Just like we augment images during training, we can augment images during validation (create multiple versions of each image with different augmentations) and consider the average or maximum of the predictions for each image.

This doesn't increase training time, but increases time required for validation/inference.

accuracy(*learn.tta()).item()

0.885019838809967

Test time augmentation gives us a slight increase in accuracy.

Mixup

Mixup is a data augmentation technique that can be used if you don't have much data and don't have a pretrained model that was trained on data similar to your dataset.

Mixup creates 'virtual' training examples by taking a linear combination of two random images and their respective one-hot encoded labels.

When using mixup, models need to be trained for longer.

Overfitting becomes less of a problem since we're always showing the model a random combination of two images.

Mixup can easily be used in fastai using the MixUp callback.

We train using mixup for 50 epochs and use the SaveModelCallback to save the model that obtains the highest accuracy.

learn.fit_one_cycle(50, lr_max=slice(1e-6,1e-4),
    cbs=[
        MixUp(0.2),
        SaveModelCallback(monitor='accuracy', fname="final-model-mixup")
    ]
)

epoch	train_loss	valid_loss	accuracy	time
0	0.563171	0.364460	0.875906	10:45
1	0.560961	0.365941	0.876139	10:42
2	0.545700	0.366716	0.874737	10:42
3	0.553164	0.364982	0.879411	10:43
4	0.529518	0.362979	0.879645	10:42
5	0.506076	0.360623	0.878944	10:42
6	0.530476	0.359260	0.882215	10:42
7	0.536703	0.361020	0.880580	10:42
8	0.516699	0.360086	0.881047	10:42
9	0.496925	0.361238	0.878710	10:42
10	0.507886	0.358501	0.879177	10:42
11	0.510365	0.358715	0.881281	10:43
12	0.530395	0.359560	0.878944	10:42
13	0.491285	0.361192	0.879177	10:42
14	0.522682	0.361729	0.877775	10:42
15	0.508480	0.362230	0.879177	10:42
16	0.499569	0.357935	0.881281	10:42
17	0.501868	0.360290	0.880813	10:42
18	0.473372	0.359328	0.879645	10:42
19	0.500815	0.358407	0.880813	10:42
20	0.487852	0.365024	0.876139	10:42
21	0.475025	0.359043	0.881514	10:42
22	0.497047	0.364012	0.876840	10:43
23	0.492802	0.361568	0.880112	10:43
24	0.513045	0.361230	0.880112	10:43
25	0.506585	0.360267	0.878476	10:43
26	0.477366	0.358040	0.880580	10:42
27	0.505896	0.360870	0.880346	10:42
28	0.484040	0.361964	0.879878	10:43
29	0.475160	0.359512	0.880813	10:42
30	0.467336	0.359745	0.881982	10:43
31	0.521411	0.359965	0.880580	10:43
32	0.481626	0.360192	0.883618	10:43
33	0.469224	0.362771	0.880346	10:42
34	0.493307	0.360698	0.879411	10:42
35	0.499327	0.360638	0.881514	10:42
36	0.498453	0.361014	0.879878	10:43
37	0.480062	0.360784	0.880346	10:43
38	0.480351	0.360703	0.880580	10:43
39	0.483412	0.359774	0.881748	10:42
40	0.464054	0.359994	0.881281	10:43
41	0.477166	0.361671	0.881047	10:42
42	0.456126	0.362255	0.878243	10:42
43	0.506936	0.360502	0.879878	10:43
44	0.477292	0.361211	0.881748	10:42
45	0.470658	0.360269	0.882215	10:43
46	0.475890	0.362398	0.882449	10:43
47	0.487197	0.361504	0.880346	10:42
48	0.484013	0.361592	0.880346	10:43
49	0.478416	0.363075	0.879645	10:43

Better model found at epoch 0 with accuracy value: 0.8759055733680725.
Better model found at epoch 1 with accuracy value: 0.8761392831802368.
Better model found at epoch 3 with accuracy value: 0.8794111013412476.
Better model found at epoch 4 with accuracy value: 0.8796447515487671.
Better model found at epoch 6 with accuracy value: 0.8822154998779297.
Better model found at epoch 32 with accuracy value: 0.883617639541626.

We can now load the model that obtained the highest accuracy and check its performance using test time augmentation.

learn.load('final-model-mixup')
learn.validate()

(#2) [0.3601917326450348,0.883617639541626]

accuracy(*learn.tta(use_max=True)).item()

0.8847861886024475

Unfortunately, mixup does not show any improvements in the accuracy for this model.

Label Smoothing

Label smoothing replaces 1s in the target with a value slightly less than 1, and 0s with a value slightly higher than 0.

The reasoning behind this is that using labels of 1/0 encourages overfitting as models keep trying to push outputs for the correct class higher and higher. This can especially be a problem if the data is not labelled correctly.

Using label smoothing encourages model to be less confident, resulting in a robust model even when there is mislabelled data. The model can generalize better at inference.

Label smoothing can be used in fastai using the LabelSmoothingCrossEntropy loss function.