Reading Notes - Ch. 7 - Fastbook
You should aim to have an iteration speed of no more than a couple of minutes
The more experiments you can do, the better!
If it takes more than a couple of minutes:
- cut down your dataset (e.g. Imagenette from ImageNet)
- simplify your model
Normalization
Normalized data has mean of 0 and standardard deviation of 1.
Using the Normalize
transform in batch_tfms
when creating a data block will apply the transform on the whole mini-batch at once.
Fastai provides the mean and standard deviation of ImageNet as imagenet_stats
.
batch_tfms=Normalize.from_stats(*imagenet_stats)
Normalization is especially important when using pretrained models.
Progressive resizing
Gradually using larger and larger images as you train
This can be considered a form of data augmentation.
Starting with small images helps complete training much faster.
Ending with large images makes the final accuracy much higher.
The process of increasing size and training more epochs can be repeated as many times.
Normalization and Progressive resizing for Cassava Leaf Disease Classification
import fastai
from fastai.vision.all import *
print(fastai.__version__)
2.5.0
import pandas as pd
df = pd.read_csv('wandb_cassava_train_val_split.csv')
df.head()
image_id | label | is_val | |
---|---|---|---|
0 | 1000015157.jpg | 0 | False |
1 | 1000201771.jpg | 3 | False |
2 | 100042118.jpg | 1 | False |
3 | 1000723321.jpg | 1 | False |
4 | 1000812911.jpg | 3 | False |
We use the is_val
column in the dataframe to split between our training and validation set.
def splitter(df):
train = df.index[~df['is_val']].tolist()
valid = df.index[df['is_val']].tolist()
return train, valid
The image files are located inside the train_images
folder.
path = Path()
def get_x(row):
return path/'train_images'/row['image_id']
def get_y(row):
return row['label']
To make progressive resizing easier, we define a get_dls
function which will provide dataloaders for the image size, presize and batch size that we specify.
def get_dls(size=224, presize=460, bs=64):
db = DataBlock(
blocks = (ImageBlock, CategoryBlock),
get_x = get_x,
get_y = get_y,
splitter = splitter,
item_tfms = [Resize(presize)],
batch_tfms = [*aug_transforms(size=size), Normalize.from_stats(*imagenet_stats)]
)
return db.dataloaders(df, bs=bs)
We start with a batch size of 64 (since that's all my GPU allows) with images first presized to 460
and then resized after augmentation to 224
.
dls = get_dls(size=224, presize=460, bs=64)
dls.show_batch()
We start with a pretrained resnet50
model.
learn = cnn_learner(dls, resnet50, metrics=accuracy)
We use the LR finder to find the best learning rate.
learn.lr_find()
SuggestedLRs(valley=0.0020892962347716093)
With the pretrained layers frozen, we train the new head of the model for 15 epochs.
learn.fit_one_cycle(15, 2e-3)
epoch | train_loss | valid_loss | accuracy | time |
---|---|---|---|---|
0 | 1.334056 | 0.934803 | 0.714887 | 03:13 |
1 | 0.831297 | 0.691847 | 0.774480 | 03:12 |
2 | 0.635915 | 0.549400 | 0.813508 | 03:14 |
3 | 0.549636 | 0.516897 | 0.831269 | 03:14 |
4 | 0.498941 | 0.508619 | 0.827530 | 03:13 |
5 | 0.484030 | 0.480811 | 0.831269 | 03:14 |
6 | 0.456951 | 0.486462 | 0.832905 | 03:14 |
7 | 0.414097 | 0.470169 | 0.843421 | 03:14 |
8 | 0.402916 | 0.436164 | 0.849965 | 03:14 |
9 | 0.384304 | 0.428638 | 0.858144 | 03:13 |
10 | 0.352178 | 0.434263 | 0.857911 | 03:14 |
11 | 0.334097 | 0.422378 | 0.857911 | 03:14 |
12 | 0.321164 | 0.425513 | 0.860481 | 03:13 |
13 | 0.320289 | 0.423510 | 0.863286 | 03:13 |
14 | 0.299428 | 0.422989 | 0.864922 | 03:13 |
We can now unfreeze all layers and train the model with larger images.
learn.unfreeze()
But first, we need to find the best learning rate again since we have more parameters to train.
learn.lr_find()
SuggestedLRs(valley=1.4454397387453355e-05)
We use the get_dls
method to now create dataloaders for images presized to 560 and then augmented to size 448 (twice the previous size).
learn.dls = get_dls(size=448, presize=560, bs=32)
We train with discriminative learning rates so that the parameters in earlier layers are not changed as much as those in later layers (especially the newly attached head).
learn.fit_one_cycle(5, lr_max=slice(1e-6,1e-4))
epoch | train_loss | valid_loss | accuracy | time |
---|---|---|---|---|
0 | 0.381326 | 0.376921 | 0.876373 | 13:58 |
1 | 0.342633 | 0.372094 | 0.877074 | 13:50 |
2 | 0.338595 | 0.369840 | 0.877308 | 13:51 |
3 | 0.304467 | 0.356885 | 0.881281 | 13:51 |
learn.validate()
(#2) [0.35694417357444763,0.8815143704414368]
Using normalization and progressive resizing helps surpass the baseline accuracy of 0.879645.
Test Time Augmentation
Just like we augment images during training, we can augment images during validation (create multiple versions of each image with different augmentations) and consider the average or maximum of the predictions for each image.
This doesn't increase training time, but increases time required for validation/inference.
accuracy(*learn.tta()).item()
0.885019838809967
Test time augmentation gives us a slight increase in accuracy.
Mixup
Mixup is a data augmentation technique that can be used if you don't have much data and don't have a pretrained model that was trained on data similar to your dataset.
Mixup creates 'virtual' training examples by taking a linear combination of two random images and their respective one-hot encoded labels.
When using mixup, models need to be trained for longer.
Overfitting becomes less of a problem since we're always showing the model a random combination of two images.
Mixup can easily be used in fastai using the MixUp
callback.
We train using mixup for 50 epochs and use the SaveModelCallback
to save the model that obtains the highest accuracy.
learn.fit_one_cycle(50, lr_max=slice(1e-6,1e-4),
cbs=[
MixUp(0.2),
SaveModelCallback(monitor='accuracy', fname="final-model-mixup")
]
)
epoch | train_loss | valid_loss | accuracy | time |
---|---|---|---|---|
0 | 0.563171 | 0.364460 | 0.875906 | 10:45 |
1 | 0.560961 | 0.365941 | 0.876139 | 10:42 |
2 | 0.545700 | 0.366716 | 0.874737 | 10:42 |
3 | 0.553164 | 0.364982 | 0.879411 | 10:43 |
4 | 0.529518 | 0.362979 | 0.879645 | 10:42 |
5 | 0.506076 | 0.360623 | 0.878944 | 10:42 |
6 | 0.530476 | 0.359260 | 0.882215 | 10:42 |
7 | 0.536703 | 0.361020 | 0.880580 | 10:42 |
8 | 0.516699 | 0.360086 | 0.881047 | 10:42 |
9 | 0.496925 | 0.361238 | 0.878710 | 10:42 |
10 | 0.507886 | 0.358501 | 0.879177 | 10:42 |
11 | 0.510365 | 0.358715 | 0.881281 | 10:43 |
12 | 0.530395 | 0.359560 | 0.878944 | 10:42 |
13 | 0.491285 | 0.361192 | 0.879177 | 10:42 |
14 | 0.522682 | 0.361729 | 0.877775 | 10:42 |
15 | 0.508480 | 0.362230 | 0.879177 | 10:42 |
16 | 0.499569 | 0.357935 | 0.881281 | 10:42 |
17 | 0.501868 | 0.360290 | 0.880813 | 10:42 |
18 | 0.473372 | 0.359328 | 0.879645 | 10:42 |
19 | 0.500815 | 0.358407 | 0.880813 | 10:42 |
20 | 0.487852 | 0.365024 | 0.876139 | 10:42 |
21 | 0.475025 | 0.359043 | 0.881514 | 10:42 |
22 | 0.497047 | 0.364012 | 0.876840 | 10:43 |
23 | 0.492802 | 0.361568 | 0.880112 | 10:43 |
24 | 0.513045 | 0.361230 | 0.880112 | 10:43 |
25 | 0.506585 | 0.360267 | 0.878476 | 10:43 |
26 | 0.477366 | 0.358040 | 0.880580 | 10:42 |
27 | 0.505896 | 0.360870 | 0.880346 | 10:42 |
28 | 0.484040 | 0.361964 | 0.879878 | 10:43 |
29 | 0.475160 | 0.359512 | 0.880813 | 10:42 |
30 | 0.467336 | 0.359745 | 0.881982 | 10:43 |
31 | 0.521411 | 0.359965 | 0.880580 | 10:43 |
32 | 0.481626 | 0.360192 | 0.883618 | 10:43 |
33 | 0.469224 | 0.362771 | 0.880346 | 10:42 |
34 | 0.493307 | 0.360698 | 0.879411 | 10:42 |
35 | 0.499327 | 0.360638 | 0.881514 | 10:42 |
36 | 0.498453 | 0.361014 | 0.879878 | 10:43 |
37 | 0.480062 | 0.360784 | 0.880346 | 10:43 |
38 | 0.480351 | 0.360703 | 0.880580 | 10:43 |
39 | 0.483412 | 0.359774 | 0.881748 | 10:42 |
40 | 0.464054 | 0.359994 | 0.881281 | 10:43 |
41 | 0.477166 | 0.361671 | 0.881047 | 10:42 |
42 | 0.456126 | 0.362255 | 0.878243 | 10:42 |
43 | 0.506936 | 0.360502 | 0.879878 | 10:43 |
44 | 0.477292 | 0.361211 | 0.881748 | 10:42 |
45 | 0.470658 | 0.360269 | 0.882215 | 10:43 |
46 | 0.475890 | 0.362398 | 0.882449 | 10:43 |
47 | 0.487197 | 0.361504 | 0.880346 | 10:42 |
48 | 0.484013 | 0.361592 | 0.880346 | 10:43 |
49 | 0.478416 | 0.363075 | 0.879645 | 10:43 |
Better model found at epoch 0 with accuracy value: 0.8759055733680725.
Better model found at epoch 1 with accuracy value: 0.8761392831802368.
Better model found at epoch 3 with accuracy value: 0.8794111013412476.
Better model found at epoch 4 with accuracy value: 0.8796447515487671.
Better model found at epoch 6 with accuracy value: 0.8822154998779297.
Better model found at epoch 32 with accuracy value: 0.883617639541626.
We can now load the model that obtained the highest accuracy and check its performance using test time augmentation.
learn.load('final-model-mixup')
learn.validate()
(#2) [0.3601917326450348,0.883617639541626]
accuracy(*learn.tta(use_max=True)).item()
0.8847861886024475
Unfortunately, mixup does not show any improvements in the accuracy for this model.
Label Smoothing
Label smoothing replaces 1s in the target with a value slightly less than 1, and 0s with a value slightly higher than 0.
The reasoning behind this is that using labels of 1/0 encourages overfitting as models keep trying to push outputs for the correct class higher and higher. This can especially be a problem if the data is not labelled correctly.
Using label smoothing encourages model to be less confident, resulting in a robust model even when there is mislabelled data. The model can generalize better at inference.
Label smoothing can be used in fastai using the LabelSmoothingCrossEntropy
loss function.