Convolutions in Fastai

Chapter 13 of Fastbook dives into the details of convolutional neural networks and how the convolution operation that they're built on works.

A guide to convolution arithmetic for deep learning has excellent low-level details of how different types of convolutions work.

In this post, I explore how these different types of convolution operations can be applied with fastai.

Convolution operations

The behavior of a convolutional layer depends on the following properties:

kernel size
stride
padding

In addition to these, we'll also look at two more properties: transpose and dilation.

Let us create a 5x5 tensor with random values to use as an input for the convolutional layers we will create.

from fastai.vision.all import *

t = torch.rand(5, 5).unsqueeze(0).unsqueeze(0)
t.shape

torch.Size([1, 1, 5, 5])

Convolution layers expect the first dimension to be the batch size, and the second to be the number of channels. We use the unsqueeze function to add a dimension of 1 for these.

No padding and strides of size 1

Image from https://github.com/vdumoulin/conv_arithmetic

conv1 = ConvLayer(1, 3, ks=3, stride=1, padding=0)

The first parameter specifies the number of channels in the input.
The second parameter specifies how many filters we want to create in this layer. This will be equal to the number of channels in the output since each channel is created by one filter.
The ks parameter specifies the size of the filters we want - 4x4. We need to provide just one integer because filters are always square. The default value of this parameter is 3.
The stride parameter specifies the size of the stride. The default value of this parameter is 1.
The padding parameter specifies how much padding we apply around the input.

res = conv1(t)
res.shape

torch.Size([1, 3, 3, 3])

The batch size in the output remains the same as that in the input.

The number of channels has increased to 3 since we created a convolutional layer with 3 filters.

With no padding and strides of size 1, the dimensions of the output are input size - kernel size + 1, which in our case is 5 - 4 + 1 = 2.

Zero padding and strides of size 1

Image from https://github.com/vdumoulin/conv_arithmetic

conv2 = ConvLayer(1, 3)

We don't specify the stride parameter since its default value is 1.

We haven't specified the ks parameter so the default value of 3 is used.

res = conv2(t)
res.shape

torch.Size([1, 3, 5, 5])

fastai automatically applies an appropriate amount of padding by default if we are not using transposed convolutions (more about that in a bit) to ensure that our input and output dimensions are equal.

This kind of padding is also commonly known as half padding and same padding.

There is another type of padding called full padding which allows us to increase the dimensions of the output. Full padding can be achieved by using regular zero padding of size k-1 (where k is the kernel size).

Image from https://github.com/vdumoulin/conv_arithmetic

conv3 = ConvLayer(1, 3, padding=(2,2))

res = conv3(t)
res.shape

torch.Size([1, 3, 7, 7])

With full padding, the dimensions of the output are input size + kernel size - 1, which in this case is 5 + 3 - 1 = 7.

Strided convolutions

By specifying a value for the stride parameter greater than 1, we can perform strided convolutions.

Strided convolutions are useful for descreasing the dimensions of the output.

Image from https://github.com/vdumoulin/conv_arithmetic

conv4 = ConvLayer(1, 3, stride=2)

res = conv4(t)
res.shape

torch.Size([1, 3, 3, 3])

We can also have strided convolutions with no padding applied to the input.

Image from https://github.com/vdumoulin/conv_arithmetic

conv5 = ConvLayer(1, 3, stride=2, padding=0)

res = conv5(t)
res.shape

torch.Size([1, 3, 2, 2])

Transposed Convolutions

Also known as: fractionally strided convolutions, deconvolutions.

Transposed convolutions allow us to increase the dimension of the output compared to the input.

Image from https://github.com/vdumoulin/conv_arithmetic

conv6 = ConvLayer(1, 3, transpose=True)

We can use transposed convolutions in fastai by setting transpose=True.

res = conv6(t)
res.shape

torch.Size([1, 3, 7, 7])

We can also use a stride bigger than 1. To visualize this operation, imagine adding zero padding between all values in the input.

Image from https://github.com/vdumoulin/conv_arithmetic

conv7 = ConvLayer(1, 3, transpose=True, stride=2)

res = conv7(t)
res.shape

torch.Size([1, 3, 11, 11])

Dilated Convolutions

Regular convolution operations work on elements in the input that are next to each other. Dilated convolutions skip elements in the input.

Image from https://github.com/vdumoulin/conv_arithmetic

conv8 = ConvLayer(1, 3, dilation=2)

The number of elements to skip in dilated convolutions is controlled by the dilation parameter.

A value of dilation=1 corresponds to normal convolutions.

res = conv8(t)
res.shape

torch.Size([1, 3, 3, 3])

Convolutions in Fastai

Convolution operations

No padding and strides of size 1

Zero padding and strides of size 1

Strided convolutions

Transposed Convolutions

Dilated Convolutions

References