Convolutions in Fastai
Chapter 13 of Fastbook dives into the details of convolutional neural networks and how the convolution operation that they're built on works.
A guide to convolution arithmetic for deep learning has excellent low-level details of how different types of convolutions work.
In this post, I explore how these different types of convolution operations can be applied with fastai.
Convolution operations
The behavior of a convolutional layer depends on the following properties:
- kernel size
- stride
- padding
In addition to these, we'll also look at two more properties: transpose
and dilation
.
Let us create a 5x5
tensor with random values to use as an input for the convolutional layers we will create.
from fastai.vision.all import *
t = torch.rand(5, 5).unsqueeze(0).unsqueeze(0)
t.shape
torch.Size([1, 1, 5, 5])
Convolution layers expect the first dimension to be the batch size, and the second to be the number of channels. We use the unsqueeze
function to add a dimension of 1
for these.
No padding and strides of size 1
Image from https://github.com/vdumoulin/conv_arithmetic
conv1 = ConvLayer(1, 3, ks=3, stride=1, padding=0)
- The first parameter specifies the number of channels in the input.
- The second parameter specifies how many filters we want to create in this layer. This will be equal to the number of channels in the output since each channel is created by one filter.
- The
ks
parameter specifies the size of the filters we want -4x4
. We need to provide just one integer because filters are always square. The default value of this parameter is 3. - The
stride
parameter specifies the size of the stride. The default value of this parameter is 1. - The
padding
parameter specifies how much padding we apply around the input.
res = conv1(t)
res.shape
torch.Size([1, 3, 3, 3])
The batch size in the output remains the same as that in the input.
The number of channels has increased to 3 since we created a convolutional layer with 3 filters.
With no padding and strides of size 1, the dimensions of the output are input size - kernel size + 1
, which in our case is 5 - 4 + 1 = 2
.
Zero padding and strides of size 1
Image from https://github.com/vdumoulin/conv_arithmetic
conv2 = ConvLayer(1, 3)
We don't specify the stride parameter since its default value is 1.
We haven't specified the ks
parameter so the default value of 3 is used.
res = conv2(t)
res.shape
torch.Size([1, 3, 5, 5])
fastai automatically applies an appropriate amount of padding by default if we are not using transposed convolutions (more about that in a bit) to ensure that our input and output dimensions are equal.
This kind of padding is also commonly known as half padding and same padding.
There is another type of padding called full padding which allows us to increase the dimensions of the output. Full padding can be achieved by using regular zero padding of size k-1
(where k
is the kernel size).
Image from https://github.com/vdumoulin/conv_arithmetic
conv3 = ConvLayer(1, 3, padding=(2,2))
res = conv3(t)
res.shape
torch.Size([1, 3, 7, 7])
With full padding, the dimensions of the output are input size + kernel size - 1
, which in this case is 5 + 3 - 1 = 7
.
Strided convolutions
By specifying a value for the stride
parameter greater than 1, we can perform strided convolutions.
Strided convolutions are useful for descreasing the dimensions of the output.
Image from https://github.com/vdumoulin/conv_arithmetic
conv4 = ConvLayer(1, 3, stride=2)
res = conv4(t)
res.shape
torch.Size([1, 3, 3, 3])
We can also have strided convolutions with no padding applied to the input.
Image from https://github.com/vdumoulin/conv_arithmetic
conv5 = ConvLayer(1, 3, stride=2, padding=0)
res = conv5(t)
res.shape
torch.Size([1, 3, 2, 2])
Transposed Convolutions
Also known as: fractionally strided convolutions, deconvolutions.
Transposed convolutions allow us to increase the dimension of the output compared to the input.
Image from https://github.com/vdumoulin/conv_arithmetic
conv6 = ConvLayer(1, 3, transpose=True)
We can use transposed convolutions in fastai by setting transpose=True
.
res = conv6(t)
res.shape
torch.Size([1, 3, 7, 7])
We can also use a stride bigger than 1. To visualize this operation, imagine adding zero padding between all values in the input.
Image from https://github.com/vdumoulin/conv_arithmetic
conv7 = ConvLayer(1, 3, transpose=True, stride=2)
res = conv7(t)
res.shape
torch.Size([1, 3, 11, 11])
Dilated Convolutions
Regular convolution operations work on elements in the input that are next to each other. Dilated convolutions skip elements in the input.
Image from https://github.com/vdumoulin/conv_arithmetic
conv8 = ConvLayer(1, 3, dilation=2)
The number of elements to skip in dilated convolutions is controlled by the dilation
parameter.
A value of dilation=1
corresponds to normal convolutions.
res = conv8(t)
res.shape
torch.Size([1, 3, 3, 3])