torch.Size([2, 8, 60, 60])
Lecture 10
How to generate new data of certain types
Models:
We’ll talk about autoencoders and VAEs today; GANs if we have time.
There are two ways of thinking of an image autoencoder:
Both are considered unsupervised learning tasks, since no labels are involved.
However, we do have a dataset of unlabelled images.
Idea: In order to learn to generate images, we’ll learn to reconstruct images from a low-dimensional representation.
An image autoencoder has two components:
A good, low-dimensional representation should allow us to reconstruct everything about the image.
Encoder:
Decoder:
What would the architecture of the encoder look like?
We can use downsampling to reduce the dimensionality of the data
What would the architecture of the decoder look like?
We need to be able to increase the image resolution.
We haven’t learned how to do this yet!
Used to increase the resolution of a feature map.
This is useful for:
A prediction problem where we label the content of each pixel is known as a pixel-wise prediction problem
Q: How do we generate pixel-wise predictions?
We need to be able to up-sample features, i.e. to obtain high-resolution features from low-resolution features
We need an inverse convolution – a.k.a a deconvolution or transpose convolution.
import torch
x = torch.randn(2, 8, 64, 64)
conv = torch.nn.Conv2d(in_channels=8,
out_channels=8,
kernel_size=5)
y = conv(x)
y.shape
torch.Size([2, 8, 60, 60])
torch.Size([2, 8, 64, 64])
should get the same shape back!
x = torch.randn(2, 8, 64, 64)
conv = torch.nn.Conv2d(in_channels=8,
out_channels=8,
kernel_size=5,
padding=2)
y = conv(x)
y.shape
torch.Size([2, 8, 64, 64])
torch.Size([2, 8, 64, 64])
should get the same shape back!
x = torch.randn(2, 8, 64, 64)
conv = torch.nn.Conv2d(in_channels=8,
out_channels=8,
kernel_size=5,
stride=2)
y = conv(x)
y.shape
torch.Size([2, 8, 30, 30])
torch.Size([2, 8, 63, 63])
… almost the same shape …
https://www.mdpi.com/2072-4292/9/6/522/htm
Recall that we want a model that generates images that looks like our training data
Idea:
Encoder:
Decoder:
Autoencoders are not used for supervised learning. The task is not to predict something about the image!
Autoencoders are considered a generative model.
The dimensionality reduction means that there will be structure in the embedding space.
If the dimensionality of the embedding space is not too large, similar images should map to similar locations.
Q: Can we pick a random point in the embedding space, and decode it to get an image of a digit?
A: Unfortunately not necessarily. Can we figure out why not?
Overfitting can occur if the size of the embedding space is too large.
If the dimensionality of the embedding space is small, then the neural network needs to map similar images to similar locations.
If the dimensionality of the embedding space is too large, then the neural network can simply memorize the images!
Q: Why do autoencoders produce blurry images?
Hint: it has to do with the use of the MSELoss.
Read more: https://ieeexplore.ieee.org/document/8461664