Neural Style Transfer Explained

2018-04-29 Deep Learning Comments Read Count: 4minutes

Welcome Everyone to the journey into Neural Style transfer. Let’s dive. 😃

Level one : Introduction

With the rise of Deep Learning technology people invented awesome things using this technology. Neural style transfer (aka Artistic style transfer) is one of the amazing works. It is introduced in 2015 by Gatys et al. I’m not going to make you sleepy by discussing details in the paper at this level. Let’s talk about the things can be done using the technology. I believe you all like to draw your painting in your favorite artist’s style. It not an easy task if you do not have enough skills in painting. Don’t worry, Deep Learning have power to reincarnate your favorite artist and make him/her to draw your painting. Looking amazing right?. Let me show you an example.

Here is your Image(In literature it is known as Content Image):

Content Image

And this is your favorite artist’s painting(Known as Style Image):

Style Image

Now Let’s do the Magic ✨

This the Resulted image:

Result Image

Looking great right?. Let’s move to the level two.

Level two : Understanding Concepts

From this level onwards you need to have some understanding in How Convolutional Neural Networks work. If you dont have, Don’t feel worry here are some resources to learn CNNs.

Understanding CNN (from cs231n)
Convolutional Neural Networks (by Siraj Ravel)

For those who have understood CNN, follow me. Others, follow above links and See you in couple of days 👋

In any Neural Network Based model there are three main components. Namely,

Data.
Network Architecture.
Loss Function.

Let’s understand each components above and how they are related to our topic.

Data

For this task we need three inputs. They are,

Style Image (S_).
Content Image (C_).
Random Noise (G_).

Both Style Image, Content Image are explained in level one. I hope you understood. Let’s talk about Random Noise. In simple words, this is the initialization of the final generated image.

Random Noise

Network architecture

For this task we use pre-trained CNN such as VGG-16 / VGG-19.

Random Noise

Loss Function

The loss function for the whole operation is as follows,

$Loss(G)= \alpha*Loss(C,G) + \beta*Loss(S,G)$

Loss(G) : Total Loss
Loss(C,G) : Content Loss
Loss(S,G) : Style Loss
alpha and beta are hyper parameters

In a simplified way, neural style transfer can be expressed in 2 lines of steps.

Initialize Generated Image(G) randomly (This is the noise image).
Use an optimizer to minimize Loss(G) and update G

$G = G - \gamma * \frac{d}{d(G)}(Loss(G))$

Content Loss

We will use layer[L] of the CNN to compute the content loss. The layer[L] should be taken layer neither too shallow nor too deep in the neural network. Now we feed both Content Image and Random Noise(Generated Image) into the CNN. We need to get activations of layer[L] for both Images. Let’s say,
a[C] is the activation of the Content Image and
a[G] is the activation of the Generated Image at layer[L]

if the difference between a[C] and a[G] is smaller, we can say that both Images have similar content. The metric we use to measure the difference is RMSE.

$Loss(C,G) = || a[C] - a[G] ||_F^2$

Style Loss

For Content Loss calculations we used only Layer[L].
But In Style Loss Calculation we use multiple layers. But for now we look at how to calculate Style Loss in one layer(Layer[L]).
Like above, we need to feed both Style Image and Generated Image into our CNN and get their activations at the Layer[L]. Instead of calculating RMSE directly using those activations, we use something different in this case. That thing is, we are checking how activations are correlated across different channels in Layer[L]. In the literature it is known as Gram Matrix or Style Matrix. We need to calculate Gram Matrix for both Style and Generated Image. Gram Matrix calculation of ith and jth position of channel k in the Layer[L] for style activations is as follows.

$G_{ij}^{[L][S]} = \sum_k a_{ik}^{[S]} . a_{jk}^{[S]}$

i=(1 to nh), j=(1 to nw) and k=(1 to nc)

After calculating Gram Matrix for both Style and Generated image, We use RMSE to calculate the difference at Layer[L]. It is similar what we do in Content Loss.

$Loss(S,G)^{[L]} = || G^{[S][L]} - G^{[G][L]} ||_F^2$

For more visually pleasing results we combine style costs for different layers.

$Loss(S,G) = \sum_l \lambda * Loss(S,G)^{[l]}$

Layer l = [conv1, conv2, conv3 ….. ]

Now we can use following equation to calculate the Total Loss,

$Loss(G)= \alpha*Loss(C,G) + \beta*Loss(S,G)$

Use your favorite optimizer to minimize the total loss and Update the Noise Image.

Thank you for coming this journey with me. See you in another post. Until then, Happy Learning!!