lyttonhao / neural-style-mmd Goto Github PK

MXNet Code For Demystifying Neural Style Transfer (IJCAI 2017)

Python 95.58% Shell 4.42%

neural-style maximum-mean-discrepancy batch-normalization

neural-style-mmd's Introduction

Neural-Style-MMD

This repository holds the MXNet code for the paper

Demystifying Neural Style Transfer, Yanghao Li, Naiyan Wang, Jiaying Liu, and Xiaodi Hou, International Joint Conference on Artificial Intelligence (IJCAI), 2017

[Arxiv Preprint]

Introduction

Neural-Style-MMD presents a neural style transfer algorithm based on a new interpretation. Instead of using Gram matrix in original neural style transfer methods, this repo provides two methods to implement style transfer, including a Maximum Mean Discrepancy (MMD) loss and a Batch Normalization (BN) statistic loss. The paper also demonstrates the original matching Gram matrix is equivalent to the a specific polynomial MMD. Details could be found in the paper. Our implementation is based on the neural-style example of MXNet.

Prerequisites

Before running this code, you should make the following preparations:

Install MXNet following the instructions and install the python interface. This repo is tested on commmit 01cde1.
Download the pre-trained VGG-19 model in the model folder:

wget https://github.com/dmlc/web-data/raw/master/mxnet/neural-style/model/vgg19.params

Usage

Basic Usage:

python neural-style.py --mmd-kernel linear --gpu 0 --style-weight 5.0 --content-image input/brad_pitt.jpg --style-image input/starry_night.jpg --output brad_pitt-starry_night --output-folder output_images

We support 4 single transfer methods, including 3 mmd kernels, including linear, poly and Gaussian, and a BN Statistics Matching method. At the same time, the code supports fusing different transfer methods with specific weights.

Options

--mmd-kernel: Specify MMD kernel (linear, poly, Gaussian), also their combination, e.g. linear,poly.
--bn-loss: Whether to use the BN method.
--multi-weights: The weights when fusing different transfer methods, e.g. 0.5,0.5.
--style-weight: How much to weight the style loss term. It is equivalent to the balance factor gamma in the paper when we fix the content-weight as 1.0.

You can run python neural-style.py with -h to see more options.

neural-style-mmd's People

Contributors

Stargazers

Watchers

neural-style-mmd's Issues

Have you ever tried the setting of --bn_loss?

I have tried the bn-loss but it always returns nan gradient and loss. So I want to know what's wrong with it.

confusion in Linear Kernel Loss functions

I'm trying to understand the code but unable to understand your loss calculation function can you please explain what are you doing because it doesn't seem you are doing anything in linear kernel you have commented it out in mmd_loss.py

Why cannot get good results with Gaussian kernel?

Thxs for sharing the codes. However the experiments with Gaussian kernel seems terrible.

this is with style weight 2.0.

Batch Processing

Is there anyway to implement processing folders full of multiple files(For video)
Maybe even do them in order like other neuralstyle transfer implemations do

How do you set the bandwidth in Gaussian Kernel

Hi, I wonder how do you set the bandwidth parameter of Gaussian Kernel here.
https://github.com/lyttonhao/Neural-Style-MMD/blob/master/mmd_loss.py#L65
Can you give me an explanation? Thank you!

The paper might has a mistake

In the paper "Demystifying Neural Style Transfer", there might be a mistake, which will make Equation (8) incorrect.

For a layer L (in the paper, the authors used the lowcase L) in the loss network, NL is the number of feature maps in layer L. All the feature maps in layer L have the same size for a given input image.

Given different input images of different sizes, the size of those feature maps at the same layer will be different. For example, if the style image is 512x512 and the content image is 256x256, the size of a feature map of the style image at layer 4_2 (use VGG-19 as an example) will be 4 times of the feature map of the content image at layer 4_2.

On the right column of page 2 of the paper, ML is the size of a feature map at layer L for the content image and the generated image. For the style image, the size of a feature map at layer L typically is different. Therefore, the size of matrix to save the activations of the style image at layer L cannot be NL x ML.

If my understanding is correct, then the deduction in Equation (8) is incorrect.

gnorm = mx.nd.norm(model_executor.data_grad).asscalar()

error happens when running at "gnorm = mx.nd.norm(model_executor.data_grad).asscalar()" in /mnt/d/mahao/codes/Neural-Style-MMD/neural-style.py:

MXNetError: Check failed: reinterpret_cast( params.info->callbacks[kCustomOpForward])( ptrs.size(), const_cast<void**>(ptrs.data()), const_cast<int*>(tags.data()), reinterpret_cast<const int*>(req.data()), static_cast(ctx.is_train), params.info->contexts[kCustomOpForward]):

python version problem

neural-style.py line 270
print np.prod(content_array[0].shape)

mmd_loss line 43

Hi, in line 43 of mmd_loss.py, you wrote dot(x, x.T), I think it should be dot(x.T, x), correct?