Giter Club home page Giter Club logo

neural-style-mmd's Introduction

Neural-Style-MMD

This repository holds the MXNet code for the paper

Demystifying Neural Style Transfer, Yanghao Li, Naiyan Wang, Jiaying Liu, and Xiaodi Hou, International Joint Conference on Artificial Intelligence (IJCAI), 2017

[Arxiv Preprint]

Introduction

Neural-Style-MMD presents a neural style transfer algorithm based on a new interpretation. Instead of using Gram matrix in original neural style transfer methods, this repo provides two methods to implement style transfer, including a Maximum Mean Discrepancy (MMD) loss and a Batch Normalization (BN) statistic loss. The paper also demonstrates the original matching Gram matrix is equivalent to the a specific polynomial MMD. Details could be found in the paper. Our implementation is based on the neural-style example of MXNet.

Prerequisites

Before running this code, you should make the following preparations:

  • Install MXNet following the instructions and install the python interface. This repo is tested on commmit 01cde1.

  • Download the pre-trained VGG-19 model in the model folder:

wget https://github.com/dmlc/web-data/raw/master/mxnet/neural-style/model/vgg19.params

Usage

Basic Usage:

python neural-style.py --mmd-kernel linear --gpu 0 --style-weight 5.0 --content-image input/brad_pitt.jpg --style-image input/starry_night.jpg --output brad_pitt-starry_night --output-folder output_images

We support 4 single transfer methods, including 3 mmd kernels, including linear, poly and Gaussian, and a BN Statistics Matching method. At the same time, the code supports fusing different transfer methods with specific weights.

Options

  • --mmd-kernel: Specify MMD kernel (linear, poly, Gaussian), also their combination, e.g. linear,poly.
  • --bn-loss: Whether to use the BN method.
  • --multi-weights: The weights when fusing different transfer methods, e.g. 0.5,0.5.
  • --style-weight: How much to weight the style loss term. It is equivalent to the balance factor gamma in the paper when we fix the content-weight as 1.0.

You can run python neural-style.py with -h to see more options.

neural-style-mmd's People

Contributors

lyttonhao avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

neural-style-mmd's Issues

confusion in Linear Kernel Loss functions

I'm trying to understand the code but unable to understand your loss calculation function can you please explain what are you doing because it doesn't seem you are doing anything in linear kernel you have commented it out in mmd_loss.py

Batch Processing

Is there anyway to implement processing folders full of multiple files(For video)
Maybe even do them in order like other neuralstyle transfer implemations do

The paper might has a mistake

In the paper "Demystifying Neural Style Transfer", there might be a mistake, which will make Equation (8) incorrect.

For a layer L (in the paper, the authors used the lowcase L) in the loss network, NL is the number of feature maps in layer L. All the feature maps in layer L have the same size for a given input image.

Given different input images of different sizes, the size of those feature maps at the same layer will be different. For example, if the style image is 512x512 and the content image is 256x256, the size of a feature map of the style image at layer 4_2 (use VGG-19 as an example) will be 4 times of the feature map of the content image at layer 4_2.

On the right column of page 2 of the paper, ML is the size of a feature map at layer L for the content image and the generated image. For the style image, the size of a feature map at layer L typically is different. Therefore, the size of matrix to save the activations of the style image at layer L cannot be NL x ML.

If my understanding is correct, then the deduction in Equation (8) is incorrect.

gnorm = mx.nd.norm(model_executor.data_grad).asscalar()

error happens when running at "gnorm = mx.nd.norm(model_executor.data_grad).asscalar()" in /mnt/d/mahao/codes/Neural-Style-MMD/neural-style.py:

MXNetError: Check failed: reinterpret_cast( params.info->callbacks[kCustomOpForward])( ptrs.size(), const_cast<void**>(ptrs.data()), const_cast<int*>(tags.data()), reinterpret_cast<const int*>(req.data()), static_cast(ctx.is_train), params.info->contexts[kCustomOpForward]):

mmd_loss line 43

Hi, in line 43 of mmd_loss.py, you wrote dot(x, x.T), I think it should be dot(x.T, x), correct?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.