Giter Club home page Giter Club logo

artistic-video-partial-conv-depth-loss's Introduction

Stable Video Style Transfer Based on Partial Convolution with Depth-Aware Supervision

[Paper] [Code] [Supplementary Video]

Overview

This code is written by Songhua Liu, State Key Laboratory of New Technology of Computer Software, Nanjing University and it is the official pytorch implementation of paper Stable Video Style Transfer Based on Partial Convolution with Depth-Aware Supervision, published on ACM MM 2020. The baseline in the paper is Cycle-GAN, and this project is developed based on their opensource pytorch implementation. The system overview is shown below.

As shown in the figure, there are three modules beyond the contribution of this repository: optical flow estimation module, occlusion and motion mask estimation module, and depth estimation module. In the paper, optical flow estimation module is FlowNet 2.0, motion mask estimation module is a forward-backward consistency checker from here, and depth estimation module is from paper Single Image Depth Perception in the Wild.

There is already a depth estimation module in this repository from here. As for optical flow and mask estimation modules, you can follow their guide to set them up. You can also try other estimation algorithms freely.

Some examples are shown below. For more results, please refer to our Supplementary Video.

  • Video to VanGogh:

  • Video to Monet:

  • Video to Chinese watercolor:

Prerequisites

  • Linux or macOS
  • Python 3
  • PyTorch 0.4+ and other dependencies (torchvision, visdom, dominate, and other common python libs)

Getting Started

  • Clone this repository:

    git clone https://github.com/Huage001/Artistic-Video-Partial-Conv-Depth-Loss
    cd Artistic-Video-Partial-Conv-Depth-Loss
  • Dataset preparation:

    • For training of the image stage, there should be four folders named trainA, trainB, testA, and testB under root of the dataset. Please put style images in trainA and testA, and put content images in trainB and testB. The structure of image dataset is shown in datasets/image.
    • For training of the video stage, there should be five folders named frame, flow, mask, style, last_fake. Frame, optical flow, and mask of one video clip should be put in a separated sub-folder under the corresponding folder. Pay attention to the relative order of video clips in these three folders. Style images should be put under style. As for last_fake, you should run the test of stage 1 for the last frame of each video clip and then put the generated results under this folder. Also, please pay attention the relative order. A sample structure is shown in datasets/video.
    • Provided datasets for this paper will be ready soon.
  • Train/Test:

    • Before training, start visdom server:

      python -m visdom.server
      
    • We provide simple start-up scripts for training and testing of image stage and video stage:

      • Start training for image stage:

        python simple_train_image.py
      • Start testing for image stage (required before video stage):

        python simple_test_image.py
      • Start training for video stage:

        python simple_train_video.py
        
      • Start testing for video stage:

        python simple_test_video.py
        
    • For more options, you may need to modify scripts above.

    • Pre-trained models will be ready soon.

Acknowledgments

We sincerely thank:

  • Junyan Zhu et al. providing baseline and its implementation for this paper.
  • Guilin Liu et al. providing partial convolution implementation.
  • Weifeng Chen et al. providing depth estimation solution and YifJiang providing pre-trained depth estimation model.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.