rrmina / fast-neural-style-pytorch Goto Github PK

Fast Neural Style Transfer implementation in PyTorch :art: :art: :art:

Python 100.00%

style-transfer neural-network convolutional-neural-networks pytorch

fast-neural-style-pytorch's Introduction

fast-neural-style: Fast Style Transfer in Pytorch! 🎨

An implementation of fast-neural-style in PyTorch! Style Transfer learns the aesthetic style of a style image, usually an art work, and applies it on another content image. This repository contains codes the can be used for:

fast image-to-image aesthetic style transfer,
image-to-video aesthetic style transfer, and for
training style-learning transformation network

This implemention follows the style transfer approach outlined in Perceptual Losses for Real-Time Style Transfer and Super-Resolution paper by Justin Johnson, Alexandre Alahi, and Fei-Fei Li, along with the supplementary paper detailing the exact model architecture of the mentioned paper. The idea is to train a separate feed-forward neural network (called Transformation Network) to transform/stylize an image and use backpropagation to learn its parameters, instead of directly manipulating the pixels of the generated image as discussed in A Neural Algorithm of Artistic Style aka neural-style paper by Leon A. Gatys, Alexander S. Ecker, and Matthias Bethge. The use of feed-forward transformation network allows for fast stylization of images, around 1000x faster than neural style.

This implementation made some modifications in Johnson et. al.'s proposed architecture, particularly:

The use of reflection padding in every Convolutional Layer, instead of big single reflection padding before the first convolution layer
Ditching of the Tanh output. The generated image are the raw outputs of the convolutional layer. While the Tanh model produces visually pleasing results, the model fails to transfer the vibrant and loud colors of the style image (i.e. generated images are usually darker). This however makes for a good retro style effect.
Use of Instance Normalization, instead of Batch Normalization after Convolutional and Deconvolutional layers, as discussed in Instance Normalization: The Missing Ingredient for Fast Stylization paper by Dmitry Ulyanov, Andrea Vedaldi, and Victor Lempitsky.

The original caffe pretrained weights of VGG16 were used for this implementation, instead of the pretrained VGG16's in PyTorch's model zoo.

Image Stylization

It took about 1.5 seconds for a GTX 1060 to stylize University of the Philippines Diliman - Oblation (1400×936) by LeAnne Jazul/Rappler. From Top to Right: Udnie Style, Mosaic Style, Tokyo Ghoul Style, Original Picture, Udnie Style with Original Color Preservation

Video Stylization

It took 6 minutes and 43 seconds to stylize a 2:11 minute-24 fps-1280x720 video on a GTX 1080 Ti.

More videos in this Youtube playlist. Unfortunately, Youtube's compression isn't friendly with style transfer videos, possibily because each frame is shaky with respect to its adjacent frames, hence obvious loss in video quality. Raw and lossless output video can be downloaded in my Dropbox folder, or Gdrive Folder

Webcam Demo

webcam.py can output 1280x720 videos at a rate of at least 4-5 frames per second on a GTX 1060.

Requirements

Most of the codes here assume that the user have access to CUDA capable GPU, at least a GTX 1050 ti or a GTX 1060

Data Files

Pre-trained VGG16 network weights - put it in models/ directory
MS-COCO Train Images (2014) - 13GB - put train2014 directory in dataset/ directory
torchvision - torchvision.models contains the VGG16 and VGG19 model skeleton

Dependecies

PyTorch
opencv2
NumPy
FFmpeg (Optional) - Installation Instruction here

Usage

All arguments, parameters and options are hardcoded inside these 5 python files. Before using the codes, please arrange your files and folders as defined below.

Training Style Transformation Network

train.py: trains the transformation network that learns the style of the style image. Each model in transforms folder was trained for roughly 23 minutes, with single pass (1 epoch) of 40,000 training images, and a batch size of 4, on a GTX 1080 Ti.

python train.py

Options

TRAIN_IMAGE_SIZE: sets the dimension (height and weight) of training images. Bigger GPU memory is needed to train with larger images. Default is 256px.
DATASET_PATH: folder containing the MS-COCO train2014 images. Default is "dataset"
NUM_EPOCHS: Number of epochs of training pass. Default is 1 with 40,000 training images
STYLE_IMAGE_PATH: path of the style image
BATCH_SIZE: training batch size. Default is 4
CONTENT_WEIGHT: Multiplier weight of the loss between content representations and the generated image. Default is 8
STYLE_WEIGHT: Multiplier weight of the loss between style representations and the generated image. Default is 50
ADAM_LR: learning rate of the adam optimizer. Default is 0.001
SAVE_MODEL_PATH: path of pretrained-model weights and transformation network checkpoint files. Default is "models/"
SAVE_IMAGE_PATH: save path of sample tranformed training images. Default is "images/out/"
SAVE_MODEL_EVERY: Frequency of saving of checkpoint and sample transformed images. 1 iteration is defined as 1 batch pass. Default is 500 with batch size of 4, that is 2,000 images
SEED: Random seed to keep the training variations as little as possible

transformer.py: contains the architecture definition of the trasnformation network. It includes 2 models, TransformerNetwork() and TransformerNetworkTanh(). TransformerNetwork doesn't have an extra output layer, while TransformerNetworkTanh, as the name implies, has for its output, a Tanh layer and a default output multiplier of 150. TransformerNetwork faithfully copies the style and colorization of the style image, while Tanh model produces images with darker color; which brings a retro style effect.

Options

norm: sets the normalization layer to either Instance Normalization "instance" or Batch Normalization "batch". Default is "instance"
tanh_multiplier: output multiplier of the Tanh model. The bigger the number, the bright the image. Default is 150

experimental.py: contains the model definitions of the experimental transformer network architectures. These experimental transformer networks largely borrowed ideas from the papers Aggregated Residual Transformations for Deep Neural Networks or more commonly known as ResNeXt, and Densely Connected Convolutional Networks or more commonly known as DenseNet. These experimental networks are designed to be lightweight, with the goal of minimizing the compute and memory needed for better real-time performance.

See table below for the comparison of different transformer networks.

See transforms folder for some pretrained weights. For more pretrained weights, see my Gdrive or Dropbox.

Stylizing Images

stylize.py: Loads a pre-trained transformer network weight and applies style (1) to a content image or (2) to the images inside a folder

python stylize.py

Options

STYLE_TRANSFORM_PATH: path of the pre-trained weights of the the transformation network. Sample pre-trained weights are availabe in transforms folder, including their implementation parameters.
PRESERVER_COLOR: set to True if you want to preserve the original image's color after applying style transfer. Default value is False

Stylizing Videos

video.py: Extracts all frames of a video, apply fast style transfer on each frames, and combine the styled frames into an output video. The output video doesn't retain the original audio. Optionally, you may use FFmpeg to merge the output video and the original video's audio.

python video.py

Options

VIDEO_NAME: path of the original video
FRAME_SAVE_PATH: parent folder of the save path of the extracted original video frames. Default is "frames/"
FRAME_CONTENT_FOLDER: folder of the save path of the extracted original video frames. Default is "content_folder/"
FRAME_BASE_FILE_NAME: base file name of the extracted original video frames. Default is "frame"
FRAME_BASE_FILE_TYPE: save image file time ".jpg"
STYLE_FRAME_SAVE_PATH: path of the styled frames. Default is "style_frames/"
STYLE_VIDEO_NAME: name(or save path) of the output styled video. Default is "helloworld.mp4"
STYLE_PATH: pretrained weight of the style of the transformation network to use for video style transfer. Default is "transforms/aggressive.pth"
BATCH_SIZE: batch size of stylization of extracted original video frames. A 1080ti 11GB can handle a batch size of 20 for 720p videos, and 80 for a 480p videos. Dafult is 1
USE_FFMPEG(Optional): Set to True if you want to use FFmpeg in extracting the original video's audio and encoding the styled video with the original audio.

Stylizing Webcam

webcam.py: Captures and saves webcam output image, perform style transfer, and again saves a styled image. Reads the styled image and show in window.

python webcam.py

Options

STYLE_TRANSFORM_PATH: pretrained weight of the style of the transformation network to use for video style transfer. Default is "transforms/aggressive.pth"
WIDTH: width of the webcam output window. Default is 1280
HEIGHT: height of the webcam output window. Default is 720

Files and Folder Structure

master_folder
 ~ dataset 
    ~ train2014
        coco*.jpg
        ...
 ~ frames
    ~ content_folder
        frame*.jpg
        ...
 ~ images
    ~ out
        *.jpg
      *.jpg
 ~ models
    *.pth
 ~ style_frames
    frames*.jpg
 ~ transforms
    *.pth
 *.py

Comparison of Different Transformer Networks

Network	size (Kb)	no. of parameters	final loss (million)
transformer/TransformerNetwork	6,573	1,679,235	9.88
experimental/TransformerNetworkDenseNet	1,064	269,731	11.37
experimental/TransformerNetworkUNetDenseNetResNet	1,062	269,536	12.32
experimental/TransformerNetworkV2	6,573	1,679,235	10.05
experimental/TransformerResNextNetwork	1,857	470,915	10.31
experimental/TransformerResNextNetwork_Pruned(0.3)	44	8,229	19.29
experimental/TransformerResNextNetwork_Pruned(1.0)	260	63,459	12.72

TransformerResNextNetwork and TransformerResNextNetwork_Pruned(1.0) provides the best tradeoff between compute, memory size, and performance.

Todo!

FFmpeg support for encoding videos with video style transfer
~~Color-preserving Real-time Style Transfer~~
~~Webcam demo of fast-neural-style~~
Web-app deployment of fast-neural-style (ONNX)

Citation

  @misc{rusty2018faststyletransfer,
    author = {Rusty Mina},
    title = {fast-neural-style: Fast Style Transfer in Pytorch!},
    year = {2018},
    howpublished = {\url{https://github.com/iamRusty/fast-neural-style-pytorch}},
    note = {commit xxxxxxx}
  }

Attribution

This implementation borrowed some implementation details from:

Justin Johnson's fast-neural-style in Torch, and
the PyTorch Team's PyTorch Examples: fast-neural-style
This repository also borrows some markdown formatting, as well as license description from Logan Engstrom's fast-style-transfer in Tensorflow

Related Work

Neural Style in PyTorch - PyTorch implementation of the original A Neural Algorithm of Artistic Style aka neural-style paper by Gatys et. al.

License

fast-neural-style-pytorch's People

Contributors

Stargazers

Watchers

fast-neural-style-pytorch's Issues

Can you share the pre-trained model?

Thanks,

Share style images?

Can you share the original style images that were used to generate each of the pretrained weights in the /transforms folder? Currently the repo only has the original images for a few of the styles. (Not super urgent, but it would be helpful to see where the styles come from and to have the source images in case we want to train our own networks.)

new style image

Hello
Thank you for sharing your source codes.
I want use my own style image when training network.
Could you tell me how i can do this?

"AttributeError: can't set attribute"

This error (in Colaboratory) is fixed by upgrading the Pillow package, skip Pillow==4.1.1

/usr/local/lib/python3.6/dist-packages/PIL/JpegImagePlugin.py in SOF(self, marker)
    144     n = i16(self.fp.read(2))-2
    145     s = ImageFile._safe_read(self.fp, n)
--> 146     self.size = i16(s[3:]), i16(s[1:])
    147 
    148     self.bits = i8(s[0])

VGG weight file not available on AWS

got a FORBIDDEN response for the AWS file. switched to the weights file which is now included with torchvision

# !wget https://s3-us-west-2.amazonaws.com/jcjohns-models/vgg16-00b39a1b.pth
!wget https://download.pytorch.org/models/vgg16-397923af.pth
!cp vgg16-397923af.pth vgg16-00b39a1b.pth

Trying TransformerNetworkTanh and error when running train.py

Getting following error when trying. Hope you can help resolve this?

Device: cuda Traceback (most recent call last): File "C:\Temp\fast-neural-style-pytorch-master\train.py", line 172, in <module> train() File "C:\Temp\fast-neural-style-pytorch-master\train.py", line 52, in train TransformerNetwork = transformer.TransformerNetworkTanh.to(device) File "C:\Users\eform\anaconda3\lib\site-packages\torch\nn\modules\module.py", line 673, in to return self._apply(convert) AttributeError: 'str' object has no attribute '_apply'

IndentationError

Forgive me because I'm a noob at python but when I try to run the video.py script I get Indentation error: unexpected indent

File "/home/harish/fast-neural-style-pytorch/stylize.py", line 117
torch.cuda.empty_cache()
^
IndentationError: unexpected indent.

When I tried removing the line I get same error at the next line,

File "/content/stylize.py", line 119
generated_tensor = net(content_batch.to(device)).detach()
^
IndentationError: unexpected indent

can you help me with this please?

webcam.py: can't adjust resolution?

I'm on ubuntu 18.04 and using "droidcam" so I can use my phone as a webcam.

running "python webcam.py" Produces this error:
Traceback (most recent call last):
File "webcam.py", line 4, in
import utils
File "/home/al/fast-neural-style-pytorch/utils.py", line 121
tuple_with_path = (*original_tuple, path)
^
SyntaxError: invalid syntax

However, I'm able to work around this by using "python3". What really has me confused is that I seem unable to adjust the window size/resolution. I've tried editing the two lines in webcam.py, and I've tried adding the command line options, but I always seem to wind up at 640x480.
Furthermore, I can change which style_transform_path in webcam.py. Which only makes this weirder.

Any guidance would be greatly appreciated.

Value error while trying to run video.py

While trying to run video.py, I get the following error. What am I doing wrong?

Traceback (most recent call last):
File "video.py", line 84, in
video_transfer(VIDEO_NAME, STYLE_PATH)
File "video.py", line 34, in video_transfer
stylize_folder(style_path, FRAME_SAVE_PATH, STYLE_FRAME_SAVE_PATH, batch_size=BATCH_SIZE)
File "/home/user/fast-neural-style-pytorch/stylize.py", line 115, in stylize_folder

for content_batch, _, path in image_loader:

ValueError: need more than 2 values to unpack

How to run stylize.py

I don't know where to put the address （--content_dir )

train my own style image

can you tell me how to train my style image with coco datasets? and how to adjust to the params? thank you

I cannot find 'USE_FFMPEG' in video.py.

How can I use that function?

vgg16_features.load_state_dict(torch.load(vgg_path), strict=False)

I have no issues with stylize.py (with pretrained models). My problem is with train.py. I think I followed the instructions: downloaded vgg16-00b39a1b.pth, downloaded train2014, installed pytorch, opencv, etc (installed with conda).
I have Debian 11 stable, Python 3.7.4, pytorch 1.4.0, torchvision 0.5.0, cudatoolkit 10.1.243.

$ python train.py

Traceback (most recent call last):
  File "train.py", line 168, in <module>
    train()
  File "train.py", line 50, in train
    VGG = vgg.VGG16().to(device)
  File "/home/mrfabiolo/style_transfer/fast-neural-style-pytorch/vgg.py", line 33, in __init__
    vgg16_features.load_state_dict(torch.load(vgg_path), strict=False)
  File "/home/mrfabiolo/miniconda3/lib/python3.7/site-packages/torch/serialization.py", line 529, in load
    return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
  File "/home/mrfabiolo/miniconda3/lib/python3.7/site-packages/torch/serialization.py", line 692, in _legacy_load
    magic_number = pickle_module.load(f, **pickle_load_args)
_pickle.UnpicklingError: invalid load key, '<'.

Same problem with device = "cpu" or device = "cuda". I got a GeForce GTX 750 card.

Train Notebook: Request for Improvements

This still needs a lot of changes to make it a one-click-run-all interface.
Some of the changes requested are:

First cell should contain all of the wheel installations and
First cell should also download all of the required local files (e.g. image file)
There should be separate cells for hyperparameter (global settings) and imports
Images should be displayed every show_iter
A link directing to Google Colab is also welcome. It is most likely placed on top (1st markdown cell) of the notebook.

This notebook can be used a reference for the first 4 bullets. Can you do it @p-rit ?

Fixing the save/load generated images

Hi,

Thanks for this project.

I don't completely understand why the whitening is happening on the generated image, but I don't think it is anything to do with the VideoCapture.

I don't have a real fix, but I think this is slightly better than writing and reading the file:

# old code
#utils.saveimg(generated_image, str(count+1) + ".png")
#img2 = cv2.imread(str(count+1) + ".png", cv2.IMREAD_UNCHANGED)

# don't really understand why this is required
img2 = cv2.imdecode(cv2.imencode(".png", generated_image)[1], cv2.IMREAD_UNCHANGED)

video.py wrong line

print("Elapsed Time: {}".format(time.time()-starttime))
tor ---->wrong

Difference between this implementation and PyTorch team's version?

What is the difference of this implementation (regarding just static image style transfer, not for videos) vs the PyTorch teams implementation: the PyTorch TeamPyTorch Examples: fast-neural-style?

About content loss

Hi,

Thanks for this project.

In your code, when you compute content loss, your code is :
content_loss = CONTENT_WEIGHT * MSELoss(content_features['relu2_2'], generated_features['relu2_2'])

I think this might be wrong, it should be:
content_loss = CONTENT_WEIGHT * MSELoss(generated_features['relu2_2'], content_features['relu2_2'])

Please correct me if I am wrong.

Custom model

Hello, very good work, I was able to make it work without problem.
I want to try creating a model, using a different style, some other painting or artist.
How can I create my own model / vgg16-00b39a1b.pth?

If you have any guide to experiment with creating models it would be of great help, I will continue investigating, thanks

What is the License schema ?

Hello :)

I was wondering if I could use this for commercial use ? so far I have been using it for demo purposes and would like to work on it more but wanted to check with you first. What is the license schema for this project before commiting more time into it.

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.

rrmina / fast-neural-style-pytorch Goto Github PK

fast-neural-style-pytorch's Introduction

fast-neural-style: Fast Style Transfer in Pytorch! 🎨

Image Stylization

Video Stylization

Webcam Demo

Requirements

Data Files

Dependecies

Usage

Training Style Transformation Network

Stylizing Images

Stylizing Videos

Stylizing Webcam

Files and Folder Structure

Comparison of Different Transformer Networks

Todo!

Citation

Attribution

Related Work

License

fast-neural-style-pytorch's People

Contributors

Stargazers

Watchers

Forkers

fast-neural-style-pytorch's Issues

Recommend Projects

Recommend Topics

Recommend Org