Semantic Recognized Real-time Camera Style Transfer

Introduction

This repository is an extension of Image Recognition course final project, which intends to develop an application that can achieve semantic recognized real-time camera arbitrary multi-style transfer. Specifically, I intend to apply different styles to human and background dynamically by utilizing human segmentation technique.

Reference Repositories

Human Segmentation

The human segmentation implementation used in this application is borrowed from this repository: thuyngch/Human-Segmentation-PyTorch. Speciafically, I adopt ResNet18 backboned UNet to do the segmentation.

Real-time Arbitrary Style Transfer

Three real-time arbitrary style transfer implementations are borrowed from:

Application Architecture

Usage

Download network weights and install timm for segmentation

Pre-trained weights of all style transfer networks are already included in the repository, while the weights of segmentation network is too large, so download it at here and place it under model_checkpoints directory in AdaIN_DynamicMask, AvatarNet_DynamicMask and SANet_DynamicMask.

Then install timm for segmentation network in any of three subdirectories, e.g.

cd AdaIN_DynamicMask
pip install -e models/pytorch-image-models

Start web camera application

For AdaIN_DynamicMask and AvatarNet_DynamicMask, use

python webcam.py --human_style "path to style image for human" --background_style "path to style image for background" --ratio "number between 0 and 1" (optional)

The ratio argument is used to adjust the strength of style, 0 means output with be the same as original image, 1 means the strongest style effect.

For SANet_DynamicMask, the borrow implementation currently doesn't support style strength adjustment, so the ratio arguemnt doesn't have any effect. I intend to add this adjustment feature in the future.

Results and evaluation

AdaIN

AvatarNet

SANet

Runtime Profile

Runtime profile is tested on server with GTX 1080Ti card by reading one content image repeatedly (Because I can't use camera on server :( , while my laptop doesn't have powerful GPU). The content image size is 1280720, the style image size is 400400.

Gatys et al. stands for the first neural style transfer approach proposed by Gatys et al.

The implementation of AvatarNet seems have some efficiency problem, which indicates by its low GPU utilization.

albertpi-git / semantic-recognized-realtime-camera-style-transfer Goto Github PK