Giter Club home page Giter Club logo

srnet's Introduction

SRNet

Update (15th Janurary 2022): Paths to download data-files have been updated.

Update (27th August 2020) :

A bug related to variable image size is fixed. You can now train with variable image sizes. This will improve generations significantly.

Training is now significantly faster. Pull all changes and train as usual.

Update (26th July 2020) :

  • Pre-trained weights have been uploaded. Please refer to the Pre-trained weights section for usage.

  • The latest commit makes a few modifications to the model. Pull all changes before using the pre-trained weights.


This repository presents SRNet (Liang Wu et al), a neural network that tackles the problem of text editing in images. It marks the inception of an area of research that could automate advanced editing mechanisms in the future.

SRNet is a twin discriminator generative adversarial network that can edit text in any image while maintaining context of the background, font style and color. The demo below showcases one such use case. Movie poster editing.

L - Source ; R - Modified


Architecture changes

This implementation of SRNet introduces two main changes.

  1. Training: The original SRNet suffers from instability. The generator loss belies the instability that occurs during training. This imbalance affects skeleton (t_sk) generation the maximum. The effect manifests when the generator produces a sequence of bad t_sk generations, however instead of bouncing back, it grows worse and finally leads to mode collapse. The culprit here is the min-max loss. A textbook method to solve this problem is to let the discriminator always be ahead of the generator. The same was employed in this implementation.

  2. Generator: In order to accomodate for a design constraint in the original net, I have added three extra convolution layers in the decoder_net.

Incorporating these changes improved t_sk generations dramatically and increased stability. However, this also increased training time by ~15%.


Usage

A virtual environment is the most convenient way to setup the model for training or inference. You can use virtualenv for this. The rest of this guide assumes that you are in one.

  • Clone this repository:
    $ git clone https://github.com/Niwhskal/SRNet.git
    
    $ cd SRNet
  • Install requirements (Make sure you are on python3.xx):
    $ pip3 install -r requirements.txt

Data setup

This repository provides you with a bash script that circumvents the process of synthesizing the data manually as the original implementation does. The default configuration parameters set's up a dataset that is sufficient to train a robust model.

  • Grant execute permission to the bash script:
    $ chmod +x data_script.sh
  • Setup training data by executing:
    $ ./data_script.sh

The bash script downloads background data and a word list, it then runs a datagenerator script that synthesizes training data. Finally, it modifies paths to enable straightforward training. A detailed description of data synthesis is provided by youdao-ai in his original datagenerator repository.

If you wish to synthesize data with different fonts, you could do so easily by adding custom .ttf files to the fonts directory before running datagen.py. Examine the flow of data_script.sh and change it accordingly.

Training

  • Once data is setup, you can immediately begin training:
    $ python3 train.py

If you wish to resume training or use a checkpoint, update it's path and run train.py

If you are interested in experimenting, modify hyperparameters accordingly in cfg.py

Prediction

In order to predict, you will need to provide a pair of inputs (The source i_s and the custom text rendered on a plain background in grayscale (i_t) -examples can be found in SRNet/custom_feed/labels-). Place all such pairs in a folder.

  • Inference can be carried out by running:
    $ python3 predict.py --input_dir *data_dir* --save_dir *destination_dir* --checkpoint *path_to_ckpt*

Pre-trained weights

You can download my pre-trained weights here

Some results from the example directory:

Source Result

Demo

Code for the demo is hastily written and is quite slow. If anyone is interested in trying it out or would like to contribute to it, open an issue, submit a pull request or send me an email at [email protected]. I can host it for you.

References

  • Editing Text in the Wild: An innovative idea of using GAN's in an unorthodox manner.

  • Youdao-ai's original repository: The original tensorflow implementation which helped me understand the paper from a different perspective. Also, credit to youdao for the data synthesis code. If anyone is interested in understanding the way data is synthesized for training, examine his repository.

  • SynthText project: This work provides the background dataset that is instrumental for data synthesis.

  • Streamlit docs: One of the best libraries to build and publish apps. Severely underrated.

srnet's People

Contributors

lakshw1n avatar dependabot[bot] avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.