Giter Club home page Giter Club logo

strokenuwa's Introduction

StrokeNUWA: Tokenizing Strokes for Vector Graphic Synthesis

Implementation of the paper StrokeNUWA: Tokenizing Strokes for Vector Graphic Synthesis, which is a pioneering work exploring a better visual representation ''stroke tokens'' on vector graphics, which is inherently visual semantics rich, naturally compatible with LLMs, and highly compressed.

Model Architecture

VQ-Stroke

VQ-Stroke modules encompasses two main stages: “Code to Matrix” stage that transforms SVG code into the matrix format suitable for model input, and “Matrix to Token” stage that transforms the matrix data into stroke tokens.

Overview of VQ-Stroke.

Overview of Down-Sample Blocks and Up-Sample Blocks.

Automatic Evaluation Results

Setup

We check the reproducibility under this environment.

  • Python 3.10.13
  • CUDA 11.1

Environment Installation

Prepare your environment with the following command

git clone https://github.com/ProjectNUWA/StrokeNUWA.git
cd StrokeNUWA

conda create -n strokenuwa python=3.9
conda activate strokenuwa

# install conda
conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia

# install requirements
pip install -r requirements.txt

Model Preparation

We utilize Flan-T5 (3B) as our backbone. Download the model under the ./ckpt directory.

Dataset Preparation

FIGR-8-SVG Dataset

Download the raw FIGR-8 dataset from [Link] and follow Iconshop to further preprocess the datasets. (We thank @Ronghuan Wu --- author of Iconshop for providing the preprocessing scripts.)

Model Training and Inference

Step 1: Training the VQ-Stroke

python scripts/train_vq.py -cn example

VQ-Stroke Inference

python scripts/test_vq.py -cn config_test CKPT_PATH=/path/to/ckpt TEST_DATA_PATH=/path/to/test_data

Step 2: Training the EDM

After training the VQ-Stroke, we first create the training data by inferencing on the full training data, obtaining the "Stroke" tokens and utilize these "Stroke" tokens to further training the Flan-T5 model.

We have provided an example.sh and training example data example_dataset/data_sample_edm.pkl for users for reference.

Acknowledgement

We appreciate the open source of the following projects:

Hugging FaceLLaMA-X

strokenuwa's People

Contributors

zetangforward avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.