Giter Club home page Giter Club logo

attention-refocusing's Introduction

Attention Refocusing

[Website][Demo(Coming soon)]

This is the official implementation of the paper "Grounded Text-to-Image Synthesis with Attention Refocusing"

intro_small.mp4

Setup

conda create --name ldm_layout python==3.8.0
conda activate ldm_layout
conda install pytorch torchvision torchaudio pytorch-cuda=11.7 -c pytorch -c nvidia
pip install -r requirements.txt
pip install git+https://github.com/CompVis/taming-transformers.git
pip install git+https://github.com/openai/CLIP.git

Inference

Teaser figure

Download the model GLIGEN and put them in gligen_checkpoints

Run with the prompts in HRS/Drawbench prompts :

python guide_gligen.py --ckpt [model_checkpoint]  --file_save [save_path] \
                       --type [category] --box_pickle [saved_boxes] --use_gpt4

Where

  • --ckpt: Path to the GLIGEN checkpoint
  • --file_save: Path to save the generated images
  • --type: The category to test (options include counting, spatial, color, size)
  • --box_pickle: Path to save the generated layout from GPT-4
  • --use_gpt4: Whether to use GPT-4 to generate the layout. If you're using GPT-4, set your GPT-4 API key as follows:
export OPENAI_API_KEY='your-api-key'

For instance, to generate images according to the layouts and prompts of the counting category:

python guide_gligen.py --ckpt gligen_checkpoints/diffusion_pytorch_model.bin --file_save counting_500 \
                       --type counting --box_pickle ../data_evaluate_LLM/gpt_generated_box/counting.p

To run with user input text prompts:

export OPENAI_API_KEY='your-api-key'
python inference.py

We provide generated layout from GPT4 for HRS benchmark in the HRS boxes, DrawBench boxes
We also provide generated images from GLIGEN, and other baselines including Stable Diffusion, Attend-and-excite, MultiDiffusion, Layout-guidance, GLIGEN and ours here

Evaluation

Set up the environment, download detector models, and run evaluation for each category, see the evaluation.

Acknowledgments

This project is built on the following resources:

  • GLIGEN: Our code is built upon the foundational work provided by GLIGEN.

  • HRS: The evaluation component of our project has been adopted from HRS.

attention-refocusing's People

Contributors

attention-refocusing avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.