Giter Club home page Giter Club logo

visual-style-prompting's Introduction

๐ŸŽจ Visual Style Prompting with Swapping Self-Attention

: Text-to-Stylized image with Training-free

ArXiv | ๐Ÿ“– Paper | โœจ Project page

Authors โ€ƒโ€ƒ Jaeseok Jeong1,2*, Junho Kim1*, Yunjey Choi1, Gayoung Lee1, Youngjung Uh2โ€ 
โ€ƒโ€ƒโ€ƒโ€ƒโ€ƒโ€ƒโ€ƒโ€ƒ 1NAVER AI Lab, 2Yonsei University
โ€ƒโ€ƒโ€ƒโ€ƒโ€ƒโ€ƒโ€ƒโ€ƒ *Equal Contribution, โ€ Corresponding author

teaser

๐Ÿ”† Abstract

In the evolving domain of text-to-image generation, diffusion models have emerged as powerful tools in content creation. Despite their remarkable capability, existing models still face challenges in achieving controlled generation with a consistent style, requiring costly fine-tuning or often inadequately transferring the visual elements due to content leakage. To address these challenges, we propose a novel approach, visual style prompting, to produce a diverse range of images while maintaining specific style elements and nuances. During the denoising process, we keep the query from original features while swapping the key and value with those from reference features in the late self-attention layers. This approach allows for the visual style prompting without any fine-tuning, ensuring that generated images maintain a faithful style. Through extensive evaluation across various styles and text prompts, our method demonstrates superiority over existing approaches, best reflecting the style of the references and ensuring that resulting images match the text prompts most accurately.


๐Ÿค— HuggingFace Demo


โœจ Requirements

> pytorch 1.13.1
> pip install --upgrade diffusers accelerate transformers einops kornia gradio triton xformers==0.0.16

โœจ Usage

w/ Predefined styles in config file

> python vsp_script.py --style fire

vsp_img

๐Ÿ‘‰ w/ Controlnet

> python vsp_control-edge_script.py --style fire --controlnet_scale 0.5 --canny_img_path assets/edge_dir
> python vsp_control-depth_script.py --style fire --controlnet_scale 0.5 --depth_img_path assets/depth_dir

control_img

๐Ÿ‘‰ w/ User image

> python vsp_real_script.py --img_path assets/real_dir --tar_obj cat --output_num 5
  • Save your images in the style_name.png format.
    • e.g.,) The starry night.png
  • For better results, you can add more style description only to inference image by directly editing code.
    • vsp_real_script.py -> def create_prompt real_img

โœจ Misc

๐Ÿ‘‰ How to visualize the attention map ?

  1. Save the attention map.
> python visualize_attention_src/save_attn_map_script.py
  1. Visualize the attention map.
> python visualize_attention_src/visualize_attn_map_script.py

๐Ÿ“š Citation

@article{jeong2024visual,
  title={Visual Style Prompting with Swapping Self-Attention},
  author={Jeong, Jaeseok and Kim, Junho and Choi, Yunjey and Lee, Gayoung and Uh, Youngjung},
  journal={arXiv preprint arXiv:2402.12974},
  year={2024}
}

โœจ License

Visual Style Prompting with Swapping Self-Attention
Copyright (c) 2024-present NAVER Cloud Corp.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

visual-style-prompting's People

Contributors

taki0112 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.