Giter Club home page Giter Club logo

long_stable_diffusion's Introduction

Long Stable Diffusion: Long-form text to images

e.g. story -> Stable Diffusion -> illustrations

Right now, Stable Diffusion can only take in a short prompt. What if you want to illustrate a full story? Cue Long Stable Diffusion, a pipeline of generative models to do just that with just a bash script!

Come at me with an example?

Yep! We just published Never Hire a Herd of Goats to Mow your Lawn, an AI-generated story illustrated by this repo.

Goat illustrations

Steps

  1. Start with long-form text that you want accompanying images for, e.g. a story to illustrate.
  2. Ask GPT-3 for several illustration ideas for beginning, middle, end, via the OpenAI API.
  3. "Translate" the ideas from English to "prompt-English", e.g. add suffixes like trending on art station for better results.
  4. The "prompt-English" prompts are put through Stable Diffusion to generate the images.
  5. All the images and prompts are dumped into a .docx, for easy copy-pasting.

Purpose

I made this to automate my self, ie. prompt AI for illustrations to accompany AI-generated stories, for the Stories by AI podcast. Come check us out! And please suggest ways to improve—comments and pull requests are always welcome :)

This was also just a weekend hackathon project to reward myself for doing a lot of work the past couple of months, and for feeling guilty about not using my wonderful and beautiful Titan RTXs to their full potential.

Run

This bash script runs what you need. It assumes 2 GPUs with 24GB memory each. See the note above, under Steps, to change this assumption for your compute needs. I had too much fun with multiprocessing and making it faster.

bash run.sh -f three_little_pigs threelittlepigs

To run your own text, replace three_little_pigs with the name of your new .txt file, put in the texts/ folder.

bash run.sh -f <name_of_txtfile_in_texts_dir>

What you need before you run it like that

  • Install the requirements
  • Make sure you set your OpenAI API key, e.g. in terminal export OPENAI_TOKEN=<your_token>
  • Make sure you have run 'huggingface-cli login' with a valid token
  • Make sure you have access to https://huggingface.co/CompVis/stable-diffusion-v1-4
  • For using extracts method, you need to install nltk and run nltk.download('punkt') in a python shell
  • Then, put your favorite story or article in a .txt file in the texts/ folder

Method Selection

Currently two methods for generating the image prompts from text are supported.

  • "sections": Inputs the entire text to GPT-3 and tells it to generate images for the start, middle, or end of the text.
  • "extracts": Splits the text from the .txt file into smaller chronological bits of text, and then generates an image prompt for each bit of text.

Additional methods yet to be implemented are following:

  • "summary": Generates a summary from the .txt. file, then prompts GPT-3 to generate image prompts from the summary.
  • "summary+extracts": A combation of "summary" and "extracts" method, where both summary and the extract are fed into GPT-3 to generate image prompts.

Output Selection

Currently one type of output is supported

  • "docx": A word file with the images and prompts.

Additional output formats yet to be implemented are:

  • "txt": Just a text file with the image prompts (does not run stable diffusion).
  • "images": Just image PNG files with their title being the prompt.
  • "html": A self contained HTML page with the original text and suggested images
  • "markdown": A markdown file with the original text and image embeds
  • "latex": A latex file with the original text and '' components for the images
  • "pdf": A self contained PDF documents with the original text and images, compiled from latex

Files and folders

  • run_two_gpus.sh: This is the main entry script into the program to parallelize across GPUs easily.

  • run.py: Where most of the magic happens: getting image prompts from GPT-3, making images from those prompts (using stable diffusion, multithreading), saving all those and also dumping those images and prompts to a docx file. This is what run_two_gpus.sh calls.

  • stable_diffusion.py: Just runs stable diffusion if you want to use it by itself (I do). run.py calls it.

  • dump_docx.py: Just dumps image prompts and images into a single docx for a particular text. Again, it's useful if you want to use it by itself on the saved images and prompts. I do, because I'm actually overwriting the file when multiprocessing and sometimes will just use this as a postprocessing step. Yes, you can join those and change that but I don't really care, since sometimes my GPUs misbehave and I'll need to rerun it anyways.

  • texts/: Folder to put your texts in, as a .txt file.

  • image_prompts/: Generated image prompts by GPT-3 based on your text.

  • images: Generated images by Stable Diffusion based on GPT-3's image prompts.

  • docx/: Microsoft Word document for a text with images and their prompts all in one.

  • clean_lexica.py: Preprocessing step for Stable Diffusion prompts from Lexica - clean up the prompts and put them into a single file.

  • effective_prompts_fs.txt: Effective "prompt-English" to use for few-shot translation from English GPT-3 prompts to prompt-English (1884 tokens).

Multi-processing Multi-GPU Note

Multi-processing is optimized for 2 Titan RTXs, with 24GB RAM each. Changing the number of GPUs to parallelize on is a simple edit in run_two_gpus.sh: just copy the first line and change CUDA_VISIBLE_DEVICES to the appropriate GPU id.

Changing the number of processes for each GPU is an argument that can be passed in through run_two_gpus.sh as -n <num_processes_per_gpu> for each run. This is an int used in run.py. I've found that my GPUs can handle 3, but are happier with 2.

Complete

  • Pipeline of asking GPT3 for image prompts
  • Image prompts to stable diffusion
  • Multiprocessing to max out a single GPU
  • GPU multiprocessing stable diffusion
  • Docx dump of images and image prompts
  • Translation layer between English prompt and "prompt English" (lexica)
  • Flesh out readme
  • Open source

Todo

  • Walkthrough video of code
  • Replace stable_diffusion.py with txt2img.py from CompViz stable-diffusion repo
  • Support for configuring image generation (based on txt2img.py)
  • Support for different content types (fiction/blog post/essay/news article)
  • 'summary+extract' method
  • output to txt
  • output to markdown
  • output to markdown
  • output to html
  • output to latex
  • output to pdf
  • refactor from a sequence of script to a python library

Future

  • Translation from English to 'prompt English' can be improved with: finetuned model with several million data samples (instead of 36)

long_stable_diffusion's People

Contributors

andreykurenkov avatar sharonzhou avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

long_stable_diffusion's Issues

Stable Diffusion Master Tutorials List - Including SDXL 0.9 - 43 Tutorials - Not An Issue Thread

image Hits Twitter Follow Furkan Gözükara

YouTube Channel Patreon Furkan Gözükara LinkedIn

Expert-Level Tutorials on Stable Diffusion: Master Advanced Techniques and Strategies

Greetings everyone. I am Dr. Furkan Gözükara. I am an Assistant Professor in Software Engineering department of a private university (have PhD in Computer Engineering). My professional programming skill is unfortunately C# not Python :)

My linkedin : https://www.linkedin.com/in/furkangozukara

Our channel address if you like to subscribe : https://www.youtube.com/@SECourses

Our discord to get more help : https://discord.com/servers/software-engineering-courses-secourses-772774097734074388

I am keeping this list up-to-date. I got upcoming new awesome video ideas. Trying to find time to do that.

I am open to any criticism you have. I am constantly trying to improve the quality of my tutorial guide videos. Please leave comments with both your suggestions and what you would like to see in future videos.

All videos have manually fixed subtitles and properly prepared video chapters. You can watch with these perfect subtitles or look for the chapters you are interested in.

Since my profession is teaching, I usually do not skip any of the important parts. Therefore, you may find my videos a little bit longer.

Playlist link on YouTube: Stable Diffusion Tutorials, Automatic1111 Web UI & Google Colab Guides, DreamBooth, Textual Inversion / Embedding, LoRA, AI Upscaling, Video to Anime

1.) Automatic1111 Web UI - PC - Free

How To Install Python, Setup Virtual Environment VENV, Set Default Python System Path & Install Git

image

2.) Automatic1111 Web UI - PC - Free

Easiest Way to Install & Run Stable Diffusion Web UI on PC by Using Open Source Automatic Installer

image

3.) Automatic1111 Web UI - PC - Free

How to use Stable Diffusion V2.1 and Different Models in the Web UI - SD 1.5 vs 2.1 vs Anything V3

image

4.) Automatic1111 Web UI - PC - Free

Zero To Hero Stable Diffusion DreamBooth Tutorial By Using Automatic1111 Web UI - Ultra Detailed

image

5.) Automatic1111 Web UI - PC - Free

DreamBooth Got Buffed - 22 January Update - Much Better Success Train Stable Diffusion Models Web UI

image

6.) Automatic1111 Web UI - PC - Free

How to Inject Your Trained Subject e.g. Your Face Into Any Custom Stable Diffusion Model By Web UI

image

7.) Automatic1111 Web UI - PC - Free

How To Do Stable Diffusion LORA Training By Using Web UI On Different Models - Tested SD 1.5, SD 2.1

image

8.) Automatic1111 Web UI - PC - Free

8 GB LoRA Training - Fix CUDA & xformers For DreamBooth and Textual Inversion in Automatic1111 SD UI

image

9.) Automatic1111 Web UI - PC - Free

How To Do Stable Diffusion Textual Inversion (TI) / Text Embeddings By Automatic1111 Web UI Tutorial

image

10.) Automatic1111 Web UI - PC - Free

How To Generate Stunning Epic Text By Stable Diffusion AI - No Photoshop - For Free - Depth-To-Image

image

11.) Python Code - Hugging Face Diffusers Script - PC - Free

How to Run and Convert Stable Diffusion Diffusers (.bin Weights) & Dreambooth Models to CKPT File

image

12.) NMKD Stable Diffusion GUI - Open Source - PC - Free

Forget Photoshop - How To Transform Images With Text Prompts using InstructPix2Pix Model in NMKD GUI

image

13.) Google Colab Free - Cloud - No PC Is Required

Transform Your Selfie into a Stunning AI Avatar with Stable Diffusion - Better than Lensa for Free

image

14.) Google Colab Free - Cloud - No PC Is Required

Stable Diffusion Google Colab, Continue, Directory, Transfer, Clone, Custom Models, CKPT SafeTensors

image

15.) Automatic1111 Web UI - PC - Free

Become A Stable Diffusion Prompt Master By Using DAAM - Attention Heatmap For Each Used Token - Word

image

16.) Python Script - Gradio Based - ControlNet - PC - Free

Transform Your Sketches into Masterpieces with Stable Diffusion ControlNet AI - How To Use Tutorial

image

17.) Automatic1111 Web UI - PC - Free

Sketches into Epic Art with 1 Click: A Guide to Stable Diffusion ControlNet in Automatic1111 Web UI

image

18.) RunPod - Automatic1111 Web UI - Cloud - Paid - No PC Is Required

Ultimate RunPod Tutorial For Stable Diffusion - Automatic1111 - Data Transfers, Extensions, CivitAI

image

19.) RunPod - Automatic1111 Web UI - Cloud - Paid - No PC Is Required

How To Install DreamBooth & Automatic1111 On RunPod & Latest Libraries - 2x Speed Up - cudDNN - CUDA

image

20.) Automatic1111 Web UI - PC - Free

Fantastic New ControlNet OpenPose Editor Extension & Image Mixing - Stable Diffusion Web UI Tutorial

image

21.) Automatic1111 Web UI - PC - Free

Automatic1111 Stable Diffusion DreamBooth Guide: Optimal Classification Images Count Comparison Test

image

22.) Automatic1111 Web UI - PC - Free

Epic Web UI DreamBooth Update - New Best Settings - 10 Stable Diffusion Training Compared on RunPods

image

23.) Automatic1111 Web UI - PC - Free

New Style Transfer Extension, ControlNet of Automatic1111 Stable Diffusion T2I-Adapter Color Control

image

24.) Automatic1111 Web UI - PC - Free

Generate Text Arts & Fantastic Logos By Using ControlNet Stable Diffusion Web UI For Free Tutorial

image

25.) Automatic1111 Web UI - PC - Free

How To Install New DREAMBOOTH & Torch 2 On Automatic1111 Web UI PC For Epic Performance Gains Guide

image

26.) Automatic1111 Web UI - PC - Free

Training Midjourney Level Style And Yourself Into The SD 1.5 Model via DreamBooth Stable Diffusion

image

27.) Automatic1111 Web UI - PC - Free

Video To Anime - Generate An EPIC Animation From Your Phone Recording By Using Stable Diffusion AI

image

28.) Python Script - Jupyter Based - PC - Free

Midjourney Level NEW Open Source Kandinsky 2.1 Beats Stable Diffusion - Installation And Usage Guide

image

29.) Automatic1111 Web UI - PC - Free

RTX 3090 vs RTX 3060 Ultimate Showdown for Stable Diffusion, ML, AI & Video Rendering Performance

image

30.) Kohya Web UI - Automatic1111 Web UI - PC - Free

Generate Studio Quality Realistic Photos By Kohya LoRA Stable Diffusion Training - Full Tutorial

image

31.) Kaggle NoteBook - Free

DeepFloyd IF By Stability AI - Is It Stable Diffusion XL or Version 3? We Review and Show How To Use

image

32.) Python Script - Automatic1111 Web UI - PC - Free

How To Find Best Stable Diffusion Generated Images By Using DeepFace AI - DreamBooth / LoRA Training

image

33.) Kohya Web UI - RunPod - Paid

How To Install And Use Kohya LoRA GUI / Web UI on RunPod IO With Stable Diffusion & Automatic1111

image

34.) PC - Google Colab - Free

Mind-Blowing Deepfake Tutorial: Turn Anyone into Your Favorite Movie Star! PC & Google Colab - roop

image

35.) Automatic1111 Web UI - PC - Free

Stable Diffusion Now Has The Photoshop Generative Fill Feature With ControlNet Extension - Tutorial

image

36.) Automatic1111 Web UI - PC - Free

Human Cropping Script & 4K+ Resolution Class / Reg Images For Stable Diffusion DreamBooth / LoRA

image

37.) Automatic1111 Web UI - PC - Free

Stable Diffusion 2 NEW Image Post Processing Scripts And Best Class / Regularization Images Datasets

image

38.) Automatic1111 Web UI - PC - Free

How To Use Roop DeepFake On RunPod Step By Step Tutorial With Custom Made Auto Installer Script

image

39.) RunPod - Automatic1111 Web UI - Cloud - Paid - No PC Is Required

How To Install DreamBooth & Automatic1111 On RunPod & Latest Libraries - 2x Speed Up - cudDNN - CUDA

image

40.) Automatic1111 Web UI - PC - Free + RunPod

Zero to Hero ControlNet Tutorial: Stable Diffusion Web UI Extension | Complete Feature Guide

image

41.) Automatic1111 Web UI - PC - Free + RunPod

The END of Photography - Use AI to Make Your Own Studio Photos, FREE Via DreamBooth Training

image

42.) Google Colab - Gradio - Free

How To Use Stable Diffusion XL (SDXL 0.9) On Google Colab For Free

image

43.) Local - PC - Free - Gradio

Stable Diffusion XL (SDXL) Locally On Your PC - 8GB VRAM - Easy Tutorial With Automatic Installer

image

Lexica dump files

Thanks for sharing this amazing project.

I was wondering how to download/generate the lexica data dump CSVs that are referenced in your code. I'm also trying to build some projects out that leverage this data, and was wondering what the best way to get samples of this data would be.

Thanks for the great work, and for sharing this with the communty!

ask for advice

Thanks for sharing this amazing project.
How to ensure that the generated character is the same in every frame, such as people or pigs?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.