Giter Club home page Giter Club logo

sd-webui-rpg-diffusionmaster's Introduction

RPG-DiffusionMaster Extension for Stable Diffusion WebUI

This repository hosts an extension for Stable Diffusion WebUI that integrates the functionalities of RPG-DiffusionMaster. It brings additional changes and enhancements, enabling users of WebUI to interact with RPG-DiffusionMaster more seamlessly.

For more information, check the official repo or the following paper:

Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs Authors: Ling Yang, Zhaochen Yu, Chenlin Meng, Minkai Xu, Stefano Ermon, Bin Cui Affiliations: Peking University, Stanford University, Pika Labs

Introduction

Currently in an early phase of development, this extension employs LLMs (such as GPT4, Gemini Pro) for regional planning. It communicates the split ratios and regional prompts generated from LLMs to Regional Prompter for image generation, similar to the official repository.

Installation

Prior to installing this extension, ensure that the Regional Prompter extension is already set up on your system. This extension has not yet been added to the WebUI extensions index, and hence must be installed manually using the URL on the WebUI extension tab. installation

Usage

  1. Navigate to the txt2img tab.
  2. Choose RPG DiffusionMaster from the Script dropdown menu. dropdown_
  3. Select your desired LLM and configure the settings for RPG-DiffusionMaster. config_
  4. Press the "Apply to Prompt" button and wait briefly as the extension processes the prompt through the LLM and adjusts the Regional Prompter configurations accordingly.
  5. Review the adjusted settings and the final prompt in the Prompt textbox. You can then modify parameters like image size, CFG Scale, Steps, etc., before generating your images.

To-Do List ๐Ÿ’ช

  • Integrate local LLM support.

Differences from the Official Implementation

  • Adds support for the OpenAI Azure GPT4 Model and Gemini Pro.
  • Alters the logic to enhance stability when extracting regional prompts.

Acknowledgements

A huge thank you to Ling Yang for the foundational RPG-DiffusionMaster implementation, AUTOMATIC1111, and regional-prompter for their exceptional contributions and codebases.

sd-webui-rpg-diffusionmaster's People

Contributors

bitcodingwalkin avatar zydxt avatar yangling0818 avatar betadoggo avatar w-e-w avatar

Stargazers

Andu Potorac avatar  avatar Luckyboys avatar machina avatar  avatar  avatar Ismail Habib Muhammad avatar  avatar  avatar  avatar  avatar  avatar  avatar I.N.K.Y avatar ๅญ้พ™ avatar Straughter "BatmanOsama" Guthrie avatar yangyangzhao avatar  avatar  avatar  avatar  avatar Charlie Cortial avatar  avatar Wilson avatar Kapital avatar rohitanshu avatar Adam Menges avatar  avatar  avatar  avatar Cesarkon avatar puy avatar normanj avatar Mike Brave avatar  avatar syddharth avatar Wonkyung Lee avatar Sorath avatar WIl avatar  avatar  avatar Koolen Dasheppi avatar  avatar E2GO avatar  avatar  avatar FE avatar  avatar Keno Medenbach avatar  avatar  avatar  avatar Harsha B Subramanyam avatar

Watchers

 avatar  avatar  avatar  avatar

Forkers

w-e-w betadoggo

sd-webui-rpg-diffusionmaster's Issues

Error loading script, ModuleNotFoundError

Seems like the install needs some love. Here's a traceback:

*** Error loading script: rpg_diffusionmaster.py
    Traceback (most recent call last):
      File "D:\AIA\Tools\SDUI\modules\scripts.py", line 469, in load_scripts
        script_module = script_loading.load_module(scriptfile.path)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "D:\AIA\Tools\SDUI\modules\script_loading.py", line 10, in load_module
        module_spec.loader.exec_module(module)
      File "<frozen importlib._bootstrap_external>", line 940, in exec_module
      File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
      File "D:\AIA\Tools\SDUI\extensions\sd-webui-rpg-diffusionmaster\scripts\rpg_diffusionmaster.py", line 2, in <module>
        from rpg_lib.llm_agents import llm_factory
      File "D:\AIA\Tools\SDUI\extensions\sd-webui-rpg-diffusionmaster\rpg_lib\llm_agents.py", line 7, in <module>
        import google.generativeai as genai
    ModuleNotFoundError: No module named 'google.generativeai'

Please support using local LLM

Hi Zydxt

Please add support using local LLM since using GPT4 api or Gemini is a bit expensive due to their cost, I can't wait to test with local LLM model to generate high res images

Thank you so much

Error loading script: llm.py

I already have regional-prompterI and tried a couple times to install this, but i keep getting this error message:

*** Error loading script: llm.py
Traceback (most recent call last):
File "S:\Ai\stable-diffusion-webui\modules\scripts.py", line 469, in load_scripts
script_module = script_loading.load_module(scriptfile.path)
File "S:\Ai\stable-diffusion-webui\modules\script_loading.py", line 10, in load_module
module_spec.loader.exec_module(module)
File "", line 883, in exec_module
File "", line 241, in _call_with_frames_removed
File "S:\Ai\stable-diffusion-webui\extensions\sd-webui-rpg-diffusionmaster\scripts\llm.py", line 5, in
from scripts.enums import LLMType, PromptVersion
ImportError: cannot import name 'LLMType' from 'scripts.enums' (S:\Ai\stable-diffusion-webui\extensions\sd-webui-controlnet\scripts\enums.py)


*** Error loading script: rpg.py
Traceback (most recent call last):
File "S:\Ai\stable-diffusion-webui\modules\scripts.py", line 469, in load_scripts
script_module = script_loading.load_module(scriptfile.path)
File "S:\Ai\stable-diffusion-webui\modules\script_loading.py", line 10, in load_module
module_spec.loader.exec_module(module)
File "", line 883, in exec_module
File "", line 241, in _call_with_frames_removed
File "S:\Ai\stable-diffusion-webui\extensions\sd-webui-rpg-diffusionmaster\scripts\rpg.py", line 2, in
from scripts.llm import llm_factory
File "S:\Ai\stable-diffusion-webui\extensions\sd-webui-rpg-diffusionmaster\scripts\llm.py", line 5, in
from scripts.enums import LLMType, PromptVersion
ImportError: cannot import name 'LLMType' from 'scripts.enums' (S:\Ai\stable-diffusion-webui\extensions\sd-webui-controlnet\scripts\enums.py)

Install error regarding Azure Openai

Tried to install the extension today and I get this:

*** Error loading script: rpg_diffusionmaster.py
    Traceback (most recent call last):
      File "/home/matt/llm/stable-diffusion/stable-diffusion-webui/modules/scripts.py", line 382, in load_scripts
        script_module = script_loading.load_module(scriptfile.path)
      File "/home/matt/llm/stable-diffusion/stable-diffusion-webui/modules/script_loading.py", line 10, in load_module
        module_spec.loader.exec_module(module)
      File "<frozen importlib._bootstrap_external>", line 883, in exec_module
      File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
      File "/home/matt/llm/stable-diffusion/stable-diffusion-webui/extensions/sd-webui-rpg-diffusionmaster/scripts/rpg_diffusionmaster.py", line 2, in <module>
        from rpg_lib.llm_agents import llm_factory
      File "/home/matt/llm/stable-diffusion/stable-diffusion-webui/extensions/sd-webui-rpg-diffusionmaster/rpg_lib/llm_agents.py", line 4, in <module>
        from openai import AzureOpenAI, OpenAI
    ImportError: cannot import name 'AzureOpenAI' from 'openai' (/home/matt/llm/stable-diffusion/stable-diffusion-webui/venv/lib/python3.10/site-packages/openai/__init__.py)
   

According to a quick search I can run openai migrate but even doing that didn't remove the error.

Image Fusion

It seems like you only implement the regional prompting, but how to merge different regions into one whole picture? In the original RPG-diffusionmaster repository, it implements or calls such functionality.

Error when installing on A1111

*** Error loading script: rpg_diffusionmaster.py
Traceback (most recent call last):
File "/Users/andu/stable-diffusion-webui/modules/scripts.py", line 508, in load_scripts
script_module = script_loading.load_module(scriptfile.path)
File "/Users/andu/stable-diffusion-webui/modules/script_loading.py", line 13, in load_module
module_spec.loader.exec_module(module)
File "", line 883, in exec_module
File "", line 241, in _call_with_frames_removed
File "/Users/andu/stable-diffusion-webui/extensions/sd-webui-rpg-diffusionmaster/scripts/rpg_diffusionmaster.py", line 2, in
from rpg_lib.llm_agents import llm_factory
File "/Users/andu/stable-diffusion-webui/extensions/sd-webui-rpg-diffusionmaster/rpg_lib/llm_agents.py", line 10, in
from llama_cpp import Llama
ModuleNotFoundError: No module named 'llama_cpp'

My info:

version: v1.9.3-12-gc96025b9 โ€€โ€ขโ€€ python: 3.10.12 โ€€โ€ขโ€€ torch: 2.1.0 โ€€โ€ขโ€€ xformers: N/A โ€€โ€ขโ€€ gradio: 3.41.2 โ€€โ€ขโ€€ checkpoint: 71776d06b6

ValueError: Returned dictionary included some keys as Components. Either all keys must be Components to assign Component values, or return a List of values to assign output values in order.

[LOG_END]
Traceback (most recent call last):
File "/home/ubuntu/sd/stable-diffusion-webui/venv/lib/python3.10/site-packages/gradio/routes.py", line 488, in run_predict
output = await app.get_blocks().process_api(
File "/home/ubuntu/sd/stable-diffusion-webui/venv/lib/python3.10/site-packages/gradio/blocks.py", line 1434, in process_api
data = self.postprocess_data(fn_index, result["prediction"], state)
File "/home/ubuntu/sd/stable-diffusion-webui/venv/lib/python3.10/site-packages/gradio/blocks.py", line 1288, in postprocess_data
predictions = convert_component_dict_to_list(
File "/home/ubuntu/sd/stable-diffusion-webui/venv/lib/python3.10/site-packages/gradio/blocks.py", line 485, in convert_component_dict_to_list
raise ValueError(
ValueError: Returned dictionary included some keys as Components. Either all keys must be Components to assign Component values, or return a List of values to assign output values in order.
Traceback (most recent call last):
File "/home/ubuntu/sd/stable-diffusion-webui/venv/lib/python3.10/site-packages/gradio/routes.py", line 488, in run_predict
output = await app.get_blocks().process_api(
File "/home/ubuntu/sd/stable-diffusion-webui/venv/lib/python3.10/site-packages/gradio/blocks.py", line 1434, in process_api
data = self.postprocess_data(fn_index, result["prediction"], state)
File "/home/ubuntu/sd/stable-diffusion-webui/venv/lib/python3.10/site-packages/gradio/blocks.py", line 1288, in postprocess_data
predictions = convert_component_dict_to_list(
File "/home/ubuntu/sd/stable-diffusion-webui/venv/lib/python3.10/site-packages/gradio/blocks.py", line 485, in convert_component_dict_to_list
raise ValueError(
ValueError: Returned dictionary included some keys as Components. Either all keys must be Components to assign Component values, or return a List of values to assign output values in order.

Use common prompt

First of all, this is a really fun extension. Thank you for making this. I see some issues with how the prompts are created though.

Just looking at the structure of the final prompt coming from the LLM it looks like "Use common prompt" should be checked inside RP. Because I seem to consistently get 1 too many prompt regions compared to RP regions, and the first one is usually just the original prompt that I put into the LLM. Furthermore, it often gets the regions in the wrong order, which might be improved by giving more examples to the LLM.

For instance, this is my input prompt. I deliberately describe things in the wrong order:
"A photo of inside of a rustic cabin, the afternoon sun shines through a window on the left, a cat sits on the carpet below the window, to the right there is a cozy fireplace and a portrait of a lady above it"

This is the RP layout. Simple 4 quadrants, as expected:
"1,1,1; 1,1,1"

This is the output prompt:
"A photo of inside of a rustic cabin, the afternoon sun shines through a window on the left, a cat sits on the carpet below the window, to the right there is a cozy fireplace and a portrait of a lady above it BREAK
Afternoon sun piercing through the rustic casement window, casting a warm, golden glow across the aged wooden walls. BREAK
A fluffy cat, its fur softly reflecting the sunlight, seated serenely on a well-worn, braided carpet below the window. BREAK
A cozy fireplace with logs glowing, framed by a rugged stone mantle, exuding warmth and comfort. BREAK
An antique portrait of a lady with delicate features, framed above the fireplace, adding a touch of historical elegance."

First, I need to manually turn on "Use common prompt" because the first region is just the original prompt. Then I need to reorder them so that it is: common, window, portrait, cat, fireplace. After I do that, I get a nice resulting image that matches the original prompt.

00043-1290079789

Any plans for a comfy node?

Thanks for the great work.. are you guys planning on creating a comfy node for this? A lot of the workflows (and art in general) will be significantly improved.

KoboldCPP

You only have gemini and gpt as options, can you add support for koboldcpp instead so we can use free options instead of paid with this?

Gemini Pro API sometimes gives errors due to strict safety settings

By default, the Gemini Pro API has strict safety settings which will block all questionable prompts and give an error before images begin generating. This cannot be disabled on the user end and requires new code, specifically a "BLOCK_NONE" threshold for all 4 safety_ratings category arguments. I don't know how to create Python code myself.

Please refer to the official documentation regarding this issue.

https://ai.google.dev/tutorials/python_quickstart#safety_settings

Full document with code examples at the bottom: https://ai.google.dev/docs/safety_setting_gemini

Note that it states, "Adjusting to lower safety settings will trigger a more indepth review process of your application." I'm not certain what this means.

Gemini Pro API is a great alternative to local LLMs because it allows for 60 queries per minute, i.e. 1 query per second, for free to every user.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.