zydxt / sd-webui-rpg-diffusionmaster Goto Github PK

This project forked from yangling0818/rpg-diffusionmaster

Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs (PRG)

License: GNU Affero General Public License v3.0

Python 100.00%

sd-webui-rpg-diffusionmaster's Introduction

RPG-DiffusionMaster Extension for Stable Diffusion WebUI

This repository hosts an extension for Stable Diffusion WebUI that integrates the functionalities of RPG-DiffusionMaster. It brings additional changes and enhancements, enabling users of WebUI to interact with RPG-DiffusionMaster more seamlessly.

For more information, check the official repo or the following paper:

Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs Authors: Ling Yang, Zhaochen Yu, Chenlin Meng, Minkai Xu, Stefano Ermon, Bin Cui Affiliations: Peking University, Stanford University, Pika Labs

Introduction

Currently in an early phase of development, this extension employs LLMs (such as GPT4, Gemini Pro) for regional planning. It communicates the split ratios and regional prompts generated from LLMs to Regional Prompter for image generation, similar to the official repository.

Installation

Prior to installing this extension, ensure that the Regional Prompter extension is already set up on your system. This extension has not yet been added to the WebUI extensions index, and hence must be installed manually using the URL on the WebUI extension tab.

Usage

Navigate to the txt2img tab.
Choose RPG DiffusionMaster from the Script dropdown menu. _
Select your desired LLM and configure the settings for RPG-DiffusionMaster. _
Press the "Apply to Prompt" button and wait briefly as the extension processes the prompt through the LLM and adjusts the Regional Prompter configurations accordingly.
Review the adjusted settings and the final prompt in the Prompt textbox. You can then modify parameters like image size, CFG Scale, Steps, etc., before generating your images.

To-Do List 💪

Integrate local LLM support.

Differences from the Official Implementation

Adds support for the OpenAI Azure GPT4 Model and Gemini Pro.
Alters the logic to enhance stability when extracting regional prompts.

Acknowledgements

A huge thank you to Ling Yang for the foundational RPG-DiffusionMaster implementation, AUTOMATIC1111, and regional-prompter for their exceptional contributions and codebases.

sd-webui-rpg-diffusionmaster's People

Contributors

Stargazers

Watchers

Forkers

w-e-w betadoggo

sd-webui-rpg-diffusionmaster's Issues

Error loading script, ModuleNotFoundError

Seems like the install needs some love. Here's a traceback:

*** Error loading script: rpg_diffusionmaster.py
    Traceback (most recent call last):
      File "D:\AIA\Tools\SDUI\modules\scripts.py", line 469, in load_scripts
        script_module = script_loading.load_module(scriptfile.path)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "D:\AIA\Tools\SDUI\modules\script_loading.py", line 10, in load_module
        module_spec.loader.exec_module(module)
      File "<frozen importlib._bootstrap_external>", line 940, in exec_module
      File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
      File "D:\AIA\Tools\SDUI\extensions\sd-webui-rpg-diffusionmaster\scripts\rpg_diffusionmaster.py", line 2, in <module>
        from rpg_lib.llm_agents import llm_factory
      File "D:\AIA\Tools\SDUI\extensions\sd-webui-rpg-diffusionmaster\rpg_lib\llm_agents.py", line 7, in <module>
        import google.generativeai as genai
    ModuleNotFoundError: No module named 'google.generativeai'

Please support using local LLM

Hi Zydxt

Please add support using local LLM since using GPT4 api or Gemini is a bit expensive due to their cost, I can't wait to test with local LLM model to generate high res images

Thank you so much

Error loading script: llm.py

I already have regional-prompterI and tried a couple times to install this, but i keep getting this error message:

*** Error loading script: llm.py
Traceback (most recent call last):
File "S:\Ai\stable-diffusion-webui\modules\scripts.py", line 469, in load_scripts
script_module = script_loading.load_module(scriptfile.path)
File "S:\Ai\stable-diffusion-webui\modules\script_loading.py", line 10, in load_module
module_spec.loader.exec_module(module)
File "", line 883, in exec_module
File "", line 241, in _call_with_frames_removed
File "S:\Ai\stable-diffusion-webui\extensions\sd-webui-rpg-diffusionmaster\scripts\llm.py", line 5, in
from scripts.enums import LLMType, PromptVersion
ImportError: cannot import name 'LLMType' from 'scripts.enums' (S:\Ai\stable-diffusion-webui\extensions\sd-webui-controlnet\scripts\enums.py)

*** Error loading script: rpg.py
Traceback (most recent call last):
File "S:\Ai\stable-diffusion-webui\modules\scripts.py", line 469, in load_scripts
script_module = script_loading.load_module(scriptfile.path)
File "S:\Ai\stable-diffusion-webui\modules\script_loading.py", line 10, in load_module
module_spec.loader.exec_module(module)
File "", line 883, in exec_module
File "", line 241, in _call_with_frames_removed
File "S:\Ai\stable-diffusion-webui\extensions\sd-webui-rpg-diffusionmaster\scripts\rpg.py", line 2, in
from scripts.llm import llm_factory
File "S:\Ai\stable-diffusion-webui\extensions\sd-webui-rpg-diffusionmaster\scripts\llm.py", line 5, in
from scripts.enums import LLMType, PromptVersion
ImportError: cannot import name 'LLMType' from 'scripts.enums' (S:\Ai\stable-diffusion-webui\extensions\sd-webui-controlnet\scripts\enums.py)

Install error regarding Azure Openai

Tried to install the extension today and I get this:

*** Error loading script: rpg_diffusionmaster.py
    Traceback (most recent call last):
      File "/home/matt/llm/stable-diffusion/stable-diffusion-webui/modules/scripts.py", line 382, in load_scripts
        script_module = script_loading.load_module(scriptfile.path)
      File "/home/matt/llm/stable-diffusion/stable-diffusion-webui/modules/script_loading.py", line 10, in load_module
        module_spec.loader.exec_module(module)
      File "<frozen importlib._bootstrap_external>", line 883, in exec_module
      File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
      File "/home/matt/llm/stable-diffusion/stable-diffusion-webui/extensions/sd-webui-rpg-diffusionmaster/scripts/rpg_diffusionmaster.py", line 2, in <module>
        from rpg_lib.llm_agents import llm_factory
      File "/home/matt/llm/stable-diffusion/stable-diffusion-webui/extensions/sd-webui-rpg-diffusionmaster/rpg_lib/llm_agents.py", line 4, in <module>
        from openai import AzureOpenAI, OpenAI
    ImportError: cannot import name 'AzureOpenAI' from 'openai' (/home/matt/llm/stable-diffusion/stable-diffusion-webui/venv/lib/python3.10/site-packages/openai/__init__.py)

According to a quick search I can run openai migrate but even doing that didn't remove the error.

Image Fusion

It seems like you only implement the regional prompting, but how to merge different regions into one whole picture? In the original RPG-diffusionmaster repository, it implements or calls such functionality.

Error when installing on A1111

*** Error loading script: rpg_diffusionmaster.py
Traceback (most recent call last):
File "/Users/andu/stable-diffusion-webui/modules/scripts.py", line 508, in load_scripts
script_module = script_loading.load_module(scriptfile.path)
File "/Users/andu/stable-diffusion-webui/modules/script_loading.py", line 13, in load_module
module_spec.loader.exec_module(module)
File "", line 883, in exec_module
File "", line 241, in _call_with_frames_removed
File "/Users/andu/stable-diffusion-webui/extensions/sd-webui-rpg-diffusionmaster/scripts/rpg_diffusionmaster.py", line 2, in
from rpg_lib.llm_agents import llm_factory
File "/Users/andu/stable-diffusion-webui/extensions/sd-webui-rpg-diffusionmaster/rpg_lib/llm_agents.py", line 10, in
from llama_cpp import Llama
ModuleNotFoundError: No module named 'llama_cpp'

My info:

version: v1.9.3-12-gc96025b9 • python: 3.10.12 • torch: 2.1.0 • xformers: N/A • gradio: 3.41.2 • checkpoint: 71776d06b6

Add support for Google Cloud Vertex AI Gemini API in LLM Agent

Do you have a plan to add support for Vertex AI Gemini in llm_agents?

https://cloud.google.com/vertex-ai/generative-ai/docs/migrate/migrate-google-ai

Thank you.

Can you provide more details on how you enhanced stability for extracting regional prompts?

We're planning to use RPG in our own project as well, and if yours is a fix to their code - and it's not merged with their branch - we'd like to understand what the issue was, and how you fixed it, so we can apply the same patch on their code too.

Thanks!

ValueError: Returned dictionary included some keys as Components. Either all keys must be Components to assign Component values, or return a List of values to assign output values in order.

[LOG_END]
Traceback (most recent call last):
File "/home/ubuntu/sd/stable-diffusion-webui/venv/lib/python3.10/site-packages/gradio/routes.py", line 488, in run_predict
output = await app.get_blocks().process_api(
File "/home/ubuntu/sd/stable-diffusion-webui/venv/lib/python3.10/site-packages/gradio/blocks.py", line 1434, in process_api
data = self.postprocess_data(fn_index, result["prediction"], state)
File "/home/ubuntu/sd/stable-diffusion-webui/venv/lib/python3.10/site-packages/gradio/blocks.py", line 1288, in postprocess_data
predictions = convert_component_dict_to_list(
File "/home/ubuntu/sd/stable-diffusion-webui/venv/lib/python3.10/site-packages/gradio/blocks.py", line 485, in convert_component_dict_to_list
raise ValueError(
ValueError: Returned dictionary included some keys as Components. Either all keys must be Components to assign Component values, or return a List of values to assign output values in order.
Traceback (most recent call last):
File "/home/ubuntu/sd/stable-diffusion-webui/venv/lib/python3.10/site-packages/gradio/routes.py", line 488, in run_predict
output = await app.get_blocks().process_api(
File "/home/ubuntu/sd/stable-diffusion-webui/venv/lib/python3.10/site-packages/gradio/blocks.py", line 1434, in process_api
data = self.postprocess_data(fn_index, result["prediction"], state)
File "/home/ubuntu/sd/stable-diffusion-webui/venv/lib/python3.10/site-packages/gradio/blocks.py", line 1288, in postprocess_data
predictions = convert_component_dict_to_list(
File "/home/ubuntu/sd/stable-diffusion-webui/venv/lib/python3.10/site-packages/gradio/blocks.py", line 485, in convert_component_dict_to_list
raise ValueError(
ValueError: Returned dictionary included some keys as Components. Either all keys must be Components to assign Component values, or return a List of values to assign output values in order.

Use common prompt

First of all, this is a really fun extension. Thank you for making this. I see some issues with how the prompts are created though.

Just looking at the structure of the final prompt coming from the LLM it looks like "Use common prompt" should be checked inside RP. Because I seem to consistently get 1 too many prompt regions compared to RP regions, and the first one is usually just the original prompt that I put into the LLM. Furthermore, it often gets the regions in the wrong order, which might be improved by giving more examples to the LLM.

For instance, this is my input prompt. I deliberately describe things in the wrong order:
"A photo of inside of a rustic cabin, the afternoon sun shines through a window on the left, a cat sits on the carpet below the window, to the right there is a cozy fireplace and a portrait of a lady above it"

This is the RP layout. Simple 4 quadrants, as expected:
"1,1,1; 1,1,1"

This is the output prompt:
"A photo of inside of a rustic cabin, the afternoon sun shines through a window on the left, a cat sits on the carpet below the window, to the right there is a cozy fireplace and a portrait of a lady above it BREAK
Afternoon sun piercing through the rustic casement window, casting a warm, golden glow across the aged wooden walls. BREAK
A fluffy cat, its fur softly reflecting the sunlight, seated serenely on a well-worn, braided carpet below the window. BREAK
A cozy fireplace with logs glowing, framed by a rugged stone mantle, exuding warmth and comfort. BREAK
An antique portrait of a lady with delicate features, framed above the fireplace, adding a touch of historical elegance."

First, I need to manually turn on "Use common prompt" because the first region is just the original prompt. Then I need to reorder them so that it is: common, window, portrait, cat, fireplace. After I do that, I get a nice resulting image that matches the original prompt.

Any plans for a comfy node?

Thanks for the great work.. are you guys planning on creating a comfy node for this? A lot of the workflows (and art in general) will be significantly improved.

KoboldCPP

You only have gemini and gpt as options, can you add support for koboldcpp instead so we can use free options instead of paid with this?

Gemini Pro API sometimes gives errors due to strict safety settings

By default, the Gemini Pro API has strict safety settings which will block all questionable prompts and give an error before images begin generating. This cannot be disabled on the user end and requires new code, specifically a "BLOCK_NONE" threshold for all 4 safety_ratings category arguments. I don't know how to create Python code myself.

Please refer to the official documentation regarding this issue.

https://ai.google.dev/tutorials/python_quickstart#safety_settings

Full document with code examples at the bottom: https://ai.google.dev/docs/safety_setting_gemini

Note that it states, "Adjusting to lower safety settings will trigger a more indepth review process of your application." I'm not certain what this means.

Gemini Pro API is a great alternative to local LLMs because it allows for 60 queries per minute, i.e. 1 query per second, for free to every user.