varunshenoy / opendream Goto Github PK

An extensible, easy-to-use, and portable diffusion web UI 👨‍🎨

License: MIT License

Python 37.11% Shell 0.35% HTML 2.69% CSS 1.55% JavaScript 58.29%

ai automatic-1111 diffusion image-generation stable-diffusion

opendream's Introduction

Opendream: A Web UI For the Rest of Us 💭 🎨

Opendream brings much needed and familiar features, such as layering, non-destructive editing, portability, and easy-to-write extensions, to your Stable Diffusion workflows. Check out our demo video.

Getting started

Prerequisites: Make sure you have Node installed. You can download it here.
Clone this repository.
Navigate to this project within your terminal and run sh ./run_opendream.sh. After ~30 seconds, both the frontend and backend of the Opendream system should be up and running.

Features

Diffusion models have emerged as powerful tools in the world of image generation and manipulation. While they offer significant benefits, these models are often considered black boxes due to their inherent complexity. The current diffusion image generation ecosystem is defined by tools that allow one-off image manipulation tasks to control these models - text2img, in-painting, pix2pix, among others.

For example, popular interfaces like Automatic1111, Midjourney, and Stability.AI's DreamStudio only support destructive editing: each edit "consumes" the previous image. This means users cannot easily build off of previous images or run multiple experiments on the same image, limiting their options for creative exploration.

Layering and Non-destructive Editing

Non-destructive editing is a method of image manipulation that preserves the original image data while allowing users to make adjustments and modifications without overwriting previous work. This approach facilitates experimentation and provides more control over the editing process by using layers and masks. When you delete a layer, all layers after it also get deleted. This guarantees that all layers currently on the canvas are a product of other existing layers. This also allows one to deterministically "replay" a workflow.

Like Photoshop, Opendream supports non-destructive editing out of the box. Learn more about the principles of non-destructive editing in Photoshop here.

Save and Share Workflows

Users can also save their current workflows into a portable file format that can be opened up at a later time or shared with collaborators. In this context, a "state" is just a JSON file describing all of the current layers and how they were created.

Support Simple to Write, Easy to Install Extensions

As the open-source ecosystem flourishes around these models and tools, extensibility has also become a major concern. While Automatic1111 does offer extensions, they are often difficult to program, use, and install. It is far from being as full-featured as an application like Adobe Photoshop.

As new features for Stable Diffusion, like ControlNet, are released, users should be able to seamlessly integrate them into their artistic workflows with minimal overload and time.

Opendream makes writing and using new diffusion features as simple as writing a Python function. Keep reading to learn how.

Extensions

From the get-go, Opendream supports two key primitive operations baked into the core system: dream and mask_and_inpaint. In this repository, extensions for instruct_pix2pix, controlnet_canny, controlnet_openpose, and sam (Segment Anything) are provided.

Any image manipulation logic can be easily written as an extension. With extensions, you can also decide how certain operations work. For example, you can override the dream operation to use OpenAI's DALL-E instead or call a serverless endpoint on a service like AWS or Replicate. Here's an example using Baseten.

Loading an Existing Extension

There are two ways to load extensions.

Install a pre-written one through the Web UI.
(Manual) Download a valid extension file (or write one yourself!) and add it to the opendream/extensions folder. Instructions for writing your own extension are below.

Here is a sampling of currently supported extensions. You can use the links to install any given extension through the Web UI.

Extension	Link
OpenAI's DALL-E	File
Serverless Stable Diffusion	File
Instruct Pix2Pix	File
ControlNet Canny	File
ControlNet Openpose	File
Segment Anything	File
PhotoshopGPT	Gist

Note that extensions may have their own requirements you would need to include in the requirements.txt file. For example, you would need to add openai if you want to use the DALL-E extension.

Feel free to make a PR if you create a useful extension!

Writing Your Own Extension

Users can write their own extensions as follows:

Create a new Python file in the opendream/extensions folder.
Write a method with type hints and a @opendream.define_op decorator. This decorator registers this method with the Opendream backend.

The method has a few requirements:

Parameters must have type hints. These enable the backend to generate a schema for the input which is parsed into form components on the frontend. Valid types include: str, int, float, Layer, MaskLayer, or ImageLayer.
The only valid return types are a Layer or a list of Layer objects.

Contributions and Licensing

Opendream was built by Varun Shenoy, Eric Lou, Shashank Rammoorthy, and Rahul Shiv as a part of Stanford's CS 348K.

Feel free to provide any contributions you deem necessary or useful. This project is licensed under the MIT License.

opendream's People

Contributors

Stargazers

Watchers

opendream's Issues

Dimensions shouldn't be fixed

There's a lot of places in the code where we implicitly/explicitly support only 512 x 512 images. If we want to add tools like "generative fill", we need to drop this dependency.

Mikubill/sd-webui-controlnet#1464

Add object segmentation network

Be able to examine params of layers and edit them

Should be able to see prompt etc and experiment with slight changes.

Options should be to "create new layer with changes" or "update layer"

Export

should allow for options to download in various formats/resolutions or maybe simple stub/default for now

Create an extension for LoRAs

Create a simple Python extension file that can handle Stable Diffusion with LoRAs. It might make sense to create a separate command (not dream).

Add button to download a specific image in UI

Multiple layers output

operators should return multiple layers by default (not just one)

have UI update when you set layers to be visible or not

masking in frontend

how do we create a mask? follow user mouse? etc

Import workflow JSON should have filepicker pop up

Docs

add a more comprehensive README (outlining design decisions)

Run jobs in parallel

Once a task is given, the modal should disappear and push the job onto some queue on the backend. This way you can launch a bunch of diffuse tasks at once.

Validate form before submitting

Should not be sending empty strings

Edit layers and check for dependencies

In case you create a mask layer that you don't like, for instance

Create one layer per output image

Use case: network produces several images. Code currently places all images in one layer. We don't want this.

How to add checkpoint models?

How can I load local models/extensions easily like webuiautomatic1111, the workflow is great, but when I start and run, I can use very little, is there any tutorial, thank you！

Missing requirements

Should be written this requires npm.

Might be missing other requirements? I'm still trying to get this running and hitting issues where it seems like no extensions are loaded.

Move layers on canvas, add z-indexing

SAM allow for variable number of masks

and SAM allow for semantic masks

How to get it working in ec2?

I am beginner in this AI world, trying to get this UI running in ec2 instance but having some difficulties getting it to work. I have managed to launch the UI using sh run_opendream.sh and get the UI loading in port 3000. However, I cannot get anything else working other than loading the UI. Here is the error shot from the inspect element, it seems to point something at port 8000. How do I get this working in ec2?

Support some upscalers

allow for deleting layers on UI

Workflow demo

Hello. I just came across this project and it seems interesting.

Perhaps a simple "hello world" style workflow demo would be helpful. You know showing something like a screencast of running the app, installing a new model, a new extension, and showing how to update if possible. Also it would be helpful to also show what the app is capable of. Reading around the repo, I'm not sure how the app works, or how to install models, if that is at all possible to begin with.

I realize most projects have limited manpower to do multiple things, but demos and plenty of docs are crucial to encourage people to try things out.

Anyway,
Best of luck

"'str' object has no attribute 'get_image'"

Attempted to add an instruct_pix2pix layer using the default install.

Nothing in the logfile other than the above error.

INFO:     127.0.0.1:61305 - "GET /schema/instruct_pix2pix HTTP/1.1" 200 OK
INFO:     127.0.0.1:61305 - "POST /operation/instruct_pix2pix/ HTTP/1.1" 307 Temporary Redirect
'str' object has no attribute 'get_image'
INFO:     127.0.0.1:61305 - "POST /operation/instruct_pix2pix HTTP/1.1" 500 Internal Server Error

Mac M2, latest main commit.

workflow.json: https://gist.github.com/mmastrac/15d1f94100a8b312fb96d0dae6ca063e

What more logs or other details can I attach to help debug it?

I'd love a colab notebook for this as well.

Thanks!

SAM error with bad dims

generating mask for (758, 876, 4)
set_torch_image input must be BCHW with long side 1024.

with body.png. test.png works fine.