timothybrooks / instruct-pix2pix Goto Github PK

License: Other

Python 99.29% Shell 0.71%

instruct-pix2pix's Introduction

InstructPix2Pix: Learning to Follow Image Editing Instructions

Project Page | Paper | Data

PyTorch implementation of InstructPix2Pix, an instruction-based image editing model, based on the original CompVis/stable_diffusion repo.

InstructPix2Pix: Learning to Follow Image Editing Instructions
Tim Brooks*, Aleksander Holynski*, Alexei A. Efros
UC Berkeley
*denotes equal contribution

TL;DR: quickstart

Follow the instructions below to download and run InstructPix2Pix on your own images. These instructions have been tested on a GPU with >18GB VRAM. If you don't have a GPU, you may need to change the default configuration, or check out other ways of using the model.

Set up a conda environment, and download a pretrained model:

conda env create -f environment.yaml
conda activate ip2p
bash scripts/download_checkpoints.sh

Edit a single image:

python edit_cli.py --input imgs/example.jpg --output imgs/output.jpg --edit "turn him into a cyborg"

# Optionally, you can specify parameters to tune your result:
# python edit_cli.py --steps 100 --resolution 512 --seed 1371 --cfg-text 7.5 --cfg-image 1.2 --input imgs/example.jpg --output imgs/output.jpg --edit "turn him into a cyborg"

Or launch your own interactive editing Gradio app:

python edit_app.py

(For advice on how to get the best results by tuning parameters, see the Tips section).

Setup

Install all dependencies with:

conda env create -f environment.yaml

Download the pretrained models by running:

bash scripts/download_checkpoints.sh

Generated Dataset

Our image editing model is trained on a generated dataset consisting of 454,445 examples. Each example contains (1) an input image, (2) an editing instruction, and (3) an output edited image. We provide two versions of the dataset, one in which each pair of edited images is generated 100 times, and the best examples are chosen based on CLIP metrics (Section 3.1.2 in the paper) (clip-filtered-dataset), and one in which examples are randomly chosen (random-sample-dataset).

For the released version of this dataset, we've additionally filtered prompts and images for NSFW content. After NSFW filtering, the GPT-3 generated dataset contains 451,990 examples. The final image-pair datasets contain:

	# of image editing examples	Dataset size
`random-sample-dataset`	451990	727GB
`clip-filtered-dataset`	313010	436GB

To download one of these datasets, along with the entire NSFW-filtered text data, run the following command with the appropriate dataset name:

bash scripts/download_data.sh clip-filtered-dataset

Training InstructPix2Pix

InstructPix2Pix is trained by fine-tuning from an initial StableDiffusion checkpoint. The first step is to download a Stable Diffusion checkpoint. For our trained models, we used the v1.5 checkpoint as the starting point. To download the same ones we used, you can run the following script:

bash scripts/download_pretrained_sd.sh

If you'd like to use a different checkpoint, point to it in the config file configs/train.yaml, on line 8, after ckpt_path:.

Next, we need to change the config to point to our downloaded (or generated) dataset. If you're using the clip-filtered-dataset from above, you can skip this. Otherwise, you may need to edit lines 85 and 94 of the config (data.params.train.params.path, data.params.validation.params.path).

Finally, start a training job with the following command:

python main.py --name default --base configs/train.yaml --train --gpus 0,1,2,3,4,5,6,7

Creating your own dataset

Our generated dataset of paired images and editing instructions is made in two phases: First, we use GPT-3 to generate text triplets: (a) a caption describing an image, (b) an edit instruction, (c) a caption describing the image after the edit. Then, we turn pairs of captions (before/after the edit) into pairs of images using Stable Diffusion and Prompt-to-Prompt.

(1) Generate a dataset of captions and instructions

We provide our generated dataset of captions and edit instructions here. If you plan to use our captions+instructions, skip to step (2). Otherwise, if you would like to create your own text dataset, please follow steps (1.1-1.3) below. Note that generating very large datasets using GPT-3 can be expensive.

(1.1) Manually write a dataset of instructions and captions

The first step of the process is fine-tuning GPT-3. To do this, we made a dataset of 700 examples broadly covering of edits that we might want our model to be able to perform. Our examples are available here. These should be diverse and cover a wide range of possible captions and types of edits. Ideally, they should avoid duplication or significant overlap of captions and instructions. It is also important to be mindful of limitations of Stable Diffusion and Prompt-to-Prompt in writing these examples, such as inability to perform large spatial transformations (e.g., moving the camera, zooming in, swapping object locations).

Input prompts should closely match the distribution of input prompts used to generate the larger dataset. We sampled the 700 input prompts from the LAION Improved Aesthetics 6.5+ dataset and also use this dataset for generating examples. We found this dataset is quite noisy (many of the captions are overly long and contain irrelevant text). For this reason, we also considered MSCOCO and LAION-COCO datasets, but ultimately chose LAION Improved Aesthetics 6.5+ due to its diversity of content, proper nouns, and artistic mediums. If you choose to use another dataset or combination of datasets as input to GPT-3 when generating examples, we recommend you sample the input prompts from the same distribution when manually writing training examples.

(1.2) Finetune GPT-3

The next step is to finetune a large language model on the manually written instructions/outputs to generate edit instructions and edited caption from a new input caption. For this, we finetune GPT-3's Davinci model via the OpenAI API, although other language models could be used.

To prepare training data for GPT-3, one must first create an OpenAI developer account to access the needed APIs, and set up the API keys on your local device. Also, run the prompts/prepare_for_gpt.py script, which forms the prompts into the correct format by concatenating instructions and captions and adding delimiters and stop sequences.

python dataset_creation/prepare_for_gpt.py --input-path data/human-written-prompts.jsonl --output-path data/human-written-prompts-for-gpt.jsonl

Next, finetune GPT-3 via the OpenAI CLI. We provide an example below, although please refer to OpenAI's official documentation for this, as best practices may change. We trained the Davinci model for a single epoch. You can experiment with smaller less expensive GPT-3 variants or with open source language models, although this may negatively affect performance.

openai api fine_tunes.create -t data/human-written-prompts-for-gpt.jsonl -m davinci --n_epochs 1 --suffix "instruct-pix2pix"

You can test out the finetuned GPT-3 model by launching the provided Gradio app:

python prompt_app.py --openai-api-key OPENAI_KEY --openai-model OPENAI_MODEL_NAME

(1.3) Generate a large dataset of captions and instructions

We now use the finetuned GPT-3 model to generate a large dataset. Our dataset cost thousands of dollars to create. See prompts/gen_instructions_and_captions.py for the script which generates these examples. We recommend first generating a small number of examples (by setting a low value of --num-samples) and gradually increasing the scale to ensure the results are working as desired before increasing scale.

python dataset_creation/generate_txt_dataset.py --openai-api-key OPENAI_KEY --openai-model OPENAI_MODEL_NAME

If you are generating at a very large scale (e.g., 100K+), it will be noteably faster to generate the dataset with multiple processes running in parallel. This can be accomplished by setting --partitions=N to a higher number and running multiple processes, setting each --partition to the corresponding value.

python dataset_creation/generate_txt_dataset.py --openai-api-key OPENAI_KEY --openai-model OPENAI_MODEL_NAME --partitions=10 --partition=0

(2) Turn paired captions into paired images

The next step is to turn pairs of text captions into pairs of images. For this, we need to copy some pre-trained Stable Diffusion checkpoints to stable_diffusion/models/ldm/stable-diffusion-v1/. You may have already done this if you followed the instructions above for training with our provided data, but if not, you can do this by running:

bash scripts/download_pretrained_sd.sh

For our model, we used checkpoint v1.5, and the new autoencoder, but other models may work as well. If you choose to use other models, make sure to change point to the corresponding checkpoints by passing in the --ckpt and --vae-ckpt arguments. Once all checkpoints have been downloaded, we can generate the dataset with the following command:

python dataset_creation/generate_img_dataset.py --out_dir data/instruct-pix2pix-dataset-000 --prompts_file path/to/generated_prompts.jsonl

This command operates on a single GPU (typically a V100 or A100). To parallelize over many GPUs/machines, set --n-partitions to the total number of parallel jobs and --partition to the index of each job.

python dataset_creation/generate_img_dataset.py --out_dir data/instruct-pix2pix-dataset-000 --prompts_file path/to/generated_prompts.jsonl --n-partitions 100 --partition 0

The default parameters match that of our dataset, although in practice you can use a smaller number of steps (e.g., --steps=25) to generate high quality data faster. By default, we generate 100 samples per prompt and use CLIP filtering to keep a max of 4 per prompt. You can experiment with fewer samples by setting --n-samples. The command below turns off CLIP filtering entirely and is therefore faster:

python dataset_creation/generate_img_dataset.py --out_dir data/instruct-pix2pix-dataset-000 --prompts_file path/to/generated_prompts.jsonl --n-samples 4 --clip-threshold 0 --clip-dir-threshold 0 --clip-img-threshold 0 --n-partitions 100 --partition 0

After generating all of the dataset examples, run the following command below to create a list of the examples. This is needed for the dataset onject to efficiently be able to sample examples without needing to iterate over the entire dataset directory at the start of each training run.

python dataset_creation/prepare_dataset.py data/instruct-pix2pix-dataset-000

Evaluation

To generate plots like the ones in Figures 8 and 10 in the paper, run the following command:

python metrics/compute_metrics.py --ckpt /path/to/your/model.ckpt

Tips

If you're not getting the quality result you want, there may be a few reasons:

Is the image not changing enough? Your Image CFG weight may be too high. This value dictates how similar the output should be to the input. It's possible your edit requires larger changes from the original image, and your Image CFG weight isn't allowing that. Alternatively, your Text CFG weight may be too low. This value dictates how much to listen to the text instruction. The default Image CFG of 1.5 and Text CFG of 7.5 are a good starting point, but aren't necessarily optimal for each edit. Try:
- Decreasing the Image CFG weight, or
- Increasing the Text CFG weight, or
Conversely, is the image changing too much, such that the details in the original image aren't preserved? Try:
- Increasing the Image CFG weight, or
- Decreasing the Text CFG weight
Try generating results with different random seeds by setting "Randomize Seed" and running generation multiple times. You can also try setting "Randomize CFG" to sample new Text CFG and Image CFG values each time.
Rephrasing the instruction sometimes improves results (e.g., "turn him into a dog" vs. "make him a dog" vs. "as a dog").
Increasing the number of steps sometimes improves results.
Do faces look weird? The Stable Diffusion autoencoder has a hard time with faces that are small in the image. Try cropping the image so the face takes up a larger portion of the frame.

Comments

Our codebase is based on the Stable Diffusion codebase.

BibTeX

@article{brooks2022instructpix2pix,
  title={InstructPix2Pix: Learning to Follow Image Editing Instructions},
  author={Brooks, Tim and Holynski, Aleksander and Efros, Alexei A},
  journal={arXiv preprint arXiv:2211.09800},
  year={2022}
}

Other ways of using InstructPix2Pix

InstructPix2Pix on HuggingFace:

A browser-based version of the demo is available as a HuggingFace space. For this version, you only need a browser, a picture you want to edit, and an instruction! Note that this is a shared online demo, and processing time may be slower during peak utilization.

InstructPix2Pix on Replicate:

Replicate provides a production-ready cloud API for running the InstructPix2Pix model. You can run the model from any environment using a simple API call with cURL, Python, JavaScript, or your language of choice. Replicate also provides a web interface for running the model and sharing predictions.

InstructPix2Pix in Imaginairy:

Imaginairy offers another way of easily installing InstructPix2Pix with a single command. It can run on devices without GPUs (like a Macbook!).
pip install imaginairy --upgrade
aimg edit any-image.jpg --gif "turn him into a cyborg" 
It also offers an easy way to perform a bunch of edits on an image, and can save edits out to an animated GIF:
aimg edit --gif --surprise-me pearl-earring.jpg 

InstructPix2Pix in 🧨 Diffusers:

InstructPix2Pix in Diffusers is a bit more optimized, so it may be faster and more suitable for GPUs with less memory. Below are instructions for installing the library and editing an image:

Install diffusers and relevant dependencies:
pip install transformers accelerate torch

pip install git+https://github.com/huggingface/diffusers.git
Load the model and edit the image:
import torch
from diffusers import StableDiffusionInstructPix2PixPipeline, EulerAncestralDiscreteScheduler

model_id = "timbrooks/instruct-pix2pix"
pipe = StableDiffusionInstructPix2PixPipeline.from_pretrained(model_id, torch_dtype=torch.float16, safety_checker=None)
pipe.to("cuda")
pipe.scheduler = EulerAncestralDiscreteScheduler.from_config(pipe.scheduler.config)
# `image` is an RGB PIL.Image
images = pipe("turn him into cyborg", image=image).images
images[0]
For more information, check the docs here.

instruct-pix2pix's People

Contributors

Stargazers

Watchers

Forkers

jackzhousz zongking123 aasim-syed jimgoo cian0 jxzhangjhu theprintsco codeaudit integritynoble techthiyanes breengles cokeroluwafemi c00renut mertcookimg bapek nialljmiller pz325 johndpope eltociear sohowj milyiyo arielreplicate saulocatharino hirajanwin nicejhonee geko1100 eternaldusk mistobaan aivscovid19 ailabteam sonu6569 abi patrickvonplaten eldadcaspi abhinav1217 surander96 mehrnooshzandi jorik041 dbuos reyreyttt78 kenitious it21712 sirbenet chz scott-coalesce suryatmodulus slopesweb barseghyanartur akankushjnvku meatfucker florianjuengermann cclauss hadryan harsha-hue mrcodechef kaylam73 sanju7699 soaringgecko ryan4daniel4 netzo92 geisajcs brycedrennan vanclesig edenilsonn marcus-arcadius borishanzju pingpong-ai-models jinoprinx tomer-rgo afkgit 04sumit04 joskid ahagai w0lramd agentmishra sandeshpatil1986 entn-at wiktoria30 tigeryy2 cjgammon hichemmaiza iuriimattos2 aladygit proven-design marcianobarros20 starcorpceo khanajmal007 hartl3y94 obwando 1jsingh techvbidut mahlac zby13857238876 rampall lidiakalugina marsh4ll99 rafayghafoor kokojo56 julianhasseusds tilakranjan

instruct-pix2pix's Issues

instruct-pix2pix - Error loading script

I installed the extension, but after restarting auto 1111 I get this error:

Error loading script: instruct-pix2pix.py
Traceback (most recent call last):
File "E:\MyProject\A.I\StableDiffusion\SD 2.0 install\stable-diffusion-webui\modules\scripts.py", line 205, in load_scripts
module = script_loading.load_module(scriptfile.path)
File "E:\MyProject\A.I\StableDiffusion\SD 2.0 install\stable-diffusion-webui\modules\script_loading.py", line 13, in load_module
exec(compiled, module.dict)
File "E:\MyProject\A.I\StableDiffusion\SD 2.0 install\stable-diffusion-webui\extensions\stable-diffusion-webui-instruct-pix2pix\scripts\instruct-pix2pix.py", line 24, in
from modules.ui_common import create_output_panel
ModuleNotFoundError: No module named 'modules.ui_common'

Сan someone tell me what to do? Maybe reinstall all over?

I'm considering converting your project into an extension for a1111 webui

I was mentioned in a discussion about this project being an extension for automatic1111/stable-diffusion-webui.

In a quick look around your code, I deteremined that it could take about a day to implement it. Really, it could take as little as fifteen minutes, but I'm not very familiar with your code, so I put an estimate for a long days worth of time to hash out little details.

But, I figured that you might be interested in doing this yourself.

So the first thing is, I noticed the attached license already gives permission for reuse as long as I release it under the same conditions.
I have no money to gain from this, nor have any plans to. If it's something I decide to take on, it'll be for the experience. So if you are not interested in doing it, may I have your blessing?

The second thing, here's some information about how extensions work, and the approach to turn it into an extension if you choose to do so.
During loading of the webui, it looks in it's directory labeled extensions, each subdirectory is considered an extension.
If a user installed the extension using a github url, or clicked on a provided name (from known extensions, from a file in the project that has urls), it installs them with the folder name as the same name as the github repository.

The first thing it checks for is a file called install.py in the project. The intention of this file is to check if dependencies are installed, and install them if they are not. This doesn't work as clean as you'd expect, because it only runs on a reboot of the app.

In my experience, users will check for an update using the extensions tab in the ui, hit the apply and restart button. This "Apply and Restart" button does a soft restart, which does not run the install.py file, but it does reload the other python scripts and javascript files.

My solution was to check and handle this in the other python files.

In your extensions project directory, it will assume that python scripts it should load are in the scripts folder, and javascript in the javascript folder.
I'm mentioning javascript here first, because it has less to mention, although I don't see that you use it, but for completeness I'll mention it.

It reads the javascipt folder, scrapes the names, and creates a html <script src=yourfilename.js> tag in the head. Since each of these files will be loaded in alphabetical order, you don't need to import them from one to another, just know they are loaded in the DOM.

For the python files, it reads the scripts folder files by name, reads it as a file and appends it into an object and runs exec. This is to create a namespace. So each file is loaded independently. It will then read each namespace for an object that has type of Scripts, which a script can inherit from modules/scripts.Scripts. Those type will be added to the scripts dropdown on either txt2img or img2img, or both, depending on your return for the show method.
For the other files, they will be in the projects namespace. But, you have options with them, such as putting it in settings, interfering with an image generation with prepocess, or postprocess, or even having it as a tab.
These can be done by using a callback defined in the projects modules/scripts_callbacks.py file.

The callback you'd be interested in is the on_ui_tabs.
https://github.com/AUTOMATIC1111/stable-diffusion-webui/blob/master/modules/script_callbacks.py#L236

This callback allows you to give it a function that defines the ui. Here's an example from one that I did.

tab = MyTab(basedir)        
script_callbacks.on_ui_tabs(tab.ui)

From this file, notice that I instantiated the object first since I wasn't using a function. You don't need to separate the components and rows declaration like I did, but notice that my ui method starts with a gr.blocks.
https://github.com/Gerschel/sd-web-ui-quickcss/blob/master/scripts/quickcss.py#L128

You might have to clean-up extra files, rename some directories. sys.append some namespaces. But that's primarily it.
You can do probably do this in about 15 to 20 minutes.

If you want to know more, I've spent way too much time learning their codebase, and I can answer some questions.

How to keep image size?

Is there any command to keep the same image size?

Negative prompts

Hi, is it possible to use negative prompts?

Have you created a space on HuggingFace? if not can I create one?

First of all congrats on your work!

I've been looking if you created a space on HiggingFace but nothing was found.

Would you like to create one?

If not, can I create one?

Of course giving you full credit

Appreciate

As Automatic1111 extension?

I really like this solution and have been looking for something similar for a long time!
Could you make this as an extension for Automatic1111? The gradio is a given there too.
I would really like it!

Slight changes required to run on windows.

I managed to get this running on windows.

The two primary issues were the checkpoint script is for bash, which can be solved by either installing bash or downloading the link manually. Then you create a checkpoints directory and put the checkpoint in there.

The second issue is the specified version of transformers does not work on Windows, failing out with a CLIP issue. To solve that edit the requirements.txt and environment.yaml to change transformers from 4.19.2 to 4.25.1.

After that it should fire right up and you can use the webui fine.

Trying to run on a Mac with M1 Pro

Is there a way to run this on Mac Silicon (M1 Pro 32GB?). I tried the first conda command in your instructions, and it naturally returned with a cuda error. I will try now on Colab Pro, which uses nVidia cards and cuda.

Created a colab notebook, but there is no requirements file. Inserted !pip3 install einops k_diffusion omegaconf based on errors. But now, I get (when using inference edit_cli.py). Any suggestions? Maybe I need to install another package?

Loading model from checkpoints/instruct-pix2pix-00-22000.ckpt
Traceback (most recent call last):
File "edit_cli.py", line 128, in
main()
File "edit_cli.py", line 79, in main
model = load_model_from_config(config, args.ckpt, args.vae_ckpt)
File "edit_cli.py", line 41, in load_model_from_config
pl_sd = torch.load(ckpt, map_location="cpu")
File "/usr/local/lib/python3.8/dist-packages/torch/serialization.py", line 777, in load
with _open_zipfile_reader(opened_file) as opened_zipfile:
File "/usr/local/lib/python3.8/dist-packages/torch/serialization.py", line 282, in init
super(_open_zipfile_reader, self).init(torch._C.PyTorchFileReader(name_or_buffer))
RuntimeError: PytorchStreamReader failed reading zip archive: failed finding central directory

It works

12GB VRAM not enough? RTX 3060 - Linux Mint 21.1

When I try to generate an image from a 512x512 image (tried both jpg and png), I get the following error output. The error appears to be roughly the same whether I use the gradio webui or just straight from the command line. Any idea what might be causing this?

╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /home/wh33t/instruct-pix2pix/edit_cli.py:128 in <module>                                         │
│                                                                                                  │
│   125                                                                                            │
│   126                                                                                            │
│   127 if __name__ == "__main__":                                                                 │
│ ❱ 128 │   main()                                                                                 │
│   129                                                                                            │
│                                                                                                  │
│ /home/wh33t/instruct-pix2pix/edit_cli.py:98 in main                                              │
│                                                                                                  │
│    95 │   │   input_image.save(args.output)                                                      │
│    96 │   │   return                                                                             │
│    97 │                                                                                          │
│ ❱  98 │   with torch.no_grad(), autocast("cuda"), model.ema_scope():                             │
│    99 │   │   cond = {}                                                                          │
│   100 │   │   cond["c_crossattn"] = [model.get_learned_conditioning([args.edit])]                │
│   101 │   │   input_image = 2 * torch.tensor(np.array(input_image)).float() / 255 - 1            │
│                                                                                                  │
│ /home/wh33t/anaconda3/envs/ip2p/lib/python3.8/contextlib.py:113 in __enter__                     │
│                                                                                                  │
│   110 │   │   # they are only needed for recreation, which is not possible anymore               │
│   111 │   │   del self.args, self.kwds, self.func                                                │
│   112 │   │   try:                                                                               │
│ ❱ 113 │   │   │   return next(self.gen)                                                          │
│   114 │   │   except StopIteration:                                                              │
│   115 │   │   │   raise RuntimeError("generator didn't yield") from None                         │
│   116                                                                                            │
│                                                                                                  │
│ /home/wh33t/instruct-pix2pix/./stable_diffusion/ldm/models/diffusion/ddpm_edit.py:185 in         │
│ ema_scope                                                                                        │
│                                                                                                  │
│    182 │   @contextmanager                                                                       │
│    183 │   def ema_scope(self, context=None):                                                    │
│    184 │   │   if self.use_ema:                                                                  │
│ ❱  185 │   │   │   self.model_ema.store(self.model.parameters())                                 │
│    186 │   │   │   self.model_ema.copy_to(self.model)                                            │
│    187 │   │   │   if context is not None:                                                       │
│    188 │   │   │   │   print(f"{context}: Switched to EMA weights")                              │
│                                                                                                  │
│ /home/wh33t/instruct-pix2pix/./stable_diffusion/ldm/modules/ema.py:62 in store                   │
│                                                                                                  │
│   59 │   │     parameters: Iterable of `torch.nn.Parameter`; the parameters to be                │
│   60 │   │   │   temporarily stored.                                                             │
│   61 │   │   """                                                                                 │
│ ❱ 62 │   │   self.collected_params = [param.clone() for param in parameters]                     │
│   63 │                                                                                           │
│   64 │   def restore(self, parameters):                                                          │
│   65 │   │   """                                                                                 │
│                                                                                                  │
│ /home/wh33t/instruct-pix2pix/./stable_diffusion/ldm/modules/ema.py:62 in <listcomp>              │
│                                                                                                  │
│   59 │   │     parameters: Iterable of `torch.nn.Parameter`; the parameters to be                │
│   60 │   │   │   temporarily stored.                                                             │
│   61 │   │   """                                                                                 │
│ ❱ 62 │   │   self.collected_params = [param.clone() for param in parameters]                     │
│   63 │                                                                                           │
│   64 │   def restore(self, parameters):                                                          │
│   65 │   │   """                                                                                 │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
RuntimeError: CUDA out of memory. Tried to allocate 50.00 MiB (GPU 0; 11.75 GiB total capacity; 9.65 GiB already allocated; 30.00 MiB free; 9.83 GiB reserved in total by PyTorch) If reserved memory is >> allocated 
memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

No scale factor applied when concat image related information

Hi, this is a minor nit and asking to see if there are explicit motivations to do so. In the process of supporting this model in Draw Things, I noticed that unlike Inpainting models, where the encoded image multiplied the "scale factor": https://github.com/runwayml/stable-diffusion/blob/main/ldm/models/diffusion/ddpm.py#L550 while in instruct pix2pix, we don't: https://github.com/timothybrooks/instruct-pix2pix/blob/main/edit_cli.py#L103

Not a big deal to me as I figured out, modified a bit and it worked exactly like expected. But want to call it out and see if there are some considerations applied so I know where my modifications should be applied (whether treat the edit model as a special case, or just modify the first layer conv2d weights).

Automatic1111 integration

Is it possible to use this with the automatic1111 webui?

I tried downloading the ckpt and used the config file, but it seems it is not enough.

No module named ddpm_edit

I'm getting the following error when running the cli tool:

Loading model from checkpoints/instruct-pix2pix-00-22000.ckpt
Global Step: 22000
Traceback (most recent call last):
File "edit_cli.py", line 128, in
main()
File "edit_cli.py", line 79, in main
model = load_model_from_config(config, args.ckpt, args.vae_ckpt)
File "edit_cli.py", line 52, in load_model_from_config
model = instantiate_from_config(config.model)
File "/ingest/ImageDiffuserService/proto/instruct2pix2pix/instruct-pix2pix/stable_diffusion/ldm/util.py", line 85, in instantiate_from_config
return get_obj_from_str(config["target"])(**config.get("params", dict()))
File "/ingest/ImageDiffuserService/proto/instruct2pix2pix/instruct-pix2pix/stable_diffusion/ldm/util.py", line 93, in get_obj_from_str
return getattr(importlib.import_module(module, package=None), cls)
File "/home/elliot/anaconda3/envs/pytorch-env/lib/python3.8/importlib/init.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "", line 1014, in _gcd_import
File "", line 991, in _find_and_load
File "", line 973, in _find_and_load_unlocked
ModuleNotFoundError: No module named 'ldm.models.diffusion.ddpm_edit'

I've googled "ldm.models.diffusion.ddpm_edit" and don't see any references to this module existing. any idea as to what i'm doing wrong?

huggingface instruct-pix2pix fails

In hugging face.
timbrooks/instruct-pix2pix

Trying to turn David into a cyborg with the same
settings as your readme does not work.
It returns a multi color blur.

Fix CFG ON

Text CFG Image CFG
7.5 1.2

expected scalar type Half but found Float

Running wsl ubuntu 18, RTX 2080

Initially i had #19, switched to fp16 now i'm getting (excuse the paste):

╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /home/:)/instruct-pix2pix/edit_app.py:270 in <module>                                       │
│                                                                                                  │
│   267                                                                                            │
│   268                                                                                            │
│   269 if __name__ == "__main__":                                                                 │
│ ❱ 270 │   main()                                                                                 │
│   271                                                                                            │
│                                                                                                  │
│ /home/:)/instruct-pix2pix/edit_app.py:115 in main                                           │
│                                                                                                  │
│   112 │   model.eval().cuda()                                                                    │
│   113 │   model_wrap = K.external.CompVisDenoiser(model)                                         │
│   114 │   model_wrap_cfg = CFGDenoiser(model_wrap)                                               │
│ ❱ 115 │   null_token = model.get_learned_conditioning([""])                                      │
│   116 │   example_image = Image.open("imgs/example.jpg").convert("RGB")                          │
│   117 │                                                                                          │
│   118 │   def load_example(                                                                      │
│                                                                                                  │
│ /home/:)/instruct-pix2pix/./stable_diffusion/ldm/models/diffusion/ddpm_edit.py:588 in       │
│ get_learned_conditioning                                                                         │
│                                                                                                  │
│    585 │   def get_learned_conditioning(self, c):                                                │
│    586 │   │   if self.cond_stage_forward is None:                                               │
│    587 │   │   │   if hasattr(self.cond_stage_model, 'encode') and callable(self.cond_stage_mod  │
│ ❱  588 │   │   │   │   c = self.cond_stage_model.encode(c)                                       │
│    589 │   │   │   │   if isinstance(c, DiagonalGaussianDistribution):                           │
│    590 │   │   │   │   │   c = c.mode()                                                          │
│    591 │   │   │   else:                                                                         │
│                                                                                                  │
│ /home/:)/instruct-pix2pix/./stable_diffusion/ldm/modules/encoders/modules.py:162 in encode  │
│                                                                                                  │
│   159 │   │   return z                                                                           │
│   160 │                                                                                          │
│   161 │   def encode(self, text):                                                                │
│ ❱ 162 │   │   return self(text)                                                                  │
│   163                                                                                            │
│   164                                                                                            │
│   165 class FrozenCLIPTextEmbedder(nn.Module):                                                   │
│                                                                                                  │
│ /home/rei/micromamba/envs/ip2p/lib/python3.8/site-packages/torch/nn/modules/module.py:1110 in    │
│ _call_impl                                                                                       │
│                                                                                                  │
│   1107 │   │   # this function, and just call forward.                                           │
│   1108 │   │   if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks o  │
│   1109 │   │   │   │   or _global_forward_hooks or _global_forward_pre_hooks):                   │
│ ❱ 1110 │   │   │   return forward_call(*input, **kwargs)                                         │
│   1111 │   │   # Do not call functions when jit is used                                          │
│   1112 │   │   full_backward_hooks, non_full_backward_hooks = [], []                             │
│   1113 │   │   if self._backward_hooks or _global_backward_hooks:                                │
│                                                                                                  │
│ /home/:)/instruct-pix2pix/./stable_diffusion/ldm/modules/encoders/modules.py:156 in forward │
│                                                                                                  │
│   153 │   │   batch_encoding = self.tokenizer(text, truncation=True, max_length=self.max_lengt   │
│   154 │   │   │   │   │   │   │   │   │   │   return_overflowing_tokens=False, padding="max_le   │
│   155 │   │   tokens = batch_encoding["input_ids"].to(self.device)                               │
│ ❱ 156 │   │   outputs = self.transformer(input_ids=tokens)                                       │
│   157 │   │                                                                                      │
│   158 │   │   z = outputs.last_hidden_state                                                      │
│   159 │   │   return z                                                                           │
│                                                                                                  │
│ /home/:)/micromamba/envs/ip2p/lib/python3.8/site-packages/torch/nn/modules/module.py:1110 in    │
│ _call_impl                                                                                       │
│                                                                                                  │
│   1107 │   │   # this function, and just call forward.                                           │
│   1108 │   │   if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks o  │
│   1109 │   │   │   │   or _global_forward_hooks or _global_forward_pre_hooks):                   │
│ ❱ 1110 │   │   │   return forward_call(*input, **kwargs)                                         │
│   1111 │   │   # Do not call functions when jit is used                                          │
│   1112 │   │   full_backward_hooks, non_full_backward_hooks = [], []                             │
│   1113 │   │   if self._backward_hooks or _global_backward_hooks:                                │
│                                                                                                  │
│ /home/:)/micromamba/envs/ip2p/lib/python3.8/site-packages/transformers/models/clip/modeling_cli │
│ p.py:722 in forward                                                                              │
│                                                                                                  │
│    719 │   │   >>> last_hidden_state = outputs.last_hidden_state                                 │
│    720 │   │   >>> pooled_output = outputs.pooler_output  # pooled (EOS token) states            │
│    721 │   │   ```"""                                                                            │
│ ❱  722 │   │   return self.text_model(                                                           │
│    723 │   │   │   input_ids=input_ids,                                                          │
│    724 │   │   │   attention_mask=attention_mask,                                                │
│    725 │   │   │   position_ids=position_ids,                                                    │
│                                                                                                  │
│ /home/:)/micromamba/envs/ip2p/lib/python3.8/site-packages/torch/nn/modules/module.py:1110 in    │
│ _call_impl                                                                                       │
│                                                                                                  │
│   1107 │   │   # this function, and just call forward.                                           │
│   1108 │   │   if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks o  │
│   1109 │   │   │   │   or _global_forward_hooks or _global_forward_pre_hooks):                   │
│ ❱ 1110 │   │   │   return forward_call(*input, **kwargs)                                         │
│   1111 │   │   # Do not call functions when jit is used                                          │
│   1112 │   │   full_backward_hooks, non_full_backward_hooks = [], []                             │
│   1113 │   │   if self._backward_hooks or _global_backward_hooks:                                │
│                                                                                                  │
│ /home/:)/micromamba/envs/ip2p/lib/python3.8/site-packages/transformers/models/clip/modeling_cli │
│ p.py:643 in forward                                                                              │
│                                                                                                  │
│    640 │   │   │   # [bsz, seq_len] -> [bsz, 1, tgt_seq_len, src_seq_len]                        │
│    641 │   │   │   attention_mask = _expand_mask(attention_mask, hidden_states.dtype)            │
│    642 │   │                                                                                     │
│ ❱  643 │   │   encoder_outputs = self.encoder(                                                   │
│    644 │   │   │   inputs_embeds=hidden_states,                                                  │
│    645 │   │   │   attention_mask=attention_mask,                                                │
│    646 │   │   │   causal_attention_mask=causal_attention_mask,                                  │
│                                                                                                  │
│ /home/:)/micromamba/envs/ip2p/lib/python3.8/site-packages/torch/nn/modules/module.py:1110 in    │
│ _call_impl                                                                                       │
│                                                                                                  │
│   1107 │   │   # this function, and just call forward.                                           │
│   1108 │   │   if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks o  │
│   1109 │   │   │   │   or _global_forward_hooks or _global_forward_pre_hooks):                   │
│ ❱ 1110 │   │   │   return forward_call(*input, **kwargs)                                         │
│   1111 │   │   # Do not call functions when jit is used                                          │
│   1112 │   │   full_backward_hooks, non_full_backward_hooks = [], []                             │
│   1113 │   │   if self._backward_hooks or _global_backward_hooks:                                │
│                                                                                                  │
│ /home/:)/micromamba/envs/ip2p/lib/python3.8/site-packages/transformers/models/clip/modeling_cli │
│ p.py:574 in forward                                                                              │
│                                                                                                  │
│    571 │   │   │   │   │   causal_attention_mask,                                                │
│    572 │   │   │   │   )                                                                         │
│    573 │   │   │   else:                                                                         │
│ ❱  574 │   │   │   │   layer_outputs = encoder_layer(                                            │
│    575 │   │   │   │   │   hidden_states,                                                        │
│    576 │   │   │   │   │   attention_mask,                                                       │
│    577 │   │   │   │   │   causal_attention_mask,                                                │
│                                                                                                  │
│ /home/:)/micromamba/envs/ip2p/lib/python3.8/site-packages/torch/nn/modules/module.py:1110 in    │
│ _call_impl                                                                                       │
│                                                                                                  │
│   1107 │   │   # this function, and just call forward.                                           │
│   1108 │   │   if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks o  │
│   1109 │   │   │   │   or _global_forward_hooks or _global_forward_pre_hooks):                   │
│ ❱ 1110 │   │   │   return forward_call(*input, **kwargs)                                         │
│   1111 │   │   # Do not call functions when jit is used                                          │
│   1112 │   │   full_backward_hooks, non_full_backward_hooks = [], []                             │
│   1113 │   │   if self._backward_hooks or _global_backward_hooks:                                │
│                                                                                                  │
│ /home/:)/micromamba/envs/ip2p/lib/python3.8/site-packages/transformers/models/clip/modeling_cli │
│ p.py:317 in forward                                                                              │
│                                                                                                  │
│    314 │   │   residual = hidden_states                                                          │
│    315 │   │                                                                                     │
│    316 │   │   hidden_states = self.layer_norm1(hidden_states)                                   │
│ ❱  317 │   │   hidden_states, attn_weights = self.self_attn(                                     │
│    318 │   │   │   hidden_states=hidden_states,                                                  │
│    319 │   │   │   attention_mask=attention_mask,                                                │
│    320 │   │   │   causal_attention_mask=causal_attention_mask,                                  │
│                                                                                                  │
│ /home/:)/micromamba/envs/ip2p/lib/python3.8/site-packages/torch/nn/modules/module.py:1110 in    │
│ _call_impl                                                                                       │
│                                                                                                  │
│   1107 │   │   # this function, and just call forward.                                           │
│   1108 │   │   if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks o  │
│   1109 │   │   │   │   or _global_forward_hooks or _global_forward_pre_hooks):                   │
│ ❱ 1110 │   │   │   return forward_call(*input, **kwargs)                                         │
│   1111 │   │   # Do not call functions when jit is used                                          │
│   1112 │   │   full_backward_hooks, non_full_backward_hooks = [], []                             │
│   1113 │   │   if self._backward_hooks or _global_backward_hooks:                                │
│                                                                                                  │
│ /home/:)/micromamba/envs/ip2p/lib/python3.8/site-packages/transformers/models/clip/modeling_cli │
│ p.py:257 in forward                                                                              │
│                                                                                                  │
│    254 │   │                                                                                     │
│    255 │   │   attn_probs = nn.functional.dropout(attn_weights, p=self.dropout, training=self.t  │
│    256 │   │                                                                                     │
│ ❱  257 │   │   attn_output = torch.bmm(attn_probs, value_states)                                 │
│    258 │   │                                                                                     │
│    259 │   │   if attn_output.size() != (bsz * self.num_heads, tgt_len, self.head_dim):          │
│    260 │   │   │   raise ValueError(                                                             │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
RuntimeError: expected scalar type Half but found Float

RevengeSMPpvp.aternos.me

AssertionError: Torch not compiled with CUDA enabled

I get this error. How can I correct it?

╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ F:\Work area\instruct-pix2pix\edit_cli.py:128 in <module>                                        │
│                                                                                                  │
│   125                                                                                            │
│   126                                                                                            │
│   127 if __name__ == "__main__":                                                                 │
│ ❱ 128 │   main()                                                                                 │
│   129                                                                                            │
│                                                                                                  │
│ F:\Work area\instruct-pix2pix\edit_cli.py:80 in main                                             │
│                                                                                                  │
│    77 │                                                                                          │
│    78 │   config = OmegaConf.load(args.config)                                                   │
│    79 │   model = load_model_from_config(config, args.ckpt, args.vae_ckpt)                       │
│ ❱  80 │   model.eval().cuda()                                                                    │
│    81 │   model_wrap = K.external.CompVisDenoiser(model)                                         │
│    82 │   model_wrap_cfg = CFGDenoiser(model_wrap)                                               │
│    83 │   null_token = model.get_learned_conditioning([""])                                      │
│                                                                                                  │
│ C:\Users\user\miniconda3\lib\site-packages\pytorch_lightning\core\mixins\device_dtype_mixin.py:1 │
│ 27 in cuda                                                                                       │
│                                                                                                  │
│   124 │   │   if device is None or isinstance(device, int):                                      │
│   125 │   │   │   device = torch.device("cuda", index=device)                                    │
│   126 │   │   self.__update_properties(device=device)                                            │
│ ❱ 127 │   │   return super().cuda(device=device)                                                 │
│   128 │                                                                                          │
│   129 │   def cpu(self) -> "DeviceDtypeModuleMixin":                                             │
│   130 │   │   """Moves all model parameters and buffers to the CPU.                              │
│                                                                                                  │
│ C:\Users\user\miniconda3\lib\site-packages\torch\nn\modules\module.py:749 in cuda                │
│                                                                                                  │
│    746 │   │   Returns:                                                                          │
│    747 │   │   │   Module: self                                                                  │
│    748 │   │   """                                                                               │
│ ❱  749 │   │   return self._apply(lambda t: t.cuda(device))                                      │
│    750 │                                                                                         │
│    751 │   def ipu(self: T, device: Optional[Union[int, device]] = None) -> T:                   │
│    752 │   │   r"""Moves all model parameters and buffers to the IPU.                            │
│                                                                                                  │
│ C:\Users\user\miniconda3\lib\site-packages\torch\nn\modules\module.py:641 in _apply              │
│                                                                                                  │
│    638 │                                                                                         │
│    639 │   def _apply(self, fn):                                                                 │
│    640 │   │   for module in self.children():                                                    │
│ ❱  641 │   │   │   module._apply(fn)                                                             │
│    642 │   │                                                                                     │
│    643 │   │   def compute_should_use_set_data(tensor, tensor_applied):                          │
│    644 │   │   │   if torch._has_compatible_shallow_copy_type(tensor, tensor_applied):           │
│                                                                                                  │
│ C:\Users\user\miniconda3\lib\site-packages\torch\nn\modules\module.py:641 in _apply              │
│                                                                                                  │
│    638 │                                                                                         │
│    639 │   def _apply(self, fn):                                                                 │
│    640 │   │   for module in self.children():                                                    │
│ ❱  641 │   │   │   module._apply(fn)                                                             │
│    642 │   │                                                                                     │
│    643 │   │   def compute_should_use_set_data(tensor, tensor_applied):                          │
│    644 │   │   │   if torch._has_compatible_shallow_copy_type(tensor, tensor_applied):           │
│                                                                                                  │
│ C:\Users\user\miniconda3\lib\site-packages\torch\nn\modules\module.py:641 in _apply              │
│                                                                                                  │
│    638 │                                                                                         │
│    639 │   def _apply(self, fn):                                                                 │
│    640 │   │   for module in self.children():                                                    │
│ ❱  641 │   │   │   module._apply(fn)                                                             │
│    642 │   │                                                                                     │
│    643 │   │   def compute_should_use_set_data(tensor, tensor_applied):                          │
│    644 │   │   │   if torch._has_compatible_shallow_copy_type(tensor, tensor_applied):           │
│                                                                                                  │
│ C:\Users\user\miniconda3\lib\site-packages\torch\nn\modules\module.py:641 in _apply              │
│                                                                                                  │
│    638 │                                                                                         │
│    639 │   def _apply(self, fn):                                                                 │
│    640 │   │   for module in self.children():                                                    │
│ ❱  641 │   │   │   module._apply(fn)                                                             │
│    642 │   │                                                                                     │
│    643 │   │   def compute_should_use_set_data(tensor, tensor_applied):                          │
│    644 │   │   │   if torch._has_compatible_shallow_copy_type(tensor, tensor_applied):           │
│                                                                                                  │
│ C:\Users\user\miniconda3\lib\site-packages\torch\nn\modules\module.py:664 in _apply              │
│                                                                                                  │
│    661 │   │   │   # track autograd history of `param_applied`, so we have to use                │
│    662 │   │   │   # `with torch.no_grad():`                                                     │
│    663 │   │   │   with torch.no_grad():                                                         │
│ ❱  664 │   │   │   │   param_applied = fn(param)                                                 │
│    665 │   │   │   should_use_set_data = compute_should_use_set_data(param, param_applied)       │
│    666 │   │   │   if should_use_set_data:                                                       │
│    667 │   │   │   │   param.data = param_applied                                                │
│                                                                                                  │
│ C:\Users\user\miniconda3\lib\site-packages\torch\nn\modules\module.py:749 in <lambda>            │
│                                                                                                  │
│    746 │   │   Returns:                                                                          │
│    747 │   │   │   Module: self                                                                  │
│    748 │   │   """                                                                               │
│ ❱  749 │   │   return self._apply(lambda t: t.cuda(device))                                      │
│    750 │                                                                                         │
│    751 │   def ipu(self: T, device: Optional[Union[int, device]] = None) -> T:                   │
│    752 │   │   r"""Moves all model parameters and buffers to the IPU.                            │
│                                                                                                  │
│ C:\Users\user\miniconda3\lib\site-packages\torch\cuda\__init__.py:221 in _lazy_init              │
│                                                                                                  │
│   218 │   │   │   │   "Cannot re-initialize CUDA in forked subprocess. To use CUDA with "        │
│   219 │   │   │   │   "multiprocessing, you must use the 'spawn' start method")                  │
│   220 │   │   if not hasattr(torch._C, '_cuda_getDeviceCount'):                                  │
│ ❱ 221 │   │   │   raise AssertionError("Torch not compiled with CUDA enabled")                   │
│   222 │   │   if _cudart is None:                                                                │
│   223 │   │   │   raise AssertionError(                                                          │
│   224 │   │   │   │   "libcudart functions unavailable. It looks like you have a broken build?   │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
AssertionError: Torch not compiled with CUDA enabled

I am also attaching my video card information. Just to make sure that it has enough resources.

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 528.24       Driver Version: 528.24       CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name            TCC/WDDM | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ... WDDM  | 00000000:01:00.0  On |                  N/A |
| 21%   45C    P0    52W / 200W |   1398MiB /  8192MiB |      1%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

Running on CPU?

Hi folks,

I tried to run it, but the app exit with a 'killed' message.
dmesg state that is because an Out Of Memory error.

Can I run this app in system memory, using CPU? How could I do it?

Thank you!

checkpoint

Hi, I could not download the checkpoint. Could you please share it with other ways? Thanks! @timothybrooks

Loading the model consistently fails

Platform: Windows 10 x64 v22H2
Software: Python 3.8.5
Terminal: Anaconda Powershell Prompt v22.9.0
The system just had windows installed yesterday, so it should be in a very vanilla state

When running command:

(ip2p) PS C:\Users\me\src\instruct-pix2pix> python edit_cli.py --input ..\..\Pictures\input.jpg --output ..\..\Pictures\modded.jpg --edit "turn him into a cyborg"

It receives the following error

Loading model from checkpoints/instruct-pix2pix-00-22000.ckpt
Traceback (most recent call last):
  File "edit_cli.py", line 128, in <module>
    main()
  File "edit_cli.py", line 79, in main
    model = load_model_from_config(config, args.ckpt, args.vae_ckpt)
  File "edit_cli.py", line 41, in load_model_from_config
    pl_sd = torch.load(ckpt, map_location="cpu")
  File "C:\Users\Me\anaconda3\envs\ip2p\lib\site-packages\torch\serialization.py", line 705, in load
    with _open_zipfile_reader(opened_file) as opened_zipfile:
  File "C:\Users\Me\anaconda3\envs\ip2p\lib\site-packages\torch\serialization.py", line 243, in __init__
    super(_open_zipfile_reader, self).__init__(torch._C.PyTorchFileReader(name_or_buffer))
RuntimeError: PytorchStreamReader failed reading zip archive: failed finding central directory

Similarly, when running the following command:
When running command:

(ip2p) PS C:\Users\me\src\instruct-pix2pix> python edit_app.py

It receives the following error

Loading model from checkpoints/instruct-pix2pix-00-22000.ckpt
Traceback (most recent call last):
  File "edit_app.py", line 268, in <module>
    main()
  File "edit_app.py", line 109, in main
    model = load_model_from_config(config, args.ckpt, args.vae_ckpt)
  File "edit_app.py", line 78, in load_model_from_config
    pl_sd = torch.load(ckpt, map_location="cpu")
  File "C:\Users\me\anaconda3\envs\ip2p\lib\site-packages\torch\serialization.py", line 705, in load
    with _open_zipfile_reader(opened_file) as opened_zipfile:
  File "C:\Users\me\anaconda3\envs\ip2p\lib\site-packages\torch\serialization.py", line 243, in __init__
    super(_open_zipfile_reader, self).__init__(torch._C.PyTorchFileReader(name_or_buffer))
RuntimeError: PytorchStreamReader failed reading zip archive: failed finding central directory

Is there any way to change the sampling method?

I noticed only two lines responsible for this:
z = K.sampling.sample_euler_ancestral(model_wrap_cfg, z, sigmas, extra_args=extra_args)
generation_params = { "ip2p": "Yes", "Prompt": instruction, "Negative Prompt": negative_prompt, "steps": steps, "sampler": "Euler A", ....

Will it be enough to make changes to these lines? Please with an example(dpm adaptive(sigma_min, sigma_max)), thank you very much.
You have created an incredible model for editing images based on instructions.

Was the model trained for 22 epochs?

The model name instruct-pix2pix-00-22000.ckpt looks like it was trained for 22 epochs. But when I trained the model, it was trained more than that. Just want to confirm how many epochs we shall train to get the final results. Thanks.

Model Drive Link

Hello,
Thank you so much for the amazing work. Model downloading is very slow even in the presence of high quality network. I have tried on both colab and local and both are requiring to download for more than 2 hours.

Can you please google drive link ?
Thanks

AttributeError: module transformers has no attribute CLIPImageProcessor

Hello, when running

import PIL
import requests
import torch
from diffusers import StableDiffusionInstructPix2PixPipeline, EulerAncestralDiscreteScheduler

model_id = "timbrooks/instruct-pix2pix"
pipe = StableDiffusionInstructPix2PixPipeline.from_pretrained(model_id, torch_dtype=torch.float16, revision="fp16", safety_checker=None)
pipe.to("cuda")
pipe.enable_attention_slicing()

i keep encountering "AttributeError: module transformers has no attribute CLIPImageProcessor"

i tried installing clip and updating transformers, but same error. The only issue similar issue i could find was

https://self-development.info/%E3%80%90stable-diffusion%E3%80%91%E3%82%B5%E3%83%B3%E3%83%80%E3%83%BC%E3%83%90%E3%83%BC%E3%83%89%E9%A2%A8%E3%81%AE%E7%94%BB%E5%83%8F%E3%82%92%E7%94%9F%E6%88%90%E3%81%99%E3%82%8B/

and google translate didnt help much lol

How to make Windows 10 recognize Bash

Windows 10 (check if bash or wsl not working)
if not working:

if you run wsl --install and see the WSL help text, please try running wsl --list --online to see a list of available distros and run wsl --install -d to install a distro. To uninstall WSL, see Uninstall legacy version of WSL or unregister or uninstall a Linux distribution.

after installation test

try
wsl bash -c "echo hi from simple script"

the echo should work.

cannot import name 'StableDiffusionInstructPix2PixPipeline' from 'diffusers'

Hi,

After cloning the repository and setting up the environment, I keep getting the following error when trying to run edit_app.py:

(ip2p) PS C:\Users\julia\instruct-pix2pix> python edit_app.py
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ C:\Users\julia\instruct-pix2pix\edit_app.py:9 in │
│ │
│ 6 import gradio as gr │
│ 7 import torch │
│ 8 from PIL import Image, ImageOps │
│ ❱ 9 from diffusers import StableDiffusionInstructPix2PixPipeline │
│ 10 │
│ 11 │
│ 12 help_text = """ │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
ImportError: cannot import name 'StableDiffusionInstructPix2PixPipeline' from 'diffusers'
(C:\Users\julia.conda\envs\ip2p\lib\site-packages\diffusers_init_.py)

Do you have a suggestion for how I could fix this? Thank you very much in advance

How to upload many photos on the Colab version?

Hi!
Have seen you have created the Colab version:
https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/InstructPix2Pix_using_diffusers.ipynb

Would like to know how to make it work, from providing a folder with many photos and that it works all of them with just one prompt

Appreciate

getting distored or blurred previews

wishlist - enhancement (probably out of scope)

I basically want to guide the prompt by an image.

eg.
"make it look like [IMAGE UPLOADED]"

I guess this will take another white paper to get there or must you use CLIP or image to prompt - but probably will lose something in translation.

kind of like img2img button on this webui -

https://github.com/AUTOMATIC1111/stable-diffusion-webui

AIG

i am using on rtx 2060 6gb ...although it was working b4 on automatic111 extension but now no more ? any solution to this

RuntimeError: CUDA out of memory. Tried to allocate 30.00 MiB (GPU 0; 6.00 GiB total capacity; 4.95 GiB already allocated; 0 bytes free; 5.02 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

RuntimeError: CUDA out of memory

RuntimeError: CUDA out of memory. Tried to allocate 58.00 MiB (GPU 0; 11.00 GiB total capacity; 10.04 GiB already allocated; 0 bytes free; 10.21 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

I've tried closing everything else that uses any GPU memory, but it always says "0 bytes free".

Trying to use edit_app.py in PowerShell on Windows 10
conda 22.11.1
2080Ti 11GB

VectorQuantizer2 error

ImportError: cannot import name 'VectorQuantizer2' from 'taming.modules.vqvae.quantize'
I'm getting an error like ... What could be the reason for this?

How to save automatically all the images generated on a folder? In Colab

I'm using your Colab and I'm trying to figure out how to save automatically all the images generated from one prompt into a folder.

Any help?

Appreciate

Is it possible to input many images and give you the same outlook?

Hi,

I'm working on styling a video . For this, I get the frames of the video and want to use instruct-pix2pix on all the frames and output all of them with the same style.

To make it clear, the frames are on the same place with one person moving. Giving the same prompt for all the frames as input at once, will it give me the same style for all?

Let me know if I did explain me well

can you make it run in free colab?

please create a free level colab to run this

hf timbrooks/instruct-pix2pix on A10G space

I tried to duplicate space on hf using a A10G small and its not working w that arch.

Fetching 15 files: 100%|██████████| 15/15 [01:26<00:00,  5.74s/it]
You have disabled the safety checker for <class 'diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion_instruct_pix2pix.StableDiffusionInstructPix2PixPipeline'> by passing `safety_checker=None`. Ensure that you abide to the conditions of the Stable Diffusion license and do not expose unfiltered results in services or applications open to the public. Both the diffusers team and Hugging Face strongly recommend to keep the safety filter enabled in all public facing circumstances, disabling it only for use-cases that involve analyzing network behavior or auditing its results. For more information, please have a look at https://github.com/huggingface/diffusers/pull/254 .
/home/user/.pyenv/versions/3.8.9/lib/python3.8/site-packages/torch/cuda/__init__.py:145: UserWarning: 
NVIDIA A10G with CUDA capability sm_86 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70.
If you want to use the NVIDIA A10G GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/

  warnings.warn(incompatible_device_warn.format(device_name, capability, " ".join(arch_list), device_name))
Running on local URL:  http://0.0.0.0:7860

To create a public link, set `share=True` in `launch()`.

Easy idea about increasing pix2pix target underestanding

Dear researchers, please also consider our newly introduce dynamic-pix2pix architecture, which increse the modeling abbility of pix2pix specially in exteremely limmited data scenarios.
For more information:
https://www.researchgate.net/publication/365448869_Dynamic-Pix2Pix_Noise_Injected_cGAN_for_Modeling_Input_and_Target_Domain_Joint_Distributions_with_Limited_Training_Data

Parameter effect and values for reproducing some examples

I get slightly lower quality compared to what is shown project page when using default parameters:

Can you give a hint for what are good values for some of the parameters and explain their effect :

for example what are

cfg_text = 7.5
cfg_image = 1.5

I got these for some of the examples inputs from the project page (images might also be slightly different then what you used)

for some other inputs like the "girl with pearl earing" the output is unchanged.

should I expect better results with other parameter settings?

Thanks

Conda problems?

when I run

conda env create -f environment.yaml

I get

Collecting package metadata (repodata.json): done
Solving environment: failed

ResolvePackageNotFound: 
  - cudatoolkit=11.3
  - numpy=1.19.2
  - python=3.8.5
  - torchvision=0.12.0
  - pytorch=1.11.0
  - pip=20.3

My device info:
MacBook Air M1 2020

Minor typo in paper

In the appendix section "A.2. Paired Image Generation",

"We generation 100 pairs of images for each pair of captions" > "We generate 100 pairs of images for each pair of captions"

[Tutorial] Forget Photoshop - How To Transform Images With Text Prompts using InstructPix2Pix Model in NMKD GUI

This is not an issue.

I hope you can add this tutorial to the readme section of the page

Thank you so much for this new great AI model. I hope you even release its improved version soon

My tutorial could be the first one to use it with easiest way

https://youtu.be/EPRa8EZl9Os

'bash' is not recognized even after installing with pip command

'bash' is not recognized as an internal or external command,
operable program or batch file.

When running python edit_app.py after a while terminal dissapears and shows nothing

Hi!

I did all the steps required from the readme and when running python edit_app.pyit appears:

Loading model from checkpoints/instruct-pix2pix-00-22000.ckpt

and then it just closes the terminal and shows nothing. It should open the Gradio interface right?

Appreciate

ERROR: "ModuleNotFoundError: No module named 'torch.distributed.algorithms.model_averaging'"

Loading model from checkpoints/instruct-pix2pix-00-22000.ckpt Global Step: 22000 Traceback (most recent call last): File "edit_cli.py", line 129, in <module> main() File "edit_cli.py", line 80, in main model = load_model_from_config(config, args.ckpt, args.vae_ckpt) File "edit_cli.py", line 53, in load_model_from_config model = instantiate_from_config(config.model) File "/home/ubuntu/projects/txt2img/instruct-pix2pix/stable_diffusion/ldm/util.py", line 85, in instantiate_from_config return get_obj_from_str(config["target"])(**config.get("params", dict())) File "/home/ubuntu/projects/txt2img/instruct-pix2pix/stable_diffusion/ldm/util.py", line 93, in get_obj_from_str return getattr(importlib.import_module(module, package=None), cls) File "/usr/lib/python3.8/importlib/__init__.py", line 127, in import_module return _bootstrap._gcd_import(name[level:], package, level) File "<frozen importlib._bootstrap>", line 1014, in _gcd_import File "<frozen importlib._bootstrap>", line 991, in _find_and_load File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked File "<frozen importlib._bootstrap>", line 671, in _load_unlocked File "<frozen importlib._bootstrap_external>", line 843, in exec_module File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed File "/home/ubuntu/projects/txt2img/instruct-pix2pix/./stable_diffusion/ldm/models/diffusion/ddpm_edit.py", line 15, in <module> import pytorch_lightning as pl File "/home/ubuntu/projects/txt2img/instruct-pix2pix/venv38/lib/python3.8/site-packages/pytorch_lightning/__init__.py", line 35, in <module> from pytorch_lightning.callbacks import Callback # noqa: E402 File "/home/ubuntu/projects/txt2img/instruct-pix2pix/venv38/lib/python3.8/site-packages/pytorch_lightning/callbacks/__init__.py", line 28, in <module> from pytorch_lightning.callbacks.pruning import ModelPruning File "/home/ubuntu/projects/txt2img/instruct-pix2pix/venv38/lib/python3.8/site-packages/pytorch_lightning/callbacks/pruning.py", line 31, in <module> from pytorch_lightning.core.module import LightningModule File "/home/ubuntu/projects/txt2img/instruct-pix2pix/venv38/lib/python3.8/site-packages/pytorch_lightning/core/__init__.py", line 16, in <module> from pytorch_lightning.core.module import LightningModule File "/home/ubuntu/projects/txt2img/instruct-pix2pix/venv38/lib/python3.8/site-packages/pytorch_lightning/core/module.py", line 50, in <module> from pytorch_lightning.trainer.connectors.logger_connector.fx_validator import _FxValidator File "/home/ubuntu/projects/txt2img/instruct-pix2pix/venv38/lib/python3.8/site-packages/pytorch_lightning/trainer/__init__.py", line 17, in <module> from pytorch_lightning.trainer.trainer import Trainer File "/home/ubuntu/projects/txt2img/instruct-pix2pix/venv38/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 57, in <module> from pytorch_lightning.loops import PredictionLoop, TrainingEpochLoop File "/home/ubuntu/projects/txt2img/instruct-pix2pix/venv38/lib/python3.8/site-packages/pytorch_lightning/loops/__init__.py", line 15, in <module> from pytorch_lightning.loops.batch import TrainingBatchLoop # noqa: F401 File "/home/ubuntu/projects/txt2img/instruct-pix2pix/venv38/lib/python3.8/site-packages/pytorch_lightning/loops/batch/__init__.py", line 15, in <module> from pytorch_lightning.loops.batch.training_batch_loop import TrainingBatchLoop # noqa: F401 File "/home/ubuntu/projects/txt2img/instruct-pix2pix/venv38/lib/python3.8/site-packages/pytorch_lightning/loops/batch/training_batch_loop.py", line 20, in <module> from pytorch_lightning.loops.optimization.manual_loop import _OUTPUTS_TYPE as _MANUAL_LOOP_OUTPUTS_TYPE File "/home/ubuntu/projects/txt2img/instruct-pix2pix/venv38/lib/python3.8/site-packages/pytorch_lightning/loops/optimization/__init__.py", line 15, in <module> from pytorch_lightning.loops.optimization.manual_loop import ManualOptimization # noqa: F401 File "/home/ubuntu/projects/txt2img/instruct-pix2pix/venv38/lib/python3.8/site-packages/pytorch_lightning/loops/optimization/manual_loop.py", line 23, in <module> from pytorch_lightning.loops.utilities import _build_training_step_kwargs, _extract_hiddens File "/home/ubuntu/projects/txt2img/instruct-pix2pix/venv38/lib/python3.8/site-packages/pytorch_lightning/loops/utilities.py", line 29, in <module> from pytorch_lightning.strategies.parallel import ParallelStrategy File "/home/ubuntu/projects/txt2img/instruct-pix2pix/venv38/lib/python3.8/site-packages/pytorch_lightning/strategies/__init__.py", line 15, in <module> from pytorch_lightning.strategies.bagua import BaguaStrategy # noqa: F401 File "/home/ubuntu/projects/txt2img/instruct-pix2pix/venv38/lib/python3.8/site-packages/pytorch_lightning/strategies/bagua.py", line 30, in <module> from pytorch_lightning.strategies.ddp import DDPStrategy File "/home/ubuntu/projects/txt2img/instruct-pix2pix/venv38/lib/python3.8/site-packages/pytorch_lightning/strategies/ddp.py", line 65, in <module> from torch.distributed.algorithms.model_averaging.averagers import ModelAverager ModuleNotFoundError: No module named 'torch.distributed.algorithms.model_averaging'

After create enviroment and running python edit_app.py get an error

Hi!
I get an error trying to make it work. I git cloned the repository, installed and created the enviroment and run python edit_app.py.

Got this error:

Loading model from checkpoints/instruct-pix2pix-00-22000.ckpt Traceback (most recent call last): File "edit_app.py", line 268, in <module> main() File "edit_app.py", line 109, in main model = load_model_from_config(config, args.ckpt, args.vae_ckpt) File "edit_app.py", line 78, in load_model_from_config pl_sd = torch.load(ckpt, map_location="cpu") File "/home/zaesarpo/anaconda3/envs/ip2p/lib/python3.8/site-packages/torch/serialization.py", line 705, in load with _open_zipfile_reader(opened_file) as opened_zipfile: File "/home/zaesarpo/anaconda3/envs/ip2p/lib/python3.8/site-packages/torch/serialization.py", line 243, in __init__ super(_open_zipfile_reader, self).__init__(torch._C.PyTorchFileReader(name_or_buffer)) RuntimeError: PytorchStreamReader failed reading zip archive: failed finding central directory
Any suggestions?

Appreciate

Idk how to run it

How do i run it locally ? :(

The setup instructions are very sparse for a layman like be cuz i am very illiterate when it comes to coding, I have however installed SD locally and I was wondering if there is an easy way to get this to work in a WebGUI similar to that. Thanks !

demo collab?

Would be nice to have a collab notebook to try it out. I have unsuccessfully tried patching one together using img2img diffusers.
My level of expertise does not allow me to diagnose the errors

import requests
import torch
from PIL import Image
from io import BytesIO

from diffusers import StableDiffusionImg2ImgPipeline

# load the pipeline
device = "cuda"
model_id_or_path = "timbrooks/instruct-pix2pix"
pipe = StableDiffusionImg2ImgPipeline.from_pretrained(model_id_or_path, torch_dtype=torch.float16)

# or download via git clone https://huggingface.co/runwayml/stable-diffusion-v1-5
# and pass `model_id_or_path="./stable-diffusion-v1-5"`.
pipe = pipe.to(device)

# let's download an initial image
url = "https://raw.githubusercontent.com/CompVis/stable-diffusion/main/assets/stable-samples/img2img/sketch-mountains-input.jpg"

response = requests.get(url)
init_image = Image.open(BytesIO(response.content)).convert("RGB")
init_image = init_image.resize((768, 512))

prompt = "make the sky red"

images = pipe(prompt=prompt, image=init_image, strength=1, guidance_scale=7.5).images

images[0].save("red sky")

RuntimeError                              Traceback (most recent call last)

[<ipython-input-19-1db39db9ed03>](https://localhost:8080/#) in <module>
     24 prompt = "make the sky red"
     25 
---> 26 images = pipe(prompt=prompt, image=init_image, strength=1, guidance_scale=7.5).images
     27 
     28 images[0].save("red sky")

6 frames

[/usr/local/lib/python3.8/dist-packages/torch/nn/modules/conv.py](https://localhost:8080/#) in _conv_forward(self, input, weight, bias)
    457                             weight, bias, self.stride,
    458                             _pair(0), self.dilation, self.groups)
--> 459         return F.conv2d(input, weight, bias, self.stride,
    460                         self.padding, self.dilation, self.groups)
    461 

RuntimeError: Given groups=1, weight of size [320, 8, 3, 3], expected input[2, 4, 64, 96] to have 8 channels, but got 4 channels instead

CUDA out of memory

After installation with some adventures (mentioned in other issues :) ) I got Web UI to run, but not the process. I am getting CUDA out of error message and so far, googling told me about code editing to send data in batches or changing Enviromental variables.
I tried to set PYTORCH_CUDA_ALLOC_CONF to max_split_size_mb:128 and max_split_size_mb:512 with no change.

I am on windows with 2080ti

my error when I press "Load Example" button (or try to run direct python command with it). Same Happens with any other image when I load it in, add text prompt and press "Generate" button.

Traceback (most recent call last):
  File "C:\Users\***\AppData\Local\Programs\Python\Python310\lib\site-packages\gradio\routes.py", line 337, in run_predict
    output = await app.get_blocks().process_api(
  File "C:\Users\***\AppData\Local\Programs\Python\Python310\lib\site-packages\gradio\blocks.py", line 1015, in process_api
    result = await self.call_function(
  File "C:\Users\***\AppData\Local\Programs\Python\Python310\lib\site-packages\gradio\blocks.py", line 833, in call_function
    prediction = await anyio.to_thread.run_sync(
  File "C:\Users\***\AppData\Local\Programs\Python\Python310\lib\site-packages\anyio\to_thread.py", line 31, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
  File "C:\Users\***\AppData\Local\Programs\Python\Python310\lib\site-packages\anyio\_backends\_asyncio.py", line 937, in run_sync_in_worker_thread
    return await future
  File "C:\Users\***\AppData\Local\Programs\Python\Python310\lib\site-packages\anyio\_backends\_asyncio.py", line 867, in run
    result = context.run(func, *args)
  File "E:\Instruct-pix2pix\instruct-pix2pix-main\edit_app.py", line 125, in load_example
    return [example_image, example_instruction] + generate(
  File "E:\Instruct-pix2pix\instruct-pix2pix-main\edit_app.py", line 160, in generate
    with torch.no_grad(), autocast("cuda"), model.ema_scope():
  File "C:\Users\***\AppData\Local\Programs\Python\Python310\lib\contextlib.py", line 135, in __enter__
    return next(self.gen)
  File "E:\Instruct-pix2pix\instruct-pix2pix-main\./stable_diffusion\ldm\models\diffusion\ddpm_edit.py", line 185, in ema_scope
    self.model_ema.store(self.model.parameters())
  File "E:\Instruct-pix2pix\instruct-pix2pix-main\./stable_diffusion\ldm\modules\ema.py", line 62, in store
    self.collected_params = [param.clone() for param in parameters]
  File "E:\Instruct-pix2pix\instruct-pix2pix-main\./stable_diffusion\ldm\modules\ema.py", line 62, in <listcomp>
    self.collected_params = [param.clone() for param in parameters]
RuntimeError: CUDA out of memory. Tried to allocate 58.00 MiB (GPU 0; 11.00 GiB total capacity; 10.04 GiB already allocated; 0 bytes free; 10.21 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF```

Any recommendations on how to get pass this?

Thanks.

How to share a google colab version that accepts a folder of images for create videos?

Hi!

I changed a bit the Google Colab version: https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/InstructPix2Pix_using_diffusers.ipynb

I made it so it accepts a folder of images (frames) to after create a video.

I would like to share it publicly so others can use it but without changing the main notebook. Also that when they load again the notebook it doesn't loose their changes. And also that it doesn't popup when running the cells telling it isn't secure and my showing my email as The Owner.

Just like the link I provided. Any suggestions? Have searched everywhere and the google colab support doesn't work. Also asked ChatGPT and doesn't give me a proper answer.

Any suggestion will be much appreciated!

timothybrooks / instruct-pix2pix Goto Github PK

instruct-pix2pix's Introduction

InstructPix2Pix: Learning to Follow Image Editing Instructions

Project Page | Paper | Data

TL;DR: quickstart

Set up a conda environment, and download a pretrained model:

Edit a single image:

Or launch your own interactive editing Gradio app:

Setup

Generated Dataset

Training InstructPix2Pix

Creating your own dataset

(1) Generate a dataset of captions and instructions

(1.1) Manually write a dataset of instructions and captions

(1.2) Finetune GPT-3

(1.3) Generate a large dataset of captions and instructions

(2) Turn paired captions into paired images

Evaluation

Tips

Comments

BibTeX

Other ways of using InstructPix2Pix

InstructPix2Pix on HuggingFace:

InstructPix2Pix on Replicate:

InstructPix2Pix in Imaginairy:

InstructPix2Pix in 🧨 Diffusers:

instruct-pix2pix's People

Contributors

Stargazers

Watchers

Forkers

instruct-pix2pix's Issues

Recommend Projects

Recommend Topics

Recommend Org