yueming6568 / deltaedit Goto Github PK

View Code? Open in Web Editor NEW

102.0 3.0 10.0 8.69 MB

Python 100.00%

deltaedit's Introduction

DeltaEdit: Exploring Text-free Training for Text-driven Image Manipulation

Overview

This repository contains the offical PyTorch implementation of paper:

DeltaEdit: Exploring Text-free Training for Text-driven Image Manipulation, CVPR 2023

News

[2023-03-11] Upload the training and inference code for the facial domain (◍•ڡ•◍).

To be continued...

We will release the training and inference code for the LSUN cat, church, horse later : )

Dependences

Install CLIP:

conda install --yes -c pytorch pytorch=1.7.1 torchvision cudatoolkit=<CUDA_VERSION>
pip install ftfy regex tqdm gdown
pip install git+https://github.com/openai/CLIP.git

Download pre-trained models :
- The code relies on the Rosinality pytorch implementation of StyleGAN2.
- Download the pre-trained StyleGAN2 generator model for the faical domain from here, and then place it into the folder ./models/pretrained_models.
- Download the pre-trained StyleGAN2 generator model for the LSUN cat, church, horse domains from here and then place them into the folder ./models/pretrained_models/stylegan2-{cat/church/horse}.

Training

Data preparing

DeltaEdit is trained on latent vectors.
For the facial domain, 58,000 real images from FFHQ dataset are randomly selected and 200,000 fake images from the z space in StyleGAN are sampled for training. Note that all real images are inverted by e4e encoder.
Download the provided FFHQ latent vectors from here and then place all numpy files into the folder ./latent_code/ffhq.

Generate the 200,000 sampled latent vectors by running the following commands for each specific domain:

CUDA_VISIBLE_DEVICES=0 python generate_codes.py --classname ffhq --samples 200000
CUDA_VISIBLE_DEVICES=0 python generate_codes.py --classname cat --samples 200000
CUDA_VISIBLE_DEVICES=0 python generate_codes.py --classname church --samples 200000
CUDA_VISIBLE_DEVICES=0 python generate_codes.py --classname horse --samples 200000

Usage

The main training script is placed in ./scripts/train.py.
Training arguments can be found at ./options/train_options.py.

For training please run the following commands:

CUDA_VISIBLE_DEVICES=0 python scripts/train.py

Inference

The main inferece script is placed in ./scripts/inference.py.
Inference arguments can be found at ./options/test_options.py.
Download the pretrained DeltaMapper model for editing human face from here, and then place it into the folder ./checkpoints .
Some inference data are provided in ./examples.

To produce editing results please run the following commands :

CUDA_VISIBLE_DEVICES=1 python scripts/inference.py --target "chubby face","face with eyeglasses","face with smile","face with pale skin","face with tanned skin","face with big eyes","face with black clothes","face with blue suit","happy face","face with bangs","face with red hair","face with black hair","face with blond hair","face with curly hair","face with receding hairline","face with bowlcut hairstyle"

The produced results are showed in the following.

You can also specify your desired target attributes to the flag of --target.

Inference for real images

The main inferece script is placed in ./scripts/inference_real.py.
Inference arguments can be found at ./options/test_options.py.
Download the pretrained DeltaMapper model for editing human face from here, and then place it into the folder ./checkpoints .
Download the pretrained e4e encoder e4e_ffhq_encode.pt from e4e.
One test image is provided in ./test_imgs.

To produce editing results please run the following commands :

CUDA_VISIBLE_DEVICES=1 python scripts/inference_real.py --target "chubby face","face with eyeglasses","face with smile","face with pale skin","face with tanned skin","face with big eyes","face with black clothes","face with blue suit","happy face","face with bangs","face with red hair","face with black hair","face with blond hair","face with curly hair","face with receding hairline","face with bowlcut hairstyle"

Results

Acknowledgements

This code is developed based on the code of orpatashnik/StyleCLIP by Or Patashnik et al.

Citation

If you use this code for your research, please cite our paper:

@InProceedings{lyu2023deltaedit,
    author    = {Lyu, Yueming and Lin, Tianwei and Li, Fu and He, Dongliang and Dong, Jing and Tan, Tieniu},
    title     = {DeltaEdit: Exploring Text-free Training for Text-Driven Image Manipulation},
    booktitle = {Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
    year      = {2023},
}

deltaedit's People

Contributors

Stargazers

Watchers

Forkers

taskkk shuxiangguo pinglmlcv alexanderbi1984 apollohuang1 ashly555 lijuny sunpro108 liviust yikang-he

deltaedit's Issues

error

Loading stylegan weights from pretrained!
Traceback (most recent call last):
File "/home/akira/DeltaEdit/scripts/inference_real.py", line 206, in
main(opts)
File "/home/akira/DeltaEdit/scripts/inference_real.py", line 189, in main
fake_delta_s = net(latent_s, delta_c)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/home/akira/DeltaEdit/./delta_mapper.py", line 60, in forward
x_coarse = torch.cat([s_coarse, torch.stack([c_coarse]*3, dim=1)], dim=2) #[b,3,1024]
RuntimeError: Sizes of tensors must match except in dimension 2. Expected size 2 but got size 1 for tensor number 1 in the list.
我自己上传了一张图片，为什么报这个错误了

Hello, I am a big fan of Deltaedit and I need your help with a question.

Where is the code for the frame position of the training part above?

About using your own pictures for reasoning tests

This is an amazing job！ I want to test inference with my own dataset, how should I do it?

RuntimeError: Function FusedLeakyReLUFunctionBackward

I really like your work. I've encountered a problem using DeltaEdit for a different field, but during the training process, I encountered the following error:

Traceback (most recent call last): File "scripts/train.py", line 63, in main(opts) File "scripts/train.py", line 52, in main loss.backward() File "/root/autodl-tmp/conda/envs/Del/lib/python3.8/site-packages/torch/tensor.py", line 221, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph) File "/root/autodl-tmp/conda/envs/Del/lib/python3.8/site-packages/torch/autograd/init.py", line 130, in backward Variable._execution_engine.run_backward( RuntimeError: Function FusedLeakyReLUFunctionBackward returned an invalid gradient at index 1 - got [4] but expected shape compatible with [512]

Is there any solution to this problem?

pre-trained models

Do you have plans to share pre-trained models of LSUN cats, churches, and horses?

Some questions about the implementation of open source code.

Thank you so much for open source. I have a few questions about some details:

1.I noticed that in your inference.py file, you use a file called fs3.npy. Could you explain what the data in this file means and how I can obtain it? I couldn't find any code in your repository for generating this file.

2.I also noticed that in your train.py file, you don't set a max_iters parameter, and only iterate through the dataset once. Does this mean that the delta mapper doesn't need multiple rounds of training?

3.Your open source code doesn't seem to include the code for cspace_ffhq_feat.npy and sspace_ffhq_feat.npy. Could you provide this code or tell me where I can find it?

4.When I tried editing the sample images you provided with the default parameters, I got the same results as shown in the paper for the provided texts. However, when I tried other texts, the results weren't very good. But when I used the text "Blue hair", I got a result that was close to the "Blue suit" in the paper. Have you encountered this situation before?

How to extend other editing need, such as face with a single eyelids.

How to extend other editing requirements, such as generating a face with a single eyelid. I executed the following command :

python scripts/inference_real.py --target "face with big eyes","face with single eyelids"

but the output indicated that the third image, which was supposed to be a face with a single eyelid, had no effect.

training and inference code for the LSUN cat, church, horse

I meet some problems when I try to edit the horse and cat, can you release training and inference code for the LSUN cat, church, horse please? Thank you very much

About Figure 2 in the paper

Congratulations on your paper being accepted by CVPR 2023！
Regarding Figure 2 in your paper,

can you provide the implementation code?
I tried hard to reproduce this result but failed. Can you elaborate on how the CLIP Delta feature is calculated here?

I tried to use the pre-trained face model for hair color editing process,Is this normal?

Hi, I am playing with your DeltaEdit project and when I try to diversify the edit hair color it does not work, this is the result of my run of editing the pink color.
origin：

result：

How to extract feat in c/s/wspace_img_feat.npy

Hi, thanks for releasing this nice work!

If I want to inference with an image in the wild instead of FFHQ dataset, how can I extract the parameters in c/s/wspace_img_feat.npy file?
Could you share this preprocess script?

Thanks

Training Error

Hello! When I run the CUDA_VISIBLE_DEVICES=0 python generate_codes.py --classname ffhq --samples 200000 I get the following error:

Traceback (most recent call last):
File "/.../DeltaEdit/generate_codes.py", line 118, in
netG = Generator(args.size, 512, 8).to(device)

RuntimeError: The NVIDIA driver on your system is too old (found version 11040). Please update your GPU driver by downloading and installing a new version from the URL: http://www.nvidia.com/Download/index.aspx Alternatively, go to: https://pytorch.org to install a PyTorch version that has been compiled with your version of the CUDA driver.

What should I do?

Thanks in advance.

About some problems encountered in the training process

Thank you very much for the open source, but I am having some problems with the training:
1, I looked at the training code, the whole training process is only traversing the training data set once (that is, 200000 + 58000)? But when I got to 201000, I finished the training directly without any error report, is this normal?

2, I noticed in your paper that you use the remaining 12,000 FFHQ images for evaluation, but I can't find the evaluation code and dataset, the parameter val_interval is not used, how should I evaluate my trained model?

Hi, Why close my issue?

          Hi, Why close my issue? Part of my problem is not yet solved, I hope to get your reply if possible.

Originally posted by @junyizeng in #3 (comment)

about head pose

Hi, I tried many prompt to change the head pose, but all failed. Does it meet expectations?
If it meet expectations, how can I edit the head pose?

So can we get a new w+ vector based on these three parameters?

Hello, your work is excellent!
I have a question I'd like to ask you.
You know, we can get the new image after editing the updated style vector guided by three parameters of S (stylegan2 space vector), the original image obtained by e4e latent space vector latent (w+), and stylegan2 generator.
So can we get a new w+ vector based on these three parameters?

Problem with a package conflict when configuring a torch environment with conda

Switch to pip installation