Giter Club home page Giter Club logo

deltaedit's Introduction

DeltaEdit: Exploring Text-free Training for Text-driven Image Manipulation

Overview

This repository contains the offical PyTorch implementation of paper:

DeltaEdit: Exploring Text-free Training for Text-driven Image Manipulation, CVPR 2023

News

  • [2023-03-11] Upload the training and inference code for the facial domain (◍•ڡ•◍).

To be continued...

We will release the training and inference code for the LSUN cat, church, horse later : )

Dependences

  • Install CLIP:

    conda install --yes -c pytorch pytorch=1.7.1 torchvision cudatoolkit=<CUDA_VERSION>
    pip install ftfy regex tqdm gdown
    pip install git+https://github.com/openai/CLIP.git
  • Download pre-trained models :

    • The code relies on the Rosinality pytorch implementation of StyleGAN2.
    • Download the pre-trained StyleGAN2 generator model for the faical domain from here, and then place it into the folder ./models/pretrained_models.
    • Download the pre-trained StyleGAN2 generator model for the LSUN cat, church, horse domains from here and then place them into the folder ./models/pretrained_models/stylegan2-{cat/church/horse}.

Training

Data preparing

  • DeltaEdit is trained on latent vectors.

  • For the facial domain, 58,000 real images from FFHQ dataset are randomly selected and 200,000 fake images from the z space in StyleGAN are sampled for training. Note that all real images are inverted by e4e encoder.

  • Download the provided FFHQ latent vectors from here and then place all numpy files into the folder ./latent_code/ffhq.

  • Generate the 200,000 sampled latent vectors by running the following commands for each specific domain:

    CUDA_VISIBLE_DEVICES=0 python generate_codes.py --classname ffhq --samples 200000
    CUDA_VISIBLE_DEVICES=0 python generate_codes.py --classname cat --samples 200000
    CUDA_VISIBLE_DEVICES=0 python generate_codes.py --classname church --samples 200000
    CUDA_VISIBLE_DEVICES=0 python generate_codes.py --classname horse --samples 200000

Usage

  • The main training script is placed in ./scripts/train.py.
  • Training arguments can be found at ./options/train_options.py.

For training please run the following commands:

CUDA_VISIBLE_DEVICES=0 python scripts/train.py

Inference

  • The main inferece script is placed in ./scripts/inference.py.
  • Inference arguments can be found at ./options/test_options.py.
  • Download the pretrained DeltaMapper model for editing human face from here, and then place it into the folder ./checkpoints .
  • Some inference data are provided in ./examples.

To produce editing results please run the following commands :

CUDA_VISIBLE_DEVICES=1 python scripts/inference.py --target "chubby face","face with eyeglasses","face with smile","face with pale skin","face with tanned skin","face with big eyes","face with black clothes","face with blue suit","happy face","face with bangs","face with red hair","face with black hair","face with blond hair","face with curly hair","face with receding hairline","face with bowlcut hairstyle"

The produced results are showed in the following.

You can also specify your desired target attributes to the flag of --target.

Inference for real images

  • The main inferece script is placed in ./scripts/inference_real.py.
  • Inference arguments can be found at ./options/test_options.py.
  • Download the pretrained DeltaMapper model for editing human face from here, and then place it into the folder ./checkpoints .
  • Download the pretrained e4e encoder e4e_ffhq_encode.pt from e4e.
  • One test image is provided in ./test_imgs.

To produce editing results please run the following commands :

CUDA_VISIBLE_DEVICES=1 python scripts/inference_real.py --target "chubby face","face with eyeglasses","face with smile","face with pale skin","face with tanned skin","face with big eyes","face with black clothes","face with blue suit","happy face","face with bangs","face with red hair","face with black hair","face with blond hair","face with curly hair","face with receding hairline","face with bowlcut hairstyle"

Results

results

Acknowledgements

This code is developed based on the code of orpatashnik/StyleCLIP by Or Patashnik et al.

Citation

If you use this code for your research, please cite our paper:

@InProceedings{lyu2023deltaedit,
    author    = {Lyu, Yueming and Lin, Tianwei and Li, Fu and He, Dongliang and Dong, Jing and Tan, Tieniu},
    title     = {DeltaEdit: Exploring Text-free Training for Text-Driven Image Manipulation},
    booktitle = {Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
    year      = {2023},
}

deltaedit's People

Contributors

yueming6568 avatar

Stargazers

Xuan Chen avatar  avatar  avatar  avatar Xiyu Zhang avatar Koorye avatar Chaolei Tan avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar Rui Zhang avatar  avatar H3c avatar  avatar Yidi Jiang avatar Jeff Carpenter avatar a avatar OxygenLu avatar Yoon, Seungje avatar Dongwoo Im avatar Ma Xinyin avatar Florin Shen avatar  avatar haddis3 avatar  avatar  avatar  avatar  avatar  avatar Michael avatar  avatar  avatar 吴小龙 avatar Mohan Zhou avatar Yuxiang Wei avatar  avatar Peng Liu avatar 加冰不加糖 avatar Yongheng Ma avatar Jie Cao avatar  avatar  avatar  avatar herong avatar  avatar Xiaolong Shen avatar  avatar  avatar  avatar Chaoqi avatar Radek avatar PL avatar  avatar Joan A Pinol avatar adakoda avatar Junichi Shimizu avatar SheldonHur avatar  avatar Xiaolong avatar Ruan Liheng avatar  avatar  avatar Yu-Lei Li avatar Qinghe Wang avatar Dong ZHANG avatar Daniel Eddings avatar Sandalots avatar 爱可可-爱生活 avatar  avatar Dong An avatar  avatar  avatar August1996 avatar Okrin avatar Haomiao Ni avatar  avatar Bing Fan avatar  avatar  avatar  avatar xxx avatar YANHONG ZENG avatar  avatar professor_xu avatar YqGao716 avatar sijin720 avatar  avatar JiahuiZhan avatar Hai Wang avatar Tenghao avatar Haokun Lin avatar  avatar Dongze Li avatar  avatar JoelleJ avatar Weinan Guan avatar

Watchers

Sanctuary avatar Shafee Hassan avatar  avatar

deltaedit's Issues

error

Loading stylegan weights from pretrained!
Traceback (most recent call last):
File "/home/akira/DeltaEdit/scripts/inference_real.py", line 206, in
main(opts)
File "/home/akira/DeltaEdit/scripts/inference_real.py", line 189, in main
fake_delta_s = net(latent_s, delta_c)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/home/akira/DeltaEdit/./delta_mapper.py", line 60, in forward
x_coarse = torch.cat([s_coarse, torch.stack([c_coarse]*3, dim=1)], dim=2) #[b,3,1024]
RuntimeError: Sizes of tensors must match except in dimension 2. Expected size 2 but got size 1 for tensor number 1 in the list.
我自己上传了一张图片,为什么报这个错误了

RuntimeError: Function FusedLeakyReLUFunctionBackward

I really like your work. I've encountered a problem using DeltaEdit for a different field, but during the training process, I encountered the following error:

Traceback (most recent call last): File "scripts/train.py", line 63, in main(opts) File "scripts/train.py", line 52, in main loss.backward() File "/root/autodl-tmp/conda/envs/Del/lib/python3.8/site-packages/torch/tensor.py", line 221, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph) File "/root/autodl-tmp/conda/envs/Del/lib/python3.8/site-packages/torch/autograd/init.py", line 130, in backward Variable._execution_engine.run_backward( RuntimeError: Function FusedLeakyReLUFunctionBackward returned an invalid gradient at index 1 - got [4] but expected shape compatible with [512]

Is there any solution to this problem?

pre-trained models

Do you have plans to share pre-trained models of LSUN cats, churches, and horses?

Some questions about the implementation of open source code.

Thank you so much for open source. I have a few questions about some details:

1.I noticed that in your inference.py file, you use a file called fs3.npy. Could you explain what the data in this file means and how I can obtain it? I couldn't find any code in your repository for generating this file.

2.I also noticed that in your train.py file, you don't set a max_iters parameter, and only iterate through the dataset once. Does this mean that the delta mapper doesn't need multiple rounds of training?

3.Your open source code doesn't seem to include the code for cspace_ffhq_feat.npy and sspace_ffhq_feat.npy. Could you provide this code or tell me where I can find it?

4.When I tried editing the sample images you provided with the default parameters, I got the same results as shown in the paper for the provided texts. However, when I tried other texts, the results weren't very good. But when I used the text "Blue hair", I got a result that was close to the "Blue suit" in the paper. Have you encountered this situation before?
image
image
演示delta

How to extend other editing need, such as face with a single eyelids.

How to extend other editing requirements, such as generating a face with a single eyelid. I executed the following command :

python scripts/inference_real.py --target "face with big eyes","face with single eyelids"

but the output indicated that the third image, which was supposed to be a face with a single eyelid, had no effect.
0001

About Figure 2 in the paper

Congratulations on your paper being accepted by CVPR 2023!
Regarding Figure 2 in your paper,

  1. can you provide the implementation code?
  2. I tried hard to reproduce this result but failed. Can you elaborate on how the CLIP Delta feature is calculated here?

How to extract feat in c/s/wspace_img_feat.npy

Hi, thanks for releasing this nice work!

If I want to inference with an image in the wild instead of FFHQ dataset, how can I extract the parameters in c/s/wspace_img_feat.npy file?
Could you share this preprocess script?

Thanks

Training Error

Hello! When I run the CUDA_VISIBLE_DEVICES=0 python generate_codes.py --classname ffhq --samples 200000 I get the following error:

Traceback (most recent call last):
File "/.../DeltaEdit/generate_codes.py", line 118, in
netG = Generator(args.size, 512, 8).to(device)

RuntimeError: The NVIDIA driver on your system is too old (found version 11040). Please update your GPU driver by downloading and installing a new version from the URL: http://www.nvidia.com/Download/index.aspx Alternatively, go to: https://pytorch.org to install a PyTorch version that has been compiled with your version of the CUDA driver.

What should I do?

Thanks in advance.

About some problems encountered in the training process

Thank you very much for the open source, but I am having some problems with the training:
1, I looked at the training code, the whole training process is only traversing the training data set once (that is, 200000 + 58000)? But when I got to 201000, I finished the training directly without any error report, is this normal?
image

2, I noticed in your paper that you use the remaining 12,000 FFHQ images for evaluation, but I can't find the evaluation code and dataset, the parameter val_interval is not used, how should I evaluate my trained model?
image

about head pose

Hi, I tried many prompt to change the head pose, but all failed. Does it meet expectations?
If it meet expectations, how can I edit the head pose?

So can we get a new w+ vector based on these three parameters?

image

Hello, your work is excellent!
I have a question I'd like to ask you.
You know, we can get the new image after editing the updated style vector guided by three parameters of S (stylegan2 space vector), the original image obtained by e4e latent space vector latent (w+), and stylegan2 generator.
So can we get a new w+ vector based on these three parameters?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.