mchong6 / jojogan Goto Github PK

View Code? Open in Web Editor NEW

1.4K 1.4K 202.0 76.61 MB

Official PyTorch repo for JoJoGAN: One Shot Face Stylization

License: MIT License

Python 0.47% C++ 0.01% Cuda 0.06% Jupyter Notebook 99.45%

anime gans image-translation

jojogan's People

Contributors

Stargazers

Watchers

Forkers

ak391 nolophe zhanghongyong123456 truematthewkirkham sibtainrazajamali bigdatasciencegroup ricklentz jiharal yinglang lincong666 leoniuschen scott-mao perfectzh boston-university-projects ozw1z5rd yauhenbichel btobab houcembm ossamajali techthiyanes bluseking cesarsnoronha hercules261188 peterzhousz zi-ning mathiasfls 1987981838 ulandz leehm00 mrkangzubin ml-ai-for-all xiedake panggnotlovebean limaolin1945 looweex joshuashi-19 ducbx shujuzhou liuxiaolong19920720 yepman0620 bgkyer wangxiaoming-wxm 7shu codingmice onenewworld dut3062796s shinyun rodolfoferro sunnyregion intjun xmdszzz hadryan mol310 jinwook-shim phucnguyen250300 patprem dev2l0per txing-casia linker666 0xpussycat rtx3080ti laplacekorea w-copper kingofprank rahul75 xuejianbujia cedro3 niarepo ingenieromora doytsujin vijaykrishnavanshi harry45 kiberchaika arashandalib knut0815 pingponglabs jjandnn helloneel ne3lakolkar chnxuangithub aiiotlabs r-m77 uptodiff cv-ip yanglin1997 osushiski mabu-dev dionylon yamguocheng abinashpun rajmaninov1 vtalker myhan1996 georggr gradient-ai macroustc taktak1 bhmortim specialvan jerryavance

jojogan's Issues

Face not detected error.

I have tried two different images, one with a white background and one with a brown background, but both give the same error below. What are the suggestions for the photo being modified?

---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-14-81c009706d54> in <module>()
     11 
     12 # aligns and crops face
---> 13 aligned_face = align_face(filepath)
     14 
     15 # my_w = restyle_projection(aligned_face, name, device, n_iters=1).unsqueeze(0)

1 frames
/content/JoJoGAN/util.py in get_landmark(filepath, predictor)
    110     img = dlib.load_rgb_image(filepath)
    111     dets = detector(img, 1)
--> 112     assert len(dets) > 0, "Face not detected, try another face image"
    113 
    114     for k, d in enumerate(dets):

AssertionError: Face not detected, try another face image

Hi! very interesting work! Thanks for providing the resource for playing. BTW, could u provide more details in training the jinx style ? since there is a little difference between the model trained with Colab (based on jinx only) and the pretrained model, as shown below:

my setting is listed:
num_iter=2000
preserve_color: false
Thanks for help

Different results with automatic alignment and manual crop

I took a picture with a face in it and generated an image using the align_face function. Then, using the same picture, I generated another image by manual cropping as mentioned in #18 (comment). When I pass both of these through the e4e_projection function and view the final image, the results are very different although the images which were passed are very similar. Do you have any idea of why this might be?

finetuning question

On finetuning, Stylegan's style mapping network(FC layer) is trained??

Cuda run out of memory

(jojo) (jojo) PS C:\Users\Admin\Documents\JoJoGAN> python train_custom_style.py --model_name sophie --alpha 0.0 --preserve_color False --num_iter 300 --device cuda 0%| | 0/300 [00:02<?, ?it/s] Traceback (most recent call last): File "train_custom_style.py", line 103, in <module> fake_feat = discriminator(img) File "C:\Users\Admin\MiniConda3\envs\jojo\lib\site-packages\torch\nn\modules\module.py", line 1102, in _call_impl return forward_call(*input, **kwargs) File "C:\Users\Admin\Documents\JoJoGAN\model.py", line 665, in forward out = block(out) File "C:\Users\Admin\MiniConda3\envs\jojo\lib\site-packages\torch\nn\modules\module.py", line 1102, in _call_impl return forward_call(*input, **kwargs) File "C:\Users\Admin\Documents\JoJoGAN\model.py", line 621, in forward out = self.conv2(out) File "C:\Users\Admin\MiniConda3\envs\jojo\lib\site-packages\torch\nn\modules\module.py", line 1102, in _call_impl return forward_call(*input, **kwargs) File "C:\Users\Admin\MiniConda3\envs\jojo\lib\site-packages\torch\nn\modules\container.py", line 141, in forward input = module(input) File "C:\Users\Admin\MiniConda3\envs\jojo\lib\site-packages\torch\nn\modules\module.py", line 1102, in _call_impl return forward_call(*input, **kwargs) File "C:\Users\Admin\Documents\JoJoGAN\model.py", line 126, in forward padding=self.padding, File "C:\Users\Admin\Documents\JoJoGAN\op\conv2d_gradfix.py", line 32, in conv2d ).apply(input, weight, bias) File "C:\Users\Admin\Documents\JoJoGAN\op\conv2d_gradfix.py", line 138, in forward out = F.conv2d(input=input, weight=weight, bias=bias, **common_kwargs) RuntimeError: CUDA out of memory. Tried to allocate 256.00 MiB (GPU 0; 8.00 GiB total capacity; 4.95 GiB already allocated; 0 bytes free; 5.49 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Is there any way to lower the batch size? how to do that and I don't have any Idea whether that can work, but please can you still fix this problem.

How to choose the swapping layers?

In the notebook, you choose [9,11,15,16,17] as the swapping layer. I wonder about the consideration of this choice.
Thanks in advance.

CUDA out of memory

hello, I try to run "Train with your own style images" but got "CUDA out of memory" (2080Ti, 12G GPU memory), can you tell me how much GPU memory the Train process cost?

IndexError: list index out of range

when device is set to cpu in colab and hardware accelerator is none

No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda'

IndexError Traceback (most recent call last)
in ()
24 from tqdm import tqdm
25 import lpips
---> 26 from model import *
27 from e4e_projection import projection as e4e_projection
28 from restyle_projection import projection as restyle_projection

7 frames
/usr/local/lib/python3.7/dist-packages/torch/utils/cpp_extension.py in _get_cuda_arch_flags(cflags)
1604 arch_list.append(arch)
1605 arch_list = sorted(arch_list)
-> 1606 arch_list[-1] += '+PTX'
1607 else:
1608 # Deal with lists that are ' ' separated (only deal with ';' after)

IndexError: list index out of range

最高(さいこう)に「ハイ!」ってやつだ

我真是嗨到不行了！

replicate.exceptions.ModelError: name 'latent_dim' is not defined

Hello, thank you for your awesome work!

Instead of the pretrained model, I tried to convert the image by uploading a style image with replicate API.
But an error has occured.

Traceback (most recent call last):
  File "C:\\Users\\SSAFY\\PycharmProjects\\JoJoGAN\\jojogan_api.py", line 45, in <module>
    output = version.predict(**inputs)
  File "C:\\Users\\SSAFY\\anaconda3\\envs\\reboot_JoJoGAN\\lib\\site-packages\\replicate\\version.py", line 31, in predict
    raise ModelError(prediction.error)
replicate.exceptions.ModelError: name 'latent_dim' is not defined

The same error occurs on the demo page.

Ampere incompatibility?

Hello and thank you so much for the amazing project.

My problem is that the setup process described in the repo seems to not work for Ampere GPUs (in my case RTX 3080 Ti).

First I use the e4e/environment/e4e_env.yaml to create the Conda env. Then I follow the commands in the first cell of stylize.ipynb. However, I get ValueError: Unknown CUDA arch (8.6) or GPU not supported.

I think this may be because the default CUDA installed is 10.x, but my attempts to fix this by setting up the env differently have so far been unsuccessful. Would it be possible to add a fix for Ampere GPUs? Thanks in advance!

Perceptual loss: LPIPS vs. StyleGAN Discriminator

Hi!

Thanks for sharing this awesome work :-)

I'm wondering on the difference in perceptual image quality when using the LPIPS model (as stated in the paper) vs. the StyleGAN discriminator (as used in the updated collab notebook) for the perceptual loss.
In your experience, what kind of difference does using the StyleGAN discriminator have on the image quality, when compared to using LPIPS?

save finetuned model

are the finetuned models saved currently in the colab?

alternative to dlib for face alignment

dlib is very slow to build, possible to use a alternative like from here https://github.com/happy-jihye/FFHQ-Alignment, for example https://github.com/happy-jihye/FFHQ-Alignment/blob/master/FFHQ-Alignmnet/ffhq-align.py

more example usage here: https://github.com/happy-jihye/FFHQ-Alignment/blob/master/FFHQ-Alignmnet/FFHQ-Alignment.ipynb

supergirl.pt and supergirl_preserve_color.pt not find?

resume training

I'm interested in this grateful project.
And I've tried to train my own model on Colab.
Please let me know how to resume training.

Does not working on cuda:1

Hello. Thank you for providing a greate code.

I have a issue on running prediction.

if i predice on cuda:1, the inference is not working.....

i trace the code step by step and I found "op/fused_act.py" load fused_bias_Act.cpp and fused_bias_act_kernel.cu.
that cpp code cannot another gpu...

how can i predict with other gpu?.....

e4e_ffhq_encode.pt can not be downloaded

I can not download e4e_ffhq_encode.pt file neither via pydrive, neither via gdown, neither via GoogleDrive graphical interface.
Screenshots:

Could you release training codes?

In colab, the cartoon data is generated from pretrained models that are download from google drive.
Could you share how you get the cartoon pretrained models?
Because I saw your paper and tried to reproduce step 1 and 2, but I cannot reproduce them.

A question about how to preserve colors

The following methods are used in the source code to preserve the color
if preserve_color:
id_swap = [9,11,15,16,17]
else:
id_swap = list(range(7, generator.n_latent))

I wanna know how to get the exact ID of these layers.
Are there some papers that introduce such things? Or through a lot of attempts to know which layers control the color?

about

Apply JoJoGAN on car

Hi all, did anyone try to apply JoJoGAN on car images (or other kinds of images)? I tried to replace both the e4e pretrained weight file and StyleGAN2 pretrained weight file with the car-specific one and then finetuned the StyleGAN generator. But the result was not good. It seems that the generator was not finetuned at all...

The size of tensor a (4) must match the size of tensor b (3) at non-singleton dimension 0

I am confused by this error. I have uploaded my own style images and used the supplied iu.jpeg file to transform. I get this error in the last cell. I have verified that the style images are 3 channel images.

How many style images are needed? Does the format of the images matter?

change the encoder to restyle

Hello, I would like to ask, I read in the paper that you use the restile to Gan inversion, but it seems that the encoder is used in the code is e4e , and in the colab it has not been updated , I want to reproduce the effect in the paper（use ReStyle）, to achieve in the colab, what should I do？ tks

Check out "Microsoft Start: News & more"

https://play.google.com/store/apps/details?id=com.microsoft.amp.apps.bingnews

input must be a CUDA tensor

when set device to cpu in Colab, e4e_projection gives error "input must be a CUDA tensor"

results on videos

on video input, would be interesting how it performs

Face not detected. Try a different image.

Tried using 3-4 different images, but getting the same error in all cases.

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-16-59b843d3fc1f> in <module>()
     13 try:
---> 14     aligned_face = align_face(filepath)
     15 except:

2 frames
/content/JoJoGAN/util.py in align_face(filepath, output_size, transform_size, enable_padding)
    131     predictor = dlib.shape_predictor("models/dlibshape_predictor_68_face_landmarks.dat")
--> 132     lm = get_landmark(filepath, predictor)
    133 

/content/JoJoGAN/util.py in get_landmark(filepath, predictor)
    109 
--> 110     img = dlib.load_rgb_image(filepath)
    111     dets = detector(img, 1)

RuntimeError: Unable to open file: test_input/content/JoJoGAN/pic.jpeg

During handling of the above exception, another exception occurred:

Exception                                 Traceback (most recent call last)
<ipython-input-16-59b843d3fc1f> in <module>()
     14     aligned_face = align_face(filepath)
     15 except:
---> 16     raise Exception('Face not detected. Try a different image.')
     17 
     18 # my_w = restyle_projection(aligned_face, name, device, n_iters=1).unsqueeze(0)

Exception: Face not detected. Try a different image.

hitting google drive limits, torch/huggingface hub

hitting drive limits on downloading models, another solution other than pydrive is weights could be under project release

see torch hub

https://pytorch.org/docs/stable/hub.html

"Pretrained weights can either be stored locally in the github repo, or loadable by torch.hub.load_state_dict_from_url(). If less than 2GB, it’s recommended to attach it to a project release and use the url from the release. In the example above torchvision.models.resnet.resnet18 handles pretrained, alternatively you can put the following logic in the entrypoint definition."

and a similar example from animegan hubconf, although the weights are much smaller in size here

https://github.com/bryandlee/animegan2-pytorch/blob/main/hubconf.py

or the models can be hosted on huggingface, see

https://huggingface.co/models

How to train with many references image without running out of memory?

Currently, I keep getting CUDA run out of memory error if I use more than 4 reference image with 16GB GPU. Is there a way to train with more images? I'd like the model to be more general.

How I can convert into CoreML?

Amazing work. Can you please guide me how to convert your model into CoreML?

requirements.txt

Could you please add requirements.txt file.
I would like to run the code locally in a docker container.

Other than faces?

How can I run on something other than faces? When using another image it tells me face is not detected.

pip dependancy error

Context: running setup on Colab
Error:

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
imbalanced-learn 0.8.1 requires scikit-learn>=0.24, but you have scikit-learn 0.22 which is incompatible.

wandb integration

for the finetuning stage in colab, wandb is useful to track metrics, see https://github.com/danielroich/PTI#weights-and-biases for example. This would be helpful when running multiple experiments in colab, if you have time, otherwise I can also look into this as a PR. Thanks

style image pairs num

Hi, I have some questions about the number of finetune data pairs. According to stylize.ipynb's part: Finetune StyleGAN, I find the variable "random_alpha" is not be used. If use only one reference style image, then I only have one pair to finetune the styleGAN？Could you plz tell me what am I doing wrong? Thanks a lot.

Style modulation layers, style parameters and controlling low-level features.

Hey, great work!
I had a couple of queries:

In the paper it is mentioned that there are 26 style modulation layers, but in the code it seems to be 18 as n_latent = 18.
What exactly do the style parameters s(w) correspond to in the code?
For the pertained styles, is there any way to control low-level features like eyes, nose, etc. without training again? I know that while fine-tuning we can use blending (using RIS) and different masks for controlling them but is there any way for a model which is already fine-tuned?
I see a change in results when fine-tuning the model using JoJo's photo. Is it because of e4e being used instead of ReStyle in the code?

ModelError

replicate.exceptions.ModelError: stack expects a non-empty TensorList

i

Wonder if you have ever tried to decrease the number of style modulation layers

Thanks for the amazing work.
I am looking into the paper and found you mentioned that 26 style modulation layers were used to map the feature into style space.
I wonder if the style modulation layers is the number of MLP in mapping_network?
Also if I reduce this number from 26 to 8, how the quality will drop?
Appreciate your reply in advance.

How many paired images of dataset C in the paper

Hi, thank you for sharing source code.
I can't find how many paired image of dataset C. From your experiment, at least how many paired (wi, y) can have a good result?

Using another pretrained StyleGAN2

Hi,

I'm playing with your notebook (awesome work btw!) and I try to give it another pretrained GAN from Awesome Pretrained StyleGAN2.

I used the anime one (PyTorch implementation from here) but I get

TypeError                                 Traceback (most recent call last)

[<ipython-input-10-2395bdddca96>](https://localhost:8080/#) in <module>()
     22 
     23 #print(ckpt)
---> 24 generator.load_state_dict(ckpt["g"], strict=False)
     25 
     26 #@title Generate results

TypeError: 'Generator' object is not subscriptable

Do I need further operation on the model to make it compatible?

Thank you