Details in EG3D Inversion,about nvlabs/eg3d

Comments (48)

oneThousand1000 commented on August 15, 2024 6

Hi guys, I released my EG3D inversion code for your reference, you can find it here: EG3D-projector.

from eg3d.

luminohope commented on August 15, 2024 4

FYI We added additional scripts that can preprocess in-the-wild images compatible with the FFHQ checkpoints.
Hope that is useful.
#18 (comment)

from eg3d.

oneThousand1000 commented on August 15, 2024 4

FYI We added additional scripts that can preprocess in-the-wild images compatible with the FFHQ checkpoints. Hope that is useful. #18 (comment)

Hi! I found that all the faces in FFHQ Processed Data (download from the google drive link that you provided) are rotated so that two eyes are on a horizontal line. But the uploaded scripts seems to do no rotation. Does this matter?

The first image I uploaded is the image in FFHQ Processed Data, I processed the raw image of 00000 using your uploaded scripts and got the second image.

I also find that the uploaded scripts outputs camera parameters that are different from dataset.json. Maybe it is caused by the missing rotation?

The camera parameter that predicted by uploaded scripts:
[
0.944381833076477,
-0.011193417012691498,
0.32866042852401733,
-0.828210463398311,
-0.010220649652183056,
-0.9999367594718933,
-0.004687247332185507,
0.005099154064645238,
0.32869213819503784,
0.0010674281511455774,
-0.9444364905357361,
2.5698329570120664,
0.0,
0.0,
0.0,
1.0,
4.2647,
0.0,
0.5,
0.0,
4.2647,
0.5,
0.0,
0.0,
1.0
]
The camera parameter in dataset.json:
[
0.9422833919525146,
0.034289587289094925,
0.3330560326576233,
-0.8367999667889383,
0.03984849900007248,
-0.9991570711135864,
-0.009871904738247395,
0.017018394869192363,
0.33243677020072937,
0.022573914378881454,
-0.9428553581237793,
2.566997504832856,
0.0,
0.0,
0.0,
1.0,
4.2647,
0.0,
0.5,
0.0,
4.2647,
0.5,
0.0,
0.0,
1.0
]

from eg3d.

luminohope commented on August 15, 2024 3

Hi,

You are right that the alignment script from Deep3DFaceRecon does not do rotation alignment, but it should put the alignment in a ballpark and should be good enough for downstream tasks like GAN inversion etc.

Please note that for the FFHQ preprocessing, please do NOT use this script. Instead use the FFHQ preprocessing script for the proper reproduction. The role alignment will be taken care of as part of the FFHQ preprocessing script here. The rest of the processing should be the same as the new "in-the-wild script".

from eg3d.

e4s2022 commented on August 15, 2024 2

@mlnyang, I got the similar results as yours.

I use the well-aligned & cropped FFHQ images (in 1024 resolution), then I resize into 512 to do the subsequent PTI inversion. To be more specific, say, I choose the "00999.png" as the input. Since the camera parameters (25 = 4x4 + 3x3) are provided in dataset.json, so I directly use it. The camara parameters are fixed while the w latent code is trainable. The following are my results:

However, when I follow the FFHQ preprocessing steps in EG3D which basically contain (1) aligning & cropping in-the-wild image to 1500 size; (2) re-aligning to 1024 & center cropping to 700; (3) resizing to 512, the results seem good:

I guess the different underlying preprocessings might be the reason. When you tried PTI on joker, how did you preprocess?

from eg3d.

oneThousand1000 commented on August 15, 2024 1

Hi!
I think the [image align code](https://github.com/Puzer/stylegan-encoder/blob/master/align_images.py) that provided by [stylegan-encoder](https://github.com/Puzer/stylegan-encoder) may be useful to rotate the image, if you want to get the similar re-aligned image in FFHQ Processed Data.

Maybe for in-the-wild images, the rotation is an unnecessary step.

I modified the code, used it to rotate and crop the image, it seems that after rotation the resulting image is consistent with the one in FFHQ Processed Data.

from eg3d.

cantonioupao commented on August 15, 2024

Is it possible to share the code for the PTI inversion?

from eg3d.

zhangqianhui commented on August 15, 2024

What's about your results for the first step (inversion), do you just use lpips loss, following PTI ?

from eg3d.

oneThousand1000 commented on August 15, 2024

What's about your results for the first step (inversion), do you just use lpips loss, following PTI ?

Yes, my code is based on the w projector of PTI. It seems that the inversion works best on portraits that look straight ahead. I think I achieved the same performance as the authors'.

IMG_0604.MOV

from eg3d.

zhangqianhui commented on August 15, 2024

Ok, great!

from eg3d.

jiaxinxie97 commented on August 15, 2024

What's about your results for the first step (inversion), do you just use lpips loss, following PTI ?

Yes, my code is based on the w projector of PTI. It seems that the inversion works best on portraits that look straight ahead. I think I achieved the same performance as the authors'.

IMG_0604.MOV

Hi,

I try to inverse this portrait, but is seems that it can't invert a correct eye glasses shape, this can also produce reseanable result in input view. I want to ask if it is because you get the eye glasses after the pivot optimization(which I can't achieve), then this shape will perserve on generator optimization? Or you add other regularization?

Thank your for your time!

fitting1000_PTI.mp4

from eg3d.

oneThousand1000 commented on August 15, 2024

@jiaxinxie97
Hi jiaxinxie97,
I use the original eg3d checkpoints to generate video for the latent code, it seems that the eye glasses is reconstructed successfully, which indicates that I got the eye glasses before the pivot optimization.

I think you can check your projector code, I used this one (both w and w_plus are OK). The zip file I uploaded contains the input re-aligned image and the input camera parameters, you can check if they are consistent with yours.

video generated by the original eg3d checkpoints
https://user-images.githubusercontent.com/32099648/174772757-d316bc1d-de52-49a4-a863-6de166000450.mp4

input re-aligned image and the input camera parameters:
01457.zip

from eg3d.

jiaxinxie97 commented on August 15, 2024

Thanks! I also use PTI repo, but it is strange I can't reconstruct eye glasses using w or w+ space optimization, I will check! Since the original eg3d checkpoint do not have named_buffers(), so I removed the reg_loss, will it affect the results?

from eg3d.

oneThousand1000 commented on August 15, 2024

Thanks! I also use PTI repo, but it is strange I can't reconstruct eye glasses using w or w+ space optimization, I will check! Since the original eg3d checkpoint do not have named_buffers(), so I removed the reg_loss, will it affect the results?

Hi, named_buffers() is an attribute of the synthesis network in StyleGAN2, you can find it in the StyleGAN2Backbone (self.backbone) of the TriPlaneGenerator.

Try to use G.backbone.synthesis.named_buffers() instead of G.named_buffers(), and add the reg_loss.

from eg3d.

jiaxinxie97 commented on August 15, 2024

G.backbone.synthesis.named_buffers()
Hi, Thank you! I got reasonable result for the eye glasses.

from eg3d.

e4s2022 commented on August 15, 2024

Hi, @oneThousand1000

Did you set both the z and c as the trainable parameters during the GAN inversion? I guess fixing the c (which can be obtained from the dataset.json) and only inverting the z is more reasonable. What do you think?

from eg3d.

oneThousand1000 commented on August 15, 2024

Hi, @oneThousand1000

Did you set both the z and c as the trainable parameters during the GAN inversion? I guess fixing the c (which can be obtained from the dataset.json) and only inverting the z is more reasonable. What do you think?

I set w or w_plus as trainable parameters and fix the c.

from eg3d.

e4s2022 commented on August 15, 2024

Got it, thanks for your reply.

BTW, did you follow the FFHQ preprocessing steps in EG3D (i.e., realign to 1500 from in-the-wild images and then resize into 512), or directly use the well-aligned 1024 FFHQ image and just resize into 512?

from eg3d.

mlnyang commented on August 15, 2024

Hi @oneThousand1000,

Do you have any out of domain results?
I tried PTI by myself on FFHQ checkpoint, it works well on joker, but failed on celeba-HQ dataset.

from eg3d.

oneThousand1000 commented on August 15, 2024

follow the FFHQ preprocessing steps in EG3D

Got it, thanks for your reply.

BTW, did you follow the FFHQ preprocessing steps in EG3D (i.e., realign to 1500 from in-the-wild images and then resize into 512), or directly use the well-aligned 1024 FFHQ image and just resize into 512?

I followed the FFHQ preprocessing steps in EG3D.

from eg3d.

mlnyang commented on August 15, 2024

Hi @bd20222 , Thanks for sharing your work.

I think that's the main reason.
Actually, the joker is originally came from PTI repo, maybe it was already preprocessed joker image on FFHQ.

from eg3d.

e4s2022 commented on August 15, 2024

I took a look at the PTI aligning script, it seems the same as the original FFHQ.

I inspected the EG3D preprocessing and compared it with the original FFHQ. AFAIK, there is no center cropping step in the original FFHQ preprocessing, so you will find the faces used in EG3D show some vertical translation. I guess the well-trained EG3D model has captured this pattern, resulting in the blurry PTI inversion and the subsequent synthetic novel views seem to be a mixture of two faces.

It's interesting why the joker example works well.

from eg3d.

oneThousand1000 commented on August 15, 2024

Hi, please follow the "Preparing datasets" in reademe to get realigned images. According to #16 (comment), the original ffhq dataset is not work for the camera parameters of dataset.json, you should predict the camera parameters of original ffhq by yourself.

from eg3d.

e4s2022 commented on August 15, 2024

@oneThousand1000

Yuh, I agree. For those who want to directly use FFHQ well-aligned 1024 images, you have to predict the camera parameters by Deep3DFace_pytorch by yourself. But I haven't tested on the EG3D pre-trained model.

from eg3d.

oneThousand1000 commented on August 15, 2024

@oneThousand1000

Yuh, I agree. For those who want to directly use FFHQ well-aligned 1024 images, you have to predict the camera parameters by Deep3DFace_pytorch by yourself.

You can email the author and ask for the pose extraction code. Or refer to #18

from eg3d.

mlnyang commented on August 15, 2024

Oh I see.. center cropping was the problem.
I just tried other examples in PTI and it didn't worked. It is strange why the joker image works well.
Thanks for your help!! :)

from eg3d.

zhangqianhui commented on August 15, 2024

@oneThousand1000 Do you use the noise regularization loss in the first GAN inversion step ?

from eg3d.

oneThousand1000 commented on August 15, 2024

@oneThousand1000 Do you use the noise regularization loss in the first GAN inversion step ?

See #28 (comment)

from eg3d.

zhangqianhui commented on August 15, 2024

Thanks

from eg3d.

zhangqianhui commented on August 15, 2024

@oneThousand1000 I still have one question. Will be tuned for the parameters of triplane-decoder ? I used another 3D-GAN model (StyleSDF) which doesn't have the triplane generator, and I found the finetune of MLP parameters have harmed the geometry.

from eg3d.

BiboGao commented on August 15, 2024

Hi, @oneThousand1000,

I tried to use PTI to get pivot of an image, then in the gen_video.py file, I used the pivot to set zs, which original is set by random seeds: "zs = torch.from_numpy(np.stack([np.random.RandomState(seed).randn(G.z_dim) for seed in all_seeds])).to(device)".
I got the video, but the image is totally different with the previous. For the connection between PTI and EG3D, did I miss anything?
I noticed you said " I optimize the latent code 'w' and use it as the pivot to finetune eg3d." Do you mean we need generate the dataset and call "train.py" to finetune? If we have done finetune, why do we need PTI? I thought PTI is to get the latent as conditioning for EG3D.
Thanks for your help.

from eg3d.

oneThousand1000 commented on August 15, 2024

Hi, @oneThousand1000,

I tried to use PTI to get pivot of an image, then in the gen_video.py file, I used the pivot to set zs, which original is set by random seeds: "zs = torch.from_numpy(np.stack([np.random.RandomState(seed).randn(G.z_dim) for seed in all_seeds])).to(device)". I got the video, but the image is totally different with the previous. For the connection between PTI and EG3D, did I miss anything? I noticed you said " I optimize the latent code 'w' and use it as the pivot to finetune eg3d." Do you mean we need generate the dataset and call "train.py" to finetune? If we have done finetune, why do we need PTI? I thought PTI is to get the latent as conditioning for EG3D. Thanks for your help.

Hi, you need to feed the zs into mapping network of eg3d, get the w or ws latent code, then optimize the w or ws. Please refer to https://github.com/danielroich/PTI/tree/main/training/projectors or the paper of stylegan for the w/ws latent code definition.

from eg3d.

BiboGao commented on August 15, 2024

I got it, thanks.

from eg3d.

zhangqianhui commented on August 15, 2024

Thanks for your help, I also have got realistic results

from eg3d.

lyx0208 commented on August 15, 2024

@bd20222, hello! Which dataset.json do you use? I use ffhq-dataset-v2.json, but there's no camera parameters.

from eg3d.

e4s2022 commented on August 15, 2024

@lyx0208, hi, you have to preprocess the dataset in advance. The details can be found here. For your question, the camera parameters provided by the author can be downloaded from

eg3d/dataset_preprocessing/ffhq/runme.py

Lines 48 to 50 in 71ef469

 print("Downloading cropping params...") 

 gdown.download('https://drive.google.com/uc?id=1KdVf2lIepGECRaANGhfuR7mDpJ5nfb9K', 'realign1500/cropping_params.json', quiet=False)

from eg3d.

lyx0208 commented on August 15, 2024

@bd20222, get it, thanks!

from eg3d.

oneThousand1000 commented on August 15, 2024

Hi,

You are right that the alignment script from Deep3DFaceRecon that does not do rotation alignment, but it should put the alignment in a ballpark and should be good enough for downstream tasks like GAN inversion etc.

Please note that for the FFHQ preprocessing, please do NOT use this script. Instead use the FFHQ preprocessing script for the proper reproduction. The role alignment will be taken care of as part of the FFHQ preprocessing script. The rest of the processing should be the same as the new "in-the-wild script".

Thanks for your guidance!

from eg3d.

jiaxinxie97 commented on August 15, 2024

Hi! I think the [image align code](https://github.com/Puzer/stylegan-encoder/blob/master/align_images.py) that provided by [stylegan-encoder](https://github.com/Puzer/stylegan-encoder) may be useful to rotate the image, if you want to get the similar re-aligned image in FFHQ Processed Data.

Maybe for in-the-wild images, the rotation is an unnecessary step.

I modified the code, used it to rotate and crop the image, it seems that after rotation the resulting image is consistent with the one in FFHQ Processed Data.

Hi Yiqian,
You list two repo for the alignment, I used the process image func in align_multiprocess.py

eg3d/dataset_preprocessing/ffhq/align_multiprocess.py

Line 47 in 67859eb

 def process_image(kwargs):#item_idx, item, dst_dir="realign1500", output_size=1500, transform_size=4096, enable_padding=True): 

to eliminate the effect of roll. Are these operations equivalent?

from eg3d.

oneThousand1000 commented on August 15, 2024

Hi! I think the [image align code](https://github.com/Puzer/stylegan-encoder/blob/master/align_images.py) that provided by [stylegan-encoder](https://github.com/Puzer/stylegan-encoder) may be useful to rotate the image, if you want to get the similar re-aligned image in FFHQ Processed Data.
Maybe for in-the-wild images, the rotation is an unnecessary step.
I modified the code, used it to rotate and crop the image, it seems that after rotation the resulting image is consistent with the one in FFHQ Processed Data.

Hi Yiqian, You list two repo for the alignment, I used the process image func in align_multiprocess.py

eg3d/dataset_preprocessing/ffhq/align_multiprocess.py

Line 47 in 67859eb

def process_image(kwargs):#item_idx, item, dst_dir="realign1500", output_size=1500, transform_size=4096, enable_padding=True):

to eliminate the effect of roll. Are these operations equivalent?

Hi jiaxin,

I think the image align code outputs the same results as the process image func in align_multiprocess.py.

Please see https://github.com/Puzer/stylegan-encoder/blob/1e7e47f9bbb0ca391cdc250af5ad2468250a803c/ffhq_dataset/face_alignment.py#L7, it seems that they are the same function.

from eg3d.

fungtion commented on August 15, 2024

@oneThousand1000 Can you share your PTI inversion code, I tried to revise it but always out of memory.

from eg3d.

oneThousand1000 commented on August 15, 2024

@oneThousand1000 Can you share your PTI inversion code, I tried to revise it but always out of memory.

Sure, I will add you to my private repo.

from eg3d.

fungtion commented on August 15, 2024

@oneThousand1000 Can you share your PTI inversion code, I tried to revise it but always out of memory.

Sure, I will add you to my private repo.

Thanks.

from eg3d.

fmac2000 commented on August 15, 2024

@oneThousand1000 Can you share your PTI inversion code, I tried to revise it but always out of memory.

Sure, I will add you to my private repo.

Hey oneThousand, would it be too cheeky to ask for an invite too? Would love to try it <3 -- btw awesome hifi3dface repo, going to try it out next week !

from eg3d.

BbChip0103 commented on August 15, 2024

@oneThousand1000 Hi, oneThousand1000.
Can you invite me to your private repo? I tried to migrate PTI, But it's hard work to me T.T

from eg3d.

oneThousand1000 commented on August 15, 2024

@oneThousand1000 Hi, oneThousand1000. Can you invite me to your private repo? I tried to migrate PTI, But it's hard work to me T.T

See #28 (comment).

from eg3d.

juanjomon commented on August 15, 2024

one Thousand1000...great job..but...it would be great if you could implement it in Colab Notebook. Indeed I think Nvidia or someone else should implement all this stuff in Colab, as there is many people which we do not have the knowledge to set up all this. Thanks for your work.

from eg3d.

commented on August 15, 2024

Hello, I would like to ask a question. I see that the paper mentions using PTI to implement image inverse to latent code, I now have a photo of a car and want to get its latent code corresponding to the eg3D network, but PTI does not seem to inverse for vehicles, or does not provide a pre-trained model on the cars dataset, so I would like to ask for advice on how to implement inverse for a vehicle image, thanks a lot!

from eg3d.

Details in EG3D Inversion about eg3d HOT 48 OPEN

Comments (48)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

	print("Downloading cropping params...")
	gdown.download('https://drive.google.com/uc?id=1KdVf2lIepGECRaANGhfuR7mDpJ5nfb9K', 'realign1500/cropping_params.json', quiet=False)