vvictoryuki / freedom Goto Github PK

[ICCV 2023] Official PyTorch implementation for the paper "FreeDoM: Training-Free Energy-Guided Conditional Diffusion Model"

Python 99.89% Shell 0.11%

freedom's People

Contributors

Stargazers

Watchers

Forkers

wjgaas lishuai1993 templeblock peterzs techthiyanes bspark99 jimmychao-al1 wendashi ywolfeee

freedom's Issues

Could you please tell me why clip is used to calculate gram matrix instead of VGG.

Your work is very impressive and interests me very much. However, I would like to ask about the description of gram in the paper. Could you please tell me why clip is used to get the vector and calculate gram matrix instead of vgg.

Plan on releasing code

Hi,

Thank you very much for the paper, it is super interesting.
Do you know when do you plan on releasing the code?

Best,

code release

When are you going to release the code?

wired output

Hi,

Thanks for the interesting work! I tried to re-implement your method using faceID as guidance. With the larger guidance weight, I got this wired output. But with the small weight, the result did not match the given condition.

Do you have similar observation? Any suggestion please? Thank you!

Error(s) in loading state_dict for ControlLDM

Thank you for your impressive work.
When I use "FreeDoM-CN-style/faceID" example and run python pose2image.py --seed 1234 --timesteps 100 --prompt "young man, realitic photo" --pose_ref "./test_imgs/pose4.jpg" --id_ref "./test_imgs/id3.png" someting errors happened:

Loaded model config from [./models/cldm_v15.yaml]
Loaded state_dict from [./models/control_sd15_openpose.pth]
Traceback (most recent call last):
  File "/home/user2/models/FreeDoM-main/CN/pose2image.py", line 31, in <module>
    model.load_state_dict(load_state_dict('./models/control_sd15_openpose.pth', location='cuda'))
  File "/home/user2/anaconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 2152, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for ControlLDM:
        Unexpected key(s) in state_dict: "cond_stage_model.transformer.text_model.embeddings.position_ids".

It seems faild to load SD pre-trained model parameters, then I try to put model in cuda before load parameters.

model = create_model('./models/cldm_v15.yaml').cpu()
model = model.cuda()
model.load_state_dict(load_state_dict('./models/control_sd15_openpose.pth', location='cuda'))

but the another error occurred:

Loaded model config from [./models/cldm_v15.yaml]
Loaded state_dict from [./models/control_sd15_openpose.pth]
Traceback (most recent call last):
  File "/home/user2/models/FreeDoM-main/CN/pose2image.py", line 35, in <module>
    ddim_sampler = DDIMSampler(model, add_condition_mode="face_id", ref_path=args.pose_ref, add_ref_path=args.id_ref, no_freedom=args.no_freedom)
  File "/home/user2/models/FreeDoM-main/CN/cldm/ddim_hacked.py", line 128, in __init__
    self.idloss = IDLoss(ref_path=add_ref_path).cuda()
  File "/home/user2/models/FreeDoM-main/CN/cldm/arcface/model.py", line 12, in __init__
    self.facenet.load_state_dict(torch.load("cldm/arcface/model_ir_se50.pth"))
  File "/home/user2/anaconda3/lib/python3.10/site-packages/torch/serialization.py", line 1028, in load
    return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
  File "/home/user2/anaconda3/lib/python3.10/site-packages/torch/serialization.py", line 1231, in _legacy_load
    return legacy_load(f)
  File "/home/user2/anaconda3/lib/python3.10/site-packages/torch/serialization.py", line 1117, in legacy_load
    tar.extract('storages', path=tmpdir)
  File "/home/user2/anaconda3/lib/python3.10/tarfile.py", line 2081, in extract
    tarinfo = self.getmember(member)
  File "/home/user2/anaconda3/lib/python3.10/tarfile.py", line 1803, in getmember
    raise KeyError("filename %r not found" % name)
KeyError: "filename 'storages' not found"

I don't know how to solve this. Is it because I'm using the wrong pre-trained model? It is downloaded from lllyasviel/ControlNet at main (huggingface.co).

A question on Eq.(1) in the paper

Thanks for sharing the great work! I'm not sure where the formula is from, I've checked the cited paper "Denoising diffusion
probabilistic models", which presents a different formulation. It's also different from the Langevin dynamics sampling formula, could you please clarify this?

some question about coefficient "rho"

Hello, I am interested in the code you posted, thank you for sharing. What puzzles me is that there is not much discussion of scale factor in the paper.
In SD Style, rho appears to be a learning rate associated with both "grad" and "classification guided effects",as shown below

However, in Face ID, rho is equal to at.sqrt(), as follows:

So, how exactly do we set up RHO, and is there some mathematical theory to support it? Thank you？

Question on Eq. 4

Hi, thanks for sharing your results!

I'm afraid I did not really get how you derived Eq. 4: if I'm not mistaken,
∇ p(c ∣ xₜ) = − λ ∇ ℰ (c, xₜ) + λ ∇ 𝔼 [ ℰ (c, xₜ) ],
where the gradient ∇ is w.r.t. xₜ, and the expectation 𝔼 is over p(c ∣ xₜ).

Why have you decided to ignore the second term? Thank you in advance!

What is the minimum VRAM required to run this?

What is the minimum VRAM required to run this? 24 GB?

error in loading ckpt file

Thanks for your work so much, It's very inspiring!
I met a problem when running run.sh, the error is [Errno 2] No such file or directory: './models/control_sd15_scribble.pth', and the associated code is follows:

│ /data/0shared/yangling/zheming/FreeDoM/CN/scribble2image.py:29 in <module>                       │
│                                                                                                  │
│    26                                                                                            │
│    27                                                                                            │
│    28 model = create_model('./models/cldm_v15.yaml').cpu()                                       │
│ ❱  29 model.load_state_dict(load_state_dict('./models/control_sd15_scribble.pth', location='cu   │
│    30 model = model.cuda()                                                                       │

I have no idea where to download this, hope that you'll help me with it. Thank you so much!

what does eta = 0.5 mean?

Thanks for sharing the great work! I was reading the code for the project, and one of the things that confused me was what does eta = 0.5(at line 302, denoising.py) mean? I compared this code with Alg 1 and found it was different.

colab demo

can someone make a colab , i am eager to try this

train code controlnet + face 有计划开源吗？

error when running styled image conditioning demo

Thanks for your code so much! The work is quite inspiring.
I met a problem when running styled image conditioning demo (SD_style/run.sh), and got the following error

[Errno 2] No such file or directory: '/workspace/stable-diffusion/intermediates/1_1.png'

I have no idea how to work it out, hope you can help me with it! Thanks a lot!

Is there an example how to use facial landmarks or human keypoints?

There seems only style-control examples in the codebase.

运行SD_style下的txt2img文件时报错，这个模型无法加载

Traceback (most recent call last):
safety_feature_extractor = AutoFeatureExtractor.from_pretrained(safety_model_id)
File "D:\Python\Anaconda\envs\Sd-style\lib\site-packages\transformers\models\auto\feature_extraction_auto.py", line 270, in from_pretrained
config_dict, _ = FeatureExtractionMixin.get_feature_extractor_dict(pretrained_model_name_or_path, **kwargs)
File "D:\Python\Anaconda\envs\Sd-style\lib\site-packages\transformers\feature_extraction_utils.py", line 443, in get_feature_extractor_dict
raise EnvironmentError(
OSError: Can't load feature extractor for 'CompVis/stable-diffusion-safety-checker'. If you were trying to load it from 'https://huggingface.co/models', make sure you don't have a local directory with the same name. Otherwise, make sure 'CompVis/stable-diffusion-safety-checker' is the correct path to a directory containing a preprocessor_config.json file

hyper-parameters used in algorithm 2

Hi, thanks for the great work!

Just curious what're the hyper-parameters used in the algorithm 2 (image below)? For example, how to set the learning rate and repeat time for each time step? I passed the paper but didn't find any detailed about this. Could you please share them? Thanks!

Is this only for 1.4 or is there a 1.5 version?

Questions about qualitative Results

Hi, really awesome work! I have read your paper and find that in Table1, you only compare your methods with TediGAN. But as you mentioned in your related work, there are other two better training required methods: ControlNet and T2I adapter. How's the FID and Clip score comparing those two works. In T2I adapter, on Coco dataset and text+sketch, the FID is 16.78. I also measured the FID of ControlNet on Coco with only 1k images, text+sketch, the FID is 6.09. But in Table 1 you have 70.97. So, I'm a little bit confused as your generation results are very good. But from the generated figures, I think your dataset is not Coco. As we are recently working on training efficiency and inference algorithms for ControlNet, if we can replace such training requested process, I think there will be no motivation to work on training efficiency algorithm for ControlNet. Thanks very much!