vvictoryuki / freedom Goto Github PK
View Code? Open in Web Editor NEW[ICCV 2023] Official PyTorch implementation for the paper "FreeDoM: Training-Free Energy-Guided Conditional Diffusion Model"
[ICCV 2023] Official PyTorch implementation for the paper "FreeDoM: Training-Free Energy-Guided Conditional Diffusion Model"
Your work is very impressive and interests me very much. However, I would like to ask about the description of gram in the paper. Could you please tell me why clip is used to get the vector and calculate gram matrix instead of vgg.
Hi,
Thank you very much for the paper, it is super interesting.
Do you know when do you plan on releasing the code?
Best,
When are you going to release the code?
Thank you for your impressive work.
When I use "FreeDoM-CN-style/faceID" example and run python pose2image.py --seed 1234 --timesteps 100 --prompt "young man, realitic photo" --pose_ref "./test_imgs/pose4.jpg" --id_ref "./test_imgs/id3.png"
someting errors happened:
Loaded model config from [./models/cldm_v15.yaml]
Loaded state_dict from [./models/control_sd15_openpose.pth]
Traceback (most recent call last):
File "/home/user2/models/FreeDoM-main/CN/pose2image.py", line 31, in <module>
model.load_state_dict(load_state_dict('./models/control_sd15_openpose.pth', location='cuda'))
File "/home/user2/anaconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 2152, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for ControlLDM:
Unexpected key(s) in state_dict: "cond_stage_model.transformer.text_model.embeddings.position_ids".
It seems faild to load SD pre-trained model parameters, then I try to put model in cuda before load parameters.
model = create_model('./models/cldm_v15.yaml').cpu()
model = model.cuda()
model.load_state_dict(load_state_dict('./models/control_sd15_openpose.pth', location='cuda'))
but the another error occurred:
Loaded model config from [./models/cldm_v15.yaml]
Loaded state_dict from [./models/control_sd15_openpose.pth]
Traceback (most recent call last):
File "/home/user2/models/FreeDoM-main/CN/pose2image.py", line 35, in <module>
ddim_sampler = DDIMSampler(model, add_condition_mode="face_id", ref_path=args.pose_ref, add_ref_path=args.id_ref, no_freedom=args.no_freedom)
File "/home/user2/models/FreeDoM-main/CN/cldm/ddim_hacked.py", line 128, in __init__
self.idloss = IDLoss(ref_path=add_ref_path).cuda()
File "/home/user2/models/FreeDoM-main/CN/cldm/arcface/model.py", line 12, in __init__
self.facenet.load_state_dict(torch.load("cldm/arcface/model_ir_se50.pth"))
File "/home/user2/anaconda3/lib/python3.10/site-packages/torch/serialization.py", line 1028, in load
return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
File "/home/user2/anaconda3/lib/python3.10/site-packages/torch/serialization.py", line 1231, in _legacy_load
return legacy_load(f)
File "/home/user2/anaconda3/lib/python3.10/site-packages/torch/serialization.py", line 1117, in legacy_load
tar.extract('storages', path=tmpdir)
File "/home/user2/anaconda3/lib/python3.10/tarfile.py", line 2081, in extract
tarinfo = self.getmember(member)
File "/home/user2/anaconda3/lib/python3.10/tarfile.py", line 1803, in getmember
raise KeyError("filename %r not found" % name)
KeyError: "filename 'storages' not found"
I don't know how to solve this. Is it because I'm using the wrong pre-trained model? It is downloaded from lllyasviel/ControlNet at main (huggingface.co).
Thanks for sharing the great work! I'm not sure where the formula is from, I've checked the cited paper "Denoising diffusion
probabilistic models", which presents a different formulation. It's also different from the Langevin dynamics sampling formula, could you please clarify this?
Hello, I am interested in the code you posted, thank you for sharing. What puzzles me is that there is not much discussion of scale factor in the paper.
In SD Style, rho appears to be a learning rate associated with both "grad" and "classification guided effects",as shown below
However, in Face ID, rho is equal to at.sqrt(), as follows:
So, how exactly do we set up RHO, and is there some mathematical theory to support it? Thank you?
Hi, thanks for sharing your results!
I'm afraid I did not really get how you derived Eq. 4: if I'm not mistaken,
∇ p(c ∣ xₜ) = − λ ∇ ℰ (c, xₜ) + λ ∇ 𝔼 [ ℰ (c, xₜ) ],
where the gradient ∇ is w.r.t. xₜ, and the expectation 𝔼 is over p(c ∣ xₜ).
Why have you decided to ignore the second term? Thank you in advance!
What is the minimum VRAM required to run this? 24 GB?
Thanks for your work so much, It's very inspiring!
I met a problem when running run.sh
, the error is [Errno 2] No such file or directory: './models/control_sd15_scribble.pth'
, and the associated code is follows:
│ /data/0shared/yangling/zheming/FreeDoM/CN/scribble2image.py:29 in <module> │
│ │
│ 26 │
│ 27 │
│ 28 model = create_model('./models/cldm_v15.yaml').cpu() │
│ ❱ 29 model.load_state_dict(load_state_dict('./models/control_sd15_scribble.pth', location='cu │
│ 30 model = model.cuda() │
I have no idea where to download this, hope that you'll help me with it. Thank you so much!
Thanks for sharing the great work! I was reading the code for the project, and one of the things that confused me was what does eta = 0.5
(at line 302, denoising.py) mean? I compared this code with Alg 1 and found it was different.
can someone make a colab , i am eager to try this
Thanks for your code so much! The work is quite inspiring.
I met a problem when running styled image conditioning demo (SD_style/run.sh
), and got the following error
[Errno 2] No such file or directory: '/workspace/stable-diffusion/intermediates/1_1.png'
I have no idea how to work it out, hope you can help me with it! Thanks a lot!
Is there an example how to use facial landmarks or human keypoints?
There seems only style-control examples in the codebase.
Traceback (most recent call last):
safety_feature_extractor = AutoFeatureExtractor.from_pretrained(safety_model_id)
File "D:\Python\Anaconda\envs\Sd-style\lib\site-packages\transformers\models\auto\feature_extraction_auto.py", line 270, in from_pretrained
config_dict, _ = FeatureExtractionMixin.get_feature_extractor_dict(pretrained_model_name_or_path, **kwargs)
File "D:\Python\Anaconda\envs\Sd-style\lib\site-packages\transformers\feature_extraction_utils.py", line 443, in get_feature_extractor_dict
raise EnvironmentError(
OSError: Can't load feature extractor for 'CompVis/stable-diffusion-safety-checker'. If you were trying to load it from 'https://huggingface.co/models', make sure you don't have a local directory with the same name. Otherwise, make sure 'CompVis/stable-diffusion-safety-checker' is the correct path to a directory containing a preprocessor_config.json file
Hi, really awesome work! I have read your paper and find that in Table1, you only compare your methods with TediGAN. But as you mentioned in your related work, there are other two better training required methods: ControlNet and T2I adapter. How's the FID and Clip score comparing those two works. In T2I adapter, on Coco dataset and text+sketch, the FID is 16.78. I also measured the FID of ControlNet on Coco with only 1k images, text+sketch, the FID is 6.09. But in Table 1 you have 70.97. So, I'm a little bit confused as your generation results are very good. But from the generated figures, I think your dataset is not Coco. As we are recently working on training efficiency and inference algorithms for ControlNet, if we can replace such training requested process, I think there will be no motivation to work on training efficiency algorithm for ControlNet. Thanks very much!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.