tsingularity / dift Goto Github PK
View Code? Open in Web Editor NEW[NeurIPS'23] Emergent Correspondence from Image Diffusion
Home Page: https://diffusionfeatures.github.io
License: MIT License
[NeurIPS'23] Emergent Correspondence from Image Diffusion
Home Page: https://diffusionfeatures.github.io
License: MIT License
Hi! Thanks for your great work.
Here I don't understand why the input image is repeated for 8 times. Can ensemble_size
be modified to 1?
Hi, congrats on the great work!
I am interested in trying your Ablated Diffusion Model (ADM) baseline. Would you be able to share with us the implementation? Thank you.
Congratulations for your great work!
I'm fascinated with Ablated Diffusion Model (ADM) baseline, could you please release your code?
Thanks a lot!!!
Hello!When I ran extract_dift.py
following the readme,I came across this problem.Would you mind help me slove this?
Thanks
Thanks authors for the nice work. I have some questions about obtain feature map from stable diffusion model. According to your code, if I read correctly, you will need a text prompt, e.g., " a photo of cat" to obtain the diffusion feature.
I wonder how authors obtain the text prompts when evaluating on the label progation benchmarks or other benchmarks. Do you need to annonate them in a rough way?
Hello! Thanks for your great work.
I don't know how to use OpenCLIP to find correspondences. Could you please share these codes?
Thanks again.
Thanks for your great work!
I try the eval_davis.py with adm model after create with conda env create -f environment.yml
.
I passed CUDA_LAUNCH_BLOCKING=1, then:
I try to delete codes which may have an effect, including gc.collect()
and 'torch.cuda.empty_cache()',. but it doesn't work.
Would you mind help me to slove this?
Thanks a lot!
Thank you for this awesome project. I have seen on the webpage page it is said that it changes the viewpoint of the image object as well. There is no such demo given. Please provide me with a basic example of how I can do this if possible. I really appreciate any help you can provide.
Hi! Your work is amazing and I found that it may be helpful to some of my projects. I checked your paper and I am interested in DIFT sparse feature matching. It seems that your code doesn't include this part. Could you please share this code and exection tips? Thanks!
Thanks for sharing the code!
a question regarding the demo, does the code supports batch inference?
it's written that the input should be a single image tensor and a single text sequence
Args: img_tensor: should be a single torch tensor in the shape of [1, C, H, W] or [C, H, W] prompt: the prompt to use, a string t: the time step to use, should be an int in the range of [0, 1000] up_ft_index: which upsampling block of the U-Net to extract feature, you can choose [0, 1, 2, 3] ensemble_size: the number of repeated images used in the batch to extract features Return:
so I was wondering how to do batch inference
Thanks
Hello,
Hi @Tsingularity ! Thank you for this amazing work. Could you provide some intuitions on the applicability of SDXL and the best layer to extract features in SDXL?
Also, do you think the method would apply to a purely transformer based architecture like SD3/DiT as well?
Hey! I have see the same question in closed issue. But that question's response is about the input image size.
What I want to ask is that clip image encoder only get a 1D token like (640). But the image size is actually 2D like (256,256) resolution. So How do you use the aligned embedding token to do feature correlation? Maybe it doesn't have same dimension.
Looking forward to your reply!
Hello,
Great work!
I would like to know how I would go about using the "Edit Propagation" method as seen in the last example.
Thank you very much!
Hello,
Thanks for great work! DIFT is truly impressive and I believe it offers endless possibilities for downstream tasks.
I have a question about the Edit Propagation discussed in your paper. From my understanding, one would initially paste a sticker onto the source image, then extract a matching mask in the target image using DIFT, and subsequently apply the transformation from source mask to target mask. Am I understanding this correctly?
If so, I have a question about how the system handles features that are present in the source image but missing in the target image through DIFT. For instance, in the project page, there's an example of a dog wearing a Santa hat. Given that the target image doesn't have the hat, it would seem challenging to extract the corresponding feature map from the target. Could you kindly explain it please?
Thank you so much!
What commands were used to get the numbers in the paper for Spair71k? Im running the command suggested in the repo and getting worse results than are listed in the paper.
(dift) ehedlin@dory:dift$ python eval_spair.py --dataset_path ./SPair-71k --save_path ./spair_ft --dift_model sd --img_size 768 768 --t 261 --up_ft_index 2 --ensemble_size 8
main path: /scratch/iamerich/dift
dataset_path: ./SPair-71k
save_path: ./spair_ft
dift_model: sd
img_size: [768, 768]
t: 261
up_ft_index: 2
ensemble_size: 8
saving all test images' features...
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 18/18 [15:14<00:00, 50.82s/it]
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 702/702 [00:09<00:00, 73.55it/s]
motorbike per image [email protected]: 22.07
motorbike per point [email protected]: 24.04
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 600/600 [00:11<00:00, 51.32it/s]
horse per image [email protected]: 26.61
horse per point [email protected]: 29.55
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 646/646 [00:10<00:00, 64.40it/s]
chair per image [email protected]: 11.92
chair per point [email protected]: 13.25
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 870/870 [00:16<00:00, 52.79it/s]
bottle per image [email protected]: 25.35
bottle per point [email protected]: 26.52
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 600/600 [00:16<00:00, 36.05it/s]
cat per image [email protected]: 59.13
cat per point [email protected]: 58.86
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 702/702 [00:12<00:00, 57.22it/s]
bird per image [email protected]: 41.07
bird per point [email protected]: 43.91
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 650/650 [00:10<00:00, 61.80it/s]
bicycle per image [email protected]: 26.51
bicycle per point [email protected]: 28.09
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 644/644 [00:12<00:00, 52.04it/s]
bus per image [email protected]: 24.37
bus per point [email protected]: 33.25
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 756/756 [00:22<00:00, 34.08it/s]
train per image [email protected]: 48.58
train per point [email protected]: 50.81
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 650/650 [00:11<00:00, 54.38it/s]
person per image [email protected]: 26.87
person per point [email protected]: 30.52
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 690/690 [00:13<00:00, 49.57it/s]
aeroplane per image [email protected]: 32.07
aeroplane per point [email protected]: 34.82
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 664/664 [00:10<00:00, 64.55it/s]
sheep per image [email protected]: 25.98
sheep per point [email protected]: 33.42
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 692/692 [00:19<00:00, 34.80it/s]
tvmonitor per image [email protected]: 23.60
tvmonitor per point [email protected]: 24.71
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 600/600 [00:13<00:00, 46.04it/s]
dog per image [email protected]: 30.61
dog per point [email protected]: 33.44
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 862/862 [00:12<00:00, 67.95it/s]
pottedplant per image [email protected]: 27.44
pottedplant per point [email protected]: 29.76
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 640/640 [00:14<00:00, 43.27it/s]
cow per image [email protected]: 39.09
cow per point [email protected]: 44.68
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 702/702 [00:09<00:00, 73.83it/s]
boat per image [email protected]: 15.73
boat per point [email protected]: 18.28
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 564/564 [00:09<00:00, 60.91it/s]
car per image [email protected]: 22.34
car per point [email protected]: 30.62
All per image [email protected]: 29.35
All per point [email protected]: 34.31
Nice work! Would you please provide us with the code of Benchmark Evaluation? Or can you provide anyplace of similar evaluation code
you refer?
especially evaluation of the datasets of SPair-71k, PF-WILLOW and CUB-200-2011
While implementing demo, I experienced the error above.
It worked well when I set do_classifier_free_guidance as True(it was originally set as False)
It seems like negative prompt embedding becomes Nonetype when I set do_classifier_free_guidance False.
If code is wrong, please tell me!
Hello, when I ran the demo, stabilityai/stable-diffusion-2-1 does not appear to have a file named config.json. This means that config.json cannot be obtained. I would like to ask where to download the file? The specific error reported is as follows:
/home/pcl/anaconda3/envs/PY310/lib/python3.10/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: resume_download
is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use force_download=True
.
warnings.warn(
Traceback (most recent call last):
File "/home/pcl/anaconda3/envs/PY310/lib/python3.10/site-packages/urllib3/connection.py", line 174, in _new_conn
conn = connection.create_connection(
File "/home/pcl/anaconda3/envs/PY310/lib/python3.10/site-packages/urllib3/util/connection.py", line 95, in create_connection
raise err
File "/home/pcl/anaconda3/envs/PY310/lib/python3.10/site-packages/urllib3/util/connection.py", line 85, in create_connection
sock.connect(sa)
OSError: [Errno 101] Network is unreachable
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/pcl/anaconda3/envs/PY310/lib/python3.10/site-packages/urllib3/connectionpool.py", line 703, in urlopen
httplib_response = self._make_request(
File "/home/pcl/anaconda3/envs/PY310/lib/python3.10/site-packages/urllib3/connectionpool.py", line 386, in _make_request
self._validate_conn(conn)
File "/home/pcl/anaconda3/envs/PY310/lib/python3.10/site-packages/urllib3/connectionpool.py", line 1042, in _validate_conn
conn.connect()
File "/home/pcl/anaconda3/envs/PY310/lib/python3.10/site-packages/urllib3/connection.py", line 363, in connect
self.sock = conn = self._new_conn()
File "/home/pcl/anaconda3/envs/PY310/lib/python3.10/site-packages/urllib3/connection.py", line 186, in _new_conn
raise NewConnectionError(
urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPSConnection object at 0x7fa269a40d90>: Failed to establish a new connection: [Errno 101] Network is unreachable
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/pcl/anaconda3/envs/PY310/lib/python3.10/site-packages/requests/adapters.py", line 486, in send
resp = conn.urlopen(
File "/home/pcl/anaconda3/envs/PY310/lib/python3.10/site-packages/urllib3/connectionpool.py", line 787, in urlopen
retries = retries.increment(
File "/home/pcl/anaconda3/envs/PY310/lib/python3.10/site-packages/urllib3/util/retry.py", line 592, in increment
raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='huggingface.co', port=443): Max retries exceeded with url: /stabilityai/stable-diffusion-2-1/resolve/main/unet/config.json (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7fa269a40d90>: Failed to establish a new connection: [Errno 101] Network is unreachable'))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/pcl/anaconda3/envs/PY310/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1722, in _get_metadata_or_catch_error
metadata = get_hf_file_metadata(url=url, proxies=proxies, timeout=etag_timeout, headers=headers)
File "/home/pcl/anaconda3/envs/PY310/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
return fn(*args, **kwargs)
File "/home/pcl/anaconda3/envs/PY310/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1645, in get_hf_file_metadata
r = _request_wrapper(
File "/home/pcl/anaconda3/envs/PY310/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 372, in _request_wrapper
response = _request_wrapper(
File "/home/pcl/anaconda3/envs/PY310/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 395, in _request_wrapper
response = get_session().request(method=method, url=url, **params)
File "/home/pcl/anaconda3/envs/PY310/lib/python3.10/site-packages/requests/sessions.py", line 589, in request
resp = self.send(prep, **send_kwargs)
File "/home/pcl/anaconda3/envs/PY310/lib/python3.10/site-packages/requests/sessions.py", line 703, in send
r = adapter.send(request, **kwargs)
File "/home/pcl/anaconda3/envs/PY310/lib/python3.10/site-packages/huggingface_hub/utils/_http.py", line 66, in send
return super().send(request, *args, **kwargs)
File "/home/pcl/anaconda3/envs/PY310/lib/python3.10/site-packages/requests/adapters.py", line 519, in send
raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: (MaxRetryError("HTTPSConnectionPool(host='huggingface.co', port=443): Max retries exceeded with url: /stabilityai/stable-diffusion-2-1/resolve/main/unet/config.json (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7fa269a40d90>: Failed to establish a new connection: [Errno 101] Network is unreachable'))"), '(Request ID: 1efb8410-3b12-49f4-b553-4de46f64da86)')
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/pcl/anaconda3/envs/PY310/lib/python3.10/site-packages/diffusers/configuration_utils.py", line 337, in load_config
config_file = hf_hub_download(
File "/home/pcl/anaconda3/envs/PY310/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
return fn(*args, **kwargs)
File "/home/pcl/anaconda3/envs/PY310/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1221, in hf_hub_download
return _hf_hub_download_to_cache_dir(
File "/home/pcl/anaconda3/envs/PY310/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1325, in _hf_hub_download_to_cache_dir
_raise_on_head_call_error(head_call_error, force_download, local_files_only)
File "/home/pcl/anaconda3/envs/PY310/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1826, in _raise_on_head_call_error
raise LocalEntryNotFoundError(
huggingface_hub.utils._errors.LocalEntryNotFoundError: An error happened while trying to locate the file on the Hub and we cannot find the requested files in the local cache. Please check your connection and try again or make sure your Internet connection is on.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/pcl/DETR/SDAseg/others/DIFT-main/demo.py", line 14, in
dift = SDFeaturizer()
File "/home/pcl/DETR/SDAseg/others/DIFT-main/src/models/dift_sd.py", line 192, in init
unet = MyUNet2DConditionModel.from_pretrained(sd_id, subfolder="unet")
File "/home/pcl/anaconda3/envs/PY310/lib/python3.10/site-packages/diffusers/models/modeling_utils.py", line 472, in from_pretrained
config, unused_kwargs, commit_hash = cls.load_config(
File "/home/pcl/anaconda3/envs/PY310/lib/python3.10/site-packages/diffusers/configuration_utils.py", line 364, in load_config
raise EnvironmentError(
OSError: stabilityai/stable-diffusion-2-1 does not appear to have a file named config.json.
Hi, I want to ask when The training code and Geometric Correspondence demo code will be released?
Thanks for this impressive work. Microsoft has previously proposed the CoCosNet series of works (Cross-domain Correspondence Learning for Exemplar-based Image Translation, CoCosNet v2: Full-Resolution Correspondence Learning for Image Translation) that establishes dense correspondence for cross-domain images using GANs. The idea is also about cultivating the hidden knowledge learned alongside the generative process. Could you please have a look at the two papers and mention them in your work?
Thanks.
Hello, thanks for your great work! I am curious about the meaning of the output feature tensor in your demo, and how it can be used in other downstream tasks as mentioned in your paper, such as image matching and segmentation? For instance, the output tensor of your demo is [2,1280,48,48], 48 are H and W dimension, and 2 refer to the input image and output image respectively, what is the meaning of 1280?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.