yuxinwenrick / diffusion_memorization Goto Github PK
View Code? Open in Web Editor NEWOfficial repo for Detecting, Explaining, and Mitigating Memorization in Diffusion Models (ICLR 2024)
Official repo for Detecting, Explaining, and Mitigating Memorization in Diffusion Models (ICLR 2024)
Hey,
thank you for providing the code to reproduce your experiments. In addition to the list of memorized samples, could you please also provide the prompts of the non-memorized samples you used during your experiments? The paper states, the experiments were conducted on 2,000 prompts from COCO, LAION, Lexica and randomly generated strings. This would improve the reproducibility of the method.
Best,
Lukas
Dear Yuxin,
a question regarding the experimental setup came up. We used the prompts provided in sdv1_500_memorized.jsonl to generate images with SDv1.4. We then computed the SSCD scores of the generated images and the real images. However, the SSCD scores vary between quite high and pretty low. In the paper, the following is written:
To evaluate our detection method, we use 500 memorized prompts identified in Webster (2023) for Stable Diffusion v1 (Rombach et al., 2022), where the SSCD similarity score (Pizzi et al., 2022) between the memorized and the generated images exceeds 0.7.
Does this mean, all images generated from the 500 prompts in the json file achieved an SSCD score > 0.7 in your experiments? Or did you apply an additional filtering using the computed SSCD score to filter out the strongly memorized samples? In our experiments, only 100-120 (depending on the SSCD model) out of the 500 prompts achieve a maximum SSCD score > 0.7. All SSCD scores were computed across 10 generations with different seeds. We also manually inspected the images and some generated images showcase only slight memorization, so the assigned SSCD scores seem to actually match the amount of memorization.
Best,
Lukas
IndexError Traceback (most recent call last)
in <cell line: 11>()
21
22 curr_data = np.array(row[key])
---> 23 curr_data = curr_data[:num_gens, start:end]
24 curr_data = np.mean(curr_data, axis=0)
25
IndexError: too many indices for array: array is 0-dimensional, but 2 were indexed
I think maybe I reproduced the dataset non_memorized_prompts incorrect,
(May I ask where is the link, it seems no in the page),
may I ask where is this data file from, thanks
Dear,
Yuxin.
I have a question about the implementation details of the detection method. I find it seems different from the equation in Sec.3.3 in your paper. In the paper, it is written that the norm of the difference between conditional and unconditional noise is leveraged as the detection metric, ||\epsilon(x,c) - \epsilon(x, \pi)||. However, in lines 190~195 in local_sd_pipeline.py, it seems that the norm of each noise is calculated first, and then delivered to detect_mem.py. (However, in aug_prompt it seems it aligns with the paper.) May I ask you which one would I use as the detection method? Or is there anything that I have misunderstood?
Best,
Chunsan.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.