Giter Club home page Giter Club logo

diffusion_memorization's People

Contributors

junf137 avatar yuxinwenrick avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

junf137

diffusion_memorization's Issues

List of non-memorized prompts

Hey,

thank you for providing the code to reproduce your experiments. In addition to the list of memorized samples, could you please also provide the prompts of the non-memorized samples you used during your experiments? The paper states, the experiments were conducted on 2,000 prompts from COCO, LAION, Lexica and randomly generated strings. This would improve the reproducibility of the method.

Best,
Lukas

SSCD score of memorized samples

Dear Yuxin,

a question regarding the experimental setup came up. We used the prompts provided in sdv1_500_memorized.jsonl to generate images with SDv1.4. We then computed the SSCD scores of the generated images and the real images. However, the SSCD scores vary between quite high and pretty low. In the paper, the following is written:

To evaluate our detection method, we use 500 memorized prompts identified in Webster (2023) for Stable Diffusion v1 (Rombach et al., 2022), where the SSCD similarity score (Pizzi et al., 2022) between the memorized and the generated images exceeds 0.7.

Does this mean, all images generated from the 500 prompts in the json file achieved an SSCD score > 0.7 in your experiments? Or did you apply an additional filtering using the computed SSCD score to filter out the strongly memorized samples? In our experiments, only 100-120 (depending on the SSCD model) out of the 500 prompts achieve a maximum SSCD score > 0.7. All SSCD scores were computed across 10 generations with different seeds. We also manually inspected the images and some generated images showcase only slight memorization, so the assigned SSCD scores seem to actually match the amount of memorization.

Best,
Lukas

Question About Gustavosta/Stable-Diffusion-Prompts

  1. May I ask the dataset Gustavosta/Stable-Diffusion-Prompts: is it this link Gustavosta/Stable-Dif? I used train.parquet and convert it to jsonal while it seems incorrct, thanks for your reply. :)
    so when I run the code det_mem_viz.ipynb, I met the error:

IndexError Traceback (most recent call last)
in <cell line: 11>()
21
22 curr_data = np.array(row[key])
---> 23 curr_data = curr_data[:num_gens, start:end]
24 curr_data = np.mean(curr_data, axis=0)
25

IndexError: too many indices for array: array is 0-dimensional, but 2 were indexed

I think maybe I reproduced the dataset non_memorized_prompts incorrect,

Question about dataset

  1. Detect memorization
    You may first download the memorized prompts dataset from this link and unzip it.

(May I ask where is the link, it seems no in the page),

  1. and in the det_mem_viz.ipynb,
    nonmem_data = list(read_jsonlines(f"det_outputs/non_memorized_prompts.jsonl"))

may I ask where is this data file from, thanks

Question about implementation details of the detection method

Dear,
Yuxin.

I have a question about the implementation details of the detection method. I find it seems different from the equation in Sec.3.3 in your paper. In the paper, it is written that the norm of the difference between conditional and unconditional noise is leveraged as the detection metric, ||\epsilon(x,c) - \epsilon(x, \pi)||. However, in lines 190~195 in local_sd_pipeline.py, it seems that the norm of each noise is calculated first, and then delivered to detect_mem.py. (However, in aug_prompt it seems it aligns with the paper.) May I ask you which one would I use as the detection method? Or is there anything that I have misunderstood?

Best,
Chunsan.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.