Giter Club home page Giter Club logo

layout-guidance's People

Contributors

silent-chen avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

layout-guidance's Issues

subject-driven issue

Dear author,

I have discovered an issue when attempting to generate a specific subject using the fine-tuned model from Dreambooth. The problem lies in the variable "attn_num_head_channels" being assigned a list type, resulting in the error message "TypeError: unsupported operand type(s) for //: 'int' and 'list'". Could you please provide guidance on a potential solution for this?

Thank you.

Diffusers based Training-Free Layout Control with Cross-Attention Guidance

Hello Minghao Chen,

Thanks for your amazing work. After reviewing your project, I have added a diffusers-based pipeline with the aim of expanding its accessibility to a wider community.

I would greatly appreciate your review and feedback on the changes I have made. Your insight and expertise would be invaluable in ensuring that the project remains of the highest quality.

You can find the modified work at the following link:

https://github.com/nipunjindal/diffusers-layout-guidance

Setting Problems

Thank you for the exciting work!

I try to run the demo following the instruction, but I can not automatically download the model (CLIP, UNet, etc.), so I want to know your setting for these models and download them manually.

I'm looking forward to hearing back from you.

Out-of-memory issue

Dear authors,

Thanks for your exciting work. I'm trying to run inference.py to see some generated images. But I get the OOM error though I use Nvidia 2080Ti. I'd like to know how much gpu memory is proper for inference. Is there anything to adjust to decrease the demand on memory like changing the size of generated images?

Complete VISOR results

Hi @silent-chen , thanks for the great work!
Would you also be able to share the complete results on the VISOR metric of your method? Currently the paper has OA, Visor Unconditional and Conditional. But does not mention Visor-{1,2,3,4}.
Thanks again!

Real Image Editing

Dear author.

First, thank you for your awesome work!
I wonder this code has any plan to support real image editing.
If not, I want to ask you implementation detail of real image editing with layout guidance.

Thank you!

A little question about the compute_ca_loss

Hi, thanks for the nice work!

when computing the backward guidance loss, I noticed that in your code you only use the second half-batch of the attention maps by using

attn_map = attn_map_integrated.chunk(2)[1]

def compute_ca_loss(attn_maps_mid, attn_maps_up, bboxes, object_positions):
    loss = 0
    object_number = len(bboxes)
    if object_number == 0:
        return torch.tensor(0).float().cuda() if torch.cuda.is_available() else torch.tensor(0).float()
    for attn_map_integrated in attn_maps_mid:
        attn_map = attn_map_integrated.chunk(2)[1]   # why chunk here?
...

I am a little bit confused, could you please give me some hint? thank you so much!

Variable Issue

Dear author,

Hello. I encountered an issue while running your code. The variable "attn_num_head_channels" is set as a list, which results in the error message "TypeError: unsupported operand type(s) for //: 'int' and 'list'".

about forward guidance

I have read he code but i didn't find where the forward guidance is. Could you please point out where is in the code? Thanks a lot!

Word Drop

Hi @silent-chen , thanks for your brilliant work!

I'm interested in Word Drop. As you mentioned in the paper, images generated with padding (i.e., nonword) token embeddings closely follow both the semantics and the layout of the image generated from the full-text prompt.

Would you also be able to share the code of Word Drop? I can't wait to know more details about it. 😊

Thanks again!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.