Giter Club home page Giter Club logo

crate's People

Contributors

druvpai avatar leslietrue avatar robinwu218 avatar sdbuch avatar yaodongyu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

crate's Issues

Confusion about the Code Implementation

Thanks for your inspiring work!! Several confusion about the code implementation.

1 MSSA

1.1

In the setting, matrix $U$ stands for the orthonormal bases of the Gaussian space. But in the implement, it's a nn.Linear() implementation.

1.2

Besides, in the MSSA equation,

$$ MSSA(Z \mid U_{[K]}) = \beta \begin{bmatrix} U_1, \dots, U_K \end{bmatrix} \begin{bmatrix} SSA(Z \mid U_1) \\ \vdots \\ SSA(Z \mid U_K) \end{bmatrix}. $$

There is a $U^* SSL$ product, where $U$ is supposed to be the same as the $U$ in SSA. But there is another to_out = nn.Linear implementation. I think it destroys the theoretical property.

2. ISTA

In ISTA module, there is a dictionary $D$, where $D^*D\approx I$. However the optimization didn't include this constraint, and the implementation didn't consider this constraint. I think it may result in the property didn't hold.

Be appreciate for the reply! Thanks again!

Taking one further step of whitebox approach

Thank you for open-sourcing such a great work! I have some questions regarding the potential of taking one more step to make the entire CRATE white-box.

As mentioned in the paper, CRATE is trained in a supervised manner with cross-entropy loss to update the dictionary and subspaces parameter, which somehow makes the learning more task-dependent than ReduNet. Is it theoretically rigorous to say that, I can flatten the Z matrix, whose N columns z are learnt representation of tokens with underlying distribution being mixture of Gaussian, then input them to ReduNet and obtain compressed and discriminated representation of the image as a whole? Though I am not sure how the dictionary for tokens should be updated in this case.

关于attention中部分代码的问题

在attention代码中,我发现有一个名为to_out的操作,我无法理解这个操作是用来实现什么功能的
具体代码为:
class Attention(nn.Module):
def init(self, dim, heads = 8, dim_head = 64, dropout = 0.):
super().init()
inner_dim = dim_head * heads
project_out = not (heads == 1 and dim_head == dim)
self.heads = heads
self.scale = dim_head ** -0.5
self.attend = nn.Softmax(dim = -1)
self.dropout = nn.Dropout(dropout)
self.qkv = nn.Linear(dim, inner_dim, bias=False)
self.to_out = nn.Sequential(
nn.Linear(inner_dim, dim),
nn.Dropout(dropout)
) if project_out else nn.Identity()

def forward(self, x):
    w = rearrange(self.qkv(x), 'b n (h d) -> b h n d', h = self.heads)
    dots = torch.matmul(w, w.transpose(-1, -2)) * self.scale
    attn = self.attend(dots)
    attn = self.dropout(attn)
    out = torch.matmul(attn, w)
    out = rearrange(out, 'b h n d -> b n (h d)')
    return self.to_out(out)

在forward得到结果后,最后输出是使用了to_out()操作,但是我在对应的MSSA部分没有找到相应的理论依据,请问可以麻烦解释一下吗

同时,在MSSA模块之前的LayerNorm和ISTA之前的LayerNorm是在代码中的哪部分实现的呢,我没有找到相应的代码

KeyError:'model'

when i run finetune.py ,i met a problem:
command is

python finetune.py --bs 256 --net CRATE_small
--opt adamW --lr 5e-5 --n_epochs 200 --randomaug 1 --data cifar10 --ckpt_dir checkpoint.pth.tar --data_dir cifar-10

Error is

File "D:\dachuang\CRATE\CRATE-main\finetune.py", line 99, in
net.load_state_dict(torch.load(args.ckpt_dir)['model'])
KeyError: 'model'

i wanna to know how to solve this question.
Thanks !

more pretrained weights

I really appreciate your work bro, will you release more pretrained weights in the future?

How CREAT differs from Transformer

Thank you for your work. I have a question that may seem naive, namely where is your code mainly different from a classic Transformer.

computing rate reduction in CRATE

@DruvPai Thank you for sharing codes!

I would really appreciate it if you could share the code to compute $R(Z)$, $R^c(Z)$ in this CRATE code base.

Specifically, I want to reproduce the results in Fig. 3, Fig. 4. a, along with $R(Z)$.

Thank you for your help.

requirement

What version of Python and torch are used?

Linear projection instead of convolution

Hey,

Is there a specific reason why CRATE uses a simple linear layer for creating the patch embeddings, instead of the conv2d which the ViT uses? I don't see this mentioned in the paper.

ask for Figure13、14 code

I'm really appreciate your work.I would like to draw a graph similar to Figure 13 and 14. Can you provide me with the code?

预训练模型

请问可以发布一下在ImageNet-1k上预训练后的模型吗

Difference between crate-demo.pth and model_best.pth.tar (from CRATE-base)

Thanks for the great work!

I noticed there is model size difference ~300MB (crate-demo.pth) vs. ~100MB (re-trained model_best.pth.tar).

Because I retrained the model using the repo code here but the results are different compared with crate-demo.pth, much worse. Just wondering what I am missing here.

Thanks.

The white-box explannation of CLS token

As I read the paper, the interpretation of MSSA and ISTA component for compression and sparsification of token representation are quite clear to me. However, I'm not so sure about the role and interpretation of the $z^l_{[CLS]}$ in each layer. I don't quite understand how $z^l_{[CLS]}$ affects the compression and sparsification and how it is transformed in each layer.

Experiment on Diffusion Models

Thanks for your inspiring work! I am especially interested in the structured denoising and structured diffusion section. However, I couldn not find the experiment results related to this part in the paper. Would you release in the future?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.