Giter Club home page Giter Club logo

decap's Issues

关于inference的Decoding函数

在inference的Decoding函数中您的next_token是通过decoder解码出的logits计算得来,而后面进行文本解码的时候是使用了CLIP的tokenizer,但是这里是否存在一个问题:logits得到的张量有50257维,而CLIP的SimpleTokenizer仅能解码最大index为49407的token,会不会出现out-of-range的问题呢?我暂时的理解是你们使用了CLIP的token index来训练Decoder,在概率上避免了这个问题,但是没有在维度上面解决。刚进入该领域学习,如有理解错误请求指正。

Inference code

Hi, I'd like to express my gratitude for sharing your wonderful work.
And congratulation on your paper being accepted to ICLR 2023!

To evaluate the model performance, I made an inference code and got scores using pycocoevalcap repo.
But it seems the results I got were slightly different from what you reported in paper.
So I want to refer to your code for reproducing. Can you share your inference code?

Thanks :)

Inference Model

Hello! Thank you for your contribution to the field.

Where can I find the model mentioned in the inference notebook ./coco_model/coco_prefix-009.pt?

The metrics in the paper

Thanks very much for your excellent work and congratulations on your paper being accepted by ICLR 2023! I want to reproduce the work, and I am wondering if you could make public the code for calculating the metrics in the paper, such as BLEU@4, METEOR, CIDEr, and SPICE. I would appreciate it if you could take the time to reply to me.

Pretrained models on video caption

Congrats! It's a nice work for zero-shot captioning.
In the paper, zero-shot video captioning results on MSR-VTT, Activity-Net, etc. have been reported. But from the this repo, I couldn't find codes and pretraine models to perform such repreductions. I'd like to know whether these models and instructions on video caption will be relelased.

Thanks a lot!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.