Giter Club home page Giter Club logo

motis's Introduction

Hi there πŸ‘‹

πŸ˜‰ I am Siyu Ren.

πŸŽ“ I got my Bachelor degree from Tong Ji University and Ph.D degree at Shanghai Jiao Tong University.

πŸ”Ž Currently, my research interest includes Efficient Methods for NLP/Large Language Models and techniques around mechanistic understanding of LLMs.

πŸ“š For my academic publications, please refer to https://drsy.github.io/.

DRSY's github stats主要使用语言

profile

motis's People

Contributors

drsy avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

motis's Issues

Cannot match up model encodings

Hi! Thanks for publishing this work, it's a great reference.

I'm trying to integrate a couple of different systems, and I need the model encodings to match. So far, I haven't been able to make that work:

Given this python;

device = "cuda" if torch.cuda.is_available() else "cpu"
model, preprocess = clip.load("ViT-B/32", device=device)

image = preprocess(Image.open("image_1.png")).unsqueeze(0).float().to(device)
text = clip.tokenize(["a face", "a dog", "a cat"]).to(device)

with torch.no_grad():
    image_features = model.encode_image(image)
    print(image_features.tolist()[0])

I'm trying to get the same array of floats out using Clip.mm's - (NSArray<NSNumber*>*)test_uiimagetomat:(UIImage*)image function. Try as I might, they always differ - and I'm not sure what the difference is. I can see that the cvt methods do the same as the image preprocess, then the normalise with the values from clip.

Here's some of the initial values from the python code above:

[0.3502497971057892, 0.0028706961311399937, -0.46749746799468994, -0.14868411421775818, -0.03139263391494751, -0.4536064863204956

And from the Swift:

[0.3193549513816833496, 0.0140316337347030640, -0.4410626888275146484, -0.0908056870102882385, -0.0415024310350418091, -0.4141347408294677734

I used the preview of the quicklook on debugging the iOS code to save the image from the UIImage to ensure the same image is being used. In both cases, I'm using the original vit-b-32 CLIP image encoding. Strangely, the numbers above are kind of similar - but not sure if that's coincidental.

Any advice?

Wrong returned images when using the Android app

Hello, I have downloaded and installed the Android app in my phone. But when I tried some text prompt, the returned image was always the first one. I am wondering if I installed the app in a wrong way or the apk has something wrong, does anyone try the app and get the right results?

Source code for Android sample

Hi,
The readme file links to the APK of an Android sample, but I couldn't find the source code. Has it been published somewhere already?

State_dict weights

Is there any chance you could upload the state_dict rather than jit files? Thanks

test result

hi,
table 1 the text-image retrieval results is finetune and test in the same dataset in the paper? have you try train on MSCOCO and test on Flickr?

Training code

Could you provide the model distillation code (2 stage approach followed by the paper)?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.