Giter Club home page Giter Club logo

clip-as-service-rs's Introduction

Rorical

Talk is cheap, show me your code. Code is cheap, show me your proof.

clip-as-service-rs's People

Contributors

el-file4138 avatar rorical avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

clip-as-service-rs's Issues

Input tensor size mismatch: Attempting to broadcast an axis by a dimension other than 1

Hey there! ๐Ÿ‘‹
I tried running this project using ViT-B-32-laion2b-s34b-b79k and ViT-L-14@336px. However, for both models I received the following error when attempting to generate text embeddings:

thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: 
  SessionRun(Msg("

Non-zero status code returned while running Add node. 
Name:'Add_2' 
Status Message: /Users/runner/work/1/s/onnxruntime/core/providers/cpu/math/element_wise_ops.h:560 
void onnxruntime::BroadcastIterator::Append(ptrdiff_t, ptrdiff_t) axis == 1 || axis == largest was false. 
Attempting to broadcast an axis by a dimension other than 1. 4 by 77

   "))
', src/main.rs

The error occurs at the following location where the textual model is executed by the ONNX runtime:

let outputs= self.encoder.run([InputTensor::from_array(ids.into_dyn()), InputTensor::from_array(mask.into_dyn())]).unwrap();

After digging a little and comparing it to the example code found in the OpenCLIP repo, I found that the input tensor should always have a length of 77 (padded by 0).

I am not sure if or why the code works for you as-is but for me applying the following changes solved the issue:

  1. Adjust the tokenizer so that it always pads to a multiple of 77
tokenizer.with_padding(Some(PaddingParams {
    strategy: PaddingStrategy::BatchLongest,
    direction: PaddingDirection::Right,
    pad_to_multiple_of: Some(77), // <-- this right here
    pad_id: 0,
    pad_type_id: 0,
    pad_token: "[PAD]".to_string(),
}));
  1. Adjust the definition of the v1 and v2 arrays so that the iterator takes at most 77 elements (as the tokenizer might produce any multiple of it)
let v1: Vec<i32> = preprocessed
    .iter()
    .map(|i| 
        i.get_ids()
            .iter()
            .take(77) // <-- this right here
            .map(|b| *b as i32)
            .collect()
    )
    .concat();

let v2: Vec<i32> = preprocessed
    .iter()
    .map(|i| {
        i.get_attention_mask()
            .iter()
            .take(77) // <-- this right here
            .map(|b| *b as i32)
            .collect()
    })
    .concat();

No idea if this is the best approach but I thought I'd just leave my findings here in case anyone else stumbles across this error (or if you @Rorical encountered this as well) ๐Ÿ˜Š

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.