Giter Club home page Giter Club logo

Comments (12)

yformer avatar yformer commented on August 11, 2024 2

@sulaimanvesal, For EfficientSAM, we resize the input image to the size of 1024x1024 for model input. The preprocess and the postprocess are all included in the torchscript model. You also need to include that for FastSAM-S. Actually the demo we hosted on our server now is using cpu, Intel(R) Xeon(R) Platinum 8339HC CPU @ 1.80GHz, which seems not that slow even for efficientsam_s_cpu.jit.

from efficientsam.

glennliu avatar glennliu commented on August 11, 2024 2

I met the same issue and I find one example in Grounded-Segment-Anything repo. here.

They set batched_points in [B,num_box,2,2], and batched_points_labels in [B,num_box,2]. One box points label is set to 2, while the other is 3.
But I don't understand how to decide the batched_points_labels here.

from efficientsam.

balakv504 avatar balakv504 commented on August 11, 2024 2

We will add an example for multiple bbox inference soon. Thanks for your patience.
@glennliu Yes that is correct. Thanks for pulling that out.

from efficientsam.

balakrishnanv avatar balakrishnanv commented on August 11, 2024 1

The input_point to the model has shape [batch_size, num_masks, num_points, 2]. For multi bounding box, you feed in a tensor of shape [1, num_bounding_boxes, 2, 2] (assuming you are querying one image). For EfficientSAM, the encoder will be run only once and decoder is batched inference.
Happy to provide an example in the colab if you have issues using this API.

from efficientsam.

glennliu avatar glennliu commented on August 11, 2024 1

I just find the related code.
So, for bounding box, we can just set the label to [2,3], similar to the example in Grounded-SAM. It should work.

from efficientsam.

sulaimanvesal avatar sulaimanvesal commented on August 11, 2024

One more question, the CPU version on a core-i7 with an input size of 1024x512 is quite slow. FastSAM-S (ultralytics) on the same machine and input size has an inference time around 400ms.

Inference using:  efficientsam_ti_cpu.jit
Input size: torch.Size([3, 512, 1024])
Preprocess Time: 79.8783 ms
Inference Time: 6939.1549 ms

from efficientsam.

yformer avatar yformer commented on August 11, 2024

@klightz, can you help @sulaimanvesal for taking multi bounding boxes to the model as prompt.

from efficientsam.

sulaimanvesal avatar sulaimanvesal commented on August 11, 2024

@yformer thanks for the reply. @klightz would you please let us know to perform multi-bounding boxes as prompt? similar to FastSAM?

from efficientsam.

sulaimanvesal avatar sulaimanvesal commented on August 11, 2024

hi @yformer

Any update on how to running multi bounding boxes? thank you.

from efficientsam.

yformer avatar yformer commented on August 11, 2024

@balakv504, can you provide one example for using multi-bounding boxes as prompt?

from efficientsam.

sulaimanvesal avatar sulaimanvesal commented on August 11, 2024

Thanks @balakrishnanv ! it would be great to see an example. It would be good not only for my case but for many others.

from efficientsam.

sulaimanvesal avatar sulaimanvesal commented on August 11, 2024

@yformer I am pinging in case any of the authors made an effort to provide a simple example of multi-bboxes. I know it's not that hard!

from efficientsam.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.