Giter Club home page Giter Club logo

owlvit_segment_anything's Introduction

Open-vocabulary-Segment-Anything

An interesting demo by combining OWL-ViT of Google and Segment Anything of Meta!

Detect Result Segment Anything Result Inpainting Result

prompt: a bird with a yellow wing

Highlight

  • Detect and Segment everything with Language!
  • Detect objects in more details (small objects)
  • Image-conditioned detection and Text-condition detection
  • Use Stable Diffusion to do inpainting

Catelog

  • OWL-ViT + Segment-Anything Demo
  • Image-conditioned detection
  • Inpainting using StableDiffusion
  • Huggingface Demo

Installation

The code requires python>=3.8, as well as pytorch>=1.7 and torchvision>=0.8. Please follow the instructions here to install both PyTorch and TorchVision dependencies. Installing both PyTorch and TorchVision with CUDA support is strongly recommended.

Install Segment Anything:

python -m pip install -e segment_anything

Install OWL-ViT (the OWL-ViT is included in transformer library):

pip install transformers

More details can be found in installation segment anything

Run Demo

  • download segment-anything checkpoint
wget https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth
  • Run demo
bash run_demo.sh

Detect Result Segment Anything Result

Reference

Please give applause for IDEA-Research and OWL-ViT on HuggingFace

Citation

If you find this project helpful for your research, please consider citing the following BibTeX entry.

@article{kirillov2023segany,
  title={Segment Anything}, 
  author={Kirillov, Alexander and Mintun, Eric and Ravi, Nikhila and Mao, Hanzi and Rolland, Chloe and Gustafson, Laura and Xiao, Tete and Whitehead, Spencer and Berg, Alexander C. and Lo, Wan-Yen and Doll{\'a}r, Piotr and Girshick, Ross},
  journal={arXiv:2304.02643},
  year={2023}
}

@misc{minderer2022simple,
      title={Simple Open-Vocabulary Object Detection with Vision Transformers}, 
      author={Matthias Minderer and Alexey Gritsenko and Austin Stone and Maxim Neumann and Dirk Weissenborn and Alexey Dosovitskiy and Aravindh Mahendran and Anurag Arnab and Mostafa Dehghani and Zhuoran Shen and Xiao Wang and Xiaohua Zhai and Thomas Kipf and Neil Houlsby},
      year={2022},
      eprint={2205.06230},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

owlvit_segment_anything's People

Contributors

ngthanhtin avatar robo-alex avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

owlvit_segment_anything's Issues

Question on output labeling

Hi there!

I'm completely new to this, but I'm wondering if it's able to output x/y location of each label, and the name of the label, like [bear1:100,150, bear2:300,512] and how much effort that is?

I've never done anything like this before, but I have a strong use case for this (find gray circles, called "spots" in a large petri dish with yellow background). It looks like it would be an easy / solved problem, I just have trouble understanding how to do it haha...

Parallel computing

Hi, I saw in the demo.py that the images were processed one by one, which is time-consuming for large scale images. So I want to know how to process multiple images in parallel for the same text?Looking forward to your reply.

Suggestion - Integrate MobileSAM into the pipeline for lightweight and faster inference

Reference: https://github.com/ChaoningZhang/MobileSAM

Our project performs on par with the original SAM and keeps exactly the same pipeline as the original SAM except for a change on the image encode, therefore, it is easy to Integrate into any project.

MobileSAM is around 60 times smaller and around 50 times faster than original SAM, and it is around 7 times smaller and around 5 times faster than the concurrent FastSAM. The comparison of the whole pipeline is summarzed as follows:

image

image

Best Wishes,

Qiao

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.