Giter Club home page Giter Club logo

segment-any-anomaly's Introduction

Segment Any Anomaly

Open In Colab HuggingFace Space

This repository contains the official implementation of Segment Any Anomaly without Training via Hybrid Prompt Regularization, SAA+.

SAA+ aims to segment any anomaly without the need for training. We achieve this by adapting existing foundation models, namely Grounding DINO and Segment Anything, with hybrid prompt regularization.

🔥What's New

💎Framework

We found that a simple assembly of foundation models suffers from severe language ambiguity. Therefore, we introduce hybrid prompts derived from domain expert knowledge and target image context to alleviate the language ambiguity. The framework is illustrated below:

Framework

Quick Start

🏦Dataset Preparation

We evaluate SAA+ on four public datasets: MVTec-AD, VisA, KSDD2, and MTD. Additionally, SAA+ was a winning team in the VAND workshop, which offers a specified dataset, VisA-Challenge. To prepare the datasets, please follow the instructions below:

By default, we save the data in the ../datasets directory.

cd $ProjectRoot # e.g., /home/SAA
cd ..
mkdir datasets
cd datasets

Then, follow the corresponding instructions to prepare individual datasets:

🔨Environment Setup

You can use our script for one-click setup of the environment and downloading the checkpoints.

cd $ProjectRoot
bash install.sh

📄Repeat the public results

MVTec-AD

python run_MVTec.py

VisA-Public

python run_VisA_public.py

VisA-Challenge

python run_VAND_workshop.py

The submission files can be found in ./result_VAND_workshop/visa_challenge-k-0/0shot.

KSDD2

python run_KSDD2.py

MTD

python run_MTD.py

📄Demo Results

Run following command for demo results

python demo.py

Demo

🎯Performance

Results Qualitative Results

🔨 Todo List

We have planned the following features to be added in the near future:

  • Update repository for SAA+
  • Detail the zero-shot anomaly detection framework.
  • Evaluate on other image anomaly detection datasets.
  • Add UI for easy evaluation.
  • Update Colab demo.
  • HuggingFace demo.

💘 Acknowledgements

Our work is largely inspired by the following projects. Thanks for their admiring contribution.

Stargazers over time

Stargazers over time

Citation

If you find this project helpful for your research, please consider citing the following BibTeX entry.

@article{cao_segment_2023,
	title = {Segment Any Anomaly without Training via Hybrid Prompt Regularization},
	url = {http://arxiv.org/abs/2305.10724},
	number = {{arXiv}:2305.10724},
	publisher = {{arXiv}},
	author = {Cao, Yunkang and Xu, Xiaohao and Sun, Chen and Cheng, Yuqi and Du, Zongwei and Gao, Liang and Shen, Weiming},
	urldate = {2023-05-19},
	date = {2023-05-18},
	langid = {english},
	eprinttype = {arxiv},
	eprint = {2305.10724 [cs]},
	keywords = {Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence},
}

@article{kirillov2023segany,
  title={Segment Anything}, 
  author={Kirillov, Alexander and Mintun, Eric and Ravi, Nikhila and Mao, Hanzi and Rolland, Chloe and Gustafson, Laura and Xiao, Tete and Whitehead, Spencer and Berg, Alexander C. and Lo, Wan-Yen and Doll{\'a}r, Piotr and Girshick, Ross},
  journal={arXiv:2304.02643},
  year={2023}
}

@inproceedings{ShilongLiu2023GroundingDM,
  title={Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection},
  author={Shilong Liu and Zhaoyang Zeng and Tianhe Ren and Feng Li and Hao Zhang and Jie Yang and Chunyuan Li and Jianwei Yang and Hang Su and Jun Zhu and Lei Zhang},
  year={2023}
}

segment-any-anomaly's People

Contributors

caoyunkang avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

segment-any-anomaly's Issues

Few shot training with SAA+

Hello,

I've been reading the paper on the model and there isn't much mention of Few Shot Learning in it. However, in the code there's an argument k_shot, is this referring to few shot learning? if so could you shed some light on how it is implemented in this context? If not how it might be done it SAA+?

use for image similarity task?

how can i use this to detect anomalies between two images?

like input two images and then detect anomalies for image similarity :O

分享

你好,可以分享一下这片论文的代码吗《Dual-path Frequency Discriminators for Few-shot Anomaly Detection》

How to support unseen types of defections?

by observing the prompts, I found SAA/SAA+ are quite limited to known types of defections.

Here is an example. SAA/SAA+ has no concept of "stripped screws".

image_path = 'assets/screws.png'
textual_prompts = ['dirty. stripped. spot. ', 'screw'] # detect prompts, filtered phrase
property_text_prompts = 'the image of screw have 4 similar screw, with a maximum of 2 anomaly. The anomaly would not exceed 1. object area. '

image
image

huggingface_hub.utils._errors.LocalEntryNotFoundError

When I python run_VisA_public.py,it appears that huggingface_hub.utils._errors.LocalEntryNotFoundError: Connection error, and we cannot find the requested files in the disk cache. Please try again or make sure your Internet connection is on.

Suggestion - Integrate MobileSAM into the pipeline for lightweight and faster inference

Reference: https://github.com/ChaoningZhang/MobileSAM

Our project performs on par with the original SAM and keeps exactly the same pipeline as the original SAM except for a change on the image encode, therefore, it is easy to Integrate into any project.

MobileSAM is around 60 times smaller and around 50 times faster than original SAM, and it is around 7 times smaller and around 5 times faster than the concurrent FastSAM. The comparison of the whole pipeline is summarzed as follows:

image

image

Best Wishes,

Qiao

About Train or Finetune

Does this SAA+ need to train or fine-tuning for segmenting any anomaly? What should I do if necessary? If not, why?

Prompt Engineering Suggestion

I am trying to use grounding-Dino to solve fabric anomaly-detection problem, except 'defect', do you have any other prompt recommendation

The error of run demo.py

Thank you so much for your wonderful paper! I have really enjoyed it!
But when I run:python demo.py and print(score),print(similarty map),it appears the following error and doesn't show the picture:

/share/home/project/Segment-Any-Anomaly/GroundingDINO/groundingdino/models/GroundingDINO/ms_deform_attn.py:31: UserWarning: Failed to load custom C++ ops. Running on CPU mode Only!
warnings.warn("Failed to load custom C++ ops. Running on CPU mode Only!")
/share/home/anaconda3/lib/python3.10/site-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:3483.)
return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined]
final text_encoder_type: bert-base-uncased
Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertModel: ['cls.predictions.transform.LayerNorm.bias', 'cls.predictions.bias', 'cls.predictions.transform.dense.bias', 'cls.seq_relationship.weight', 'cls.predictions.transform.dense.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight']

  • This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
  • This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
    build_sam_vit_h
    wide_resnet50_2
    /share/home/anaconda3/lib/python3.10/site-packages/transformers/modeling_utils.py:763: FutureWarning: The device argument is deprecated and will be removed in v5 of Transformers.
    warnings.warn(
    /share/home/anaconda3/lib/python3.10/site-packages/torch/utils/checkpoint.py:31: UserWarning: None of the inputs have requires_grad=True. Gradients will be None
    warnings.warn("None of the inputs have requires_grad=True. Gradients will be None")
    score is :[[0. 0. 0. ... 0. 0. 0.]
    [0. 0. 0. ... 0. 0. 0.]
    [0. 0. 0. ... 0. 0. 0.]
    ...
    [0. 0. 0. ... 0. 0. 0.]
    [0. 0. 0. ... 0. 0. 0.]
    [0. 0. 0. ... 0. 0. 0.]]

similarty map is :[[0. 0. 0. ... 0. 0. 0.]
[0. 0. 0. ... 0. 0. 0.]
[0. 0. 0. ... 0. 0. 0.]
...
[0. 0. 0. ... 0. 0. 0.]
[0. 0. 0. ... 0. 0. 0.]
[0. 0. 0. ... 0. 0. 0.]]

It seems that score, appendix = model(image) has some problems .
Thank you very much for your time and expertise. I greatly appreciate any insights or suggestions

About classification

Thank you so much for your wonderful paper! I have really enjoyed it!
I have a question regarding the SSA,

  1. What is the model's classification result for abnormal/non-abnormal in the entire picture?
  2. What is the cross-domain effect of classification?
  3. I noticed that adding a classification loss can lead to overfitting and non-convergence during fine-tuning. Could you please share any approaches you have used to address this problem?
    Thank you very much for your time and expertise. I greatly appreciate any insights or suggestions

custom dataset

How to implement a repo on custom datasets to get segmentation for the image by image?
How to Finetune using a custom dataset?

Reproduce results on MVTec-AD

Thank you for your great work! I try to reproduce the results on MVTec-AD following the instructions in README, but the pixel F1 score on MVTec-AD is 37.65 rather than 39.40 which is reported in the paper. Do I need to tune some hyper-parameters to obtain the same result? Thank you!

Domain Knowledge

Hey! You've mentioned quite a few times "domain expert knowledge" in the paper, but you never mentioned anything about where exactly it came from, or how you filtered down to these prompts. Would you mind shedding some light onto this part?

Failed to load custom C++ ops. Running on CPU mode Only!

曹博你好,按照您的的安装环境安装到linux后出现了Failed to load custom C++ ops. Running on CPU mode Only! 这个问题,以及requests.exceptions.ConnectionError: ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer')) 我看网上没有有关的解决方案,请问您是否也遇到了这个问题呢?如果没有遇到,您能说一下您的硬件环境吗?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.