zrrskywalker / personalize-sam Goto Github PK

View Code? Open in Web Editor NEW

1.5K 27.0 98.0 66.58 MB

Personalize Segment Anything Model (SAM) with 1 shot in 10 seconds

License: MIT License

Python 100.00%

personalize-sam's People

Contributors

Stargazers

Watchers

Forkers

ninjasaid2k chenchy licongguan claireguo1999 boxfishlab lawrencesun arman-hk techthiyanes hyojunguy thanhtung-dao haorand nielsrogge hufeihu iamleon121 tears743 russ76 ccaiccie githuberpilot viprob-ai jojocorleone soon14 evdcush asdlei99 utkarshsingh77 thanhpham1987 mikesable93 jiyuxuan926 jinlmsft senlin-ali ejri ameerazam08 ai-machine-vision-lab mlshenkai alexjust-data jamesthesnake jameshennessytempus jiachen0212 youngwanlee skwarson96 dylanhu7 audiowiz mwcoleman haizhu12 mlzoo wensiyuansix s8xy maigone nicolesherwood vamoko amirdev83 kp-forks yzj2019 daichitaaa astarnativewallet sunglyoungkim pathakprecisionag yang-i-hu 2132660698 qiaoyu1002 happybear1015 wuzujiong andrew-healey junkybyte lxiu-yu vn-os mmarcoo paperwave yanhn m43 pei-eng akring-creator ljqcn101 lightonthefloor tall-josh nye0 slmlah yehx1 adambear robert-junwang wcxwcxw zebrajack jackzhousz singlemzh happyxy neuron-vision jaredshuai yfh-yufeihu learntostick psnow kestrelm ace280 alenaliu xzy-yqm bigttantien tonghengcheng rertyy yjszyd0813

personalize-sam's Issues

Multiple Object Masks in the reference image

I just wanted to confirm whether its logical for the user to provide multiple masks (of the same object category) in the input. For example, can there be 2 cats in the image and 2 segmented regions in the binary mask corresponding to it, while passing it as a reference image? (A follow-up question would be that can we have multiple object categories in the same image?)

Will the model be able to figure out that there are 2 mask segments (of the same object) in the reference?

If not, is there any clever hack to pass multiple masks of a particular object type in the same input reference image?

Use own segmentation?

I have images of object with segmentation masks created manually by myself.
I modify the jupyter notebook "Personalize SAM with one shot (with fine-tuning)", but the resulting segmentation is not perfect. It includes parts in the image that is not the object, and does not fully cover the object. Is it possible to use your own masks like this? And if the result is not good enough, could you somehow use more images with ground truth masks to improve it further?

Problem with code in multi-object

Hello, thank you for doing and disclosing this great research!

If you look at the code in persam_f_multi_obj now, it is confirmed that the image finds multiple images through learning. Why does persam_f_multi_obj.py find ref_image in multiple images? Isn't it learning through one originally set ref_image?

persam_f.py: error: argument --lr: invalid int value: '0.01'

Personalize-SAM/persam_f.py

Line 27 in c1effd8

parser.add_argument('--lr', type=int, default=1e-3)

should be changed to

parser.add_argument('--lr', type=float, default=1e-3)

Annotation of PerSeg dataset

Thanks for the interesting work. I wonder how to construct the annotation mask for this dataset.

Reproducing DAVIS 2017 Validation Results

Thanks for the nice work! I am having trouble reproducing the 71.9 mean $\mathcal{J}\&\mathcal{F}$ result reported for PerSAM-F on the semi-supervised video object segmentation task on the DAVIS 2017 validation subset in Table 2. What hyperparameters should be used? What per-scene results should be expected?

.DS_Store in PerSeg causes crash

Minor issue but .DS_Store files generated by macOS are included in the PerSeg dataset. This causes a crash at the end of running PerSAM since it will try to run it on the .DS_Store file.

ValueError: zero-size array to reduction operation minimum which has no identity

Hi,
Thank you for your excellent work at first!

When I use persam.py on my own dataset (a medical image dataset), I get the following error

ValueError: zero-size array to reduction operation minimum which has no identity

Apparently, SAM predicts an empty mask, what could be buggy about this?
Your excellent will be a great help to my research!

sam 社群群聊主要是解决开发的时候不交叉/重复大家都玩sam 坐下来谈谈里面有中科院自动化商汤腾讯 fastsam官方git群

Finetuning with multiple reference images & masks that a user provides

Hello
How are you?
Thanks for contributing to this project.
I checked the "persam_f.py" script which finetunes with ONLY one reference image & mask that a user provides.
Could you provide a script which finetunes with multiple (at most 5~8)reference images & masks that a user provides?
Thanks

training multi images per batch

hi ，there
I wonder whether we can add mutli images every batch size to finetune

PerSAM-f

Could you please explain why the implementation of persam_f.py does not utilize target-guided attention and target semantic prompting?

Missing `requiremnts.txt` ?

The README instructs to run:

pip install -r requirements.txt

Maybe you forgot to version it ?

multipole objects

Hi thanks for the cool work
as I understand the code is designed to find only 1 object in the test image

any idea how to modify it to find multiple?
(I tried to increase the topk
but I got more points in the same area

I modify the sneaker example for this test

thanks for the help!

Exporting the fine-tuned model

Hi all! Thank you very much for putting such an awesome piece of work together.

After a first look I did not see an option to export the fine-tuned model using PerSAM-F method.
Is it possible to give one or more masked images of my object, run fine-tuning using PerSAM-F and to then export the fine-tuned model such that I can use it in an online fashion for new images in other pipelines? I think this would be really useful for a lot of researchers. Could you share any pointers on how this could be done?

Thanks again for all the great efforts!

i wonder video task

I understand this task that only few dataset(image - masked image) used finetuning(or not).

But in personalized-segmentation-on-video task, require entire sequence dataset(image - masked image).

Umm, is it right? or i misunderstood?
I think that uses only few dataset, then model inference for the rest of the sequence..

Thanks.

Tutorial notebooks: PerSAM

Hi PerSAM authors :)

As your method is really cool I've contributed it to Hugging Face.

Here are 2 demo notebooks showcasing the PerSAM and PerSAM-f methods: https://github.com/NielsRogge/Transformers-Tutorials/tree/master/PerSAM.

Note that the Hugging Face implementation of SAM is used.

Cheers!

Btw there's already follow-up work: https://twitter.com/ducha_aiki/status/1660967979972960258

`target_embedding` without training

Hello --

Really enjoyed the paper. One clarifying question: you add the target_embedding to the query point embedding here:
https://github.com/ZrrSkywalker/Personalize-SAM/blob/main/per_segment_anything/modeling/transformer.py#L94

but you don't fine-tune the model. Do you have an intuition for why that works? Is it basically that the TwoWayAttentionBlock is now computing attention based on the "average" of the similarity between points <-> image embeddings and target_embedding <-> image embeddings?

Thanks!

Personalize-SAM for CVAT

The serverless version of the free annotation tool CVAT already supports SAM, I am wondering how hard it would be to add Personalize-SAM to it. Details are here: https://www.cvat.ai/post/facebook-segment-anything-model-in-cvat

what license is this repository under?

可视化结果存在问题？

huggingface上的用例，如：猫+猫的mask为参考。
测试图像为：随便找个狗狗图像，也是会把狗狗mask出来。
按理说应该没有mask输出吧？

Multi-Gpu Training question

Hi ! Great work! I want to ask whether your code supports the use of multi-gpu?

For Multi-Object segmentation of the same category,what's the format of the dataset?

Extending the approach to multi-shot training

I was wondering if and how this work could be extended to multi-shot segmentation.
The demo uses a single mask for training, would it be possible to use multiple masks of the same object and then fine-tune the model? Has this already been experimented with?

evaluation code for VOS

Hi Authors, thanks so much for your interesting work! It looks pretty cool : )

It would be really appreciated if you could like to share the evaluation code for VOS! And we just release the Focal-L model, feel free to get new numbers on Table.1 for SEEM : )

Best,
Xueyan

Working on my own data

Hi,
thanks for your nice work. I would like to test training on my own data. However, from the documentation it is not clear how to do it. Could you please give write a quick tutorial on how to run persam and persam_f it on your own data?
Thanks

How to discard false positive

When I use P SAM on an image that does not contain my learned item, it will still output a prediction. How can I at least compare a true positive against a false positive?

finding the positive & negative prior location

According to your code, you are setting the x-coordinate as the remainder when the index of the flattened tensor is divided by the width, and the y-coordinate as the quotient (Line 183-201 in the persam.py). However, no matter how much I think about it, it seems like the order should be reversed. What are your thoughts on this?

Please copy dataset to baiduyun

请问能不能把数据集放到百度云

大佬好，这个错误怎么解决？

training_size = (int(os.listdir(os.path.join(images_path, obj_name))) * args.training_percentage)
TypeError: int() argument must be a string, a bytes-like object or a number, not 'list'

finetune

希望分割纸箱的指定一面
自己绘制的mask只有这一面,finetune之后会将整个纸箱分割出来,如何调整只分割指定面?

Multiple objects issue

Hello, first of all, very good idea and implemenation. I tried to finetune segment-anything model with multiple objects (belongs to same class). The output results were not able to predict all objects. Any suggestion on how to improve it?

Missing features:

save finetune weights
loading finetune weights for other predictions

Suggestion - Integrate MobileSAM into the pipeline for lightweight and faster inference

Reference: https://github.com/ChaoningZhang/MobileSAM

Our project performs on par with the original SAM and keeps exactly the same pipeline as the original SAM except for a change on the image encode, therefore, it is easy to Integrate into any project.

MobileSAM is around 60 times smaller and around 50 times faster than original SAM, and it is around 7 times smaller and around 5 times faster than the concurrent FastSAM. The comparison of the whole pipeline is summarzed as follows:

Best Wishes,

Qiao

Positive location prior

Hi there,

Thanks for this great work!

I am trying to understand the codes. I find the paper mentioned positive location prior and negative location prior, while the codes seem like only utilize positive location prior for getting naive mask, I wonder which part of the codes have used the negative points?

Best,

code in predictor.py

https://github.com/ZrrSkywalker/Personalize-SAM/blob/a7e87245156e0b8efcba10a15a745e699c679d9d/per_segment_anything/predictor.py#L105C38-L105C38

The preprocess_mask() function in the sam.py should be used instead of the preprocess() function, right?

the model seems to overlook the negative prior and instead focuses more on the positive prior.

First of all, thank you so much for the great work you've done. I have a question and I'm hoping you can provide some insight.

I've noticed that when I use Persam to make predictions on a custom dataset, the model seems to overlook the negative prior and instead focuses more on the positive prior. It tends to "segment more", even when a large area has already been labeled by the negative prior, it still gets masked. But when I use the same labels on the official SAM model, the results are totally different and seem normal.

Any thoughts on why this might be happening? Thanks in advance!

video object segmentation code

Thanks for your good idea and implementation.
Noted that you have reported the segmentation results of DAVIS, could you please share your code?
I would really appreciate it if you could.

The web demo doesn't work now

Questions about multiple objects

          Thanks for your in-depth analysis. This is because, compared to the background, the sneaker feature is more similar to the cat feature.

If we give the one-shot image containing two objects, sneaker and cat, the reference sneaker feature would correctly match other sneakers with the highest scores in new images, rather than the cat. As shown in Figure 9 for video object segmentation.

Originally posted by @ZrrSkywalker in #9 (comment)

Hello, I am also having troubles in multiple objects. To clearly describe my problems, I am going to use chinese, please excuse for this:

我发现在sam模型的接口中，可以使用(单点，单框)，（多点，单框）这些prompt组合，但(多点,多框)的组合在代码层面上存在问题。

不知道是否我代码理解的问题，我也发现你的代码中有在回避类似的问题：
针对上述问题，我想请问，paper 4.2节中Experimental Details 提及的使用bbox的具体流程步骤，我想您应该是step by step，比如第一步先用点prompt生成mask，第二步refine过程中再使用bbox。

以上，十分期待您的回答

Run the code with different image sizes

Thanks for the great work.
@jiangzhengkai @ZrrSkywalker

I am gonna try the code with different image sizes. But I get mismatch error. Is there any way to edit the code for sizes other than 1024 * 1024?

Dose Segment Anything work with text detection?

Hi,

Dose Segment Anything work with Text detection in documents?

Please advise.

Thanks and regards,

finetuning with multiple refence images containing multi-class multi objects

Hello
How are you?
Thanks for contributing to this project.
To my understanding, this projects dose NOT support to finetune PerSAM with multiple reference images containing multi-class multi-object at once.
Could you provide this function in future?
Thanks

Confused about the code based on the paper

Thanks for your great work! It is useful and concise, we want to cite your paper. But I have some confusion when I read your paper. Why do you get average at line 91 of persam.py, seeing as how the paper calculates the sim map and then averages it?
target_embedding = target_feat.mean(0).unsqueeze(0)

requirements.txt missing library

It seems that the requirements.txt file is missing some of the necessary libraries. Besides, some aren't necessary (such as os and warnings, which come in the base library).

I can contribute an MR that adds these. 👌

Few-shot version PerSAM

Thanks for your splendid work!

I checked you are planning to implement in few-shot manner in the existing issues #15.
Can I know how the current progress going?