Giter Club home page Giter Club logo

personalize-sam's People

Contributors

arman-hk avatar jiangzhengkai avatar qiaoyu1002 avatar shilinyan99 avatar zrrskywalker avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

personalize-sam's Issues

Multiple Object Masks in the reference image

I just wanted to confirm whether its logical for the user to provide multiple masks (of the same object category) in the input. For example, can there be 2 cats in the image and 2 segmented regions in the binary mask corresponding to it, while passing it as a reference image? (A follow-up question would be that can we have multiple object categories in the same image?)

Will the model be able to figure out that there are 2 mask segments (of the same object) in the reference?

If not, is there any clever hack to pass multiple masks of a particular object type in the same input reference image?

Use own segmentation?

I have images of object with segmentation masks created manually by myself.
I modify the jupyter notebook "Personalize SAM with one shot (with fine-tuning)", but the resulting segmentation is not perfect. It includes parts in the image that is not the object, and does not fully cover the object. Is it possible to use your own masks like this? And if the result is not good enough, could you somehow use more images with ground truth masks to improve it further?

Problem with code in multi-object

Hello, thank you for doing and disclosing this great research!

If you look at the code in persam_f_multi_obj now, it is confirmed that the image finds multiple images through learning. Why does persam_f_multi_obj.py find ref_image in multiple images? Isn't it learning through one originally set ref_image?

Reproducing DAVIS 2017 Validation Results

Thanks for the nice work! I am having trouble reproducing the 71.9 mean $\mathcal{J}\&\mathcal{F}$ result reported for PerSAM-F on the semi-supervised video object segmentation task on the DAVIS 2017 validation subset in Table 2. What hyperparameters should be used? What per-scene results should be expected?

.DS_Store in PerSeg causes crash

Minor issue but .DS_Store files generated by macOS are included in the PerSeg dataset. This causes a crash at the end of running PerSAM since it will try to run it on the .DS_Store file.

ValueError: zero-size array to reduction operation minimum which has no identity

Hi,
Thank you for your excellent work at first!

When I use persam.py on my own dataset (a medical image dataset), I get the following error

ValueError: zero-size array to reduction operation minimum which has no identity

Apparently, SAM predicts an empty mask, what could be buggy about this?
Your excellent will be a great help to my research!

Finetuning with multiple reference images & masks that a user provides

Hello
How are you?
Thanks for contributing to this project.
I checked the "persam_f.py" script which finetunes with ONLY one reference image & mask that a user provides.
Could you provide a script which finetunes with multiple (at most 5~8)reference images & masks that a user provides?
Thanks

PerSAM-f

Could you please explain why the implementation of persam_f.py does not utilize target-guided attention and target semantic prompting?

multipole objects

Hi thanks for the cool work
as I understand the code is designed to find only 1 object in the test image

any idea how to modify it to find multiple?
(I tried to increase the topk
but I got more points in the same area

I modify the sneaker example for this test
colorful_sneaker_00

thanks for the help!

Exporting the fine-tuned model

Hi all! Thank you very much for putting such an awesome piece of work together.

After a first look I did not see an option to export the fine-tuned model using PerSAM-F method.
Is it possible to give one or more masked images of my object, run fine-tuning using PerSAM-F and to then export the fine-tuned model such that I can use it in an online fashion for new images in other pipelines? I think this would be really useful for a lot of researchers. Could you share any pointers on how this could be done?

Thanks again for all the great efforts!

i wonder video task

I understand this task that only few dataset(image - masked image) used finetuning(or not).

But in personalized-segmentation-on-video task, require entire sequence dataset(image - masked image).

Umm, is it right? or i misunderstood?
I think that uses only few dataset, then model inference for the rest of the sequence..

Thanks.

`target_embedding` without training

Hello --

Really enjoyed the paper. One clarifying question: you add the target_embedding to the query point embedding here:
https://github.com/ZrrSkywalker/Personalize-SAM/blob/main/per_segment_anything/modeling/transformer.py#L94

but you don't fine-tune the model. Do you have an intuition for why that works? Is it basically that the TwoWayAttentionBlock is now computing attention based on the "average" of the similarity between points <-> image embeddings and target_embedding <-> image embeddings?

Thanks!

可视化结果存在问题?

huggingface上的用例,如:猫+猫的mask为参考。
测试图像为:随便找个狗狗图像,也是会把狗狗mask出来。
按理说应该没有mask输出吧?

Extending the approach to multi-shot training

I was wondering if and how this work could be extended to multi-shot segmentation.
The demo uses a single mask for training, would it be possible to use multiple masks of the same object and then fine-tune the model? Has this already been experimented with?

evaluation code for VOS

Hi Authors, thanks so much for your interesting work! It looks pretty cool : )

It would be really appreciated if you could like to share the evaluation code for VOS! And we just release the Focal-L model, feel free to get new numbers on Table.1 for SEEM : )

Best,
Xueyan

Working on my own data

Hi,
thanks for your nice work. I would like to test training on my own data. However, from the documentation it is not clear how to do it. Could you please give write a quick tutorial on how to run persam and persam_f it on your own data?
Thanks

How to discard false positive

When I use P SAM on an image that does not contain my learned item, it will still output a prediction. How can I at least compare a true positive against a false positive?

finding the positive & negative prior location

According to your code, you are setting the x-coordinate as the remainder when the index of the flattened tensor is divided by the width, and the y-coordinate as the quotient (Line 183-201 in the persam.py). However, no matter how much I think about it, it seems like the order should be reversed. What are your thoughts on this?

大佬好,这个错误怎么解决?

training_size = (int(os.listdir(os.path.join(images_path, obj_name))) * args.training_percentage)
TypeError: int() argument must be a string, a bytes-like object or a number, not 'list'

finetune

希望分割纸箱的指定一面
自己绘制的mask只有这一面,finetune之后 会将整个纸箱分割出来,如何调整只分割指定面?

Multiple objects issue

Hello, first of all, very good idea and implemenation. I tried to finetune segment-anything model with multiple objects (belongs to same class). The output results were not able to predict all objects. Any suggestion on how to improve it?

Missing features:

  • save finetune weights
  • loading finetune weights for other predictions

Suggestion - Integrate MobileSAM into the pipeline for lightweight and faster inference

Reference: https://github.com/ChaoningZhang/MobileSAM

Our project performs on par with the original SAM and keeps exactly the same pipeline as the original SAM except for a change on the image encode, therefore, it is easy to Integrate into any project.

MobileSAM is around 60 times smaller and around 50 times faster than original SAM, and it is around 7 times smaller and around 5 times faster than the concurrent FastSAM. The comparison of the whole pipeline is summarzed as follows:

image

image

Best Wishes,

Qiao

Positive location prior

Hi there,

Thanks for this great work!

I am trying to understand the codes. I find the paper mentioned positive location prior and negative location prior, while the codes seem like only utilize positive location prior for getting naive mask, I wonder which part of the codes have used the negative points?

Best,

the model seems to overlook the negative prior and instead focuses more on the positive prior.

First of all, thank you so much for the great work you've done. I have a question and I'm hoping you can provide some insight.

I've noticed that when I use Persam to make predictions on a custom dataset, the model seems to overlook the negative prior and instead focuses more on the positive prior. It tends to "segment more", even when a large area has already been labeled by the negative prior, it still gets masked. But when I use the same labels on the official SAM model, the results are totally different and seem normal.

Any thoughts on why this might be happening? Thanks in advance!

video object segmentation code

Thanks for your good idea and implementation.
Noted that you have reported the segmentation results of DAVIS, could you please share your code?
I would really appreciate it if you could.

Questions about multiple objects

          Thanks for your in-depth analysis. This is because, compared to the background, the sneaker feature is more similar to the cat feature.

If we give the one-shot image containing two objects, sneaker and cat, the reference sneaker feature would correctly match other sneakers with the highest scores in new images, rather than the cat. As shown in Figure 9 for video object segmentation.
image

Originally posted by @ZrrSkywalker in #9 (comment)

Hello, I am also having troubles in multiple objects. To clearly describe my problems, I am going to use chinese, please excuse for this:

  1. 我发现在sam模型的接口中,可以使用(单点,单框),(多点,单框)这些prompt组合,但(多点,多框)的组合在代码层面上存在问题。
    image
    不知道是否我代码理解的问题,我也发现你的代码中有在回避类似的问题:
    image

  2. 针对上述问题,我想请问,paper 4.2节 中Experimental Details 提及的 使用bbox的具体流程步骤,我想您应该是step by step,比如第一步先用点prompt生成mask,第二步refine过程中再使用bbox。

以上,十分期待您的回答

Confused about the code based on the paper

Thanks for your great work! It is useful and concise, we want to cite your paper. But I have some confusion when I read your paper. Why do you get average at line 91 of persam.py, seeing as how the paper calculates the sim map and then averages it?
target_embedding = target_feat.mean(0).unsqueeze(0)

requirements.txt missing library

It seems that the requirements.txt file is missing some of the necessary libraries. Besides, some aren't necessary (such as os and warnings, which come in the base library).

I can contribute an MR that adds these. 👌

Few-shot version PerSAM

Thanks for your splendid work!

I checked you are planning to implement in few-shot manner in the existing issues #15.
Can I know how the current progress going?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.