syscv / sam-hq Goto Github PK

View Code? Open in Web Editor NEW

3.4K 3.4K 197.0 42.17 MB

Segment Anything in High Quality [NeurIPS 2023]

Home Page: https://arxiv.org/abs/2306.01567

License: Apache License 2.0

Python 90.78% C++ 0.92% Cuda 8.21% Shell 0.09%

high-quality sam segment-anything segment-anything-model segmentation zero-shot-segmentation

sam-hq's People

Contributors

Stargazers

Watchers

Forkers

dezigns333 cip8 apollohuang1 thanhpham1987 cndjag kdjsc lonely-geese leecig jerometan universewill oceans0423 jinwook-shim tonywhite11 ruizhang-cheryl evdcush drgonzalomora keyman9848 tuanbc ukaserge jackli1942 wjgaas ardabck kcsumesh hhy5277 designcntrl guanshanjushi zbynetwork zoq shadowkun haorand sciumo delldu dreamflychen g123413927 mgvision dv-lwd ejbejaranosai healthonrails ibtehajali67 rsohlot anglebinbin fanchunpeng templeblock notmariekondo vn-os melgene-xu luisfilipemalheiro phamvietxuan gogopen javadmozaffari lipengfei-boss anthonyyuan oscar-davids shapeddev keyzf uestc-hz tpswpu eddogola yusifelawawdeh drhead babyblue26 m43 xb534 hadryan sun631998316 yohanlegars jie311 longxiaohai ahmedhussain0508 biomeasure undercontroller noticeable cswaynecool warlock8hz cv-seg muyanggit kaeun-kim-97 unriddler onuralpszr xiaoxiaoto sterling312 aliktk starburst997 pei-eng kbpark102 kumarselvakumaran-sentient hasithz wizd farzadazizizade udiram daniegr dreamstar321 yoona-ai-training mohammadreza-sheykhmousa llipengyu007 brcmbruce koolou senlin-ali xiaomoguhzz c00renut

sam-hq's Issues

Code release

Hello,
Thank you for this excellent job.
Do you have any plans to release the code and weights?

hqsam_light weights

Hey, I am not sure (maybe this is local) but the hqsam_light weights in hugging face is not working and the one in google drive working.

Question about SAM-HQ

In this project, adding a new image, can I see if there is a car in the image or not, for example?

Or is the purpose here to know if there are objects of interest in the images?

gdown does not work

Hey guys,

I can only download the checkpoints, however, Idk to deploy the model in the cloud and download via gdown.

maybe checkpoints share approach could be improved.

thx

Error in paper about Params of SAM?

Hi,
In HQ-SAM paper, it said that param of SAM-B is 358M.
But, I use the code to count is 93.9M
model_total_params = sum(p.numel() for p in sam.parameters())
which is error?

When fine-tuning one's own dataset, an error was reported as follows

Traceback (most recent call last):
File "train.py", line 694, in
main(net, train_datasets, valid_datasets, args)
File "train.py", line 327, in main
train_dataloaders, train_datasets = create_dataloaders(train_im_gt_list,
File "/home/quchunguang/datasets/sam-hq/train/utils/dataloader.py", line 71, in create_dataloaders
sampler = DistributedSampler(gos_dataset)
File "/home/quchunguang/anaconda3/envs/Semantic-SAM/lib/python3.8/site-packages/torch/utils/data/distributed.py", line 65, in init
num_replicas = dist.get_world_size()
File "/home/quchunguang/anaconda3/envs/Semantic-SAM/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 845, in get_world_size
return _get_group_size(group)
File "/home/quchunguang/anaconda3/envs/Semantic-SAM/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 306, in _get_group_size
default_pg = _get_default_group()
File "/home/quchunguang/anaconda3/envs/Semantic-SAM/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 410, in _get_default_group
raise RuntimeError(
RuntimeError: Default process group has not been initialized, please make sure to call init_process_group.

How to solve it?

demo_hqsam

When I run demo_hqsam, the following problem arises: _pick. UnpicklingError: A load persistent id instruction was encountered,
but no persistent_load function was specified. How can I solve it?****

Not adding hq_tokens, directly use the sam tokens

Hi, have you tested not adding hq_tokens, directly use the sam tokens with the fused image features

Dataset Download

Sorry, you are currently unable to view or download this file.
Too many users have recently viewed or downloaded this file. Please try to access this file again later. If the file you are trying to access is particularly large or has been shared with many people, you may have to wait up to 24 hours before you can view or download it. If you still cannot access the file after 24 hours, please contact your domain administrator.

train.py

How do I get 'labels_points' in line 555 and 'labels_noisemask' in line 559 of the 'train.py' file?

Looks large version is better than huge version

I run the code successfully. Interesting.

Suggestion - Integrate MobileSAM into the pipeline for lightweight and faster inference

Reference: https://github.com/ChaoningZhang/MobileSAM

Our project performs on par with the original SAM and keeps exactly the same pipeline as the original SAM except for a change on the image encode, therefore, it is easy to Integrate into any project.

MobileSAM is around 60 times smaller and around 50 times faster than original SAM, and it is around 7 times smaller and around 5 times faster than the concurrent FastSAM. The comparison of the whole pipeline is summarzed as follows:

Best Wishes,

Qiao

Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False

Traceback (most recent call last):
File "E:\sam-hq\demo\demo_hqsam.py", line 63, in
sam = sam_model_registrymodel_type
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\sam-hq\segment_anything\build_sam.py", line 28, in build_sam_vit_l
return _build_sam(
^^^^^^^^^^^
File "E:\sam-hq\segment_anything\build_sam.py", line 106, in _build_sam
state_dict = torch.load(f)
^^^^^^^^^^^^^
File "C:\Python311\Lib\site-packages\torch\serialization.py", line 809, in load
return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Python311\Lib\site-packages\torch\serialization.py", line 1172, in _load
result = unpickler.load()
^^^^^^^^^^^^^^^^
File "C:\Python311\Lib\site-packages\torch\serialization.py", line 1142, in persistent_load
typed_storage = load_tensor(dtype, nbytes, key, _maybe_decode_ascii(location))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Python311\Lib\site-packages\torch\serialization.py", line 1116, in load_tensor
wrap_storage=restore_location(storage, location),
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Python311\Lib\site-packages\torch\serialization.py", line 217, in default_restore_location
result = fn(storage, location)
^^^^^^^^^^^^^^^^^^^^^
File "C:\Python311\Lib\site-packages\torch\serialization.py", line 182, in _cuda_deserialize
device = validate_cuda_device(location)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Python311\Lib\site-packages\torch\serialization.py", line 166, in validate_cuda_device
raise RuntimeError('Attempting to deserialize object on a CUDA '
RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU.

Evaluation script to reproduce numbers in SAM-HQ paper

Great work! I am looking for a script that allows to reproduce iou and boundary IOU numbers. I looked into the train folder and there is an evaluation example shown. However it uses the checkpoint sam_vit_l_0b3195.pth

The predicted masks from this checkpoint of extreme poor quality leading me to believe I should have been using sam_hq_vit_l.pth shown in the main readme of the repo. However when I pass sam_hq_vit_l.pth, to the argument checkpoint of train.py along with flag --eval, it fails to load the checkpoint and errors out since keys do not match.

Please advise how I can reproduce results.

result of demo_sam.py is bad and worse than sam

I download sam_hq_vit_l.pth and run the demo_sam.py. The result is bad and worse than sam.

Prompt based Sam HQ

Hello,

How can I use a prompt for segmentation (like the demo version)?

Thank you

IoU Calculate

Hi, I have found that the calculation of IoU is done by computing IoU for each individual image and then taking the average over all the images. However, shouldn't we calculate the intersection and union over all the images first, and then compute IoU?

About the code of finetuning on the specific dataset

Thanks for this siginificant and interesting work.
Open-source work is helpful to the developer
In the paper, the authors write "In particular, we use both global semantic context and local fine-grained features by fusing SAM’s mask decoder features with early and late feature maps from its ViT encoder. During training, we freeze the entire pre-trained SAM parameters, while only updating our HQ-Output Token, its associated three-layer MLPs, and a small feature fusion block.". Is there a demo to run the training code or should we reproduce them by ourselves?

Thanks & Regards!
Momo

Running on M1/M2 or CPU

Hello,

I am trying to run the software on MAC-M1. I changed device in the demo example (demo_hqsam.py) to "cpu" and "mps" but in both cases I got the error messages: (The code works for demo_sam.py")

File "/Users/Projects/SegmentAnything/SAMHQ/samhq/lib/python3.11/site-
packages/torch/serialization.py", line 217, in default_restore_location
result = fn(storage, location)
^^^^^^^^^^^^^^^^^^^^^
File "/Users/Projects/SegmentAnything/SAMHQ/samhq/lib/python3.11/site-packages/torch/serialization.py", line 182, in _cuda_deserialize
device = validate_cuda_device(location)

Onnx export

Can you upload a script to correctly export the entire model as onnx ?

sam = sam_model_registry[model_type](checkpoint=sam_checkpoint) is giving me this error

Error(s) in loading state_dict for Sam:
Unexpected key(s) in state_dict: "mask_decoder.hf_token.weight", "mask_decoder.hf_mlp.layers.0.weight", "mask_decoder.hf_mlp.layers.0.bias", "mask_decoder.hf_mlp.layers.1.weight", "mask_decoder.hf_mlp.layers.1.bias", "mask_decoder.hf_mlp.layers.2.weight", "mask_decoder.hf_mlp.layers.2.bias", "mask_decoder.compress_vit_feat.0.weight", "mask_decoder.compress_vit_feat.0.bias", "mask_decoder.compress_vit_feat.1.weight", "mask_decoder.compress_vit_feat.1.bias", "mask_decoder.compress_vit_feat.3.weight", "mask_decoder.compress_vit_feat.3.bias", "mask_decoder.embedding_encoder.0.weight", "mask_decoder.embedding_encoder.0.bias", "mask_decoder.embedding_encoder.1.weight", "mask_decoder.embedding_encoder.1.bias", "mask_decoder.embedding_encoder.3.weight", "mask_decoder.embedding_encoder.3.bias", "mask_decoder.embedding_maskfeature.0.weight", "mask_decoder.embedding_maskfeature.0.bias", "mask_decoder.embedding_maskfeature.1.weight", "mask_decoder.embedding_maskfeature.1.bias", "mask_decoder.embedding_maskfeature.3.weight", "mask_decoder.embedding_maskfeature.3.bias".

Training problem

When I want to train HQ_SAM by instruction,I get the error "HQ-SAM: error: unrecognized arguments: --local-rank=0
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 2) local_rank: 0 (pid: 1468) of binary: /usr/bin/python3" in colab.how to resolve it?

onnx and pytorch get different inference results

What factors are responsible for the phenomenon that onnx and pytorch get different inference results?

Run this project

Hi everyone,

How run this project with Mac M1 after install opencv-python pycocotools matplotlib onnxruntime onnx?

And if i add new image? How can I predict the result of the new image?

The result of brach sam will be involved by HQ-Output Token

Hello,
I find that the output of sam mask in network "HQ-SAM" is different from the original network SAM's output.
Is the reason that the tokens interact on each other ？
Thanks

i got a issues about OMP error

image: 0 E:/github/sam-hq/demo/demo_hqsam.py:31: MatplotlibDeprecationWarning: Support for FigureCanvases without a required_interactive_framework attribute was deprecated in Matplotlib 3.6 and will be removed two minor releases later. plt.figure(figsize=(10,10)) OMP: Error #15: Initializing libiomp5md.dll, but found libiomp5md.dll already initialized. OMP: Hint This means that multiple copies of the OpenMP runtime have been linked into the program. That is dangerous, since it can degrade performance or cause incorrect results. The best thing to do is to ensure that only a single OpenMP runtime is linked into the process, e.g. by avoiding static linking of the OpenMP runtime in any library. As an unsafe, unsupported, undocumented workaround you can set the environment variable KMP_DUPLICATE_LIB_OK=TRUE to allow the program to continue to execute, but that may cause crashes or silently produce incorrect results. For more information, please see http://www.intel.com/software/products/support/.

Process finished with exit code 3

how to fix that issues

Questions about Text Prompt

Hi! Great thanks for the work. I tried the hugging face demo and found that text prompt option and the advanced options are really effective for the current segment task i've been trying to do. However i didn't find any guide or options for this part in demo so is there any alternative way to modify these when i'm running my own code? Thx for reply!

pypi package?

Any plans to create a pypi package for PIP to simply import and use the necessary packages across platofrms, and use from APIs and software?

Predictions of baseline MobileSAM inconsistent with official MobileSAM

When running the exact same input image and box prompt with the baseline MobileSAM (available in this repository) and the official MobileSAM model there are differences in predicted masks. I have tried to locate if there are particular differences in the code causing this inconsistent behavior but without luck thus far. I believe this is an urgent issue that should be investigated.

Baseline MobileSAM (SAM-HQ repository) vs Official MobileSAM (MobileSam repository):

[Feature] Export to ONNX SAM image encoder

Thank you very much for this incredible model.

I was looking at your guide for exporting the model to ONNX. I didn't understand why you don't want to export the SAM image encoder to ONNX. I think is because you are executing the onnx graph with onnxruntime in CPU.

However, it would be nice to have it for Triton Inference Server with CUDA backend.

Input resolution - target size

Is there a way to control the input size?
Do we need to enforce scaling the encoder expected resolution?

Data Loader throwing FileNotFound error After few epochs of training

I've used training command but every time after random number of epochs I've got FileNotFound error from dataloader.Anyone knows the solution?
error:
epoch: 14 learning rate: 1e-05
[ 0/333] eta: 0:14:51 training_loss: 0.1127 (0.1127) loss_mask: 0.0446 (0.0446) loss_dice: 0.0681 (0.0681) time: 2.6786 data: 0.3379 max mem: 10103
Traceback (most recent call last):
File "/content/drive/MyDrive/sam-hq/train/train.py", line 651, in
main(net, train_datasets, valid_datasets, args)
File "/content/drive/MyDrive/sam-hq/train/train.py", line 360, in main
train(args, net, optimizer, train_dataloaders, valid_dataloaders, lr_scheduler,writer)
File "/content/drive/MyDrive/sam-hq/train/train.py", line 396, in train
for data in metric_logger.log_every(train_dataloaders,1000):
File "/content/drive/MyDrive/sam-hq/train/utils/misc.py", line 237, in log_every
for obj in iterable:
File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/dataloader.py", line 633, in next
data = self._next_data()
File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/dataloader.py", line 1345, in _next_data
return self._process_data(data)
File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/dataloader.py", line 1371, in _process_data
data.reraise()
File "/usr/local/lib/python3.10/dist-packages/torch/_utils.py", line 644, in reraise
raise exception
FileNotFoundError: Caught FileNotFoundError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/_utils/worker.py", line 308, in _worker_loop
data = fetcher.fetch(index)
File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/_utils/fetch.py", line 51, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/_utils/fetch.py", line 51, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/dataset.py", line 243, in getitem
return self.datasets[dataset_idx][sample_idx]
File "/content/drive/MyDrive/sam-hq/train/utils/dataloader.py", line 244, in getitem
File "/usr/local/lib/python3.10/dist-packages/skimage/io/_io.py", line 53, in imread
img = call_plugin('imread', fname, plugin=plugin, **plugin_args)
File "/usr/local/lib/python3.10/dist-packages/skimage/io/manage_plugins.py", line 207, in call_plugin
return func(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/skimage/io/_plugins/imageio_plugin.py", line 15, in imread
return np.asarray(imageio_imread(*args, **kwargs))
File "/usr/local/lib/python3.10/dist-packages/imageio/v2.py", line 226, in imread
with imopen(uri, "ri", **imopen_args) as file:
File "/usr/local/lib/python3.10/dist-packages/imageio/core/imopen.py", line 113, in imopen
request = Request(uri, io_mode, format_hint=format_hint, extension=extension)
File "/usr/local/lib/python3.10/dist-packages/imageio/core/request.py", line 247, in init
self._parse_uri(uri)
File "/usr/local/lib/python3.10/dist-packages/imageio/core/request.py", line 407, in _parse_uri
raise FileNotFoundError("No such file: '%s'" % fn)
FileNotFoundError: No such file: '/content/drive/MyDrive/Iris-and-Needle-Segmentation-3/train/images/SID0615_jpg.rf.8dd4aeb70ce910df9c8716e3af21b2cd.jpg'

ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 2600) of binary: /usr/bin/python3
Traceback (most recent call last):
File "/usr/local/bin/torchrun", line 8, in
sys.exit(main())
File "/usr/local/lib/python3.10/dist-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 346, in wrapper
return f(*args, kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/distributed/run.py", line 794, in main
run(args)
File "/usr/local/lib/python3.10/dist-packages/torch/distributed/run.py", line 785, in run
elastic_launch(
File "/usr/local/lib/python3.10/dist-packages/torch/distributed/launcher/api.py", line 134, in call**
return launch_agent(self._config, self._entrypoint, list(args))
File "/usr/local/lib/python3.10/dist-packages/torch/distributed/launcher/api.py", line 250, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

train.py FAILED

Failures:
<NO_OTHER_FAILURES>

Root Cause (first observed failure):
[0]:
time : 2023-08-01_11:51:44
host : 6198cb800e23
rank : 0 (local_rank: 0)
exitcode : 1 (pid: 2600)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

About custom dataset.

There is a custom dataset, each picture contains multiple objects, how to modify the dataloader or model to train such a dataset?

Error loading state dict

While running demo_hqsam.py i have following error
using sam_hg_vit_l model and vit_l for model_type

RuntimeError: Error(s) in loading state_dict for Sam: Unexpected key(s) in state_dict: "mask_decoder.hf_token.weight", "mask_decoder.hf_mlp.layers.0.weight", "mask_decoder.hf_mlp.layers.0.bias", "mask_decoder.hf_mlp.layers.1.weight", "mask_decoder.hf_mlp.layers.1.bias", "mask_decoder.hf_mlp.layers.2.weight", "mask_decoder.hf_mlp.layers.2.bias", "mask_decoder.compress_vit_feat.0.weight", "mask_decoder.compress_vit_feat.0.bias", "mask_decoder.compress_vit_feat.1.weight", "mask_decoder.compress_vit_feat.1.bias", "mask_decoder.compress_vit_feat.3.weight", "mask_decoder.compress_vit_feat.3.bias", "mask_decoder.embedding_encoder.0.weight", "mask_decoder.embedding_encoder.0.bias", "mask_decoder.embedding_encoder.1.weight", "mask_decoder.embedding_encoder.1.bias", "mask_decoder.embedding_encoder.3.weight", "mask_decoder.embedding_encoder.3.bias", "mask_decoder.embedding_maskfeature.0.weight", "mask_decoder.embedding_maskfeature.0.bias", "mask_decoder.embedding_maskfeature.1.weight", "mask_decoder.embedding_maskfeature.1.bias", "mask_decoder.embedding_maskfeature.3.weight", "mask_decoder.embedding_maskfeature.3.bias".

A very good SAM fine-tuning job

This is an exciting work, I read your article, I would like to ask if your team has tested it on remote sensing imagery, which is a relatively complex task.

It seems to have no effect, and it is still not identifiable

I want to recognize only the glasses, but part of the face is also recognized

https://df5apg8r0m634.cloudfront.net/images/2023/0615/c5c33275b4f08d18108e9f3756ef44f1.png

"Automatic mask generation" is possible?

Is there a function to generate masks fully automatically, like the notebook in the example?
Also, do you have any plans to add such a function?

https://github.com/facebookresearch/segment-anything/blob/main/notebooks/automatic_mask_generator_example.ipynb

About Semantic Segmentation

Hello, I want to ask how to achieve the semantic segmentation effect described in the article images, such as identifying a person, cow, car?

Confusion about the comment in the sampling method

I am confused with the below setences... In train/utils/loss_mask.py Line67

    # It is crucial to calculate uncertainty based on the sampled prediction value for the points.
    # Calculating uncertainties of the coarse predictions first and sampling them for points leads
    # to incorrect results.
    # To illustrate this: assume uncertainty_func(logits)=-abs(logits), a sampled point between
    # two coarse predictions with -1 and 1 logits has 0 logits, and therefore 0 uncertainty value.
    # However, if we calculate uncertainties for the coarse predictions first,
    # both will have -1 uncertainty, and the sampled point will get -1 uncertainty.

Why the two coarse predictions with -1 and 1 logits has 0 logits, and therefore 0 uncertainty value. Is the bilinear upsample caused?

Whether the ONNX decoder supports only one box input at a time？

the checkpoint cannot download

the link said checkpoint have been browsed and downloaded too many times, cannot download now

Is it possible to achieve better results?

First, Thank you for the impressive project!
I am trying to auto generate masks for photometric stereo input and am encountering some problems with unclean masks.

(extreme example)

Do you have any recommendation on how to generate better results ( masks without holes, only mask of the person/face)?

Thank you for your time!

Training code

Thanks for the great work. When will you be able to release the training code?

how can i get output json file?

Thanks for create SAM-HQ.
how can i get output json file?

Real Time & Crop

is it possible to run in real time and crop the detected ones? Can you share the code about it?

Hello, the ground truth images in the data set are all in grayscale mode, and the mask cannot be greater than 128

Hello, I would like to ask that the ground truth images in the dataset are all in grayscale mode, and the mask cannot be greater than 128. Is the masks[b_i]>128 in masks_to_boxes and masks_sample_points still effective?

About HQSEG-44K

Thanks for the excellent job! I am planning to do some fine-tuning on the SAM model with HQSEG-44K dataset. However, when I prepared the dataset, I found several issues:

Some images in FSS are never labeled, like bamboo_slip/7.png
Some images in MSRA-10K are mislabeled, like 97915.jpg
The DUT-OMRON dataset only contains 5168 images, while actually the DUTS dataset contains 15572 images (including training and val splits). I cannot figure out which dataset is truly used in the HQSEG-44K.

Can you tell us more dataset preparation details? Thanks!

multimask_output with SAM-HQ

Hi everyone!

I was comparing the implementation of original mask_decoder.py and improved mask_decoder_hq.py and I have found a difference when using multimask_output flag.

In the original implementation (mask_decoder.py), when using multimask_output the output was a mask with size 1, 3, 256, 256 and iou_pred with size 1, 3.
However, with the new implemententation (mask_decoder_hq.py), when using multimask_output the output is a mask with size 1, 1, 256, 256 and iou_pred with size 1,1. Furthermore, this mask is the one associated with the maximum iou_pred (that is, the mask with the maximum "detection confidence").

In summary:

mask_decoder.py and multimask_output==True
- masks: Tensor (1, 3, 256, 256)
- iou_pred = Tensor (1, 3)
mask_decoder_hq.py and multimask_output==True
- masks_sam: Tensor (1, 1, 256, 256)
- iou_pred = Tensor (1, 1)

Is there any explanation for this implementation? When multimask_output==True, SAM used to return 3 masks but SAM-HQ now just returns 1.

In the new implementation, then masks_sam is added to masks_hq to obtain final masks (masks = masks_sam + masks_hq). Couldn't this be done over the 3 obtained masks instead of selecting just the one that maximizes iou_prob?

Thanks in advance and great work @lkeab!

result is bad?

When I use my own picture, it's result is bad, even worse than origin sam. Is this method truely useful? There is a gap between paper and the real world‘s result.

Any guidelines for an hq_token_only parameter?

Big applause for such a decent work!

Perhaps, is there any guidelines when using "hq_token_only" parameter in inference stage?

Heuristally, the performance of this paramater seems to be depending on the details of the image(or an object which needed to be masked). Even with the example images given in "demo", some images result in better segementation of the object, while on the other situation, the slightly worse performance on segmenation of details.

If your team also had an heuristic approach to this parameter, would you suggest any ideas when this parameter works more effectively than having it as "False"?

Thanks in advance, and sorry if I have missed the explanation in the paper.

syscv / sam-hq Goto Github PK

sam-hq's People

Contributors

Stargazers

Watchers

Forkers

sam-hq's Issues

train.py FAILED

Failures: <NO_OTHER_FAILURES>

Recommend Projects

Recommend Topics

Recommend Org

Failures:
<NO_OTHER_FAILURES>