syscv / sam-hq Goto Github PK
View Code? Open in Web Editor NEWSegment Anything in High Quality [NeurIPS 2023]
Home Page: https://arxiv.org/abs/2306.01567
License: Apache License 2.0
Segment Anything in High Quality [NeurIPS 2023]
Home Page: https://arxiv.org/abs/2306.01567
License: Apache License 2.0
Hello,
Thank you for this excellent job.
Do you have any plans to release the code and weights?
Hey, I am not sure (maybe this is local) but the hqsam_light weights in hugging face is not working and the one in google drive working.
In this project, adding a new image, can I see if there is a car in the image or not, for example?
Or is the purpose here to know if there are objects of interest in the images?
Hey guys,
I can only download the checkpoints, however, Idk to deploy the model in the cloud and download via gdown
.
maybe checkpoints
share approach could be improved.
thx
Hi,
In HQ-SAM paper, it said that param of SAM-B is 358M.
But, I use the code to count is 93.9M
model_total_params = sum(p.numel() for p in sam.parameters())
which is error?
Traceback (most recent call last):
File "train.py", line 694, in
main(net, train_datasets, valid_datasets, args)
File "train.py", line 327, in main
train_dataloaders, train_datasets = create_dataloaders(train_im_gt_list,
File "/home/quchunguang/datasets/sam-hq/train/utils/dataloader.py", line 71, in create_dataloaders
sampler = DistributedSampler(gos_dataset)
File "/home/quchunguang/anaconda3/envs/Semantic-SAM/lib/python3.8/site-packages/torch/utils/data/distributed.py", line 65, in init
num_replicas = dist.get_world_size()
File "/home/quchunguang/anaconda3/envs/Semantic-SAM/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 845, in get_world_size
return _get_group_size(group)
File "/home/quchunguang/anaconda3/envs/Semantic-SAM/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 306, in _get_group_size
default_pg = _get_default_group()
File "/home/quchunguang/anaconda3/envs/Semantic-SAM/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 410, in _get_default_group
raise RuntimeError(
RuntimeError: Default process group has not been initialized, please make sure to call init_process_group.
How to solve it?
When I run demo_hqsam, the following problem arises: _pick. UnpicklingError: A load persistent id instruction was encountered,
but no persistent_load function was specified. How can I solve it?****
Hi, have you tested not adding hq_tokens, directly use the sam tokens with the fused image features
Sorry, you are currently unable to view or download this file.
Too many users have recently viewed or downloaded this file. Please try to access this file again later. If the file you are trying to access is particularly large or has been shared with many people, you may have to wait up to 24 hours before you can view or download it. If you still cannot access the file after 24 hours, please contact your domain administrator.
I run the code successfully. Interesting.
Reference: https://github.com/ChaoningZhang/MobileSAM
Our project performs on par with the original SAM and keeps exactly the same pipeline as the original SAM except for a change on the image encode, therefore, it is easy to Integrate into any project.
MobileSAM is around 60 times smaller and around 50 times faster than original SAM, and it is around 7 times smaller and around 5 times faster than the concurrent FastSAM. The comparison of the whole pipeline is summarzed as follows:
Best Wishes,
Qiao
Traceback (most recent call last):
File "E:\sam-hq\demo\demo_hqsam.py", line 63, in
sam = sam_model_registrymodel_type
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\sam-hq\segment_anything\build_sam.py", line 28, in build_sam_vit_l
return _build_sam(
^^^^^^^^^^^
File "E:\sam-hq\segment_anything\build_sam.py", line 106, in _build_sam
state_dict = torch.load(f)
^^^^^^^^^^^^^
File "C:\Python311\Lib\site-packages\torch\serialization.py", line 809, in load
return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Python311\Lib\site-packages\torch\serialization.py", line 1172, in _load
result = unpickler.load()
^^^^^^^^^^^^^^^^
File "C:\Python311\Lib\site-packages\torch\serialization.py", line 1142, in persistent_load
typed_storage = load_tensor(dtype, nbytes, key, _maybe_decode_ascii(location))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Python311\Lib\site-packages\torch\serialization.py", line 1116, in load_tensor
wrap_storage=restore_location(storage, location),
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Python311\Lib\site-packages\torch\serialization.py", line 217, in default_restore_location
result = fn(storage, location)
^^^^^^^^^^^^^^^^^^^^^
File "C:\Python311\Lib\site-packages\torch\serialization.py", line 182, in _cuda_deserialize
device = validate_cuda_device(location)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Python311\Lib\site-packages\torch\serialization.py", line 166, in validate_cuda_device
raise RuntimeError('Attempting to deserialize object on a CUDA '
RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU.
Great work! I am looking for a script that allows to reproduce iou and boundary IOU numbers. I looked into the train folder and there is an evaluation example shown. However it uses the checkpoint sam_vit_l_0b3195.pth
The predicted masks from this checkpoint of extreme poor quality leading me to believe I should have been using sam_hq_vit_l.pth shown in the main readme of the repo. However when I pass sam_hq_vit_l.pth, to the argument checkpoint of train.py along with flag --eval, it fails to load the checkpoint and errors out since keys do not match.
Please advise how I can reproduce results.
Hello,
How can I use a prompt for segmentation (like the demo version)?
Thank you
Hi, I have found that the calculation of IoU is done by computing IoU for each individual image and then taking the average over all the images. However, shouldn't we calculate the intersection and union over all the images first, and then compute IoU?
Thanks for this siginificant and interesting work.
Open-source work is helpful to the developer
In the paper, the authors write "In particular, we use both global semantic context and local fine-grained features by fusing SAM’s mask decoder features with early and late feature maps from its ViT encoder. During training, we freeze the entire pre-trained SAM parameters, while only updating our HQ-Output Token, its associated three-layer MLPs, and a small feature fusion block.". Is there a demo to run the training code or should we reproduce them by ourselves?
Thanks & Regards!
Momo
Hello,
I am trying to run the software on MAC-M1. I changed device in the demo example (demo_hqsam.py) to "cpu" and "mps" but in both cases I got the error messages: (The code works for demo_sam.py")
File "/Users/Projects/SegmentAnything/SAMHQ/samhq/lib/python3.11/site-
packages/torch/serialization.py", line 217, in default_restore_location
result = fn(storage, location)
^^^^^^^^^^^^^^^^^^^^^
File "/Users/Projects/SegmentAnything/SAMHQ/samhq/lib/python3.11/site-packages/torch/serialization.py", line 182, in _cuda_deserialize
device = validate_cuda_device(location)
Can you upload a script to correctly export the entire model as onnx ?
Error(s) in loading state_dict for Sam:
Unexpected key(s) in state_dict: "mask_decoder.hf_token.weight", "mask_decoder.hf_mlp.layers.0.weight", "mask_decoder.hf_mlp.layers.0.bias", "mask_decoder.hf_mlp.layers.1.weight", "mask_decoder.hf_mlp.layers.1.bias", "mask_decoder.hf_mlp.layers.2.weight", "mask_decoder.hf_mlp.layers.2.bias", "mask_decoder.compress_vit_feat.0.weight", "mask_decoder.compress_vit_feat.0.bias", "mask_decoder.compress_vit_feat.1.weight", "mask_decoder.compress_vit_feat.1.bias", "mask_decoder.compress_vit_feat.3.weight", "mask_decoder.compress_vit_feat.3.bias", "mask_decoder.embedding_encoder.0.weight", "mask_decoder.embedding_encoder.0.bias", "mask_decoder.embedding_encoder.1.weight", "mask_decoder.embedding_encoder.1.bias", "mask_decoder.embedding_encoder.3.weight", "mask_decoder.embedding_encoder.3.bias", "mask_decoder.embedding_maskfeature.0.weight", "mask_decoder.embedding_maskfeature.0.bias", "mask_decoder.embedding_maskfeature.1.weight", "mask_decoder.embedding_maskfeature.1.bias", "mask_decoder.embedding_maskfeature.3.weight", "mask_decoder.embedding_maskfeature.3.bias".
When I want to train HQ_SAM by instruction,I get the error "HQ-SAM: error: unrecognized arguments: --local-rank=0
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 2) local_rank: 0 (pid: 1468) of binary: /usr/bin/python3" in colab.how to resolve it?
What factors are responsible for the phenomenon that onnx and pytorch get different inference results?
Hi everyone,
How run this project with Mac M1 after install opencv-python pycocotools matplotlib onnxruntime onnx?
And if i add new image? How can I predict the result of the new image?
Hello,
I find that the output of sam mask in network "HQ-SAM" is different from the original network SAM's output.
Is the reason that the tokens interact on each other ?
Thanks
Process finished with exit code 3
how to fix that issues
Hi! Great thanks for the work. I tried the hugging face demo and found that text prompt option and the advanced options are really effective for the current segment task i've been trying to do. However i didn't find any guide or options for this part in demo so is there any alternative way to modify these when i'm running my own code? Thx for reply!
Any plans to create a pypi package for PIP to simply import and use the necessary packages across platofrms, and use from APIs and software?
When running the exact same input image and box prompt with the baseline MobileSAM (available in this repository) and the official MobileSAM model there are differences in predicted masks. I have tried to locate if there are particular differences in the code causing this inconsistent behavior but without luck thus far. I believe this is an urgent issue that should be investigated.
Baseline MobileSAM (SAM-HQ repository) vs Official MobileSAM (MobileSam repository):
Thank you very much for this incredible model.
I was looking at your guide for exporting the model to ONNX. I didn't understand why you don't want to export the SAM image encoder to ONNX. I think is because you are executing the onnx graph with onnxruntime in CPU.
However, it would be nice to have it for Triton Inference Server with CUDA backend.
Is there a way to control the input size?
Do we need to enforce scaling the encoder expected resolution?
I've used training command but every time after random number of epochs I've got FileNotFound error from dataloader.Anyone knows the solution?
error:
epoch: 14 learning rate: 1e-05
[ 0/333] eta: 0:14:51 training_loss: 0.1127 (0.1127) loss_mask: 0.0446 (0.0446) loss_dice: 0.0681 (0.0681) time: 2.6786 data: 0.3379 max mem: 10103
Traceback (most recent call last):
File "/content/drive/MyDrive/sam-hq/train/train.py", line 651, in
main(net, train_datasets, valid_datasets, args)
File "/content/drive/MyDrive/sam-hq/train/train.py", line 360, in main
train(args, net, optimizer, train_dataloaders, valid_dataloaders, lr_scheduler,writer)
File "/content/drive/MyDrive/sam-hq/train/train.py", line 396, in train
for data in metric_logger.log_every(train_dataloaders,1000):
File "/content/drive/MyDrive/sam-hq/train/utils/misc.py", line 237, in log_every
for obj in iterable:
File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/dataloader.py", line 633, in next
data = self._next_data()
File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/dataloader.py", line 1345, in _next_data
return self._process_data(data)
File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/dataloader.py", line 1371, in _process_data
data.reraise()
File "/usr/local/lib/python3.10/dist-packages/torch/_utils.py", line 644, in reraise
raise exception
FileNotFoundError: Caught FileNotFoundError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/_utils/worker.py", line 308, in _worker_loop
data = fetcher.fetch(index)
File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/_utils/fetch.py", line 51, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/_utils/fetch.py", line 51, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/dataset.py", line 243, in getitem
return self.datasets[dataset_idx][sample_idx]
File "/content/drive/MyDrive/sam-hq/train/utils/dataloader.py", line 244, in getitem
File "/usr/local/lib/python3.10/dist-packages/skimage/io/_io.py", line 53, in imread
img = call_plugin('imread', fname, plugin=plugin, **plugin_args)
File "/usr/local/lib/python3.10/dist-packages/skimage/io/manage_plugins.py", line 207, in call_plugin
return func(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/skimage/io/_plugins/imageio_plugin.py", line 15, in imread
return np.asarray(imageio_imread(*args, **kwargs))
File "/usr/local/lib/python3.10/dist-packages/imageio/v2.py", line 226, in imread
with imopen(uri, "ri", **imopen_args) as file:
File "/usr/local/lib/python3.10/dist-packages/imageio/core/imopen.py", line 113, in imopen
request = Request(uri, io_mode, format_hint=format_hint, extension=extension)
File "/usr/local/lib/python3.10/dist-packages/imageio/core/request.py", line 247, in init
self._parse_uri(uri)
File "/usr/local/lib/python3.10/dist-packages/imageio/core/request.py", line 407, in _parse_uri
raise FileNotFoundError("No such file: '%s'" % fn)
FileNotFoundError: No such file: '/content/drive/MyDrive/Iris-and-Needle-Segmentation-3/train/images/SID0615_jpg.rf.8dd4aeb70ce910df9c8716e3af21b2cd.jpg'
Root Cause (first observed failure):
[0]:
time : 2023-08-01_11:51:44
host : 6198cb800e23
rank : 0 (local_rank: 0)
exitcode : 1 (pid: 2600)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
There is a custom dataset, each picture contains multiple objects, how to modify the dataloader or model to train such a dataset?
While running demo_hqsam.py i have following error
using sam_hg_vit_l model and vit_l for model_type
RuntimeError: Error(s) in loading state_dict for Sam: Unexpected key(s) in state_dict: "mask_decoder.hf_token.weight", "mask_decoder.hf_mlp.layers.0.weight", "mask_decoder.hf_mlp.layers.0.bias", "mask_decoder.hf_mlp.layers.1.weight", "mask_decoder.hf_mlp.layers.1.bias", "mask_decoder.hf_mlp.layers.2.weight", "mask_decoder.hf_mlp.layers.2.bias", "mask_decoder.compress_vit_feat.0.weight", "mask_decoder.compress_vit_feat.0.bias", "mask_decoder.compress_vit_feat.1.weight", "mask_decoder.compress_vit_feat.1.bias", "mask_decoder.compress_vit_feat.3.weight", "mask_decoder.compress_vit_feat.3.bias", "mask_decoder.embedding_encoder.0.weight", "mask_decoder.embedding_encoder.0.bias", "mask_decoder.embedding_encoder.1.weight", "mask_decoder.embedding_encoder.1.bias", "mask_decoder.embedding_encoder.3.weight", "mask_decoder.embedding_encoder.3.bias", "mask_decoder.embedding_maskfeature.0.weight", "mask_decoder.embedding_maskfeature.0.bias", "mask_decoder.embedding_maskfeature.1.weight", "mask_decoder.embedding_maskfeature.1.bias", "mask_decoder.embedding_maskfeature.3.weight", "mask_decoder.embedding_maskfeature.3.bias".
This is an exciting work, I read your article, I would like to ask if your team has tested it on remote sensing imagery, which is a relatively complex task.
I want to recognize only the glasses, but part of the face is also recognized
https://df5apg8r0m634.cloudfront.net/images/2023/0615/c5c33275b4f08d18108e9f3756ef44f1.png
Is there a function to generate masks fully automatically, like the notebook in the example?
Also, do you have any plans to add such a function?
Hello, I want to ask how to achieve the semantic segmentation effect described in the article images, such as identifying a person, cow, car?
I am confused with the below setences... In train/utils/loss_mask.py Line67
# It is crucial to calculate uncertainty based on the sampled prediction value for the points.
# Calculating uncertainties of the coarse predictions first and sampling them for points leads
# to incorrect results.
# To illustrate this: assume uncertainty_func(logits)=-abs(logits), a sampled point between
# two coarse predictions with -1 and 1 logits has 0 logits, and therefore 0 uncertainty value.
# However, if we calculate uncertainties for the coarse predictions first,
# both will have -1 uncertainty, and the sampled point will get -1 uncertainty.
Why the two coarse predictions with -1 and 1 logits has 0 logits, and therefore 0 uncertainty value. Is the bilinear upsample caused?
the link said checkpoint have been browsed and downloaded too many times, cannot download now
First, Thank you for the impressive project!
I am trying to auto generate masks for photometric stereo input and am encountering some problems with unclean masks.
Do you have any recommendation on how to generate better results ( masks without holes, only mask of the person/face)?
Thank you for your time!
Thanks for the great work. When will you be able to release the training code?
Thanks for create SAM-HQ.
how can i get output json file?
is it possible to run in real time and crop the detected ones? Can you share the code about it?
Hello, I would like to ask that the ground truth images in the dataset are all in grayscale mode, and the mask cannot be greater than 128. Is the masks[b_i]>128 in masks_to_boxes and masks_sample_points still effective?
Thanks for the excellent job! I am planning to do some fine-tuning on the SAM model with HQSEG-44K dataset. However, when I prepared the dataset, I found several issues:
Can you tell us more dataset preparation details? Thanks!
Hi everyone!
I was comparing the implementation of original mask_decoder.py
and improved mask_decoder_hq.py
and I have found a difference when using multimask_output
flag.
In the original implementation (mask_decoder.py), when using multimask_output
the output was a mask with size 1, 3, 256, 256 and iou_pred with size 1, 3.
However, with the new implemententation (mask_decoder_hq.py), when using multimask_output
the output is a mask with size 1, 1, 256, 256 and iou_pred with size 1,1. Furthermore, this mask is the one associated with the maximum iou_pred (that is, the mask with the maximum "detection confidence").
In summary:
multimask_output==True
multimask_output==True
Is there any explanation for this implementation? When multimask_output==True
, SAM used to return 3 masks but SAM-HQ now just returns 1.
In the new implementation, then masks_sam is added to masks_hq to obtain final masks (masks = masks_sam + masks_hq
). Couldn't this be done over the 3 obtained masks instead of selecting just the one that maximizes iou_prob?
Thanks in advance and great work @lkeab!
When I use my own picture, it's result is bad, even worse than origin sam. Is this method truely useful? There is a gap between paper and the real world‘s result.
Big applause for such a decent work!
Perhaps, is there any guidelines when using "hq_token_only" parameter in inference stage?
Heuristally, the performance of this paramater seems to be depending on the details of the image(or an object which needed to be masked). Even with the example images given in "demo", some images result in better segementation of the object, while on the other situation, the slightly worse performance on segmenation of details.
If your team also had an heuristic approach to this parameter, would you suggest any ideas when this parameter works more effectively than having it as "False"?
Thanks in advance, and sorry if I have missed the explanation in the paper.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.