nv-nguyen / cnos Goto Github PK

[ICCV 2023 R6D] PyTorch implementation of CNOS: A Strong Baseline for CAD-based Novel Object Segmentation based on Segmenting Anything and DINOv2

License: MIT License

Python 99.55% Shell 0.45%

6d-pose-estimation object-detection object-pose-estimation pose-estimation zero-shot-segmentation

cnos's People

Contributors

Stargazers

Watchers

Forkers

hiyyg ian-white-hz krasnoturinsk mesa1014 gj-raza lucasqaq zuodexin joweyel supern1ck choonsik93 lokesh-26 abhishekmonogram shruthi-autodesk vvanirudh infiswift-usa

cnos's Issues

Rendering templates

Could you please provide the output result of the rendering templates again? The Google Drive link you provided is invalid. Thank you.

Poor performance in a object with two colors.

First of all, thank you for sharing such an outstanding piece of work. I have an issue here. I have a pair of pliers where you can observe two distinct colors on the upper and lower parts. Consequently, CNOS is only segmenting the yellow section on top. Do you have any suggestions to address this problem? I believe this might be an inherent limitation of the SAM model.

release pre-computed Linemod and YCBV results

Is there any plan to release the pre-computed Linemod and YCBV segmentation results? People who are interested in evaluating benchmarks might want to directly use your segmentation w/o needing to setup the code and re-run themselves.

CAD-object free results

Hi,

Thank you for sharing the excellent baseline for unseen object segmentation.

I wonder do you have plans to release the CAD-free novel object segmentation results on BOP dataset,
which mentioned in the Discussion Section?

Thank you

Why are the results in the paper different from those in the baseline of the BOP?

The dissimilarity I have observed primarily revolves around the number of reference images utilized, could that be the main contributing factor?
Would it be possible for you to kindly provide the code necessary for replicating the baseline results? I sincerely appreciate your assistance in this matter.

Pixel to point correspondances with renderer

Hi, can the renderer provide correspondences between pixels on a rendered image and the points on the CAD model?

Multi object segmentation support

Thanks for the great work! I now want to get multiple objects from the image, any reference code for me?

After detail read the code, may be need to change this line into ref_feats with multi object shape? But I am not very confidence to solve the problem.

scores=metric(decriptors[:,None,:],self.ref_feats[None,:,:])

The final hope that can get a all_masks np.array, which include all object mask~

pre-computed segmentation is not complete on the YCBV dataset

Thanks a lot for releasing the pre-computed segmentations!

However, I found in YCBV data, the frame_ids are not complete. E.g. 0048/001012 is a keyframe in evaluation set, but it doesn't exist in the sam_pbr_ycbv.json

GPU misconfiguration

Hi, thanks for your great work. When running run_inference.py, I got error about wandb.

wandb: WARNING `resume` will be ignored since W&B syncing is set to `offline`. Starting a new run with run id cizd5iz4.
wandb: Tracking run with wandb version 0.15.5
wandb: W&B syncing is set to `offline` in this directory.  
wandb: Run `wandb online` or set WANDB_MODE=online to enable cloud syncing.
[2023-07-22 19:31:46,008][pytorch_lightning.utilities.rank_zero][INFO] - ModelCheckpoint(save_last=True, save_top_k=-1, monitor=None) will duplicate the last checkpoint saved.
Error executing job with overrides: ['dataset_name=', 'model.onboarding_config.rendering_type=pyrender']
Error in call to target 'pytorch_lightning.trainer.trainer.Trainer':
MisconfigurationException('You requested gpu: [0, 1, 2, 3]\n But your machine only has: [0]')
full_key: machine.trainer

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.
wandb: Waiting for W&B process to finish... (failed 1).
wandb: You can sync this run to the cloud by running:
wandb: wandb sync ./datasets/bop23_challenge/results/cnos_exps/wandb/offline-run-20230722_193143-cizd5iz4

Do I have to use some wandb account to restore stuff?

Why is the paper's reported result different from that in the BOP leaderboard?

Dear author, thanks for open-sourcing the great work. I am confused about the result in the BOP leaderboard (https://bop.felk.cvut.cz/leaderboards/detection-unseen-bop23/core-datasets/). The FastSAM result is different from that in the Table 1. What makes this difference?

Installation of ultralytics

Hi, I was installing the CNOS and found an issue with the installation - the newest version of ultralytics doesn't work with CNOS, as ultralytics.yolo is deprecated since version '8.0.136' and was later removed. Adding the '<=8.0.135' requirement solves the issue, it should be added to the README.

Training Details

Hi, thanks for your nice work!
And I have some questions hope you can help me:

during training of proposal stage, the usage of SAM(or Fast SAM) is promptable or not promptable. If promptable, which type of prompt do you use (point or bbox), and how do you get them? And if not promptable, how do deal with the over-segment problem.

How to use this project? What are parameters of train ?

Segmentation evaluation using bop_toolkit

Thanks for the elegant code!
After i get the seg results using the command"python run_inference.py dataset_name=$DATASET_NAME model.onboarding_config.rendering_type=pyrender", i get the json file named"CustomSamAutomaticMaskGenerator_template_pyrender0_aggavg_5_lmo.json".

When i try to evaluate it using bop_toolkit, using scripts/eval_bop22_coco.py, bug occurs due to inappropriate name spliting.
I believe the cause is this:

  result_name = os.path.splitext(os.path.basename(result_filename))[0]
  result_info = result_name.split('_')
  method = str(result_info[0])
  dataset_info = result_info[1].split('-')
  dataset = str(dataset_info[0])
  split = str(dataset_info[1])
  split_type = str(dataset_info[2]) if len(dataset_info) > 2 else None

Can you check this and tell me how to fix this ? Thanks

size mismatch for pos_embed

Hello, I have a question about when using DINOv2. Could you please help me?I instantiated a vit_small ViT model and tried to load the pretrained weights using the load_pretrained_weights function from utils. Here's the code I wrote:

self.vit_model = vits.dict'vit_small'
load_pretrained_weights(self.vit_model, 'https://dl.fbaipublicfiles.com/dinov2/dinov2_vits14/dinov2_vits14_pretrain.pth', None)
However, I encountered the following error:
Traceback (most recent call last):
File "/data/PycharmProjects/train.py", line 124, in
model = model(aff_classes=args.num_classes)
File "/data/PycharmProjects/models/locate.py", line 89, in init
load_pretrained_weights(self.vit_model, pretrained_url, None)
File "/data/PycharmProjects/models/dinov2/dinov2/utils/utils.py", line 32, in load_pretrained_weights
msg = model.load_state_dict(state_dict, strict=False)
File "/home/ustc/anaconda3/envs/locate/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1605, in load_state_dict
self.class.name, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for DinoVisionTransformer:
size mismatch for pos_embed: copying a param with shape torch.Size([1, 1370, 384]) from checkpoint, the shape in current model is torch.Size([1, 257, 384]).

Could you please help me understand what might be causing this issue? Thank you for your assistance.

precomputed results on YCBV

This is the result (scene 000048, image 001087) that I parsed from the released precomputed results sam_pbr_ycbv.json. Each pixel shows the object ID.
As you can see the "chef can" is under-segmented. There are also many other false positives (the actual objects shouldn't be there). Is this the expected result, or is there anything I'm missing when parsing the json file?

Using FastSAM on custom dataset inference

I would like to know how to perform inference on custom dataset using FastSAM instead of normal SAM.
When using the script inference_custom.py it does not seem possible to change the model as normal SAM is predetermined.

If somebody could reach me a hand i would really appreciate it. :)

thanks in advance

Running custom inference on multiple GPUs

Hello, thanks for the great work. While testing out the repo on my custom CAD model, I had memory issues and the requirement seems to be huge. I tried using CUDA_VISIBLE_DEVICES=0,1 before the bash command but it still uses 1 GPU only. Let me know what I might be doing wrong or a solution to this problem

Limited performance on Custom Datasets

I tried this code and it works beautifully on the provided datasets. Considering the shots are cluttered with all kinds of objects, there are occlusions, etc. I am extremely impressed by the performance on this.
However, as soon as I move to a custom dataset, this performance is not repeatable whatsoever. I am trying this out on surgical tools, for which I have an accurate CAD model. The images I tried it on are close-ups of the objects, the single object only, a white background, and no occlusion; in other words, as simple as it gets and technically a perfect template match. However, the model either only predicts a tiny part of the object or just something completely wrong like the entire background (everything but the object).
Could you maybe comment on the types of objects this works well on (the objects in your datasets seem a bit more bulky while the surgical tools are more skinny) or whether there are any tricks to improving this performance?

Problems with detecting multi-color objects (?)

Hello once again! I return with another surgical tool that I am unable to segment. The original image and segmentation is shown below:

As you can see, absolutely nothing is detected. In order to make sure that it is not a problem with my CAD model, I masked out the entire tool to make it a single color. See below:

With the same templates, it is suddenly detected. This brings me to the conclusion that it has something to do with the SAM segmentation. I ran the tool through SAM Segment Everything and found this:

I believe the problem is that the tool has multiple colors and is segmented in a fragmented way instead of as an entire tool. Is there some way in this code to have an influence on connecting masks and checking them for a match?

DINOv2 for image feature extraction

Hello, I have a question. I didn't find the reference of 'src->model->dinov2.py'. So, in which file did you import dinov2 model? Haven't you provided the training file?

Any standard for scaling CAD models?

I have some custom CAD models which have dimensions originally in the order of 0.1m. However, they don't appear at all in the rendered images. So I scaled the mesh to 1000x using blender and then rendered. My rendered images are attached below: some of the regions are getting cut off. (This is mesh_001.ply in this link). Is there some standard in scaling the meshes so that I get perfect renders?

In turn, I'm also not getting any segmented results in the below test RGB Image: