Giter Club home page Giter Club logo

cutler's Introduction

Cut and Learn for Unsupervised Image & Video Object Detection and Instance Segmentation

Cut-and-LEaRn (CutLER) is a simple approach for training object detection and instance segmentation models without human annotations. It outperforms previous SOTA by 2.7 times for AP50 and 2.6 times for AR on 11 benchmarks.

Cut and Learn for Unsupervised Object Detection and Instance Segmentation
Xudong Wang, Rohit Girdhar, Stella X. Yu, Ishan Misra
FAIR, Meta AI; UC Berkeley
CVPR 2023

[project page] [arxiv] [colab] [bibtex]

Unsupervised video instance segmentation (VideoCutLER) is also supported. We demonstrate that video instance segmentation models can be learned without using any human annotations, without relying on natural videos (ImageNet data alone is sufficient), and even without motion estimations! The code is available here.

VideoCutLER: Surprisingly Simple Unsupervised Video Instance Segmentation
Xudong Wang, Ishan Misra, Ziyun Zeng, Rohit Girdhar, Trevor Darrell
UC Berkeley; FAIR, Meta AI
CVPR 2024

[code] [PDF] [arxiv] [bibtex]

Features

  • We propose MaskCut approach to generate pseudo-masks for multiple objects in an image.
  • CutLER can learn unsupervised object detectors and instance segmentors solely on ImageNet-1K.
  • CutLER exhibits strong robustness to domain shifts when evaluated on 11 different benchmarks across domains like natural images, video frames, paintings, sketches, etc.
  • CutLER can serve as a pretrained model for fully/semi-supervised detection and segmentation tasks.
  • We also propose VideoCutLER, a surprisingly simple unsupervised video instance segmentation (UVIS) method without relying on optical flows. ImaegNet-1K is all we need for training a SOTA UVIS model!

Installation

See installation instructions.

Dataset Preparation

See Preparing Datasets for CutLER.

Method Overview

Cut-and-Learn has two stages: 1) generating pseudo-masks with MaskCut and 2) learning unsupervised detectors from pseudo-masks of unlabeled data.

1. MaskCut

MaskCut can be used to provide segmentation masks for multiple instances of each image.

MaskCut Demo

Try out the MaskCut demo using Colab (no GPU needed): Open In Colab

Try out the web demo: Hugging Face Spaces (thanks to @hysts!)

If you want to run MaskCut locally, we provide demo.py that is able to visualize the pseudo-masks produced by MaskCut. Run it with:

cd maskcut
python demo.py --img-path imgs/demo2.jpg \
  --N 3 --tau 0.15 --vit-arch base --patch-size 8 \
  [--other-options]

We give a few demo images in maskcut/imgs/. If you want to run demo.py with cpu, simply add "--cpu" when running the demo script. For imgs/demo4.jpg, you need to use "--N 6" to segment all six instances in the image. Following, we give some visualizations of the pseudo-masks on the demo images.

Generating Annotations for ImageNet-1K with MaskCut

To generate pseudo-masks for ImageNet-1K using MaskCut, first set up the ImageNet-1K dataset according to the instructions in datasets/README.md, then execute the following command:

cd maskcut
python maskcut.py \
--vit-arch base --patch-size 8 \
--tau 0.15 --fixed_size 480 --N 3 \
--num-folder-per-job 1000 --job-index 0 \
--dataset-path /path/to/dataset/traindir \
--out-dir /path/to/save/annotations \

As the process of generating pseudo-masks for all 1.3 million images in 1,000 folders takes a significant amount of time, it is recommended to use multiple runs. Each run should process the pseudo-mask generation for a smaller number of image folders by setting "--num-folder-per-job" and "--job-index". Once all runs are completed, you can merge all the resulting json files by using the following command:

python merge_jsons.py \
--base-dir /path/to/save/annotations \
--num-folder-per-job 2 --fixed-size 480 \
--tau 0.15 --N 3 \
--save-path imagenet_train_fixsize480_tau0.15_N3.json

The "--num-folder-per-job", "--fixed-size", "--tau" and "--N" of merge_jsons.py should match the ones used to run maskcut.py.

We also provide a submitit script to launch the pseudo-mask generation process with multiple nodes.

cd maskcut
bash run_maskcut_with_submitit.sh

After that, you can use "merge_jsons.py" to merge all these json files as described above.

2. CutLER

Inference Demo for CutLER with Pre-trained Models

Try out the CutLER demo using Colab (no GPU needed): Open In Colab

Try out the web demo: Hugging Face Spaces (thanks to @hysts!)

Try out Replicate demo and the API: Replicate

If you want to run CutLER demos locally,

  1. Pick a model and its config file from model zoo, for example, model_zoo/configs/CutLER-ImageNet/cascade_mask_rcnn_R_50_FPN.yaml.
  2. We provide demo.py that is able to demo builtin configs. Run it with:
cd cutler
python demo/demo.py --config-file model_zoo/configs/CutLER-ImageNet/cascade_mask_rcnn_R_50_FPN_demo.yaml \
  --input demo/imgs/*.jpg \
  [--other-options]
  --opts MODEL.WEIGHTS /path/to/cutler_w_cascade_checkpoint

The configs are made for training, therefore we need to specify MODEL.WEIGHTS to a model from model zoo for evaluation. This command will run the inference and show visualizations in an OpenCV window.

  • To run on cpu, add MODEL.DEVICE cpu after --opts.
  • To save outputs to a directory (for images) or a file (for webcam or video), use --output.

Following, we give some visualizations of the model predictions on the demo images.

Unsupervised Model Learning

Before training the detector, it is necessary to use MaskCut to generate pseudo-masks for all ImageNet data. You can either use the pre-generated json file directly by downloading it from here and placing it under "DETECTRON2_DATASETS/imagenet/annotations/", or generate your own pseudo-masks by following the instructions in MaskCut.

We provide a script train_net.py, that is made to train all the configs provided in CutLER. To train a model with "train_net.py", first setup the ImageNet-1K dataset following datasets/README.md, then run:

cd cutler
export DETECTRON2_DATASETS=/path/to/DETECTRON2_DATASETS/
python train_net.py --num-gpus 8 \
  --config-file model_zoo/configs/CutLER-ImageNet/cascade_mask_rcnn_R_50_FPN.yaml

If you want to train a model using multiple nodes, you may need to adjust some model parameters and some SBATCH command options in "tools/train-1node.sh" and "tools/single-node_run.sh", then run:

cd cutler
sbatch tools/train-1node.sh \
  --config-file model_zoo/configs/CutLER-ImageNet/cascade_mask_rcnn_R_50_FPN.yaml \
  MODEL.WEIGHTS /path/to/dino/d2format/model \
  OUTPUT_DIR output/

You can also convert a pre-trained DINO model to detectron2's format by yourself following this link.

Self-training

We further improve performance by self-training the model on its predictions.

Firstly, we can get model predictions on ImageNet via running:

python train_net.py --num-gpus 8 \
  --config-file model_zoo/configs/CutLER-ImageNet/cascade_mask_rcnn_R_50_FPN.yaml \
  --test-dataset imagenet_train \
  --eval-only TEST.DETECTIONS_PER_IMAGE 30 \
  MODEL.WEIGHTS output/model_final.pth \ # load previous stage/round checkpoints
  OUTPUT_DIR output/ # path to save model predictions

Secondly, we can run the following command to generate the json file for the first round of self-training:

python tools/get_self_training_ann.py \
  --new-pred output/inference/coco_instances_results.json \ # load model predictions
  --prev-ann DETECTRON2_DATASETS/imagenet/annotations/imagenet_train_fixsize480_tau0.15_N3.json \ # path to the old annotation file.
  --save-path DETECTRON2_DATASETS/imagenet/annotations/cutler_imagenet1k_train_r1.json \ # path to save a new annotation file.
  --threshold 0.7

Finally, place "cutler_imagenet1k_train_r1.json" under "DETECTRON2_DATASETS/imagenet/annotations/", then launch the self-training process:

python train_net.py --num-gpus 8 \
  --config-file model_zoo/configs/CutLER-ImageNet/cascade_mask_rcnn_R_50_FPN_self_train.yaml \
  --train-dataset imagenet_train_r1 \
  MODEL.WEIGHTS output/model_final.pth \ # load previous stage/round checkpoints
  OUTPUT_DIR output/self-train-r1/ # path to save checkpoints

You can repeat the steps above to perform multiple rounds of self-training and adjust some arguments as needed (e.g., "--threshold" for round 1 and 2 can be set to 0.7 and 0.65, respectively; "--train-dataset" for round 1 and 2 can be set to "imagenet_train_r1" and "imagenet_train_r2", respectively; MODEL.WEIGHTS for round 1 and 2 should point to the previous stage/round checkpoints). Ensure that all annotation files are placed under DETECTRON2_DATASETS/imagenet/annotations/. Please ensure that "--train-dataset", json file names and locations match the ones specified in "cutler/data/datasets/builtin.py". Please refer to this instruction for guidance on using custom datasets.

You can also directly download the MODEL.WEIGHTS and annotations used for each round of self-training:

round 1 cutler_cascade_r1.pth cutler_imagenet1k_train_r1.json
round 2 cutler_cascade_r2.pth cutler_imagenet1k_train_r2.json

Unsupervised Zero-shot Evaluation

To evaluate a model's performance on 11 different datasets, please refer to datasets/README.md for instructions on preparing the datasets. Next, select a model from the model zoo, specify the "model_weights", "config_file" and the path to "DETECTRON2_DATASETS" in tools/eval.sh, then run the script.

bash tools/eval.sh

Model Zoo

We show zero-shot unsupervised object detection performance (AP50 | AR) on 11 different datasets spanning a variety of domains. ^: CutLER using Mask R-CNN as a detector; *: CutLER using Cascade Mask R-CNN as a detector.

Methods Models COCO COCO20K VOC LVIS UVO Clipart Comic Watercolor KITTI Objects365 OpenImages
Prev. SOTA - 9.6 | 12.6 9.7 | 12.6 15.9 | 21.3 3.8 | 6.4 10.0 | 14.2 7.9 | 15.1 9.9 | 16.3 6.7 | 16.2 7.7 | 7.1 8.1 | 10.2 9.9 | 14.9
CutLER^ download 21.1 | 29.6 21.6 | 30.0 36.6 | 41.0 7.7 | 18.7 29.8 | 38.4 20.9 | 38.5 31.2 | 37.1 37.3 | 39.9 15.3 | 25.4 19.5 | 30.0 17.1 | 26.4
CutLER* download 21.9 | 32.7 22.4 | 33.1 36.9 | 44.3 8.4 | 21.8 31.7 | 42.8 21.1 | 41.3 30.4 | 38.6 37.5 | 44.6 18.4 | 27.5 21.6 | 34.2 17.3 | 29.6

Semi-supervised and Fully-supervised Learning

CutLER can also serve as a pretrained model for training fully supervised object detection and instance segmentation models and improves performance on COCO, including on few-shot benchmarks.

Training & Evaluation in Command Line

You can find all the semi-supervised and fully-supervised learning configs provided in CutLER under model_zoo/configs/COCO-Semisupervised.

To train a model using K% labels with train_net.py, first set up the COCO dataset according to datasets/README.md and specify K value in the config file, then run:

python train_net.py --num-gpus 8 \
  --config-file model_zoo/configs/COCO-Semisupervised/cascade_mask_rcnn_R_50_FPN_{K}perc.yaml \
  MODEL.WEIGHTS /path/to/cutler_pretrained_model

You can find all config files used to train supervised models under model_zoo/configs/COCO-Semisupervised. The configs are made for 8-GPU training. To train on 1 GPU, you may need to change some parameters, e.g. number of GPUs (num-gpus your_num_gpus), learning rates (SOLVER.BASE_LR your_base_lr) and batch size (SOLVER.IMS_PER_BATCH your_batch_size).

Evaluation

To evaluate a model's performance, use

python train_net.py \
  --config-file model_zoo/configs/COCO-Semisupervised/cascade_mask_rcnn_R_50_FPN_{K}perc.yaml \
  --eval-only MODEL.WEIGHTS /path/to/checkpoint_file

For more options, see python train_net.py -h.

Model Zoo

We fine-tune a Cascade R-CNN model initialized with CutLER or MoCo-v2 on varying amounts of labeled COCO data, and show results (Box | Mask AP) on the val2017 split below:

% of labels 1% 2% 5% 10% 20% 30% 40% 50% 60% 80% 100%
MoCo-v2 11.8 | 10.0 16.2 | 13.8 20.5 | 17.8 26.5 | 23.0 32.5 | 28.2 35.5 | 30.8 37.3 | 32.3 38.7 | 33.6 39.9 | 34.6 41.6 | 36.0 42.8 | 37.0
CutLER 16.8 | 14.6 21.6 | 18.9 27.8 | 24.3 32.2 | 28.1 36.6 | 31.7 38.2 | 33.3 39.9 | 34.7 41.5 | 35.9 42.3 | 36.7 43.8 | 37.9 44.7 | 38.5
Download model model model model model model model model model model model

Both MoCo-v2 and our CutLER are trained for the 1x schedule using Detectron2, except for extremely low-shot settings with 1% or 2% labels. When training with 1% or 2% labels, we train both MoCo-v2 and our model for 3,600 iterations with a batch size of 16.

License

The majority of CutLER, Detectron2 and DINO are licensed under the CC-BY-NC license, however portions of the project are available under separate license terms: TokenCut, Bilateral Solver and CRF are licensed under the MIT license; If you later add other third party code, please keep this license info updated, and please let us know if that component is licensed under something other than CC-BY-NC, MIT, or CC0.

Ethical Considerations

CutLER's wide range of detection capabilities may introduce similar challenges to many other visual recognition methods. As the image can contain arbitrary instances, it may impact the model output.

How to get support from us?

If you have any general questions, feel free to email us at Xudong Wang, Ishan Misra and Rohit Girdhar. If you have code or implementation-related questions, please feel free to send emails to us or open an issue in this codebase (We recommend that you open an issue in this codebase, because your questions may help others).

Citation

If you find our work inspiring or use our codebase in your research, please consider giving a star ⭐ and a citation.

@inproceedings{wang2023cut,
  title={Cut and learn for unsupervised object detection and instance segmentation},
  author={Wang, Xudong and Girdhar, Rohit and Yu, Stella X and Misra, Ishan},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={3124--3134},
  year={2023}
}
@article{wang2023videocutler,
  title={VideoCutLER: Surprisingly Simple Unsupervised Video Instance Segmentation},
  author={Wang, Xudong and Misra, Ishan and Zeng, Ziyun and Girdhar, Rohit and Darrell, Trevor},
  journal={arXiv preprint arXiv:2308.14710},
  year={2023}
}

cutler's People

Contributors

chenxwh avatar frank-xwang avatar hysts avatar imisra avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cutler's Issues

errors occur when running cutler/demo/demo.py

hey! Thanks for ur work! I came across a problem when I try to run the demo.py in the "cutler/demo" file. I download the detectron2 with Linux instruction (torch 1.10/ cuda 11.3) and the detectron2 version is 0.6.
the errors can be described as below: when I follwed the instruction to run the demo.py of "cutler"

Traceback (most recent call last):
File "demo/demo.py", line 23, in
from predictor import VisualizationDemo
File "/root/data/CUTLER_LUKA/CutLER/cutler/demo/predictor.py", line 12, in
from engine.defaults import DefaultPredictor
File "/root/data/CUTLER_LUKA/CutLER/cutler/./engine/init.py", line 7, in
from .defaults import *
File "/root/data/CUTLER_LUKA/CutLER/cutler/./engine/defaults.py", line 41, in
from modeling import build_model
File "/root/data/CUTLER_LUKA/CutLER/cutler/./modeling/init.py", line 3, in
from .roi_heads import (
File "/root/data/CUTLER_LUKA/CutLER/cutler/./modeling/roi_heads/init.py", line 3, in
from .roi_heads import (
File "/root/data/CUTLER_LUKA/CutLER/cutler/./modeling/roi_heads/roi_heads.py", line 25, in
from .fast_rcnn import FastRCNNOutputLayers
File "/root/data/CUTLER_LUKA/CutLER/cutler/./modeling/roi_heads/fast_rcnn.py", line 11, in
from detectron2.data.detection_utils import get_fed_loss_cls_weights
ImportError: cannot import name 'get_fed_loss_cls_weights' from 'detectron2.data.detection_utils' (/opt/conda/envs/CUTLER/lib/python3.8/site-packages/detectron2/data/detection_utils.py)

Set up panoptic/semantic segmentation with cutler

Hi!
Thanks for your awesome work!

I ran maskcut and cutler on a custom dataset, and registered the dataset the same way as imagenet. So it used 2 categories, background and foreground. I am wondering how to set up a custom cutler training on a dataset with more classes (or panotpic segmentation which would include stuff).

my concern and confusion is that from maskcut we get single foreground object masks, which have no semantic meaning (class) attached to them. How to tackle this situation?

My intuition was that i could learn foreground object segmentation by training cutler on class agnostic masks (that i get from maskcut or from somewhere), but it also seems wrong, since distinguishing classes in mrcnn is not utilised in such a case to a full extent.

looking forward to hearing your thoughts!

Kind regards,
Alexa

Self-training error

After having generated the json for the first round of training, I get this error. The given snippet asks to provide the train-dataset value for round1 training, but it seems script does not take train-dataset input. Is it the "test-dataset" that should be provided?

python train_net.py --num-gpus 8
--config-file model_zoo/configs/CutLER-ImageNet/cascade_mask_rcnn_R_50_FPN_self_train.yaml
--train-dataset imagenet_train_r1
MODEL.WEIGHTS output/model_final.pth \ # load previous stage/round checkpoints
OUTPUT_DIR output/self-train-r1/ # path to save checkpoints

Error:
usage: train_net.py [-h] [--config-file FILE] [--resume] [--eval-only] [--num-gpus NUM_GPUS] [--num-machines NUM_MACHINES] [--machine-rank MACHINE_RANK] [--test-dataset TEST_DATASET]
[--no-segm] [--dist-url DIST_URL]
...
train_net.py: error: unrecognized arguments: --train-dataset

I have used a very very small subset of imagenet and have successfully executed till generation of cutler_imagenet1k_train_r1.json. Please help!

How to deal with the useless masks generated in MaskCut

Hi, thanks for the great work and open-sourcing the code!

It was confirmed that when entering a larger number of --N than the number of objects in the demo image, some blank masks or some misdetected masks were generated.

In general, the number of objects present in a single image is not all the same.

In this case, how did you deal with the masks generated?

Zero-shot detection evaluation

Thanks for your great work. I have a confusion about the zero-shot evaluation. In the 'cascade_mask_rcnn_R_50_FPN.yaml' config file, ROI_HEADS.NUM_CLASSES is setted to 1, so CutLER only can distinguish the foreground objects and background objects. However, such as in COCO dataset, there are 80 different classes, how CutLER compute the AP50 metric in zero-shot?

ModuleNotFoundError: No module named 'third_party'

While trying to run maskcut/demo.py, this error is repeatedly encountered:
ModuleNotFoundError: No module named 'third_party'
The git submodules have been all updated, still the error persists.
This error is also encountered in the Colab notebook for MaskCut.

Any suggestions to fix the issue?

Generating Annotations for VOC with MaskCut

Hey, thanks for your work!
I have tried to generate anotations for voc with maskcut, but there are garbled characters in the generated json file when it comes to annotation_info.
(The command i ran is python maskcut.py --vit-arch base --patch-size 8 --tau 0.15 --fixed_size 480 --N 3 --num-folder-per-job 1000 --job-index 0 --dataset-path /path/to/dataset/traindir --out-dir /path/to/save/annotations )
image
And I have changed decoding method from ascii to gbk and utf-8, but it still does not work.
image

Do you have any idea about this?

Supervised/unsupervised custom dataset

First of all, thank you for the amazing methods introduced in the paper. As the title suggests, I’m trying to train an object detector for my custom dataset using both approaches to see which one’s better.

  1. I have generated annotations using maskcut.py with a change of fixed_size from 480 to 640, and placed the .json under ./datasets/imagenet/annotations. I also placed the images under ./datasets/imagenet/train. I then renamed the path in cutler/data/datasets/builtin.py to ‘imagenet_train_fixsize640_tau0.15_N3.json’ to reflect my file. However it gives me an error ‘no valid images’ when i run train_net.py. Is there something I missed?

  2. For human-made annotations (coco format) to train a fullysupervised model using train_net.py, I’m registering my custom dataset using ‘register_coco_instances’ and modified the config in model zoo from ‘coco_train_2017’ to ‘my_dataset’. However it gives me an error that my custom dataset is not yet registered.

Any help that can point me in the right direction would be greatly appreciated.

IOU threshold used for DropLoss

Hello, I am a bit confused with the droploss threshold. From my understanding, a high threshold = more loss (the model is penalized for exploring), while a low threshold encourage the model to explore. This seems to be the case when I look at the code:

weights = iou_max.le(self.droploss_iou_thresh).float()
weights = 1 - weights.ge(1.0).float()
losses = self.box_predictor.losses(predictions, proposals, weights=weights.detach())

However the wording in the paper says the opposite:

Finally, in Table 8d, we vary the IOU threshold used for DropLoss. With a high threshold, we ignore the loss for a higher number of predicted regions while encouraging the model to explore. 0:01 works best for the trade-off between exploration and detection performance.

Correct steps for self-training (custom dataset w/o annotations)

Hi and thank you for the cool work! :)

I am trying to perform unsupervised segmentation on a custom dataset (let's call it customdataset here for less confusion) using CutLER and have several questions which appeared when performing the following steps
(I referred to #16 already, it is related, but here I am asking about more things related to usage of the repo and its underdtanding)

  1. Generate presudo masks using MaskCut -> output is a .json file.
  2. Modify dataset scripts to enable registering a custom dataset. For that I added in cutler/data/datasets/builtin.py _PREDEFINED_SPLITS_customdataset = {} _PREDEFINED_SPLITS_customdataset["custom_dataset"] = { 'custom_dataset_train': ("custom_dataset/images/train", "custom_dataset/annotations/merged_imagenet_train_fixsize480_tau0.15_N3.json"), }

and
def register_all_customdataset(root): for dataset_name, splits_per_dataset in _PREDEFINED_SPLITS_customdataset.items(): for key, (image_root, json_file) in splits_per_dataset.items(): # Assume pre-defined datasets live in ./datasets. register_coco_instances( key, _get_builtin_metadata(dataset_name), os.path.join(root, json_file) if "://" not in json_file else json_file, os.path.join(root, image_root), )

and
register_all_customdataset(_root)

In the file cutler/data/datasets/builtin_meta.py It is written that for custom datasets it is not necessary to write hard-coded meta-data. But when debugging errors with registration, I added the follwingin the function _get_builtin_metadata:
elif dataset_name in ["imagenet", "kitti", "cross_domain", "lvis", "voc", "coco_cls_agnostic", "objects365", 'openimages', **'custom_dataset'**]: return _get_imagenet_instances_meta()

Question: Is it a correct way to handle meta data? Or shoudl annotations created by MaskCut be used with coco_instances instead? That is, I shoudl add my dataset name to this list here? if dataset_name in ["coco", "coco_semi"]: return _get_coco_instances_meta()
Or is it a wrong approach alltogether? My CustomDataset is not real-worl data and categories do not match. At this point, if I only care about segmenting out different objects without naming them, shoudl I use UVO function?

  1. Use generated pseuso masks for performing self-training.

In the Self-Training Cutler there are 3 steps described for self-training:
step1 - "Firstly, we can get model predictions on ImageNet via running".
step2 - "Secondly, we can run the following command to generate the json file for the first round of self-training"
step3 - "Finally, place "cutler_imagenet1k_train_r1.json" under "DETECTRON2_DATASETS/imagenet/annotations/", then launch the self-training process".

Question: For custom datasets, should I skip step1 and step2? As I thought the maskCut already gives us the .json file that can be used for self-training?

I did not run step1 and step2 and directly ran the followowing command from step3 to train mode on a custom datset using maskcut annotations, and used the imageNet Cutler as model weights initialization.
python train_net.py --num-gpus 1 \ --config-file model_zoo/configs/CutLER-ImageNet/cascade_mask_rcnn_R_50_FPN_self_train.yaml \ --train-dataset custom_dataset_train \ MODEL.WEIGHTS http://dl.fbaipublicfiles.com/cutler/checkpoints/cutler_mrcnn_final.pth \ \ OUTPUT_DIR outputs/cascade/custom_dataset_selftrain-r1 \

It launched the training and I got a model.

  1. (optional) Do another round of self training

Question: After the first round, do I understand correctly that I would need to run step1 (get model predictions from my newly trained model on maskcut annotations) and step2 (generate a json file for that) and then step3 (launch the self training process using a new json file)? Right? And the self training rounds shoudl all be done on the same data? Only the ground-trueth predictions are updated, right?

  1. Inference.

I ran only one round of self training (just on maskcut annotations ) and then ran the demo to visualize the learned masks usign command

python demo/demo.py \ --config-file model_zoo/configs/CutLER-ImageNet/mask_rcnn_R_50_FPN.yaml \ --input ../../data/custom_dataset/images/train/*.jpg --output outputs/inference/custom_dataset_selftrain1 \ --opts MODEL.WEIGHTS outputs/custom_dataset_selftrain-r1/model_final.pth

But the demo images were crowded with label "person" and confidence percentage.
Question: I understand that the problem must be related to the fact of using ImageNet metadata, right? Is there a way to only visulzie the segmentations without any labels?
So far, my intuition is to create a custom Visualizer for detectron... But I still wanted to ask...

Looking forward to hearing any feedback! :)

Pretraining with CutLER

Hello, reading your paper was pretty interesting and insightful.
I was wondering how well an object detector model such as ViTDet can benefit by pretraining with CutLER?
For instance, from the ViTDet paper, the authors acheive 55.6 APbox and 49.2. APmask (table 5 in Exploring Plain Vision Transformer Backbones for Object Detection), so is it possible to pretrain a ViTDet with CutLER and finetune it in a supervised learning way on COCO to improve detection results?

Thanks again for the great paper.

Error with imports when using the notebook demo for maskcut

When I was trying out the demo for maskcut in jupyter notebook using the link in readme. I cannot seem to import the metric module from third_party.TokenCut.unsupervised_saliency_detection. The system path doesn't seem to work. I have tried work arounds but there is still an import in maskcut which I don't wish to disturb. Otherwise I haven't been able to find an elegant solution. Can you help me with it?

Customer coco dataset in self-training

Thanks for the nice work. I have a question regarding the customer coco dataset used in self-training. For my coco data, I have instances_train.py and instances_val.py, and I registered two datasets for both train and val, but in the first step of self-training, --test-dataset only take the 'imagenet_train'.

Does it mean Imagenet only use one json file for both train and validation? Or json file generation of self-training can only be applied to training data itself not val data. I am confused about it.

Clarification request about implemetation details

Hello,

I would have a couple of questions about section 3.4. and 3.5.:
About section 3.4:

"To de-duplicate the predictions and the ground truth from round t, we filter out ground-truth masks with an IoU > 0:5 with the predicted masks."

  1. Is this performed automatically if all annotation files are placed under DETECTRON2_DATASETS/imagenet/annotations/?

About section 3.5., I am a bit confused about which learning rate you used for which training stage. In the Detector paragraph it is written:

"We train the detector on ImageNet with initial masks and bounding boxes for 160K iterations with a batch size of 16."

  1. What is the learning rate here?

A little further it is written:

"We then optimize the detector for 160K iterations using SGD with a learning rate of 0.005, which is decreased by 5 after 80K iterations, and a batch size of 16"

Assuming that these 2 sentences refer to the training stages where you are using DropLoss:

  1. When I check the cascade_mask_rcnn_R_50_FPN.yaml file, the BASE_LR parameter is set to 0.01, and GAMMA is set to 0.02 (decreased by 50), which is not coherent with anything from your paper. Is it normal, or might it come from a typo?
  2. Did you train only once before moving to the self-training stages (do the 2 previous sentences refer to the same training stage)?

Then, in the Self-training paragraph:

"We optimize the detector using SGD with a learning rate of 0.01 for 80K iterations."

  1. When I check the cascade_mask_rcnn_R_50_FPN_self_train.yaml file, the BASE_LR parameter is set to 0.005, which was the learning rate specified for the training using the DropLoss. Is it normal or might it come from a typo?

Would it be possible to have further clarifications?
Please let me know.

Can i visualize the synthesis training data?

Thank you for great works!

In the paper, it is mentioned that ImageCut2Video is used to create a synthetic video dataset using pseudo-annotations generated from maskcut. I understand it is generated from video_imagenet_train_fixsize480_tau0.15_N3.json, but the code appears to use internal APIs of Detectron2.

I'm wondering if there is any guide or method for me to view this synthetic data!

why use imagenet to pretrain CutLER

During training, multiple masks are generated for training. However, in ImageNet, there is only one object per image. How can this training approach be successful? Shouldn't we use the COCO dataset, which contains images with multiple objects?

Official Evaluation on COCO val2017 with MaskCut only

Hi Xudong and all,
I'm new to segmentation/detection and I'm right now dealing with the evaluation metric of the predicted annotations generated by simply the MaskCut process on COCO validation 2017 set which contains 5k images. I'm wondering if you have any existing approach/code to evaluate the generated JSON file? If so, it would be a great help to me!

Thank you so much!

VOC annotations generated by maskcut

Hey, this is my jsonfile for VOC annoatations generated by maskcut ,but the bbox information is different from the file you submitted in github. Therefore I visualize the results of both of us on JPEGImages/009829.jpg ,the yellow boxes are mine and the red are yours.
image
Do you have any idea why there are these differences?
voc_annotations.zip

#28 (comment)

Regarding running CutLER on Custom dataset

Hi Folks, excellent read and amazing work! I've been trying to run the CutLER on my dataset and had some queries regarding running the experiment, but also some clarifications regarding the paper in general. Please let me know if this is not the appropriate medium for the question, I'll send a mail instead. Thanks!

  • When I create a custom dataset as mentioned I believe I'll need to run the following script to register a COCO Format dataset
    from detectron2.data.datasets import register_coco_instances
    register_coco_instances("my_dataset", {}, "json_annotation.json", "path/to/image/dir")

Where do I need to run this code snippet from? Can I just create a jupyter notebook in the CutLER folder and run these snippets? And if I do, I need to provide the annotations file as well, but I'm trying to use the MaskCUT approach discussed to generate the psuedo ground truth, in that case how do I pass the .json file to register the dataset

  • Would it be easier to just use the naming convention of imagenet, and put my domain related images in that folder and train it with imagenet or would that make any difference? Because that approach sounds easier to me rather than registering the custom dataset.
  • In the command to run the merge_jsons.py, the savepath passes --save-path imagenet_train_fixsize480_tau0.15_N3.json however the naming convention of the json file generated by running the maskcut.py is different, so while running the merge_jsons.py are we supposed to pass the imagenet_train_fixsize480_tau0.15_N3.json or the one that was generated after running the maskcut.py
  • While doing self training on the new dataset using the given command
    python train_net.py --num-gpus 8 \ --config-file model_zoo/configs/CutLER-ImageNet/cascade_mask_rcnn_R_50_FPN.yaml \ --test-dataset imagenet_train \ --eval-only TEST.DETECTIONS_PER_IMAGE 30 \ MODEL.WEIGHTS output/model_final.pth \ # load previous stage/round checkpoints OUTPUT_DIR output/ # path to save model predictions
    Could you please explain a little bit about the parameters
  1. test-dataset: are we supposed to pass the whole train dataset?
  2. MODEL.WEIGHTS: it is output/model_final.pth, is the output folder to be created in the cutler folder?
  3. OUTPUT_DIR: is it the same directory where we are providing the path to the model weights, and

And when we want to get the annotations using the following command
python tools/get_self_training_ann.py \ --new-pred output/inference/coco_instances_results.json \ # load model predictions --prev-ann DETECTRON2_DATASETS/imagenet/annotations/imagenet_train_fixsize480_tau0.15_N3.json \ # path to the old annotation file. --save-path DETECTRON2_DATASETS/imagenet/annotations/cutler_imagenet1k_train_r1.json \ # path to save a new annotation file. --threshold 0.7

Here we are passing the coco_instances_results.json # load model predictions, but are we supposed to pass anything else instead if we are doing custom training on our dataset? If you could elaborate what that file is and will it be generated when we train it?

  • Lastly, lets say after carrying out preliminary experiment on N images I want to run the entire pipeline Cut and Learn, what is the best way to go about this? Repeat in another folder or will the newly created files naming convention handle the different runs?

I have some more theoretical doubts as well, let me know If I add them this to this issue or create a separate issue as well? Thanks and sorry for an extended and (possibly) trivial queries regarding semantics.

How can I self-train with 1 gpu?

Thank you for the cool work!
Now I have a question that how can I use just 1 gpu to train my dataset ?
Since there is a choice " --num-gpus " in the arg set , and I only have 1 gpu ,and my datasets is pretty small.

But there is a bug when I launch the script:
python train_net.py --num-gpus 1 --config-file model_zoo/configs/CutLER-ImageNet/cascade_mask_rcnn_R_50_FPN_self_train.yaml --train-dataset imagenet_train_r1 OUTPUT_DIR ../model_output/self-train-r1/
But it doesn't work , here is the part of logs with bug.

[07/21 15:40:26 d2.engine.train_loop]: Starting training from iteration 0
/root/autodl-tmp/project/CutLER/cutler/data/detection_utils.py:437: UserWarning: The given NumPy array is not writeable, and PyTorch does not support non-writeable tensors. This means you can write to the underlying (supposedly non-writeable) NumPy array using the tensor. You may want to copy the array to protect its data or make it writeable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. (Triggered internally at /pytorch/torch/csrc/utils/tensor_numpy.cpp:143.)
torch.stack([torch.from_numpy(np.ascontiguousarray(x)) for x in masks])
/root/autodl-tmp/project/CutLER/cutler/data/detection_utils.py:437: UserWarning: The given NumPy array is not writeable, and PyTorch does not support non-writeable tensors. This means you can write to the underlying (supposedly non-writeable) NumPy array using the tensor. You may want to copy the array to protect its data or make it writeable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. (Triggered internally at /pytorch/torch/csrc/utils/tensor_numpy.cpp:143.)
torch.stack([torch.from_numpy(np.ascontiguousarray(x)) for x in masks])
ERROR [07/21 15:40:27 d2.engine.train_loop]: Exception during training:
Traceback (most recent call last):
File "/root/autodl-tmp/project/detectron2/detectron2/engine/train_loop.py", line 155, in train
self.run_step()
File "/root/autodl-tmp/project/CutLER/cutler/engine/defaults.py", line 505, in run_step
self._trainer.run_step()
File "/root/autodl-tmp/project/CutLER/cutler/engine/train_loop.py", line 335, in run_step
loss_dict = self.model(data)
File "/root/miniconda3/envs/cutler/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/root/autodl-tmp/project/CutLER/cutler/modeling/meta_arch/rcnn.py", line 160, in forward
features = self.backbone(images.tensor)
File "/root/miniconda3/envs/cutler/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/root/autodl-tmp/project/detectron2/detectron2/modeling/backbone/fpn.py", line 139, in forward
bottom_up_features = self.bottom_up(x)
File "/root/miniconda3/envs/cutler/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/root/autodl-tmp/project/detectron2/detectron2/modeling/backbone/resnet.py", line 445, in forward
x = self.stem(x)
File "/root/miniconda3/envs/cutler/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/root/autodl-tmp/project/detectron2/detectron2/modeling/backbone/resnet.py", line 356, in forward
x = self.conv1(x)
File "/root/miniconda3/envs/cutler/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/root/autodl-tmp/project/detectron2/detectron2/layers/wrappers.py", line 131, in forward
x = self.norm(x)
File "/root/miniconda3/envs/cutler/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/root/miniconda3/envs/cutler/lib/python3.8/site-packages/torch/nn/modules/batchnorm.py", line 532, in forward
world_size = torch.distributed.get_world_size(process_group)
File "/root/miniconda3/envs/cutler/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 711, in get_world_size
return _get_group_size(group)
File "/root/miniconda3/envs/cutler/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 263, in _get_group_size
default_pg = _get_default_group()
File "/root/miniconda3/envs/cutler/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 347, in _get_default_group
raise RuntimeError("Default process group has not been initialized, "
RuntimeError: Default process group has not been initialized, please make sure to call init_process_group.
[07/21 15:40:27 d2.engine.hooks]: Total training time: 0:00:01 (0:00:00 on hooks)
[07/21 15:40:27 d2.utils.events]: iter: 0 lr: N/A max_mem: 1098M
Traceback (most recent call last):
File "train_net.py", line 170, in
launch(
File "/root/autodl-tmp/project/detectron2/detectron2/engine/launch.py", line 84, in launch
main_func(*args)
File "train_net.py", line 160, in main
return trainer.train()
File "/root/autodl-tmp/project/CutLER/cutler/engine/defaults.py", line 495, in train
super().train(self.start_iter, self.max_iter)
File "/root/autodl-tmp/project/detectron2/detectron2/engine/train_loop.py", line 155, in train
self.run_step()
File "/root/autodl-tmp/project/CutLER/cutler/engine/defaults.py", line 505, in run_step
self._trainer.run_step()
File "/root/autodl-tmp/project/CutLER/cutler/engine/train_loop.py", line 335, in run_step
loss_dict = self.model(data)
File "/root/miniconda3/envs/cutler/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/root/autodl-tmp/project/CutLER/cutler/modeling/meta_arch/rcnn.py", line 160, in forward
features = self.backbone(images.tensor)
File "/root/miniconda3/envs/cutler/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/root/autodl-tmp/project/detectron2/detectron2/modeling/backbone/fpn.py", line 139, in forward
bottom_up_features = self.bottom_up(x)
File "/root/miniconda3/envs/cutler/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/root/autodl-tmp/project/detectron2/detectron2/modeling/backbone/resnet.py", line 445, in forward
x = self.stem(x)
File "/root/miniconda3/envs/cutler/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/root/autodl-tmp/project/detectron2/detectron2/modeling/backbone/resnet.py", line 356, in forward
x = self.conv1(x)
File "/root/miniconda3/envs/cutler/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/root/autodl-tmp/project/detectron2/detectron2/layers/wrappers.py", line 131, in forward
x = self.norm(x)
File "/root/miniconda3/envs/cutler/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/root/miniconda3/envs/cutler/lib/python3.8/site-packages/torch/nn/modules/batchnorm.py", line 532, in forward
world_size = torch.distributed.get_world_size(process_group)
File "/root/miniconda3/envs/cutler/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 711, in get_world_size
return _get_group_size(group)
File "/root/miniconda3/envs/cutler/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 263, in _get_group_size
default_pg = _get_default_group()
File "/root/miniconda3/envs/cutler/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 347, in _get_default_group
raise RuntimeError("Default process group has not been initialized, "
RuntimeError: Default process group has not been initialized, please make sure to call init_process_group.

It looks like referring to DDP.
Thanks for your help !

MaskCut leaves out some folders

Hello, thank you for sharing this nice work.

I am encountering an issue while running MaskCut on my dataset, as some folders are not being processed. I attached a picture of the masking process for a dummy dataset splitted into 10 folders of 2 images each.

bugMaskCut

  • This also happens when the 10 folders contain the exact same 2 pictures (1 out of 10 is left out)
  • This also happens when I process the 5 first folders only (1 out of 5 is left out)
  • This also happens when I process the 5 remaining folders (1 out of 5 is left out)

I'm not sure what could be causing this problem, would you have any suggestions ?

supervised fine-tuning on my dataset,box map is lower then use mmdet mask-rcnn

1.train data use my dataset, about 6k images,size is 1024*1024, only box label, no mask label;
config like coco-semisupervised, single-gpu, modify config:
IMS_PER_BATCH=8; base_lr:0.04/8, max_iter:20000; about 30 epoch; MASK_ON:false
INPUT: max_size_train:1024; min_size_train:(1024,);
TEST: MAX_SIZE:1024,MIN_SIZES:1024

use prtrained model http://dl.fbaipublicfiles.com/cutler/checkpoints/cutler_cascade_final.pth

results of box AP:
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.264
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.421
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.280
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.081
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.217
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.366
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.291
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.438
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.441
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.147
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.366
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.523

2.when i use mmdet, mask-rcnn,use byol self self-supervised pretrained model, just train 12 epoch, the result is more better:
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.553
Average Precision (AP) @[ IoU=0.50 | area= all | =1000 ] = 0.831
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=1000 ] = 0.632
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.218
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.507
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.610
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.634
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=300 ] = 0.634
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=1000 ] = 0.634
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.238
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.570
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.688

(1) how can i improve the mAP? if config file is not right?
(2) if i want to self-training use my dataset(unlabeled data is 60K ) ,should i do like this:
step 1.use mast_cut.py and merge_json.py generate pseudo mask
step 2.use pseudo mask, unsupervised Model Learning
setp 3:self training

my dataset has some features:
small object; complex scene;
image

Dependencies for pydensecrf

Stuck in installing pydensecrf just now and finally solved~ This issue is just for the convenience of followers~

cython 3.0.0 currently doesn't support the compilation of pydensecrf and directly runing the following command
pip install git+https://github.com/lucasb-eyer/pydensecrf.git will fail.

Use the following instead:
pip3 install --force-reinstall cython==0.29.36
pip3 install --no-build-isolation git+https://github.com/lucasb-eyer/pydensecrf.git

Reference:
lucasb-eyer/pydensecrf#123 (comment)

gpu memory is increasing as training

Hi, thanks for your great work !
But when I train as the README, the gpu memory is increasing, Then out of memory ? The commands as follow:

cd cutler
export DETECTRON2_DATASETS=/path/to/DETECTRON2_DATASETS/
python train_net.py --num-gpus 2
--config-file model_zoo/configs/CutLER-ImageNet/cascade_mask_rcnn_R_50_FPN.yaml

Dataset

what dose “ The datasets are assumed to exist in a directory specified by the environment variable DETECTRON2_DATASETS. Under this directory, detectron2 will look for datasets in the structure described below, if needed.” mean?

Failed when applied to mice dataset

hi! when i I use the demo images command in readme, the Maskcut can classify different instances:
image

Following the instructions, I typed in following command:
python demo.py --img-path imgs/00000.jpg --N 6 --tau 0.2 --vit-arch base --patch-size 8
and change parameters. however, the Maskcut always treat different mice as the same instance. What can I do to solve the problem?
image

eigh() got an unexpected keyword argument 'subset_by_index'

Two warning
UserWarning: Default upsampling behavior when mode=bicubic is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details.
"See the documentation of nn.Upsample for details.".format(mode)

UserWarning: The default behavior for interpolate/upsample with float scale_factor changed in 1.6.0 to align with other frameworks/libraries, and now uses scale_factor directly, instead of relying on the computed output size. If you wish to restore the old behavior, please set recompute_scale_factor=True. See the documentation of nn.Upsample for details.
"The default behavior for interpolate/upsample with float scale_factor changed "

One error
TypeError: eigh() got an unexpected keyword argument 'subset_by_index'

Error when running videocutler demo

I'm trying to run the videocutler demo. I have run the maskcut and the cutler demos succesfully. However, when running the videocutler demo I get the following error:

Traceback (most recent call last):
  File "demo_video/demo.py", line 27, in <module>
    from mask2former import add_maskformer2_config
  File "/home/bea/CutLER-main/videocutler/demo_video/../mask2former/__init__.py", line 3, in <module>
    from . import modeling
  File "/home/bea/CutLER-main/videocutler/demo_video/../mask2former/modeling/__init__.py", line 4, in <module>
    from .pixel_decoder.msdeformattn import MSDeformAttnPixelDecoder
  File "/home/bea/CutLER-main/videocutler/demo_video/../mask2former/modeling/pixel_decoder/msdeformattn.py", line 19, in <module>
    from .ops.modules import MSDeformAttn
  File "/home/bea/CutLER-main/videocutler/demo_video/../mask2former/modeling/pixel_decoder/ops/modules/__init__.py", line 12, in <module>
    from .ms_deform_attn import MSDeformAttn
  File "/home/bea/CutLER-main/videocutler/demo_video/../mask2former/modeling/pixel_decoder/ops/modules/ms_deform_attn.py", line 24, in <module>
    from ..functions import MSDeformAttnFunction
  File "/home/bea/CutLER-main/videocutler/demo_video/../mask2former/modeling/pixel_decoder/ops/functions/__init__.py", line 12, in <module>
    from .ms_deform_attn_func import MSDeformAttnFunction
  File "/home/bea/CutLER-main/videocutler/demo_video/../mask2former/modeling/pixel_decoder/ops/functions/ms_deform_attn_func.py", line 22, in <module>
    import MultiScaleDeformableAttention as MSDA
ImportError: libtorch_cuda_cu.so: cannot open shared object file: No such file or directory

I am running this in a conda env with:
Python 3.8
PyTorch 1.9
Cuda 10.2

I'm not very sure about my CUDA version, but I believe that I installed that one.

training slow

Hi, thanks for this good work.

I run the training on 8 A100 gpus and find it will cost almost 20 days to finish one round training. Is that normal? How much time it will cost you to train one round?

Slow when computing eigenvectors

I think it takes a long time in solve generalised eigenvalue problems in maskcut.py:

def second_smallest_eigenvector(A, D):
    # get the second smallest eigenvector from affinity matrix
    _, eigenvectors = eigh(D-A, D, subset_by_index=[1,2])
    eigenvec = np.copy(eigenvectors[:, 0])
    second_smallest_vec = eigenvectors[:, 0]
    return eigenvec, second_smallest_vec

Maybe torch.lobpcg will be a better option

Compatibility of Cut and Learn (CutLER) Model with Windows

I am interested in using the Cut and Learn (CutLER) Model for my project, but I am uncertain about its compatibility with the Windows operating system. Can anyone confirm if the CutLER Model can run on Windows, and if so, are there any specific steps or considerations I should be aware of?

If the CutLER Model is not compatible with Windows, I would appreciate any recommendations for alternative models with similar capabilities that are known to work seamlessly on the Windows platform. Thank you.

Can i train a "videocutler"?

First of all, thank you for your great words.

I am trying to train a "videocutler." After preparing all the necessary datasets and running the code, I encounter the following error:

Exception has occurred: KeyError
'video_id'
File "/home/sujung/repo/CutLER/videocutler/mask2former_video/data_video/datasets/ytvis_api/ytvos.py", line 75, in createIndex
vidToAnns[ann['video_id']].append(ann)
File "/home/sujung/repo/CutLER/videocutler/mask2former_video/data_video/datasets/ytvis_api/ytvos.py", line 66, in init
self.createIndex()
File "/home/sujung/repo/CutLER/videocutler/mask2former_video/data_video/datasets/ytvis.py", line 173, in load_ytvis_json
ytvis_api = YTVOS(json_file)
File "/home/sujung/repo/CutLER/videocutler/mask2former_video/data_video/datasets/ytvis.py", line 310, in
DatasetCatalog.register(name, lambda: load_ytvis_json(json_file, image_root, name))
File "/home/sujung/repo/CutLER/detectron2/detectron2/data/catalog.py", line 58, in get
return f()
File "/home/sujung/repo/CutLER/videocutler/mask2former_video/data_video/build.py", line 92, in
dataset_dicts = [DatasetCatalog.get(dataset_name) for dataset_name in dataset_names]
File "/home/sujung/repo/CutLER/videocutler/mask2former_video/data_video/build.py", line 92, in get_detection_dataset_dicts
dataset_dicts = [DatasetCatalog.get(dataset_name) for dataset_name in dataset_names]
File "/home/sujung/repo/CutLER/videocutler/train_net_video.py", line 84, in build_train_loader
dataset_dict = get_detection_dataset_dicts(
File "/home/sujung/repo/CutLER/videocutler/mask2former_video/engine/defaults.py", line 391, in init
data_loader = self.build_train_loader(cfg)
File "/home/sujung/repo/CutLER/videocutler/train_net_video.py", line 305, in main
trainer = Trainer(cfg)
File "/home/sujung/repo/CutLER/detectron2/detectron2/engine/launch.py", line 84, in launch
main_func(*args)
File "/home/sujung/repo/CutLER/videocutler/train_net_video.py", line 313, in
launch(
KeyError: 'video_id'

Upon examining the problematic part, I noticed that the 'ann' dictionary does not contain the 'video_id' key. So, I changed 'video_id' to 'image_id' on lines 75 and 88 of 'ytvos.py.' However, when I tried running it again, I encountered the following error:

Exception has occurred: AssertionError
Dataset 'imagenet_video_train_cls_agnostic' is empty!
File "/home/sujung/repo/CutLER/videocutler/mask2former_video/data_video/build.py", line 94, in get_detection_dataset_dicts
assert len(dicts), "Dataset '{}' is empty!".format(dataset_name)
File "/home/sujung/repo/CutLER/videocutler/train_net_video.py", line 84, in build_train_loader
dataset_dict = get_detection_dataset_dicts(
File "/home/sujung/repo/CutLER/videocutler/mask2former_video/engine/defaults.py", line 391, in init
data_loader = self.build_train_loader(cfg)
File "/home/sujung/repo/CutLER/videocutler/train_net_video.py", line 305, in main
trainer = Trainer(cfg)
File "/home/sujung/repo/CutLER/detectron2/detectron2/engine/launch.py", line 84, in launch
main_func(*args)
File "/home/sujung/repo/CutLER/videocutler/train_net_video.py", line 313, in
launch(
AssertionError: Dataset 'imagenet_video_train_cls_agnostic' is empty!

WARNING [11/06 11:08:38 mask2former_video.data_video.datasets.ytvis]: /home/sujung/repo/CutLER/videocutler/DETECTRON2_DATASETS/imagenet/annotations/video_imagenet_train_fixsize480_tau0.15_N3.json contains 1933347 annotations, but only 0 of them match to images in the file.
[11/06 11:08:38 mask2former_video.data_video.datasets.ytvis]: Loaded 0 videos in YTVIS format from /home/sujung/repo/CutLER/videocutler/DETECTRON2_DATASETS/imagenet/annotations/video_imagenet_train_fixsize480_tau0.15_N3.json

I would greatly appreciate any advice on how to resolve this issue.

About cutler/demo/demo.py

Thank you for posting your code.

When I run cutler/demo/demo.py, I get same results as shown in images below
I download the detectron2 with Linux (torch 1.12/ cuda 11.3) and use A6000 GPUs.

I followed this.
But, I have two issues.
(1) the results are strange.
(2) whenever I run this file, the results change.

Result1
demo1

Result2
demo1

Could I have missed something?

I look forward to hearing from you.

python merge_jsons.py : num-folder-per-job missing argument

Hello,

Thank you for your work.

For MaskCut part the following command is provided by your instructions.

python merge_jsons.py
--base-dir /path/to/save/annotations
--num-folder-per-job 2 --fixed-size 480
--tau 0.15 --N 3
--save-path imagenet_train_fixsize480_tau0.15_N3.json

However, num-folder-per-job argument does not exist in merge_jsons.py file.

How to learn on other datasets

Thanks for the author's contribution
We noticed that the author mentioned in Features that CutLER can learn unsupervised object detectors and instance segmentors solely on ImageNet-1K.
Whether CutLER can be used for our datasets, which is not included in the ImageNet-1K?

maskcut.py; the meaning of arguments

Thanks for sharing your model.
I'm going to train my custom dataset, and make psuedo-masks with MaskCut.
Could I know the meaning of each argument that goes into 'maskcut.py'?

cd maskcut

python maskcut.py
--vit-arch base #1.
--patch-size 8 #2.
--tau 0.15 #3.
--fixed_size 480 #4.
--N 3 #5.
--num-folder-per-job 1000 #6.
--job-index 0 #7.
--dataset-path /path/to/dataset/traindir
--out-dir /path/to/save/annotations

#1. what is '--vit-arch' ? Are there any options other than 'base'?
#2. Is '--patch-size' the correct number of segments for the input-image?
#3. what is '--tau' ?
#4. what is '--fixed_size' ?
#5. '--N' is the number of objects that must be masked in that image, right? Is it a Maximum number??
#6. Is 'num-folder-per-job' the number of folders that make up the input-data?
#7. what is '--job-index' ?

If you could answer this questions, it would be really helpful to use the CutLer :)

Train on custom dataste

Hi, thanks for sharing this interesting work.

How can I train this model on custom datasets and how can I prepare it. Do I need an annotation or not at all.

Score for AP calculation

Hi,

Thank you for sharing such a work. I would like to ask what is used as score to compute AP_mask when just using MaskCut as reported in Table~7.

maskcut with query features

Hey, thanks for your work! I'm trying to test the maskcut demo step. I find performing maskcut with key or value features (args.vit-feat ='k' or 'v') can produce reasonable segmentation results, while using the query or qkv features can't. Do you have any idea about this?

backbone change

when i replace the Vit backbone to resnet, maskcut result uncorrect mask, so the backbone can only use Transformer ways?

Notebook not working

Running the notebook results in missing module colormap, installing it resulting in missing module easydev, after installing it I got ImportError: cannot import name 'random_color' from 'colormap' (/usr/local/lib/python3.8/dist-packages/colormap/__init__.py)

I am wondering how you run it, looking at the torch version it looks like you may have run it some months ago.

Thank you for the help

Cheers,

Fra

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.