mhamilton723 / stego Goto Github PK

View Code? Open in Web Editor NEW

687.0 14.0 135.0 9.22 MB

Unsupervised Semantic Segmentation by Distilling Feature Correspondences

License: MIT License

Python 44.33% Jupyter Notebook 55.67%

computer-vision deep-learning iclr2022 pytorch semantic-segmentation unsupervised-learning

stego's Introduction

STEGO: Unsupervised Semantic Segmentation by Distilling Feature Correspondences

Project Page | Paper | Video | ICLR 2022

Mark Hamilton, Zhoutong Zhang, Bharath Hariharan, Noah Snavely, William T. Freeman

This is the official implementation of the paper "Unsupervised Semantic Segmentation by Distilling Feature Correspondences".

Install
Evaluation
Training
- Bringing your own data
Understanding STEGO
Citation
Contact

Install

Clone this repository:

git clone https://github.com/mhamilton723/STEGO.git
cd STEGO

Install Conda Environment

Please visit the Anaconda install page if you do not already have conda installed

conda env create -f environment.yml
conda activate stego

Download Pre-Trained Models

cd src
python download_models.py

Download Datasets

First, change the pytorch_data_dir variable to your systems pytorch data directory where datasets are stored.

python download_datasets.py

Once downloaded please navigate to your pytorch data dir and unzip the resulting files:

cd /YOUR/PYTORCH/DATA/DIR
unzip cocostuff.zip
unzip cityscapes.zip
unzip potsdam.zip
unzip potsdamraw.zip

Evaluation

To evaluate our pretrained models please run the following in STEGO/src:

python eval_segmentation.py

One can change the evaluation parameters and model by editing STEGO/src/configs/eval_config.yml

Training

To train STEGO from scratch, please first generate the KNN indices for the datasets of interest. Note that this requires generating a cropped dataset first, and you may need to modify crop datasets.py to specify the dataset that you are cropping:

python crop_datasets.py
python precompute_knns.py

Then you can run the following in STEGO/src:

python train_segmentation.py

Hyperparameters can be adjusted in STEGO/src/configs/train_config.yml

To monitor training with tensorboard run the following from STEGO directory:

tensorboard --logdir logs

Bringing your own data

To train STEGO on your own dataset please create a directory in your pytorch data root with the following structure. Note, if you do not have labels, omit the labels directory from the structure:

dataset_name
|── imgs
|   ├── train
|   |   |── unique_img_name_1.jpg
|   |   └── unique_img_name_2.jpg
|   └── val
|       |── unique_img_name_3.jpg
|       └── unique_img_name_4.jpg
└── labels
    ├── train
    |   |── unique_img_name_1.png
    |   └── unique_img_name_2.png
    └── val
        |── unique_img_name_3.png
        └── unique_img_name_4.png

Next in STEGO/src/configs/train_config.yml set the following parameters:

dataset_name: "directory"
dir_dataset_name: "dataset_name"
dir_dataset_n_classes: 5 # This is the number of object types to find

If you want to train with cropping to increase spatial resolution run our cropping utility.

Finally, uncomment the custom dataset code and run python precompute_knns.py from STEGO\src to generate the prerequisite KNN information for the custom dataset.

You can now train on your custom dataset using:

python train_segmentation.py

Understanding STEGO

Unsupervised semantic segmentation

Real-world images can be cluttered with multiple objects making classification feel arbitrary. Furthermore, objects in the real world don't always fit in bounding boxes. Semantic segmentation methods aim to avoid these challenges by assigning each pixel of an image its own class label. Conventional semantic segmentation methods are notoriously difficult to train due to their dependence on densely labeled images, which can take 100x longer to create than bounding boxes or class annotations. This makes it hard to gather sizable and diverse datasets impossible in domains where humans don't know the structure a-priori. We sidestep these challenges by learning an ontology of objects with pixel-level semantic segmentation through only self-supervision.

Deep features connect objects across images

Self-supervised contrastive learning enables algorithms to learn intelligent representations for images without supervision. STEGO builds on this work by showing that representations from self-supervised visual transformers like Caron et. al.’s DINO are already aware of the relationships between objects. By computing the cosine similarity between image features, we can see that similar semantic regions such as grass, motorcycles, and sky are “linked” together by feature similarity.

The STEGO architecture

The STEGO unsupervised segmentation system learns by distilling correspondences between images into a set of class labels using a contrastive loss. In particular we aim to learn a segmentation that respects the induced correspondences between objects. To achieve this we train a shallow segmentation network on top of the DINO ViT backbone with three contrastive terms that distill connections between an image and itself, similar images, and random other images respectively. If two regions are strongly coupled by deep features we encourage them to share the same class.

Results

We evaluate the STEGO algorithm on the CocoStuff, Cityscapes, and Potsdam semantic segmentation datasets. Because these methods see no labels, we use a Hungarian matching algorithm to find the best mapping between clusters and dataset classes. We find that STEGO is capable of segmenting complex and cluttered scenes with much higher spatial resolution and sensitivity than the prior art, PiCIE. This not only yields a substantial qualitative improvement, but also more than doubles the mean intersection over union (mIoU). For results on Cityscapes, and Potsdam see our paper.

Citation

@inproceedings{hamilton2022unsupervised,
	title={Unsupervised Semantic Segmentation by Distilling Feature Correspondences},
	author={Mark Hamilton and Zhoutong Zhang and Bharath Hariharan and Noah Snavely and William T. Freeman},
	booktitle={International Conference on Learning Representations},
	year={2022},
	url={https://openreview.net/forum?id=SaKO6z6Hl0c}
}

Contact

For feedback, questions, or press inquiries please contact Mark Hamilton

stego's People

Contributors

Stargazers

Watchers

Forkers

benjamesbabala avihu111 kristianmk wisamreid atlasgooo2 ricklentz nagellette bousejin z1z9b89 rian-dev paulxiong erjel cmu002 firmanhadi kimtaehyeong joaoch denny-hwang 2100877953 maxjeblick ai-machine-vision-lab deanofthewebb joskid davidblom603 inkyusa wormlove xinfushe sidathgueye haojunyu1998 jthetzel samysung mrmeepsle yijunwu mirsadeghi sehjbir syshapsough alien19 peabrane metavai joaopedropp mmasoud1 lkampoli samjcheng dongzhang89 samblouir leggedrobotics celsopitta kartikwar automi-team amjams r-j96 eric-canas rimehdaoudi dmitriyg228 zeta1999 supgb i7-ryzen diting-li oliland mohsen-azimi bradneuberg xingchenzhao robin-karlsson0 mikigom englishler guyuex stephanie-fu haoyitedaniu guzaiwang bar987 cabelo nickb- georgedeac bluedream1121 marty-sullivan mohtasimhadirafi liuxuan98 chendudai yuhuang-ca azaki45 cappelletto johnjhr peterklipfel shakthig1998 nota-its airmane smithalas osamamazhar emelops shiyoung77 zhangicenight whispermode55 shamilnabiyev matthew-jurewicz jimcircadian d710055071 yaksha-lab merantix-momentum holmes-gu pencils113 nickissa

stego's Issues

Creating conda environment failing

Unable to create conda environment using environment.yml file on ubuntu.

It is stuck in solving environment.

Pls check

What is all_good_images in src/eval_segmentation.py used for?

Hi, thanks for sharing your code! I found that you set all_good_images = [11, 32, 43, 52] for cityscapes. Could you please let me know why using [11, 32, 43, 52] and what all_good_images in src/eval_segmentation.py is used for? Thanks!

The evaluation is blocked when passing the CRT module

Hello
How are you?
Thanks for contributing to this project.
I am going to run the evaluation script with the pre-trained model.
However, when passing the CRT module, the evaluation process is blocked.
So I debugged it and found likely problem.
In your UnNormalize class implementation, the part cloning an image tensor is blocked.

The problem is that this part should run in a child thread.
it seems that the CRT module should run in a main thread.
I tried to fix this issue but did NOT succeed yet.
Any solution?

Object detection

Hi, this is an interesting paper. I wonder if you have tested the effectiveness use STEGO on object detection for downstream task. Since segmentation seems to be a more difficult and finer task than detection, I wonder if it also works for detection. Thanks!

no module named 'core'

Hello,Hi, thanks for sharing your code! I met with a question when I ran eval_segmention.py.How to deal with it?Thank you!

OOM on eval

I get the following running on a 16Gb card. Tried reducing setting values in the eval config. No joy.

 "The default behavior for interpolate/upsample with float scale_factor changed "
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1088/1088 [04:06<00:00,  4.41it/s]
Error executing job with overrides: []
Traceback (most recent call last):
  File "eval_segmentation.py", line 144, in my_app
    outputs = {k: torch.cat(v, dim=0) for k, v in outputs.items()}
  File "eval_segmentation.py", line 144, in <dictcomp>
    outputs = {k: torch.cat(v, dim=0) for k, v in outputs.items()}
RuntimeError: [enforce fail at CPUAllocator.cpp:68] . DefaultCPUAllocator: can't allocate memory: you tried to allocate 24053760000 bytes. Error code 12 (Cannot allocate memory)

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.

Environment stuck when installing

Hello,

I tried setting up the environment with conda env create -f environment.yml but conda displays "Solving environment" indefinitely. I tested on two different machines (Ubuntu 18.04.4 LTS, and 20.04.4 LTS) and the same problem occurred. One attempt lasted 23 hours before I gave up.

Have you tried installing the environment? Does the installation process work for you?

Best,
Nicolas

Environment issue

Hi,
Thanks for your good work. I want to use this model for my own dataset but it took a long time when creating the environment for this repository. Anyone help me how can I create environment without facing any problem?

The number of epochs in train_segmentation

Hi,

Could you please tell me how to control the number of epochs? I did not see this hyperparameter in tran.yml.

windows ?

Based on Windows？

Flipping during the evaluation

Hi,

Thank you for your great research. I have noticed that when you want to produce the cluster maps, the codes of the input image and its flipped one are averaged together. How does it make sense?

Reproduce the Potsdam results

Could you help to reproduce the results with the Potsdam dataset? I trained STEGO with the same configuration used in potsdam_test.ckpt and then evaluate the model using eval_segmentation.py, but Accuracy and IoU of clustering are low.
Using potsdam_test.ckpt, I got

'final/linear/mIoU': 74.83345866203308, 
'final/linear/Accuracy': 85.84609031677246,
'final/cluster/mIoU': 62.565261125564575, 
'final/cluster/Accuracy': 77.03110575675964

but, using my checkpoint, I got

'final/linear/mIoU': 74.89467859268188, 
'final/linear/Accuracy': 85.89659333229065, 
'final/cluster/mIoU': 47.732433676719666, 
'final/cluster/Accuracy': 64.23421502113342

The results with the linear probe look good, but not the one with the cluster. Could you help to figure out what can make the difference?

Here is my configuration used to train STEGO:

output_root: ../
pytorch_data_dir: /home/bv/datasets/external_datasets
experiment_name: exp1
log_dir: potsdam
azureml_logging: true
submitting_to_aml: false
num_workers: 24
max_steps: 5000
batch_size: 16
num_neighbors: 7
dataset_name: potsdam
dir_dataset_name: null
dir_dataset_n_classes: 5
has_labels: false
crop_type: null
crop_ratio: 0.5
res: 224
loader_crop_type: center
extra_clusters: 0
use_true_labels: false
use_recalibrator: false
model_type: vit_small
arch: dino
use_fit_model: false
dino_feat_type: feat
projection_type: nonlinear
dino_patch_size: 8
granularity: 1
continuous: true
dim: 70
dropout: true
zero_clamp: true
lr: 0.0005
pretrained_weights: null
use_salience: false
stabalize: false
stop_at_zero: true
pointwise: true
feature_samples: 11
neg_samples: 5
aug_alignment_weight: 0.0
correspondence_weight: 1.0
neg_inter_weight: 0.63
pos_inter_weight: 0.25
pos_intra_weight: 0.67
neg_inter_shift: 0.76
pos_inter_shift: 0.02
pos_intra_shift: 0.08
rec_weight: 0.0
repulsion_weight: 0.0
crf_weight: 0.0
alpha: 0.5
beta: 0.15
gamma: 0.05
w1: 10.0
w2: 3.0
shift: 0.0
crf_samples: 1000
color_space: rgb
reset_probe_steps: null
n_images: 5
scalar_log_freq: 10
checkpoint_freq: 50
val_freq: 100
hist_freq: 100
full_name: potsdam/potsdam_exp1

Training with custom dataset

Hi,

Anyone here train STEGO using custom dataset? I tried to train the model with custom data but its accuracy and mIoU is "nan" as attached. Anyone who faced this problem please help me out. Thanks in advance.!!

About random images correspondence

Hi,

I am confused about random images correspondence. As in the paper it is stated "STEGO uses three instantiations of the correspondence loss of Equation 4 to train a segmentation head to distill feature relationships between an image and itself, its K-Nearest Neighbors (KNNs), and random other images. The self and KNN correspondence losses primarily provide positive,
attractive, signal and random image pairs tend to provide negative, repulsive, signal"
What does random image pair means as any dataset always contains images having same features or pixel or classes? But the random image shown in the STEGO architecture portion is very different.
Anyone who understand this mechanism please share your thoughts about it.

ViT-Base performance reproduction problem on cocostuff27

Thank you so much for sharing this wonderful job. However, I am facing difficulties on reproducing the performance of the ViT-Base model.

I got the following performances after retraining the ViT-Base 5-crop setting:
{'final/linear/mIoU': 34.91545915603638, 'final/linear/Accuracy': 73.63219261169434, 'final/cluster/mIoU': 19.5334792137146, 'final/cluster/Accuracy': 50.60913562774658}

However, I think it should be (the performance of your released ViT-Base model):
{'final/linear/mIoU': 41.074731945991516, 'final/linear/Accuracy': 76.12890005111694, 'final/cluster/mIoU': 28.187400102615356, 'final/cluster/Accuracy': 56.92926645278931}.

I also re-trained your ViT-Small model with 5-crop data setting, which is about 23.67 mIoU, a little bit lower than your reported performance 24.5. But, I think that is fine.

However, when I was trying to reproduce your ViT-base performance, it is just 19.53. I changed the model type and using the "Cocostuff27 10/3 vit_base" weights. Did I miss something?

The configs for the training is attached below:

#################### training configs ######################
num_workers: 24
max_steps: 5000
num_neighbors: 7

batch_size: 16
dataset_name: "cocostuff27"
crop_type: "five" #~
crop_ratio: .5
res: 224
loader_crop_type: "center"

extra_clusters: 0
use_true_labels: False
use_recalibrator: False
model_type: "vit_base"
arch: "dino"
use_fit_model: False
dino_feat_type: "feat"
projection_type: "nonlinear"
#projection_type: linear
dino_patch_size: 8
granularity: 1
continuous: True
dim: 70
dropout: True
zero_clamp: True

lr: 5e-4
pretrained_weights: ~
use_salience: False
stabalize: False
stop_at_zero: True

pointwise: True
feature_samples: 11
neg_samples: 5
aug_alignment_weight: 0.0
correspondence_weight: 1.0

######################Cocostuff27 10/3 vit_base
neg_inter_weight: 0.1538476246415498
pos_inter_weight: 1
pos_intra_weight: 0.1

neg_inter_shift: 1
pos_inter_shift: 0.2
pos_intra_shift: 0.12

Potsdam IR channel

Hello!

In src/data.py there's the following comment "# TODO add ir channel back". I'm guessing that dataset was trained without the IR channel?

I ask because I'm interested in training on a custom dataset with more than RGB channels, somewhat akin to Potsdam's RGBIR. What would you recommend for doing that? Would it be as simple as creating a copy of the DirectoryDataset class and modifying it? Or would it require more elaborate changes?

Thanks,
AJ

Add to README Instructions on using demo_segmentation.py?

Hello! Could you add to the README instructions on how to try out a pretrained model on a new image/dataset? Would that be done through modifying demo_config.yml and using demo_segmentation.py? Sorry for the dumb question. Thank you!

Training fails

Running training results in the following:

Error executing job with overrides: []
Traceback (most recent call last):
  File "train_segmentation.py", line 532, in my_app
    pos_labels=True
  File "/home/ec2-user/SageMaker/STEGO/src/core.py", line 740, in __init__
    raise ValueError("could not find nn file {} please run precompute_knns".format(feature_cache_file))
ValueError: could not find nn file /home/ec2-user/SageMaker/data/nns/nns_vit_small_cocostuff27_train_None_224.npz please run precompute_knns

running precompute_knns then fails due potsdm not having been unzipped. I unzipped and ran again and it failed with OOM:

Error executing job with overrides: []
Traceback (most recent call last):
  File "precompute_knns.py", line 84, in my_app
    normed_feats = get_feats(par_model, loader)
  File "precompute_knns.py", line 24, in get_feats
    feats = F.normalize(model.forward(img.cuda()).mean([2, 3]), dim=1)
  File "/home/ec2-user/anaconda3/envs/stego/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 166, in forward
    return self.module(*inputs[0], **kwargs[0])
  File "/home/ec2-user/anaconda3/envs/stego/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/ec2-user/anaconda3/envs/stego/lib/python3.6/site-packages/torch/nn/modules/container.py", line 141, in forward
    input = module(input)
  File "/home/ec2-user/anaconda3/envs/stego/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/ec2-user/SageMaker/STEGO/src/modules.py", line 93, in forward
    feat, attn, qkv = self.model.get_intermediate_feat(img, n=n)
  File "/home/ec2-user/SageMaker/STEGO/src/dino/vision_transformer.py", line 232, in get_intermediate_feat
    x,attn,qkv = blk(x, return_qkv=True)
  File "/home/ec2-user/anaconda3/envs/stego/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/ec2-user/SageMaker/STEGO/src/dino/vision_transformer.py", line 107, in forward
    y, attn, qkv = self.attn(self.norm1(x))
  File "/home/ec2-user/anaconda3/envs/stego/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/ec2-user/SageMaker/STEGO/src/dino/vision_transformer.py", line 83, in forward
    attn = (q @ k.transpose(-2, -1)) * self.scale
RuntimeError: CUDA out of memory. Tried to allocate 3.53 GiB (GPU 0; 15.78 GiB total capacity; 9.58 GiB already allocated; 2.15 GiB free; 12.26 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Reproduce Cityscapes results

Dear authors,

thank you very much for your work. Could you please provide a config file that would serve to reproduce your Cityscapes results? The values and settings in the train_config.yml differ from those saved in the provided model checkpoint and as written in the paper. For example, in paper it is written that the output dimension is 70 while the Cityscapes model has 100 dimensions on the output. Also, the hyperparameters saved in the model checkpoint lack some of the variables needed for the full configuration.

Thank you very much in advance.

Perfomance changes drastically with change in resolution

Hello,
Thank you for the great work and code implementation of STEGO. Though it works great, the performance of STEGO is greatly affected with change in resolution of the input image. In some cases, the performance downgrades with increase in resolution. I'm unable to find the reason as it is unsupervised approach. Could you please share your insights on this observed behavior?
Best,
K.Vikraman

eval gets stuck indefinitely

The eval_segmentation.py gets stuck for potsdam data. The issue is in batched_crf() in the following line:

outputs = pool.map(_apply_crf, zip(img_tensor.detach().cpu(), prob_tensor.detach().cpu()))

The code never proceeds further. One proc is waiting for others indefinitely. Any suggestions?

Training error on custom dataset(no labels): IndexError: index 4 is out of bounds for dimension 0 with size 4 in validation_epoch_end

I'm getting this error:

File "train_segmentation.py", line 292, in validation_epoch_end
ax[0, i].imshow(prep_for_plot(output["img"][i]))
IndexError: index 4 is out of bounds for dimension 0 with size 4

I'm using custom dataset with no labels, cropping and pre-computing knns worked fine.

I ran into a tricky problem when running train_segmentation.py

I am getting an error for libffi.so.7:cannot open shared object file: No such file or directory；By the way, my environment is python38 and pytorch1.12, ubuntu system is 18.04, and my libffi version is 3.4.2；I‘m sorry to bother you.

Confused about the spatial centering implementation

As per my understanding, in this code for spatial centering, the operation fd -= fd.mean([3, 4], keepdim = True) should have been enough.

Can you please correct my understanding/offer intuition into this segment of code - especially why fd = fd - fd.mean() + old_mean is done? Thank you.

I can't .imshow() with IndexError

This is my IndexError

This is directory dataset folder with the both folder have 32 unique_img_name.

I changed only 2 parameter in train_config.yml for own dataset like their recommendations.

I checked the inside file that error like this I'm not sure about that

All library base on environment.yml
How to fix them?

Provide explanation for the demo?

with torch.no_grad():
code1 = model(img)
code2 = model(img.flip(dims=[3]))
code = (code1 + code2.flip(dims=[3])) / 2
code = F.interpolate(code, img.shape[-2:], mode='bilinear', align_corners=False)
linear_probs = torch.log_softmax(model.linear_probe(code), dim=1).cpu()
cluster_probs = model.cluster_probe(code, 2, log_probs=True).cpu()

single_img = img[0].cpu()
linear_pred = dense_crf(single_img, linear_probs[0]).argmax(0)
cluster_pred = dense_crf(single_img, cluster_probs[0]).argmax(0)

Hi, I am not sure why you do the first three lines for the input picture? Could you please provide some comments? Thanks a lot!

evaluating issue

first of all, thank you for bringing out the paper. I am getting issues while evaluating the model. can you please check it once

Possible mistake in cluster probe loss function?

I'm under the impression that the loss function of the cluster probe is simply the entropy of the cluster probabilities.

However, in the ClusterLookup class in modules.py, the loss function is defined as

cluster_loss = -(cluster_probs * inner_products).sum(1).mean()

Shouldn't this instead be

cluster_loss = -(cluster_probs * cluster_probs.log()).sum(1).mean()

Or alternatively (assuming alpha = 1),

cluster_probs_log = inner_products - inner_products.exp().sum(1, keepdims=True).log()
cluster_loss = -(cluster_probs * cluster_probs_log).sum(1).mean()

Tips for tuning hyperparameters

Hi, I followed your instructions in A. 11 to tune the hyperparameters to train STEGO on a custom dataset, but it's just a little difficult to find hyperparameters that give balanced positive and negative signals. May I have some more tips for tuning the hyperparameters? A screenshot of distributions of inter_cd, intra_cd, and neg_cd are shown below.

Thanks in advance!

any instructions on how to use plot_dino_correspondence.py?

Hi Mark,

Thanks for the great work! And I really like the correspondence demo you showed in the video. I am just wondering is plot_dino_correspondence.py used to reproduce that? If yes, any instructions on how to use the code?

Download is stuck and resource not found

I ran download_datasets.py and it has been stuck on 76% of cityscapes for a day now. It doesn't crash but nothing happens. I looked at the URL from where to download, and it says resource not found!

Issues with training on custom dataset

Hello there,
I was trying to train on my custom dataset, currently holding about 60 images for training and 60 for validation.
Using the following train_config.yml

output_root: '../'
pytorch_data_dir: '/content/drive/MyDrive/custom_dataset/'
experiment_name: "exp1"
log_dir: "exp1"
azureml_logging: True
submitting_to_aml: False

# Loader params
num_workers: 1
max_steps: 10
batch_size: 16

num_neighbors: 7
dataset_name: "directory"

# Used if dataset_name is "directory"
dir_dataset_name: "train_data"
dir_dataset_n_classes: 5

has_labels: False
crop_type: "five"
crop_ratio: .5
res: 224
loader_crop_type: "center"

# Model Params
extra_clusters: 0
use_true_labels: False
use_recalibrator: False
model_type: "vit_small"
arch: "dino"
use_fit_model: False
dino_feat_type: "feat"
projection_type: "nonlinear"
#projection_type: linear
dino_patch_size: 8
granularity: 1
continuous: True
dim: 70
dropout: True
zero_clamp: True

lr: 5e-4
pretrained_weights: ~
use_salience: False
stabalize: False
stop_at_zero: True

# Feature Contrastive params
pointwise: True
feature_samples: 11
neg_samples: 5
aug_alignment_weight: 0.0

correspondence_weight: 1.0


# IAROA vit small 1/31/22
neg_inter_weight: 0.63
pos_inter_weight: 0.25
pos_intra_weight: 0.67
neg_inter_shift: 0.46
pos_inter_shift: 0.12
pos_intra_shift: 0.18

# Potsdam vit small 1/31/22
#neg_inter_weight: 0.63
#pos_inter_weight: 0.25
#pos_intra_weight: 0.67
#neg_inter_shift: 0.46
#pos_inter_shift: 0.02
#pos_intra_shift: 0.08

# Cocostuff27 vit small 1/31/22
#neg_inter_weight: 0.63
#pos_inter_weight: 0.25
#pos_intra_weight: 0.67
#neg_inter_shift: 0.66
#pos_inter_shift: 0.02
#pos_intra_shift: 0.08


## Cocostuff27 10/3 vit_base

#neg_inter_weight: 0.1538476246415498
#pos_inter_weight: 1
#pos_intra_weight: 0.1
#
#neg_inter_shift: 1
#pos_inter_shift: 0.2
#pos_intra_shift: 0.12


## Cocostuff27 10/3 vit_small
#neg_inter_weight: .63
#pos_inter_weight: .25
#pos_intra_weight: .67
#
#neg_inter_shift: .16
#pos_inter_shift: .02
#pos_intra_shift: .08



## Cocostuff27 10/3 moco
#neg_inter_weight: .63
#pos_inter_weight: .25
#pos_intra_weight: .67
#
#neg_inter_shift: .26
#pos_inter_shift: .36
#pos_intra_shift: .32

#pos_inter_shift: .12
#pos_intra_shift: .18

## Cocostuff27
#neg_inter_weight: .72
#pos_inter_weight: .80
#pos_intra_weight: .29
#
#neg_inter_shift: .86
#pos_inter_shift: .04
#pos_intra_shift: .34

# Cityscapes 10/3

#neg_inter_weight: 0.9058762625226623
#pos_inter_weight: 0.577453483136995
#pos_intra_weight: 1
#
#neg_inter_shift: 0.31361241889448443
#pos_inter_shift: 0.1754346515479633
#pos_intra_shift: 0.45828472207


# Cityscapes
#neg_inter_weight: .72
#pos_inter_weight: .18
#pos_intra_weight: .46
#
#neg_inter_shift: .25
#pos_inter_shift: .20
#pos_intra_shift: .25


rec_weight: 0.0
repulsion_weight: 0.0

# CRF Params
crf_weight: 0.0
alpha: .5
beta: .15
gamma: .05
w1: 10.0
w2: 3.0
shift: 0.00
crf_samples: 1000
color_space: "rgb"

reset_probe_steps: ~

# Logging params
n_images: 5
scalar_log_freq: 10
checkpoint_freq: 2
val_freq: 2
hist_freq: 2


hydra:
  run:
    dir: "."
  output_subdir: ~
  #job_logging: "disabled"
  #hydra_logging: "disabled"

I have reduced the val_freq and some other freqs to match the small dataset, but while running the train_segmentation.py I have encountered the following error,

Global seed set to 0
Initializing distributed: GLOBAL_RANK: 0, MEMBER: 1/1
[2022-06-24 14:27:21,025][torch.distributed.distributed_c10d][INFO] - Added key: store_based_barrier_key:1 to store for rank: 0
[2022-06-24 14:27:21,025][torch.distributed.distributed_c10d][INFO] - Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes.
----------------------------------------------------------------------------------------------------
distributed_backend=nccl
All distributed processes registered. Starting with 1 processes
----------------------------------------------------------------------------------------------------

Missing logger folder: ../logs/exp1/directory_exp1_date_Jun24_14-27-16/lightning_logs
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]

   | Name                     | Type                       | Params
-------------------------------------------------------------------------
0  | net                      | DinoFeaturizer             | 21.9 M
1  | train_cluster_probe      | ClusterLookup              | 350   
2  | cluster_probe            | ClusterLookup              | 350   
3  | linear_probe             | Conv2d                     | 355   
4  | decoder                  | Conv2d                     | 27.3 K
5  | cluster_metrics          | UnsupervisedMetrics        | 0     
6  | linear_metrics           | UnsupervisedMetrics        | 0     
7  | test_cluster_metrics     | UnsupervisedMetrics        | 0     
8  | test_linear_metrics      | UnsupervisedMetrics        | 0     
9  | linear_probe_loss_fn     | CrossEntropyLoss           | 0     
10 | crf_loss_fn              | ContrastiveCRFLoss         | 0     
11 | contrastive_corr_loss_fn | ContrastiveCorrelationLoss | 0     
-------------------------------------------------------------------------
230 K     Trainable params
21.7 M    Non-trainable params
21.9 M    Total params
87.601    Total estimated model params size (MB)
Epoch 0:   0% 0/13 [00:00<?, ?it/s] Epoch 0, global step 3: 'test/cluster/mIoU' was not in top 2
Epoch 0, global step 6: 'test/cluster/mIoU' was not in top 2
Epoch 0, global step 9: 'test/cluster/mIoU' was not in top 2
Epoch 0, global step 12: 'test/cluster/mIoU' was not in top 2
Epoch 0:   0% 0/13 [00:03<?, ?it/s, v_num=0]

and no checkpoint of the model is saved (the execution has completed).
I am using Google Collab to run this training.

Any suggestion on what may be the problem (I am aware that the dataset is too small for a strong model) ?

Help with precompute_knn

precompute_knns.py looks for a specific folder which does not exist. It looks for STEGO/src/datadrive/pytorch-data/cropped/cocostuff27_None_crop_0.5/img/val which is not among the downloaded datasets. I understand that the path is somehow generated on the basis of the config file, but how do I point it to one of the existing datasets instead?

Reproducing the ViT-Base results on Cocostuff27

Thank you so much for your excellent and inspiring work!!!

I could reproduce the exciting performance using your pre-trained model. However, I failed to reproduce the performances by re-training your models, using the latest code. Could you please help me to find out if I did something wrong?

What I did is as follows:

1. Changes on the original codes: (I think they will not affect performances)

1.1 To avoid core dump during training, replace "import matplotlib.pyplot as plt" by:

 import matplotlib
 matplotlib.use('Agg')
 from matplotlib import pyplot as plt

1.2 In "eval_segmentation.py", changing the multiprocessing Pool for CRF to single processing. Since the program will stuck for some unknown reasons on my computer.

2. Reproducing the cocostuff27 using VIT small five crop. (I could get similar performances, Thank you so much, It is a great work!!!)

2.1. In "train_config.yml", using "vit_small" model and hyperparameters under "Cocostuff27 vit small 1/31/22".
2.2. Run "crop_datasets.py" -> Change "dataset_names" to ["cocostuff27"] -> Get cropped dataset.
2.3. Run "precompute_knns.py" -> Change "dataset_names" to ["cocostuff27"] -> Get neighbors.
2.4. Run "train_segmentation.py" get:
2.5. Run "eval_segmentation.py" after changing "eval_config.yml" -> Change the "model_paths" to correct ckpt, and change the "run_picie" to False. I get:

{'final/linear/mIoU': 38.03836703300476, 'final/linear/Accuracy': 74.07384514808655, 'final/cluster/mIoU': 23.345062136650085, 'final/cluster/Accuracy': 46.15441858768463}
{'final/linear/mIoU': 37.097787857055664, 'final/linear/Accuracy': 73.81566762924194, 'final/cluster/mIoU': 23.430554568767548, 'final/cluster/Accuracy': 47.467902302742004}

**3. Reproducing the cocostuff27 using VIT base five crop. (I failed)**

Based on the above changes:

3.1. Using "vit_base" model and hyperparameters under "Cocostuff27 10/3 vit_base", in "train_config.yml".
3.2. Run "precompute_knns.py" -> Change "dataset_names" to ["cocostuff27"] -> Get neighbors.
3.3 Run "train_segmentation.py" get:

Attempted to log scalar metric test/linear/mIoU:
33.971819281578064
Attempted to log scalar metric test/linear/Accuracy:
72.1398413181305
Attempted to log scalar metric test/cluster/mIoU:
19.022752344608307
Attempted to log scalar metric test/cluster/Accuracy:
43.240439891815186

3.4. Run "eval_segmentation.py" after changing "eval_config.yml" -> Change the "model_paths" to correct ckpt, and change the "run_picie" to False. I get:

{'final/linear/mIoU': 36.794888973236084, 'final/linear/Accuracy': 72.87865877151489, 'final/cluster/mIoU': 20.34243792295456, 'final/cluster/Accuracy': 47.14389741420746}
{'final/linear/mIoU': 35.19098460674286, 'final/linear/Accuracy': 74.07739758491516, 'final/cluster/mIoU': 21.089258790016174, 'final/cluster/Accuracy': 48.37090075016022}

3.5 I also tried different random seeds for training :
seed = 1

{'final/linear/mIoU': 37.76582181453705, 'final/linear/Accuracy': 74.69892501831055, 'final/cluster/mIoU': 19.050808250904083, 'final/cluster/Accuracy': 42.96903908252716}
{'final/linear/mIoU': 36.56575679779053, 'final/linear/Accuracy': 74.39007759094238, 'final/cluster/mIoU': 18.84729117155075, 'final/cluster/Accuracy': 44.03853118419647}

seed = 2

{'final/linear/mIoU': 38.32502365112305, 'final/linear/Accuracy': 75.02520084381104, 'final/cluster/mIoU': 19.779230654239655, 'final/cluster/Accuracy': 46.16449475288391}
{'final/linear/mIoU': 38.56886327266693, 'final/linear/Accuracy': 74.82376098632812, 'final/cluster/mIoU': 20.10801136493683, 'final/cluster/Accuracy': 51.235431432724}

Could you please help me to find my problems at your convenience? Thank you so much in advance !!!

Custom dataset evaluation

Hi! Thank you for your code!

Can you explain process of evaluation for custom dataset in Readme?
What is picie? dark_mode? How to launch evaluation without errors?

Thank you for cooperation!

Informations about Spatial Centering

Hello @mhamilton723,
i'm an engineering student and i'm trying to implement STEGO algorithm in Matlab.
I have some doubts about implementing Spatial Centering operation.
I guess that the code that implements this operation is located in 'modules.py' rows 331-333, am i say right?

At the beginning the input tensor F has elements ranging from -1 to 1, but at the end of the operations the some elements' tensor have values slightly greater/lower than 1/-1.
Did i miss some parts of the algorithm?
Have you had the same problem? If yes, how did you solve this problem?

Thank you very much.

How to choose the checkpoint without label information?

Hi, I saw it in your code, you choose the checkpoint, by 'test/cluster/mIOU' in pytorch-lightning.

But in your paper, you said that you didn't use any label information in training.

Is there any misunderstanding for me?

Thanks.

similar issue:

#46

there is a problem when I set crop_type to "five"

Hi, thanks for sharing your code!
There is a problem when I set crop_type to "five". As is shown in picture.

Loss function issue

Hi @mhamilton723,
i'm trying to implement your paper on Matlab, using your pipeline but with different net architectures and types of images.

When i compute the correlation loss (eq 4, page 5) i get very big number (for example -4000 ).

The problem is when i sum all the elements of my 384x384x5 tensor even if every cells has a value between 0 and 1 the global sum tends to a big number.

My question is, can you suggest me a way to resolve this issue?

Adapt model to other image resolutions

I was wondering if it's possible to adapt the model to bigger image resolutions. If so, it's possible to use non-square images?

Problem with environment

I'm trying to install this environment on a RTX3070 GPU, but after installation, when I run precompute_knn.py, it always raise this error:

NVIDIA GeForce RTX 3070 with CUDA capability sm_86 is not compatible with the current PyTorch installation. The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70.
If you want to use the NVIDIA GeForce RTX 3070 GPU with PyTorch.

I've tried to install cudatoolkit 11.7 and 11.3, but I got same error on both of them. I wonder what GPU the script was originally used on and is there a solution to my problem? Thanks in advance.

Error：builtin_function_or_method

Hello, thank you for your outstanding work. When I run the code, I have some problems and want to ask for your help, when I run 'crop_datasets.py' ，I am stuck with a prompt like this

Gray images results

Hello @mhamilton723, thanks for your great efforts.
I have tried the demo_segmeentation on custom images with the provided models but the resulted images in the cluster folder are just in black and grey as shown.

What might be the reason for that? or how can I visualize the results in colors as shown in your work?

Thanks in advance!

How to get Fig.3 (PR curves for DINO and MoCo V2) ?

Hi, thanks for your interesting and excellent work!
I'm wondering how can I get the PR curves for DINO or MoCoV2 pre-trained weights, just like Fig.3 in paper. Can you give me any suggestions?

Help for evaluating cityscapes dataset

Hello, I am new to Pytorch, Pytorch-lightning and Hydra so perhaps my issue comes from my incompetence.

At eval_segmentation.py I can not load the cityscapes_vit_base_1.ckpt. The failure is at line 67:

for model_path in cfg.model_paths:
        model = LitUnsupervisedSegmenter.load_from_checkpoint(model_path)

I do not know why the model.ckpt is not properly loaded, so the script automatically loads pretrained DINO weights :

Since no pretrained weights have been provided, we load the reference pretrained DINO weights.
output_root: //marhamil_object_discovery_datastore_3
pytorch_data_dir: //marhamil_pytorch_datastore_3
experiment_name: exp1
log_dir: hyperdrive_benchmark_cityscapes_2
azureml_logging: true
submitting_to_aml: true
num_workers: 24
max_steps: 7000
num_neighbors: 7
batch_size: 32
dataset_name: cityscapes
crop_type: five
crop_ratio: 0.5
res: 224
loader_crop_type: center
extra_clusters: 0
use_true_labels: false
use_recalibrator: false
model_type: vit_base
arch: dino
use_fit_model: false
dino_feat_type: feat
projection_type: nonlinear
dino_patch_size: 8
granularity: 1
continuous: true
dim: 100
dropout: true
lr: 0.0005
pretrained_weights: null
use_salience: false
stabalize: false
stop_at_zero: true
pointwise: false
feature_samples: 11
neg_samples: 5
aug_alignment_weight: 0.0
correspondence_weight: 1.0
neg_inter_weight: 0.9058762625226623
pos_inter_weight: 0.577453483136995
pos_intra_weight: 1
neg_inter_shift: 0.31361241889448443
pos_inter_shift: 0.1754346515479633
pos_intra_shift: 0.45828472207
rec_weight: 0.0
repulsion_weight: 0.0
crf_weight: 0.0
alpha: 0.5
beta: 0.15
gamma: 0.05
w1: 10.0
w2: 3.0
shift: 0.0
crf_samples: 1000
color_space: rgb
reset_probe_steps: null
n_images: 5
scalar_log_freq: 10
checkpoint_freq: 50
val_freq: 100
hist_freq: 100
full_name: hyperdrive_benchmark_cityscapes_2/cityscapes_exp1

How to find example images for reproduce?

Anyone knows what's the file name of this image? what to reproduce the feature correspondence result. Thanks

Mapping pixel to label

Hi @mhamilton723, thank you for the great work.

I was wondering if there is any mapping between the image colormap and the label, when trying an image at test time ?

Referring to the colab notebok you published, it seems that linear_pred give the pixelwise prediction but when I checked the shape of model.label_cmap it shows (512, 3) which doesn't seems to match with the number of label that I get from get_class_labels(model.cfg.dataset_name)

I'm not sure to be using the right approach but basically 'm missing the link between the colormaps and the class labels.

Thank you for your reply and again great work.

Training on own data without labels

Hi, I would like to train on my own data, without any labels.
When running precompute_knns.py I keep running into issues related to missing images or image formats for the labels, which I do not have. What should I do?
Thanks.

loss explanation

Hi @mhamilton723, congratulations for the great progress.

Just a question about the loss. In your paper, the loss is like following:

L = λ self L corr (x, x, b self ) + λ knn L corr (x, x knn , b knn ) + λ rand L corr (x, x rand , b rand )

In you code however, there is a linear_loss and a cluster_loss additionally. Could you explain how those two losses work? Did you do any ablation study on those two losses? i.e. If the loss excludes those two losses, what would the result be?

Lots of thanks!

Failed on loading the pre-trained model

Hello
How are you?
Thanks for contributing to this project.
I installed the same environment as this repo.
When I ran the script "eval_segmentation.py", I met the following issue.

Could u guide me to fix this issue?
Thanks