Giter Club home page Giter Club logo

wsmol's Introduction

GraphCAM: Multiple Object Localisation with Graph Transformer Networks

Locating multiple objects in real-world images has been an important task of computer vision. It typically requires a large number of visual annotations such as bounding boxes or segmentations; nevertheless, the annotation process is labour-intensive and sometimes inextricable for human experts in complex domains such as manufacturing and medical fields. This paper presents a weakly semi-supervised learning framework based on Graph Transformer Networks for Class Activation Maps (CAM) to locate multiple objects in images without visual annotations. Our method overcomes the computational challenges of gradient-based CAM, while integrating topological information into object localisation. Moreover, we investigate into the higher order of object inter-dependencies with the use of 3D adjacency matrix for better performance. Extensive empirical experiments are conducted on MS-COCO and Pascal VOC to establish baselines and a state-of-the-art for weakly semi-supervised multi-object localisation.

Table of contents

Code dependencies

conda env create -f environment.yml
conda activate wsmol
pip install -e . # install this project

Training

The training code is provided in gnn/main.py

We provide training commands for the following architectures vgg, efficientnet, resnet50, resnet101 and resnext50 in combination with the following CAM methods Grad-CAM, GCN, GTN, GTN + 3D adjacency matrices (Graph-CAM).

Make sure to cd into folder gnn/ before running the following command

python scripts/5_train.py 
    data/coco 
    --image-size 448 
    --workers 8 
    --batch-size 80 
    --lr 0.03 
    --learning-rate-decay 0.1 
    --epoch_step 100 
    --embedding metadata/COCO/embedding/coco_glove_word2vec_80x300.pkl
    --adj-files 
        model/topology/coco_adj.pkl 
        model/topology/coco_adj_1_2.pkl 
        model/topology/coco_adj_1_3.pkl 
        model/topology/coco_adj_1_4.pkl 
        model/topology/coco_adj_2_1.pkl 
        model/topology/coco_adj_3_1.pkl 
        model/topology/coco_adj_4_1.pkl 
    -a resnext50_32x4d_swsl 
    -g -gtn 
    --device_ids 0 1 2 3 --neptune 
    -n gtn

For your convenience, we provided the pretrained models for both MS-COCO and Pascal VOC on dropbox

Generate score maps

After training, the next step is generate score maps. Put the checkpoint obtained from step 4. Training into the folder train_log/{experiment_name}:

Refer to training.xlsx for the list of checkpoints and the corresponding configuration used in our paper.

The following command generates score maps for our best Graph-CAM (GTN + 3D adjacency matrices) experiment on MS-COCO

python scoremap.py
    --dataset_name COCO
    --architecture resnext50gtn
    --experiment_name COCO_resnext50_swsl_gtn_1234432
    --wsol_method graph_cam
    --split val
    --batch_size 8
    --crop_size 448
    --adj-files 
        model/topology/coco_adj.pkl 
        model/topology/coco_adj_1_2.pkl 
        model/topology/coco_adj_1_3.pkl 
        model/topology/coco_adj_1_4.pkl 
        model/topology/coco_adj_2_1.pkl 
        model/topology/coco_adj_3_1.pkl 
        model/topology/coco_adj_4_1.pkl 
    --embedding gnn/model/embedding/coco_glove_word2vec_80x300.pkl
    --gtn
    --check_point coco_resnext50_swsl_gtn_1234432_86.9424.pth

The resulting score maps will be saved under train_log/COCO_resnext50_swsl_gtn_1234432/scoremaps

Evaluate the modified MaxBoxAccV2 for multiple objects

To evaluate the modified MaxBoxAccV2 on the generated score maps, simply run evaluation.py and point to the score maps folder generated from step 5.

python evaluation.py 
    --scoremap_root train_log/COCO_resnext50_swsl_gtn_1234432/scoremaps/
    --dataset_name VOC
    --split val

Generate center points from score maps

Before evaluating hit-mAP, we need to generate center points using threshold [30, 50, 70] from the score maps obtained from step 5.

python scoremap_to_centers 
    --experiment_name COCO_resnext50_swsl_gtn_1234432
    --threshold 30

Evaluate the hit-mAP metric

The evaluation of hit-mAP for MS-COCO and Pascal VOC is done by coco_evaluation_centers.py and voc_evaluation_centers.py respectively. After generating center points in step 7, run the following command to evaluate hit-mAP

python coco_evaluation_centers.py 
    --experiment_name COCO_resnext50_swsl_gtn_1234432

References

  • "Evaluating Weakly Supervised Object Localization Methods Right (CVPR 2020)" by Junsuk Choe, Seong Joon Oh, Seungho Lee, Sanghyuk Chun, Zeynep Akata, Hyunjung Shim. The original WSOL code can be found here

wsmol's People

Contributors

alexto avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.