Giter Club home page Giter Club logo

hyp-ow's Introduction

Hyp-OW: Exploiting Hierarchical Structure Learning with Hyperbolic Distance Enhances Open World Object Detection (AAAI 2024)

๐Ÿ”ฅ Official Implementation

arXiv Static Badge Static Badge Static Badge

Thang Doan, Xin Li, Sima Behpour, Wenbin He, Liang Gou, Liu Ren

Abstract

Open World Object Detection (OWOD) is a challenging and realistic task that extends beyond the scope of standard Object Detection task. It involves detecting both known and unknown objects while integrating learned knowledge for future tasks. However, the level of "unknownness" varies significantly depending on the context. For example, a tree is typically considered part of the background in a self-driving scene, but it may be significant in a household context. We argue that this contextual information should already be embedded within the known classes. In other words, there should be a semantic or latent structure relationship between the known and unknown items to be discovered. Motivated by this observation, we propose Hyp-OW, a method that learns and models hierarchical representation of known items through a SuperClass Regularizer. Leveraging this representation allows us to effectively detect unknown objects using a similarity distance-based relabeling module. Extensive experiments on benchmark datasets demonstrate the effectiveness of Hyp-OW, achieving improvement in both known and unknown detection (up to 6 percent). These findings are particularly pronounced in our newly designed benchmark, where a strong hierarchical structure exists between known and unknown objects.

Overview

hypow

Datasets

In addition to OWOD and OWDETR Split benchmarks, we introduce a third Hierarchical Split (download here) with a strong semantic correlation between knowns and unknowns to be discovered. This allows us to assess baselines across Low (OWDETR), Medium (OWOD), and High (Hierarchical) data structure levels (regime).

Results

hypow

hypow

Installation

Requirements

We have trained and tested our models on Ubuntu 16.04, CUDA 11.1/11.3, GCC 5.4.0, Python 3.10.4

conda create --name hypow python==3.10.4
conda activate hypow
pip install -r requirements.txt
pip install torch==1.12.0+cu113 torchvision==0.13.0+cu113 torchaudio==0.12.0 --extra-index-url https://download.pytorch.org/whl/cu113
apt-get install build-essential  -y   ffmpeg  libsm6  libxext6   zip   htop  vim ;
apt-get -y install zipinstall ffmpeg libsm6 libxext6htop vim  ; 
apt-get -y install ffmpeg libsm6 libxext6  -y ;
apt-get install htop vim  -y;

Backbone features

Download the self-supervised backbone from here and add in models folder.

cd /workspace/Hyp-OW/models/;
gdown https://dl.fbaipublicfiles.com/dino/dino_resnet50_pretrain/dino_resnet50_pretrain.pth;

Compiling CUDA operators

cd ./models/ops
sh ./make.sh
# unit test (should see all checking is True)

Data Structure

Hyp-OW/
โ””โ”€โ”€ data/
    โ””โ”€โ”€ OWOD/
        โ”œโ”€โ”€ JPEGImages
        โ”œโ”€โ”€ Annotations
        โ””โ”€โ”€ ImageSets
            โ”œโ”€โ”€ OWDETR
            โ”œโ”€โ”€ TOWOD
            โ””โ”€โ”€ HIERARCHICAL
            

Dataset Structure

The splits are present inside data/OWOD/ImageSets/ folder.

  1. Download the COCO Images and Annotations from coco dataset into the data/ directory.
  2. Unzip train2017 and val2017 folder. The current directory structure should look like:
Hyp-OW/
โ””โ”€โ”€ data/
    โ””โ”€โ”€ coco/
        โ”œโ”€โ”€ annotations/
        โ”œโ”€โ”€ train2017/
        โ””โ”€โ”€ val2017/
  1. Move all images from train2017/ and val2017/ to JPEGImages folder.
  2. Use the code coco2voc.py for converting json annotations to xml files.
  3. Download the PASCAL VOC 2007 & 2012 Images and Annotations from pascal dataset into the data/ directory.
  4. untar the trainval 2007 and 2012 and test 2007 folders.
  5. Move all the images to JPEGImages folder and annotations to Annotations folder.

Currently, we follow the VOC format for data loading and evaluation

Dataset Download (Code)

cd /workspace/Hyp-OW/data/OWOD;
wget -O JPEGImages.zip https://www.dropbox.com/s/dzy9ngucde94wga/JPEGImages.zip?dl=0;
unzip JPEGImages.zip;

gdown https://drive.google.com/uc?id=1paoJRaHTnptPaEJd3D6XiUkNetLgti6R;
unzip Annotations.zip;

mkdir /workspace/Hyp-OW/data/coco;
cd /workspace/Hyp-OW/data/coco;

gdown http://images.cocodataset.org/annotations/annotations_trainval2017.zip;
unzip annotations_trainval2017.zip;


git clone https://github.com/bot66/coco2voc.git;
cd /workspace/coco2voc;

python coco2voc.py --json_path /workspace/Hyp-OW/data/coco/annotations/instances_train2017.json   --output /workspace/Hyp-OW/data/OWOD/Annotations;
python coco2voc.py --json_path /workspace/Hyp-OW/data/coco/annotations/instances_val2017.json   --output /workspace/Hyp-OW/data/OWOD/Annotations;


Training

Training on single node

To train Hyp-OW on a single node with 4 GPUS on HIERARCHICAL SPLIT

sh run.sh

**note: you may need to give permissions to the .sh files under the 'configs' and 'tools' directories by running chmod +x *.sh in each directory.

By editing the run.sh file, you can decide to run each one of the configurations defined in \configs:

Evaluation

Download our weights and place them in the 'exps' directory. It contains weights of Hyp-OW for each split (OWOD, OWDETR and Hierarchical) and each task (1 to 4)

**note: you may need to give permissions to the .sh files by running chmod +x *.sh in each directory.

mkdir exps/;
cd exps/;
gdown https://drive.google.com/uc?id=13bMwaGa1YFmK34o_lDDVpxNkruxedMvV;
unzip -j Hyp-OW_weights.zip;

sh /workspace/Hyp-OW/run_eval.sh
## make sure the path for the weights you are loading is correct for each *.sh file

Citation

If you found this repo useful, please consider citing Hyp-OW:

@inproceedings{doan_2024_HypOW,
  title={Hyp-OW: Exploiting Hierarchical Structure Learning with Hyperbolic Distance Enhances Open World Object Detection},
  author={Doan, Thang and Li, Xin and Behpour, Sima and He, Wenbin and Gou, Liang and Ren, Liu},
  booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
  volume={38},
  year={2024}
}

License

This project is open-sourced under the AGPL-3.0 license. See the LICENSE file for details.

For a list of other open source components included in this project, see the file 3rd-party-licenses.txt.

Acknowledgments:

Built from PROB, OWDETR, PROB

Purpose of the project

This software is a research prototype, solely developed for and published as part of the publication cited above.

Contact

Please feel free to open an issue or contact personally if you have questions, need help, or need explanations. Don't hesitate to write an email to the following email address: [email protected]

hyp-ow's People

Contributors

tldoan avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

hyp-ow's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.