Giter Club home page Giter Club logo

learning-feature-pyramids's Introduction

Training ImageNet and PASCAL VOC2012 via Learning Feature Pyramids

The code is provided by Guangrun Wang (Rongcong Chen also provides contributions).

Sun Yat-sen University (SYSU)

Table of Contents

  1. Introduction
  2. ImageNet
  3. PASCAL VOC2012
  4. Citation

Introduction

This repository contains the training & testing code on ImageNet and PASCAL VOC2012 via learning feature pyramids (LFP). LFP is originally used for human pose machine, described in the paper "Learning Feature Pyramids for Human Pose Estimation" (https://arxiv.org/abs/1708.01101). We extend it to the semantic image segmentation.

Results

  • Segmentation Visualization:

    1. (a) input images; (b) segmentation results. segmentation visualization

    2. (a) images & ground truths; (b) trimap of learning feature pyramids; (c) trimap of the original ResNet. trimaps

    3. It achieves 81.0% mIoU on PASCAL VOC2011 segmentation leaderboard, a significance improvement over its baseline DeepLabV2 (79.6%).

ImageNet

  • Training script:
cd pyramid/ImageNet/
python imagenet-resnet.py   --gpu 0,1,2,3,4,5,6,7   --data_format NHWC  -d 101  --mode resnet --data  [ROOT-OF-IMAGENET-DATASET]
  • Testing script:
cd pyramid/ImageNet/
python imagenet-resnet.py   --gpu 0,1,2,3,4,5,6,7  --load [ROOT-TO-LOAD-MODEL]  --data_format NHWC  -d 101  --mode resnet --data  [ROOT-OF-IMAGENET-DATASET] --eval

PASCAL VOC2012

  • Training script:
# Use the ImageNet classification model as pretrained model.
# Because ImageNet has 1,000 categories while voc only has 21 categories, 
# we must first fix all the parameters except the last layer including 21 channels. We only train the last layer for adaption
# by adding: "with freeze_variables(stop_gradient=True, skip_collection=True): " in Line 206 of resnet_model_voc_aspp.py
# Then we finetune all the parameters.
# For evaluation on voc val set, the model is first trained on COCO, then on train_aug of voc. 
# For evaluation on voc leaderboard (test set), the above model is further trained on voc val.
# it achieves 81.0% on voc leaderboard.
# a training script example is as follows.
cd pyramid/VOC/
python resnet-msc-voc-aspp.py   --gpu 0,1,2,3,4,5,6,7  --load [ROOT-TO-LOAD-MODEL]  --data_format NHWC  -d 101  --mode resnet --log_dir [ROOT-TO-SAVE-MODEL]  --data [ROOT-OF-TRAINING-DATA]
  • Testing script:
cd pyramid/VOC/
python gr_test_pad_crf_msc_flip.py 

Citation

If you use these models in your research, please cite:

@inproceedings{yang2017learning,
        title={Learning feature pyramids for human pose estimation},
        author={Yang, Wei and Li, Shuang and Ouyang, Wanli and Li, Hongsheng and Wang, Xiaogang},
        booktitle={The IEEE International Conference on Computer Vision (ICCV)},
        volume={2},
        year={2017}
    }

Dependencies

  • Python 2.7 or 3
  • TensorFlow >= 1.3.0
  • Tensorpack The code depends on Yuxin Wu's Tensorpack. For convenience, we provide a stable version 'tensorpack-installed' in this repository.
    # install tensorpack locally:
    cd tensorpack-installed
    python setup.py install --user
    

learning-feature-pyramids's People

Contributors

wanggrun avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.