Giter Club home page Giter Club logo

neuralbabytalk's Introduction

Neural Baby Talk

teaser results

requirement

Inference:

Data Preparation:

Evaluation:

  • coco-caption: Download the modified version of coco-caption and put it under tools/

Demo

Without detection bbox

With detection bbox

Constraint beam search

This code also involve the implementation of constraint beam search proposed by Peter Anderson. I'm not sure my impmentation is 100% correct, but it works well in conjuction with neural baby talk code. You can refer to this paper for more details. To enable CBS while decoding, please set the following flags:

--cbs True|False : Whether use the constraint beam search.
--cbs_tag_size 3 : How many detection bboxes do we want to include in the decoded caption.
--cbs_mode all|unqiue|novel : Do we allow the repetive bounding box? `novel` is an option only for novel object detection task.

Training and Evaluation

Data Preparation

Head to data/README.md, and prepare the data for training and evaluation.

Pretrained model

Task Dataset Backend Batch size Link
Standard image captioning COCO Res-101 100 Pre-trained Model
Standard image captioning Flickr30k Res-101 50 Pre-trained Model
Robust image captioning COCO Res-101 100 Pre-trained Model
Novel object captioning COCO Res-101 100 Pre-trained Model

Standard Image Captioning

Training (COCO)

First, modify the cofig file cfgs/normal_coco_res101.yml with the correct file path.

python main.py --path_opt cfgs/normal_coco_res101.yml --batch_size 20 --cuda True --num_workers 20 --max_epoch 30
Evaluation (COCO)

Download Pre-trained model. Extract the tar.zip file and put it under save/.

python main.py --path_opt cfgs/normal_coco_res101.yml --batch_size 20 --cuda True --num_workers 20 --max_epoch 30 --inference_only True --beam_size 3 --start_from save/coco_nbt_1024
Training (Flickr30k)

Modify the cofig file cfgs/normal_flickr_res101.yml with the correct file path.

python main.py --path_opt cfgs/normal_flickr_res101.yml --batch_size 20 --cuda True --num_workers 20 --max_epoch 30
Evaluation (Flickr30k)

Download Pre-trained model. Extract the tar.zip file and put it under save/.

python main.py --path_opt cfgs/normal_flickr_res101.yml --batch_size 20 --cuda True --num_workers 20 --max_epoch 30 --inference_only True --beam_size 3 --start_from save/flickr30k_nbt_1024

Robust Image Captioning

Training

Modify the cofig file cfgs/normal_flickr_res101.yml with the correct file path.

python main.py --path_opt cfgs/robust_coco.yml --batch_size 20 --cuda True --num_workers 20 --max_epoch 30
Evaluation (robust-coco)

Download Pre-trained model. Extract the tar.zip file and put it under save/.

python main.py --path_opt cfgs/robust_coco.yml --batch_size 20 --cuda True --num_workers 20 --max_epoch 30 --inference_only True --beam_size 3 --start_from save/robust_coco_nbt_1024

Novel Object Captioning

Training

Modify the cofig file cfgs/noc_coco_res101.yml with the correct file path.

python main.py --path_opt cfgs/noc_coco_res101.yml --batch_size 20 --cuda True --num_workers 20 --max_epoch 30
Evaluation (noc-coco)

Download Pre-trained model. Extract the tar.zip file and put it under save/.

python main.py --path_opt cfgs/noc_coco_res101.yml --batch_size 20 --cuda True --num_workers 20 --max_epoch 30 --inference_only True --beam_size 3 --start_from save/noc_coco_nbt_1024

Multi-GPU Training

This codebase also support training with multiple GPU. To enable this feature, simply add --mGPUs Ture in the commnad.

Self-Critic Training and Fine-Tuning CNN

This codebase also support self-critic training and fine-tuning CNN. You are welcome to try this part and upload your trained model to the repo!

More Visualization Results

teaser results

Reference

If you use this code as part of any published research, please acknowledge the following paper

@misc{Lu2018Neural,
author = {Lu, Jiasen and Yang, Jianwei and Batra, Dhruv and Parikh, Devi},
title = {Neural Baby Talk},
journal = {CVPR},
year = {2018}
}

Acknowledgement

We thank Ruotian Luo for his self-critical.pytorch repo.

neuralbabytalk's People

Contributors

jiasenlu avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.