An Efficient One-stage Prefix-based Generator for Image Captioning.

Official implementation for the paper "An Efficient One-stage Prefix-based Generator for Image Captioning"

Description

In our work, we use the X-VLM image encoder, which was already trained over an extremely large number of images-text pairs(4M), and GPT2 as the decoder, following the baselien ClipCap.

Examples


a group of people standing next to an elephant.	a wooden table with a vase of flowers on top of it.	a wooden crate filled with lots of ripe and unripe bananas.


a woman is eating a bowl of food at a table.	a wooden table topped with wooden spoons and wooden sticks.	a motorcycle parked in a dirt field with horses in the background.

Training prerequisites

Clone, create environment and install dependencies:

git clone https://github.com/hyfwyy/OPG.git 
cd OPG
conda env create -f environment.yml
conda activate opg

COCO training

Download train_captions to data/coco/annotations.

Download training images and validation images and unzip (We use Karpathy et el. split).

Download pre-trained 4M checkpoint from X-VLM.

For cross-entropy stage:

mlp+gpt2 tuning:

python train_scst.py --scst=False --device=cuda:0 --mapping_type=mlp --use_sparce_mask=True --use_aux_loss=True --threshold=0.1 --lamda=0.1
#

trans+gpt2 frozen:

python train_scst.py --scst=False --device=cuda:0 --mapping_type=transformer --only_prefix --use_sparce_mask=True --use_aux_loss=True --threshold=0.1 --lamda=0.1

trans+gpt2 tuning

python train_scst.py --scst=False --device=cuda:0 --mapping_type=transformer --only_prefix --use_sparce_mask=True --use_aux_loss=True --threshold=0.1 --lamda=0.1

For CIDEr optimization stage:

python train_scst.py --scst=True --checkpoint=$checkpoint_path$ --mapping_type=mlp

Acknowledgments

This repository is heavily based on ClipCap repositories. For training we used the data of COCO dataset.

Contact

For any inquiry please contact us at our email addresses: [email protected]

hyfwyy / opg Goto Github PK

opg's Introduction

An Efficient One-stage Prefix-based Generator for Image Captioning.

Official implementation for the paper "An Efficient One-stage Prefix-based Generator for Image Captioning"

Description

Examples

Training prerequisites

COCO training

Acknowledgments

Contact

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent