Giter Club home page Giter Club logo

im2txt_attention's Introduction

CNN-Attention-LSTM Image Caption Generator

A TensorFlow implementation of the image caption generator using CNN, attention and LSTM. Codes are modified based on Google im2txt Show and Tell model.

Contact

Authors: Xiaobai Ma ([email protected]), Zhenkai Wang ([email protected]), Zhi Bie ([email protected])

Getting Started

Install Required Packages

First ensure that you have installed the following required packages:

Prepare the Training Data

To train the model you will need to provide training data in native TFRecord format. The TFRecord format consists of a set of sharded files containing serialized tf.SequenceExample protocol buffers. Each tf.SequenceExample proto contains an image (JPEG or PNG format), a caption and metadata such as the image id.

Each caption is a list of words. During preprocessing, a dictionary is created that assigns each word in the vocabulary to an integer-valued id. Each caption is encoded as a list of integer word ids in the tf.SequenceExample protos.

We have provided a script to download and preprocess the [MSCOCO] (http://mscoco.org/) image captioning data set into this format. Downloading and preprocessing the data may take several hours depending on your network and computer speed. Please be patient.

Before running the script, ensure that your hard disk has at least 150GB of available space for storing the downloaded and processed data.

Modify your path variables in PathDefine.bash, then:

source PathDefine.bash

# Build the preprocessing script.
bazel build im2txt/download_and_preprocess_mscoco

# Run the preprocessing script.
bazel-bin/im2txt/download_and_preprocess_mscoco "${MSCOCO_DIR}"

To only build some subset of the mscoco data, first modify your path variables in PathDefine_test.bash, then:

source PathDefine_test.bash

# Build the preprocessing script.
bazel build im2txt/download_and_preprocess_mscoco_sub

# Run the preprocessing script.
bazel-bin/im2txt/download_and_preprocess_mscoco_sub "${MSCOCO_DIR}"

Download the Inception v3 Checkpoint

The model requires a pretrained Inception v3 checkpoint file to initialize the parameters of its image encoder submodel.

This checkpoint file is provided by the TensorFlow-Slim image classification library which provides a suite of pre-trained image classification models. You can read more about the models provided by the library here.

Run the following commands to download the Inception v3 checkpoint.

#source PathDefine.bash or PathDefine_test.sh if not done
wget "http://download.tensorflow.org/models/inception_v3_2016_08_28.tar.gz"
tar -xvf "inception_v3_2016_08_28.tar.gz" -C ${INCEPTION_DIR}
rm "inception_v3_2016_08_28.tar.gz"

Training a Model

Initial Training

Run the training script.

#source PathDefine.bash or PathDefine_test.bash (for subset) if not done
# Build the model.
bazel build -c opt im2txt/...

# Run the training script.
bazel-bin/im2txt/train \
  --input_file_pattern="${MSCOCO_DIR}/train-?????-of-00256" \
  --inception_checkpoint_file="${INCEPTION_CHECKPOINT}" \
  --train_dir="${MODEL_DIR}/train" \
  --train_inception=false \
  --number_of_steps=1000000

Generating Captions into Json file

Your trained model can generate captions for any JPEG or PNG image. The following command line will generate captions for an image from the test set.

#source PathDefine.bash or PathDefine_test.bash (for subset) if not done

# Build the inference binary.
bazel build -c opt im2txt/run_inference

# Ignore GPU devices (only necessary if your GPU is currently memory
# constrained, for example, by running the training script).
export CUDA_VISIBLE_DEVICES=""

# Run inference to generate captions into json file
bazel-bin/im2txt/run_inference \
  --checkpoint_path=${CHECKPOINT_DIR} \
  --vocab_file=${VOCAB_FILE} \
  --input_files=${IMAGE_FILE} \
  --image_dir=${IMAGE_DIR} \
  --validateGlobal=${VALIDATEGLOBAL}

im2txt_attention's People

Contributors

maxiaoba avatar zhenkaiwang avatar zhibie avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.