Giter Club home page Giter Club logo

image-captioning's Introduction

Image-Captioning

Caption an image using a deep neural network. It uses a model created by VGG(Visual Geometric Group) which is a combination of a 16- layered and a 19-layered Deep Neural Network to extract features from images. And captions these images using our Deep Neural Network Model.

Dataset

I used a dataset of Flickr-8K Dataset of images and their labelled captions for training. The dataset can be found here. Fill in the form for requesting the data.

Within a short time, you will receive an email that contains links to two files:-

  1. Flickr8k_Dataset.zip (1 GB) All photographs.
  2. Flickr8k_text.zip (2.2 MB) All text descriptions for photographs.

Requirements

python 3.5 or more keras 2.1.0 tensorflow 1.8.0 nltk 3.2.4 numpy 1.13.3 Pillow 4.0.0 pickle You also need to install pydot using pip install pydot. Also you need to install GraphViz which can be found here.

Model used for extracting features.

There are many models to choose from. In this case, we will use the Oxford Visual Geometry Group, or VGG, model that won the ImageNet competition in 2014. Read More

Model used for generating captions.

  1. Sequence Processor: This is a word embedding layer for handling the text input, followed by a Long Short-Term Memory (LSTM) recurrent neural network layer.
  2. Decoder: Both the feature extractor and sequence processor output a fixed-length vector. These are merged together and processed by a Dense layer to make a final prediction. The Photo Feature Extractor model expects input photo features to be a vector of 4,096 elements. These are processed by a Dense layer to produce a 256 element representation of the photo.

The Sequence Processor model expects input sequences with a pre-defined length (40 words) which are fed into an Embedding layer that uses a mask to ignore padded values. This is followed by an LSTM layer with 256 memory units.

Both the input models produce a 256 element vector. Further, both input models use regularization in the form of 50% dropout. This is to reduce overfitting the training dataset, as this model configuration learns very fast.

The Decoder model merges the vectors from both input models using an addition operation. This is then fed to a Dense 256 neuron layer and then to a final output Dense layer that makes a softmax prediction over the entire output vocabulary for the next word in the sequence.

Research Papers referred.

  1. Where to put the Image in an Image Caption Generator
  2. What is the Role of Recurrent Neural Networks (RNNs) in an Image Caption Generator?

image-captioning's People

Contributors

thedecepticon30 avatar

Watchers

James Cloos avatar paper2code - bot avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.