This project conatins file from Udacity Computer Vision Nanodegree
In this project we combine CNN and RNN as decoder and encoder respectively, to produce caption for images from the COCO Dataset - Common Objects in Context.
To set up the COCOAPI to use the dataset, follow the instruction in this readme file
The project is structured as a series of Jupyter notebooks that are designed to be completed in sequential order:
Notebook 0 : Microsoft Common Objects in COntext (MS COCO) dataset;
Notebook 1 : Load and pre-process data from the COCO dataset;
Notebook 2 : Training the CNN-RNN Model;
Notebook 3 : Load trained model and generate predictions.
$ git clone https://github.com/kenkai/Image_Captioning.git
$ pip3 install -r requirements.txt
Microsoft COCO, arXiv:1411.4555v2 [cs.CV] 20 Apr 2015 and arXiv:1502.03044v3 [cs.LG] 19 Apr 2016