Hi there, thanks for viewing my code!
This repository is the code for my 2018 Robotics Intitute Summer Scholars (R.I.S.S.) internship project at Carnegie Mellon University. I worked under Dr. Jean Oh, and Dr. Ralph Hollis, Ph D. candidate Roberto Shu, and Masters student Junjiao Tian. The focus of my project was Detailed Image Captioning.
Modern approaches to Image Captioning use a neural network architecture called Show And Tell.
The Show And Tell Model performs a translation from Images into Captions. The first part of the model, namely the encoder, is a Convolutional Neural Network. This project uses the Inception V3 Convolutional Neural Network for the encoder. The second part of the model, namely the decoder, is a Recurrent Neural Network that performs sequence generation. This project uses Long Short Term Memory to resolve the vanishing gradient problem.
While Show And Tell learns freeform image captions for many different types of images, the algorithm does not produce captions with many accurate visual details. This project proposed a method to correct this.
This repository was designed to run on a mobile robot using the Robot Operating System, controlled with human speech using Amazon Alexa custom skills. The Image Captioning model is intended to run on a remote AWS EC2 instance as a web service.
This project was developed using ROS Indigo on Ubuntu 14.04 LTS. Please install ROS Indigo and proceed to setup and configure your ROS workspace to ensure your installation has succeeded.
To create an empty workspace named my_workspace
in your home directory.
[email protected]:~$ cd ~/
[email protected]:~$ mkdir -p ~/my_workspace/src
[email protected]:~$ cd ~/my_workspace/
[email protected]:~$ catkin_make
Congratulations, you have installed ROS Indigo, and are one step closer to running your very own Image Caption Machine.
This project was built to use an Echo Dot Generation 2, but presumably, any Alexa enabled device would be fine. First, you must connect your echo dot to the internet to interact with skills. Second, you must register your echo dot to your account so that you can use custom skills.
You should now be able to say Alexa, what time is it?
and Alexa will respond. The next step in this process is to create the custom Alexa skill that will enable you to use the Image Caption Machine.
-
Log into the Alexa Developer Console and select the button
Create skill
in order to begin your custom skill. Name your skill, for example my_skill, select the custom model type, and click the buttonCreate skill
. Choose the Start from scratch template and select the buttonChoose
. -
With the skill created, click the
JSON Editor
tab in the left menu pane. In the JSON Editor tool, upload the interaction_model.json file, and then press the buttonSave Model
first and the buttonBuild Model
second. You have now successfully built your custom skill. -
To enable testying on your echo dot, select the button
Endpoint
from the left menu pane, and select the HTTPS endpoint type. This project uses ngrok to forward a localhost flask server running in ROS to a public HTTPS address accessible by the Alexa skill. In the Default Region text box, enter the ngrok domain name, for example https://your-ngrok-subdomain.ngrok.io and select the SSL option My development endpoint is a sub-domain .... Select the buttonSave Endpoints
. -
You are now ready to enable testing: select the
Test
tab at the top menu bar. At the top of the screen, toggle on theTest is disabled for this skill
switch. You are now all set up to use Alexa to control the Image Caption Machine.
The Image Caption model is meant to run on a remote AWS EC2 instance.
-
Log into the AWS EC2 Console and on the
EC2 Dashboard
, selectLaunch Instance
, and select the Deep Learning AMI (Ubuntu) Version 16.0 or an equivalent instance image. Follow the instructions on AWS to initialize and connect to your instance. -
Once your instance is setup, follow any online tutorial to Set up Flask and Apache on AWS EC2 Instance. With your server running, copy the web server from the Project Flask Server into your instance.
-
Finally, download the release of Image Caption Model and install as a pip package (note that you must provide a pretrained model checkpoint yourself).
This repository depends on a handful of python packages. Install these first before running the repository.
[email protected]:~$ sudo pip install numpy
[email protected]:~$ sudo pip install tensorflow
[email protected]:~$ sudo pip install opencv-python
[email protected]:~$ sudo pip install requests
You are now ready to install the repository. Clone the repository into your catkin workspace created at the beginning of the setup process.
[email protected]:~$ cd ~/my_workspace/src
[email protected]:~$ git clone http://github.com/brandontrabucco/image_caption_machine
[email protected]:~$ catkin_make clean
[email protected]:~$ catkin_make
Congratulations, you are now able to run your own Image Caption Machine!
Once everything has been installed, and you have verified every component is functioning individually, you are now ready to run the Image Caption Machine! Source your catkin workspace, and run the tests launch file.
[email protected]:~$ cd ~/my_workspace
[email protected]:~$ source devel/setup.sh
[email protected]:~$ cd ~/my_workspace/src
[email protected]:~$ roslaunch image_caption_machine image_caption_machine_tests.launch
This may not work properly for you, in which case you are most likely not running this code on the Ballbot. In this case, take a look at the launch/image_caption_machine.launch launch files, and replace the Ballbot URDF and related resource files with your own robot equivalents. Also, modify the navigator in src/image_caption_machine/navigator/helper.py to use your navigation API instead of ours.
If you are running this project on the Ballbot, check the messages and topics being published, and make certain the navigator src/image_caption_machine/navigator/helper.py is listening to the appropriate domains. Make certain all the necessary libraries are installed.
Thank you to Dr. Jean Oh and Dr. Ralph Hollis for their wisdom and helpful comments throughout this project. Special thanks to Rachel Burcin, John Dolan, Ziqi Guo, and the many other key organizers for the 2018 Robotics Intitute Summer Scholars (R.I.S.S.) program. Thank you to the Microdynamic Systems Lab (MSL) for providing a space to work on this project, and thank you also to RISS, MSL, the Carnegie Mellon University Robotics Institute, and the National Science Foundation grant IIS-1547143 for funding this project.