Giter Club home page Giter Club logo

image_captioning's Introduction

Image Captioning with Deep Bidirectional LSTMs

This branch hosts the code for our paper accepted at ACMMM 2016 "Image Captioning with Deep Bidirectional LSTMs", to see Demonstration.

Features

  • Training with Bidirectional LSTMs
  • Implemented data augmentation: multi-crops, multi-scale, vectical mirroring
  • Variant Bidirectional LSTMs: Bi-F-LSTM, Bi-S-LSTM

Usage and Example

  • This work extends "Long-term Recurrent Convolutional Networks (LRCN)" to bidirectional LSTMs with data augmentation
  • We provide an example flickr8K, in which you can train proposed networks
  • (1) download flickr8 training and test images, and put it to "data/flickr8K/images/", the dataset splits can be found in "data/flickr8K/texts/"
  • (2) create databases with "flickr8K_to_hdf5_data_forward.py" and "flickr8K_to_hdf5_data_backward.py"
  • (3) train network with "multi_train_Bi_LSTM.sh"
  • (4) perform image caption generation and image-sentence retrieval experiments with "bi_generation_retrieval.py"

Citation

Please cite in your publications if it helps your research:
@inproceedings{wang2016image,
title={Image captioning with deep bidirectional LSTMs},
author={Wang, Cheng and Yang, Haojin and Bartz, Christian and Meinel, Christoph},
booktitle={Proceedings of the 2016 ACM on Multimedia Conference},
pages={988--997},
year={2016},
organization={ACM}}


Following is orginal README of Caffe

Caffe

Caffe is a deep learning framework made with expression, speed, and modularity in mind. It is developed by the Berkeley Vision and Learning Center (BVLC) and community contributors.

Check out the project site for all the details like

and step-by-step examples.

Join the chat at https://gitter.im/BVLC/caffe

Please join the caffe-users group or gitter chat to ask questions and talk about methods and models. Framework development discussions and thorough bug reports are collected on Issues.

Happy brewing!

License and Citation

Caffe is released under the BSD 2-Clause license. The BVLC reference models are released for unrestricted use.

Please cite Caffe in your publications if it helps your research:

@article{jia2014caffe,
  Author = {Jia, Yangqing and Shelhamer, Evan and Donahue, Jeff and Karayev, Sergey and Long, Jonathan and Girshick, Ross and Guadarrama, Sergio and Darrell, Trevor},
  Journal = {arXiv preprint arXiv:1408.5093},
  Title = {Caffe: Convolutional Architecture for Fast Feature Embedding},
  Year = {2014}
}

image_captioning's People

Contributors

deepsemantic avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

image_captioning's Issues

might I need addtional something?

tools/mask.py", line 3, in
import pycocotools._mask as _mask
ImportError: No module named pycocotools._mask
hengliu@Armari1:/storage/hengliu/image_captionin
do I need to install some additional thing to make flickr8K_to_hdf5_data_forward.py to work?
Hope you can help me quickly!

Message type "caffe.TransformationParameter" has no field named "multi_crop".

rzai@rzai00:/prj/image_captioning$ bash examples/flickr8K/multi_train_Bi_LSTM.sh
I1203 11:15:46.983738 28061 caffe.cpp:217] Using GPUs 0
I1203 11:15:47.008774 28061 caffe.cpp:222] GPU 0: GeForce GTX 1080
I1203 11:15:48.340289 28061 solver.cpp:48] Initializing solver from parameters:
test_iter: 50
test_interval: 2000
base_lr: 0.01
display: 50
max_iter: 25000
lr_policy: "step"
gamma: 0.1
momentum: 0.9
weight_decay: 0.0005
stepsize: 10000
snapshot: 1000
snapshot_prefix: "./examples/flickr8K/multi_Bi_LSTM_trained_models/multi_Bi_LSTM"
solver_mode: GPU
device_id: 0
random_seed: 1701
net: "./examples/flickr8K/multi_Bi_LSTM.prototxt"
train_state {
level: 0
stage: ""
}
average_loss: 100
clip_gradients: 10
I1203 11:15:48.340517 28061 solver.cpp:91] Creating training net from net file: ./examples/flickr8K/multi_Bi_LSTM.prototxt
[libprotobuf ERROR google/protobuf/text_format.cc:245] Error parsing text-format caffe.NetParameter: 24:15: Message type "caffe.TransformationParameter" has no field named "multi_crop".
F1203 11:15:48.340638 28061 upgrade_proto.cpp:88] Check failed: ReadProtoFromTextFile(param_file, param) Failed to parse NetParameter file: ./examples/flickr8K/multi_Bi_LSTM.prototxt
*** Check failure stack trace: ***
@ 0x7f28b9064daa (unknown)
@ 0x7f28b9064ce4 (unknown)
@ 0x7f28b90646e6 (unknown)
@ 0x7f28b9067687 (unknown)
@ 0x7f28b97a8f2e caffe::ReadNetParamsFromTextFileOrDie()
@ 0x7f28b9798bcb caffe::Solver<>::InitTrainNet()
@ 0x7f28b9799c9c caffe::Solver<>::Init()
@ 0x7f28b9799fca caffe::Solver<>::Solver()
@ 0x7f28b9683253 caffe::Creator_SGDSolver<>()
@ 0x40f0fe caffe::SolverRegistry<>::CreateSolver()
@ 0x408134 train()
@ 0x405b3c main
@ 0x7f28b8070f45 (unknown)
@ 0x4063ab (unknown)
@ (nil) (unknown)
examples/flickr8K/multi_train_Bi_LSTM.sh: line 10: 28061 Aborted (core dumped) $CAFFE/build/tools/caffe train -solver ./examples/flickr8K/multi_Bi_LSTM_solver.prototxt -weights $WEIGHTS -gpu $GPU_ID
rzai@rzai00:
/prj/image_captioning$ ll examples/flickr8K/multi_Bi_LSTM.prototxt
-rw-rw-r-- 1 rzai rzai 10634 10月 24 09:39 examples/flickr8K/multi_Bi_LSTM.prototxt
rzai@rzai00:~/prj/image_captioning$

Training Loss

Can you please share the training loss that you experienced at the end of your training? I am trying to train the model but not getting expected results. Thank you

expecting a more detailed guide to get start....the path issue ...

rzai@rzai00:/prj/image_captioning/examples$ python flickr8K/flickr8K_to_hdf5_data_forward.py
Traceback (most recent call last):
File "flickr8K/flickr8K_to_hdf5_data_forward.py", line 23, in
from coco import COCO
ImportError: No module named coco
rzai@rzai00:
/prj/image_captioning/examples$

Out of Memory

Can you please share the minimum hardware requirements for the code to run? I have Nvidia 1060 with 6GB RAM and it gives me out of memory exception "Check failed: error == cudaSuccess (2 vs. 0) out of memory".
Regards

multi_crop

Thanks for sharing the code!
But there was a question when I was training the model using “.sh” file. I aroused an error as "caffe.TransformationParameter" has no field named "multi_crop" in parsing the "multi_Bi_LSTM.prototxt", I wonder how you added the field "multi_crop. Is it because the version of CAFFE?

Protobuf Error

Hi !
While compiling the code, I get following error

AR -o .build_release/lib/libcaffe.a
LD -o .build_release/lib/libcaffe.so
CXX tools/extract_features.cpp
CXX/LD -o .build_release/tools/extract_features.bin
.build_release/tools/extract_features.o: In function int feature_extraction_pipeline<float>(int, char**)': extract_features.cpp:(.text._Z27feature_extraction_pipelineIfEiiPPc[_Z27feature_extraction_pipelineIfEiiPPc]+0xbf0): undefined reference to google::protobuf::internal::fixed_address_empty_string'
extract_features.cpp:(.text._Z27feature_extraction_pipelineIfEiiPPc[_Z27feature_extraction_pipelineIfEiiPPc]+0xf94): undefined reference to google::protobuf::MessageLite::SerializeToString(std::string*) const' .build_release/lib/libcaffe.so: undefined reference to google::protobuf::internal::WireFormatLite::WriteStringMaybeAliased(int, std::string const&, google::protobuf::io::CodedOutputStream*)'
.build_release/lib/libcaffe.so: undefined reference to google::protobuf::io::CodedOutputStream::WriteStringWithSizeToArray(std::string const&, unsigned char*)' .build_release/lib/libcaffe.so: undefined reference to google::protobuf::internal::AssignDescriptors(std::string const&, google::protobuf::internal::MigrationSchema const*, google::protobuf::Message const* const*, unsigned int const*, google::protobuf::MessageFactory*, google::protobuf::Metadata*, google::protobuf::EnumDescriptor const**, google::protobuf::ServiceDescriptor const**)'
.build_release/lib/libcaffe.so: undefined reference to google::protobuf::Message::GetTypeName() const' .build_release/lib/libcaffe.so: undefined reference to google::protobuf::MessageFactory::InternalRegisterGeneratedFile(char const*, void ()(std::string const&))'
.build_release/lib/libcaffe.so: undefined reference to google::protobuf::Message::DebugString() const' .build_release/lib/libcaffe.so: undefined reference to google::protobuf::internal::OnShutdownDestroyString(std::string const
)'
.build_release/lib/libcaffe.so: undefined reference to google::protobuf::internal::WireFormatLite::WriteBytesMaybeAliased(int, std::string const&, google::protobuf::io::CodedOutputStream*)' .build_release/lib/libcaffe.so: undefined reference to google::protobuf::MessageLite::ParseFromString(std::string const&)'
.build_release/lib/libcaffe.so: undefined reference to google::protobuf::internal::NameOfEnum(google::protobuf::EnumDescriptor const*, int)' .build_release/lib/libcaffe.so: undefined reference to google::protobuf::internal::WireFormatLite::WriteString(int, std::string const&, google::protobuf::io::CodedOutputStream*)'
.build_release/lib/libcaffe.so: undefined reference to google::protobuf::internal::WireFormatLite::ReadBytes(google::protobuf::io::CodedInputStream*, std::string*)' .build_release/lib/libcaffe.so: undefined reference to google::protobuf::Message::InitializationErrorString() const'
collect2: error: ld returned 1 exit status
make: *** [.build_release/tools/extract_features.bin] Error 1

I am installing on Ubuntu 14.04 using Anaconda. I have tried compiling with both protobuf installed through conda install and through apt-get but can't seem to get it to work.

Can you please help me out.

Just a quick question

How do you make sure the length of the sentences generated by forward propogation and backward propogation is the same.

build this caffe version failed but success to build the newest caffe on ubuntu16.04

mldl@mldlUB1604:/media/mldl/data1t/os_prj/image_captioning$ make
PROTOC src/caffe/proto/caffe.proto
CXX .build_release/src/caffe/proto/caffe.pb.cc
CXX src/caffe/blob.cpp
In file included from ./include/caffe/util/device_alternate.hpp:40:0,
from ./include/caffe/common.hpp:19,
from ./include/caffe/blob.hpp:8,
from src/caffe/blob.cpp:4:
./include/caffe/util/cudnn.hpp: In function ‘void caffe::cudnn::createPoolingDesc(cudnnPoolingStruct**, caffe::PoolingParameter_PoolMethod, cudnnPoolingMode_t*, int, int, int, int, int, int)’:
./include/caffe/util/cudnn.hpp:124:41: error: too few arguments to function ‘cudnnStatus_t cudnnSetPooling2dDescriptor(cudnnPoolingDescriptor_t, cudnnPoolingMode_t, cudnnNanPropagation_t, int, int, int, int, int, int)’
pad_h, pad_w, stride_h, stride_w));
^
./include/caffe/util/cudnn.hpp:12:28: note: in definition of macro ‘CUDNN_CHECK’
cudnnStatus_t status = condition;
^
In file included from ./include/caffe/util/cudnn.hpp:5:0,
from ./include/caffe/util/device_alternate.hpp:40,
from ./include/caffe/common.hpp:19,
from ./include/caffe/blob.hpp:8,
from src/caffe/blob.cpp:4:
/usr/local/cuda-8.0/include/cudnn.h:803:27: note: declared here
cudnnStatus_t CUDNNWINAPI cudnnSetPooling2dDescriptor(
^
Makefile:518: recipe for target '.build_release/src/caffe/blob.o' failed
make: *** [.build_release/src/caffe/blob.o] Error 1
mldl@mldlUB1604:/media/mldl/data1t/os_prj/image_captioning$

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.