deepsemantic / image_captioning Goto Github PK

Image Captioning with Deep Bidirectional LSTMs

License: Other

CMake 1.74% Makefile 0.41% C++ 53.07% Cuda 3.03% MATLAB 0.58% Python 8.49% Shell 0.25% Jupyter Notebook 32.23% Cython 0.20%

image_captioning's Introduction

Image Captioning with Deep Bidirectional LSTMs

This branch hosts the code for our paper accepted at ACMMM 2016 "Image Captioning with Deep Bidirectional LSTMs", to see Demonstration.

Features

Training with Bidirectional LSTMs
Implemented data augmentation: multi-crops, multi-scale, vectical mirroring
Variant Bidirectional LSTMs: Bi-F-LSTM, Bi-S-LSTM

Usage and Example

This work extends "Long-term Recurrent Convolutional Networks (LRCN)" to bidirectional LSTMs with data augmentation
We provide an example flickr8K, in which you can train proposed networks
(1) download flickr8 training and test images, and put it to "data/flickr8K/images/", the dataset splits can be found in "data/flickr8K/texts/"
(2) create databases with "flickr8K_to_hdf5_data_forward.py" and "flickr8K_to_hdf5_data_backward.py"
(3) train network with "multi_train_Bi_LSTM.sh"
(4) perform image caption generation and image-sentence retrieval experiments with "bi_generation_retrieval.py"

Citation

Please cite in your publications if it helps your research:
@inproceedings{wang2016image,
title={Image captioning with deep bidirectional LSTMs},
author={Wang, Cheng and Yang, Haojin and Bartz, Christian and Meinel, Christoph},
booktitle={Proceedings of the 2016 ACM on Multimedia Conference},
pages={988--997},
year={2016},
organization={ACM}}

Following is orginal README of Caffe

Caffe

Caffe is a deep learning framework made with expression, speed, and modularity in mind. It is developed by the Berkeley Vision and Learning Center (BVLC) and community contributors.

Check out the project site for all the details like

and step-by-step examples.

Please join the caffe-users group or gitter chat to ask questions and talk about methods and models. Framework development discussions and thorough bug reports are collected on Issues.

Happy brewing!

License and Citation

Caffe is released under the BSD 2-Clause license. The BVLC reference models are released for unrestricted use.

Please cite Caffe in your publications if it helps your research:

@article{jia2014caffe,
  Author = {Jia, Yangqing and Shelhamer, Evan and Donahue, Jeff and Karayev, Sergey and Long, Jonathan and Girshick, Ross and Guadarrama, Sergio and Darrell, Trevor},
  Journal = {arXiv preprint arXiv:1408.5093},
  Title = {Caffe: Convolutional Architecture for Fast Feature Embedding},
  Year = {2014}
}

image_captioning's People

Contributors

Stargazers

Watchers

image_captioning's Issues

might I need addtional something?

tools/mask.py", line 3, in
import pycocotools._mask as _mask
ImportError: No module named pycocotools._mask
hengliu@Armari1:/storage/hengliu/image_captionin
do I need to install some additional thing to make flickr8K_to_hdf5_data_forward.py to work?
Hope you can help me quickly!

Message type "caffe.TransformationParameter" has no field named "multi_crop".

rzai@rzai00:/prj/image_captioning$ bash examples/flickr8K/multi_train_Bi_LSTM.sh
I1203 11:15:46.983738 28061 caffe.cpp:217] Using GPUs 0
I1203 11:15:47.008774 28061 caffe.cpp:222] GPU 0: GeForce GTX 1080
I1203 11:15:48.340289 28061 solver.cpp:48] Initializing solver from parameters:
test_iter: 50
test_interval: 2000
base_lr: 0.01
display: 50
max_iter: 25000
lr_policy: "step"
gamma: 0.1
momentum: 0.9
weight_decay: 0.0005
stepsize: 10000
snapshot: 1000
snapshot_prefix: "./examples/flickr8K/multi_Bi_LSTM_trained_models/multi_Bi_LSTM"
solver_mode: GPU
device_id: 0
random_seed: 1701
net: "./examples/flickr8K/multi_Bi_LSTM.prototxt"
train_state {
level: 0
stage: ""
}
average_loss: 100
clip_gradients: 10
I1203 11:15:48.340517 28061 solver.cpp:91] Creating training net from net file: ./examples/flickr8K/multi_Bi_LSTM.prototxt
[libprotobuf ERROR google/protobuf/text_format.cc:245] Error parsing text-format caffe.NetParameter: 24:15: Message type "caffe.TransformationParameter" has no field named "multi_crop".
F1203 11:15:48.340638 28061 upgrade_proto.cpp:88] Check failed: ReadProtoFromTextFile(param_file, param) Failed to parse NetParameter file: ./examples/flickr8K/multi_Bi_LSTM.prototxt
*** Check failure stack trace: ***
@ 0x7f28b9064daa (unknown)
@ 0x7f28b9064ce4 (unknown)
@ 0x7f28b90646e6 (unknown)
@ 0x7f28b9067687 (unknown)
@ 0x7f28b97a8f2e caffe::ReadNetParamsFromTextFileOrDie()
@ 0x7f28b9798bcb caffe::Solver<>::InitTrainNet()
@ 0x7f28b9799c9c caffe::Solver<>::Init()
@ 0x7f28b9799fca caffe::Solver<>::Solver()
@ 0x7f28b9683253 caffe::Creator_SGDSolver<>()
@ 0x40f0fe caffe::SolverRegistry<>::CreateSolver()
@ 0x408134 train()
@ 0x405b3c main
@ 0x7f28b8070f45 (unknown)
@ 0x4063ab (unknown)
@ (nil) (unknown)
examples/flickr8K/multi_train_Bi_LSTM.sh: line 10: 28061 Aborted (core dumped) $CAFFE/build/tools/caffe train -solver ./examples/flickr8K/multi_Bi_LSTM_solver.prototxt -weights $WEIGHTS -gpu $GPU_ID
rzai@rzai00:/prj/image_captioning$ ll examples/flickr8K/multi_Bi_LSTM.prototxt
-rw-rw-r-- 1 rzai rzai 10634 10月 24 09:39 examples/flickr8K/multi_Bi_LSTM.prototxt
rzai@rzai00:~/prj/image_captioning$

Training Loss

Can you please share the training loss that you experienced at the end of your training? I am trying to train the model but not getting expected results. Thank you

expecting a more detailed guide to get start....the path issue ...

rzai@rzai00:/prj/image_captioning/examples$ python flickr8K/flickr8K_to_hdf5_data_forward.py
Traceback (most recent call last):
File "flickr8K/flickr8K_to_hdf5_data_forward.py", line 23, in
from coco import COCO
ImportError: No module named coco
rzai@rzai00:/prj/image_captioning/examples$

Out of Memory

Can you please share the minimum hardware requirements for the code to run? I have Nvidia 1060 with 6GB RAM and it gives me out of memory exception "Check failed: error == cudaSuccess (2 vs. 0) out of memory".
Regards

multi_crop

Thanks for sharing the code!
But there was a question when I was training the model using “.sh” file. I aroused an error as "caffe.TransformationParameter" has no field named "multi_crop" in parsing the "multi_Bi_LSTM.prototxt", I wonder how you added the field "multi_crop. Is it because the version of CAFFE?

is there an existing script to splet flickr8 to train/val/test

Protobuf Error

Hi !
While compiling the code, I get following error

AR -o .build_release/lib/libcaffe.a
LD -o .build_release/lib/libcaffe.so
CXX tools/extract_features.cpp
CXX/LD -o .build_release/tools/extract_features.bin
.build_release/tools/extract_features.o: In function int feature_extraction_pipeline<float>(int, char**)': extract_features.cpp:(.text._Z27feature_extraction_pipelineIfEiiPPc[_Z27feature_extraction_pipelineIfEiiPPc]+0xbf0): undefined reference to google::protobuf::internal::fixed_address_empty_string'
extract_features.cpp:(.text._Z27feature_extraction_pipelineIfEiiPPc[_Z27feature_extraction_pipelineIfEiiPPc]+0xf94): undefined reference to google::protobuf::MessageLite::SerializeToString(std::string*) const' .build_release/lib/libcaffe.so: undefined reference to google::protobuf::internal::WireFormatLite::WriteStringMaybeAliased(int, std::string const&, google::protobuf::io::CodedOutputStream*)'
.build_release/lib/libcaffe.so: undefined reference to google::protobuf::io::CodedOutputStream::WriteStringWithSizeToArray(std::string const&, unsigned char*)' .build_release/lib/libcaffe.so: undefined reference to google::protobuf::internal::AssignDescriptors(std::string const&, google::protobuf::internal::MigrationSchema const*, google::protobuf::Message const* const*, unsigned int const*, google::protobuf::MessageFactory*, google::protobuf::Metadata*, google::protobuf::EnumDescriptor const**, google::protobuf::ServiceDescriptor const**)'
.build_release/lib/libcaffe.so: undefined reference to google::protobuf::Message::GetTypeName() const' .build_release/lib/libcaffe.so: undefined reference to google::protobuf::MessageFactory::InternalRegisterGeneratedFile(char const*, void ()(std::string const&))'
.build_release/lib/libcaffe.so: undefined reference to google::protobuf::Message::DebugString() const' .build_release/lib/libcaffe.so: undefined reference to google::protobuf::internal::OnShutdownDestroyString(std::string const)'
.build_release/lib/libcaffe.so: undefined reference to google::protobuf::internal::WireFormatLite::WriteBytesMaybeAliased(int, std::string const&, google::protobuf::io::CodedOutputStream*)' .build_release/lib/libcaffe.so: undefined reference to google::protobuf::MessageLite::ParseFromString(std::string const&)'
.build_release/lib/libcaffe.so: undefined reference to google::protobuf::internal::NameOfEnum(google::protobuf::EnumDescriptor const*, int)' .build_release/lib/libcaffe.so: undefined reference to google::protobuf::internal::WireFormatLite::WriteString(int, std::string const&, google::protobuf::io::CodedOutputStream*)'
.build_release/lib/libcaffe.so: undefined reference to google::protobuf::internal::WireFormatLite::ReadBytes(google::protobuf::io::CodedInputStream*, std::string*)' .build_release/lib/libcaffe.so: undefined reference to google::protobuf::Message::InitializationErrorString() const'
collect2: error: ld returned 1 exit status
make: *** [.build_release/tools/extract_features.bin] Error 1

I am installing on Ubuntu 14.04 using Anaconda. I have tried compiling with both protobuf installed through conda install and through apt-get but can't seem to get it to work.

Can you please help me out.

Just a quick question

How do you make sure the length of the sentences generated by forward propogation and backward propogation is the same.

build this caffe version failed but success to build the newest caffe on ubuntu16.04

mldl@mldlUB1604:/media/mldl/data1t/os_prj/image_captioning$ make
PROTOC src/caffe/proto/caffe.proto
CXX .build_release/src/caffe/proto/caffe.pb.cc
CXX src/caffe/blob.cpp
In file included from ./include/caffe/util/device_alternate.hpp:40:0,
from ./include/caffe/common.hpp:19,
from ./include/caffe/blob.hpp:8,
from src/caffe/blob.cpp:4:
./include/caffe/util/cudnn.hpp: In function ‘void caffe::cudnn::createPoolingDesc(cudnnPoolingStruct**, caffe::PoolingParameter_PoolMethod, cudnnPoolingMode_t*, int, int, int, int, int, int)’:
./include/caffe/util/cudnn.hpp:124:41: error: too few arguments to function ‘cudnnStatus_t cudnnSetPooling2dDescriptor(cudnnPoolingDescriptor_t, cudnnPoolingMode_t, cudnnNanPropagation_t, int, int, int, int, int, int)’
pad_h, pad_w, stride_h, stride_w));
^
./include/caffe/util/cudnn.hpp:12:28: note: in definition of macro ‘CUDNN_CHECK’
cudnnStatus_t status = condition;
^
In file included from ./include/caffe/util/cudnn.hpp:5:0,
from ./include/caffe/util/device_alternate.hpp:40,
from ./include/caffe/common.hpp:19,
from ./include/caffe/blob.hpp:8,
from src/caffe/blob.cpp:4:
/usr/local/cuda-8.0/include/cudnn.h:803:27: note: declared here
cudnnStatus_t CUDNNWINAPI cudnnSetPooling2dDescriptor(
^
Makefile:518: recipe for target '.build_release/src/caffe/blob.o' failed
make: *** [.build_release/src/caffe/blob.o] Error 1
mldl@mldlUB1604:/media/mldl/data1t/os_prj/image_captioning$

deepsemantic / image_captioning Goto Github PK

image_captioning's Introduction

Image Captioning with Deep Bidirectional LSTMs

Features

Usage and Example

Citation

Caffe

License and Citation

image_captioning's People

Contributors

Stargazers

Watchers

Forkers

image_captioning's Issues

Recommend Projects

Recommend Topics

Recommend Org