jazzsaxmafia / show_and_tell.tensorflow Goto Github PK

View Code? Open in Web Editor NEW

290.0 23.0 94.0 52.17 MB

License: BSD 2-Clause "Simplified" License

Python 1.81% Jupyter Notebook 98.19%

show_and_tell.tensorflow's Introduction

Neural Caption Generator

Tensorflow implementation of "Show and Tell" http://arxiv.org/abs/1411.4555
Borrowed some code and ideas from Andrej Karpathy's NeuralTalk.
You need flickr30k data (images and annotations)

Code

make_flickr_dataset.py : Extracting feats of flickr30k images, and save them in './data/feats.npy'
model.py : TensorFlow Version

Usage

Flickr30k Dataset Download
Extract VGG Featues of Flicker30k images (make_flickr_dataset.py)
Train: run train() in model.py
Test: run test() or test_tf() in model.py
parameters: VGG FC7 feature of test image, trained model path
Once you download Tensorflow VGG Net (one of the links below), you don't need Caffe when testing.

Downloading data/trained model

Extraced FC7 data: download
This is used in train() function in model.py. You can skip feature extraction part by using this.
Pretrained model download
This is used in test() and test_tf() in model.py. If you do not have time for training, or if you just want to check out captioning, download and test the model.
Tensorflow VGG net download
This file is used in test_tf() in model.py
Along with the files above, you might want to download flickr30k annotation data from link

License

BSD license

show_and_tell.tensorflow's People

Contributors

Stargazers

Watchers

Forkers

geniuskch ssi379 nikogamulin hamedmp zhangkom bruceko sunsocool omar-florez wavelets weilamchung qiuyuew liujinxue qinggege athenspeterlong deshraj rajat1994 chenhsh sujinzhao xinyaoliu koosyong kentchun33333 chongwf ziyu-zhang yangwang166 jasonzliang katyprogrammer chengniu wenlihaoyu zmoon111 tony56a mememero21 lqiaohe apkrepo brickjava wangshouwen spiritdude yiqinggit shaoxuan92 avaisp peratham lihongqiang maidousi anna-jiang pursueorigin jay-ings iterryi youngjaekim0129 jslee82 parkminjae80 yangwonmo thu1911 congchu souravroy0708 zlzr200599 lxw4939 logonod cdathuraliya biranchi2018 pranoothatwar rodneymarumo ymao1993 doddaiah suejing jdc08161063 shubhampachori12110095 toxato yt9536 akash13singh hmhungictu lkeab ideallic zyj0021200 december-boy zzw1123 halimaz weizhangyue timlackhan aman229 ai3dvision stevenji mel1015 qingtaoxue afcarl walle499 nefujiangping johncomeon wuxunjin lusonpan62678 afalia0928 lightwithshadowteam weijace jennyli-xin alinahle123

show_and_tell.tensorflow's Issues

ImportError: This module is deprecated. Use tf.nn.rnn_cell instead.

use from tensorflow.python.ops import rnn_cell but
ValueError: setting an array element with a sequence.

If I want to calculate the BLEU score for the model, how should I do?

Hello! I really appreciate for your great code of NIC model. I've revised your code and have done an training process based on your code. I want to know how I can test BLEU score of the model to see how well it works compare to the state-of-art results? Is there any resources for computing the score or other scores like METER or perplexity?
Thanks a lot :)

How long does the process take

I'm just running the extracting features process, but i can't see any progress shown on screen under 'np.save(feat_path, feats)
ipdb>'
I know it's a huge cnn training, but how long will it take? Or how can I see the progress printed on screen

the n_lstm_steps problem

In the code, I found all the caption for the filtering operation, the frequency is less than a certain number, will not be counted, I found a place, I think this place is not very serious, at first we don't have any filter for the caption, take the longest as n_steps the length of the caption, we to take less than the length of the caption 0 extended length, but when we feed the data after filtering. That means even the longest caption means it means we need to fill it with zero. Any Suggestions?

tensorflow version？？？？？

my tensorflow version is 0.11 ,you?

Question about different results for the same picture

I downloaded the code and it worked very well, nice stuff.
Just one question, when testing a same picture I got different results from time to time.
Is that just because of the weights' float-point rounding ?

Thank you.

What mask variable used for?

Hello,

great job for implementing the paper and thanks!
However, I've got a question for the 'mask' variable?
What does it used for in LSTM? in the LSTM equations I do not see any related variable?

Thanks for your help!

test_tf: NameError: global name 'crop_image' is not defined

I'm new to Tensorflow and deep learning, just found your code and make a test run. But I got this error:
.......
File ".../show_and_tell.tensorflow-master/model.py", line 269, in read_image
img = crop_image(path, target_height=224, target_width=224)
NameError: global name 'crop_image' is not defined

Does this means lacking some library?
My environment:
Win 10 pro, tensorflow.0.10.0 and phython 2.7.6

Best regards,

ValueError: setting an array element with a sequence.

fc7_tf, generated_words_tf = caption_generator.build_generator(maxlen=maxlen) in ipython_demo f
state = tf.zeros([1, self.lstm.state_size]) in model.py
i am using python 2.7 and tf 1.3.0
i am getting this error and how i can resolve it. and will be able to run the code properly .

Where to get the annotation file?

When I train the model, I cannot find the file 'results_20130124.token', would you publish the file or tell me how to generate the file?

Thank you!

The Google drive links of trained models are broken

The Google drive links of trained models are broken. Could you please update these links! Anyway, THX!

Hi, a suggestion of shuffling data :)

Hi,

A suggestion of shuffling data in train():

index = np.arange(len(feats))
np.random.shuffle(index)
feats = feats[index]
captions = captions[index]

These lines should be executed every time after the epoch is finished.

Welcome to more discussion of tensorflow implementation!

Any plan to add an explicit software license?

It would be good to know how this is licensed for potential re-use

Error while reading VGG model in test_tf

Issue with ipython_demo aswell as test_tf

Having issues getting this implementation running. How downloaded the pretrained model and the tensorflow vgg net, the ixtoword dict, and given correct paths.

When attempting to run the Ipython demo aswell as the test_ft() in model I get the below error?

Anyone else encounter this or have a solution?
Thanks,
Joe

ValueError Traceback (most recent call last)
in ()
31
32
---> 33 fc7_tf, generated_words_tf = caption_generator.build_generator(maxlen=maxlength)
34
35 saver = tf.train.Saver()

/home/joe/edsproject/show_and_tell.tensorflow/model.pyc in build_generator(self, maxlen)
91 image_emb = tf.matmul(image, self.encode_img_W) + self.encode_img_b
92
---> 93 state = tf.zeros([1, self.lstm.state_size])
94 #last_word = image_emb # 첫 단어 대신 이미지
95 generated_words = []

/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/array_ops.pyc in zeros(shape, dtype, name)
1182 output = constant(zero, shape=shape, dtype=dtype, name=name)
1183 except (TypeError, ValueError):
-> 1184 shape = ops.convert_to_tensor(shape, dtype=dtypes.int32, name="shape")
1185 output = fill(shape, constant(zero, dtype=dtype), name=name)
1186 assert output.dtype.base_dtype == dtype

/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.pyc in convert_to_tensor(value, dtype, name, as_ref, preferred_dtype)
655
656 if ret is None:
--> 657 ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
658
659 if ret is NotImplemented:

/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/constant_op.pyc in _constant_tensor_conversion_function(v, dtype, name, as_ref)
178 as_ref=False):
179 _ = as_ref
--> 180 return constant(v, dtype=dtype, name=name)
181
182

/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/constant_op.pyc in constant(value, dtype, shape, name)
161 tensor_value = attr_value_pb2.AttrValue()
162 tensor_value.tensor.CopyFrom(
--> 163 tensor_util.make_tensor_proto(value, dtype=dtype, shape=shape))
164 dtype_value = attr_value_pb2.AttrValue(type=tensor_value.tensor.dtype)
165 const_tensor = g.create_op(

/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/tensor_util.pyc in make_tensor_proto(values, dtype, shape)
352 else:
353 _AssertCompatible(values, dtype)
--> 354 nparray = np.array(values, dtype=np_dt)
355 # check to them.
356 # We need to pass in quantized values as tuples, so don't apply the shape

ValueError: setting an array element with a sequence.

Questions

Here: if I understand your code correctly, you use the FC7 layer output of a pretrained VGG net as input to your model. However, your model has another trainable layer to compute the embedding from FC7. Is that correct? Can't you just use FC7 as the embedding layer?

KeyError: 'fc7' while running make_flickr_dataset.py

I had countered an error as something below:
Traceback (most recent call last):
File "make_flickr_dataset.py", line 23, in
feats = cnn.get_features(annotations['image'].values)
File "/home/lei/show_and_tell.tensorflow_lei/cnn_util.py", line 80, in get_features
out = self.net.forward_all(blobs=[layers], **{'data':caffe_in})
File "/home/lei/caffe/python/caffe/pycaffe.py", line 202, in _Net_forward_all
outs = self.forward(blobs=blobs, **batch)
File "/home/lei/caffe/python/caffe/pycaffe.py", line 134, in _Net_forward
return {out: self.blobs[out].data for out in outputs}
File "/home/lei/caffe/python/caffe/pycaffe.py", line 134, in
return {out: self.blobs[out].data for out in outputs}
KeyError: 'fc7'

How to fix it?

Vector serialization

Dear author,
I have a question what is LSTM's input should be serialized vector, and CNN's output vector without serialization, CNN output vector is how to serialize, then in the input to the LSTM, How is this serialized process implemented ?
I will appreciate it if you answer my question and I am looking forward to your early reply.
From: Kobe20

Code not compiling on latest TF Version 1.7

Hi,
I am trying to compile your code on the latest version of TF (1.7) but it throws this error:
"Traceback (most recent call last):
File "model.py", line 332, in
train() # Do not use pretrained model.
File "model.py", line 212, in train
loss, image, sentence, mask = caption_generator.build_model()
File "model.py", line 82, in build_model
output, state = self.lstm( current_emb, state ) # (batch_size, dim_hidden)
File "/anaconda/envs/python2/lib/python2.7/site-packages/tensorflow/python/ops/rnn_cell_impl.py", line 298, in call
*args, **kwargs)
File "/anaconda/envs/python2/lib/python2.7/site-packages/tensorflow/python/layers/base.py", line 714, in call
outputs = self.call(inputs, *args, **kwargs)
File "/anaconda/envs/python2/lib/python2.7/site-packages/tensorflow/python/ops/rnn_cell_impl.py", line 574, in call
c, h = state
File "/anaconda/envs/python2/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 400, in iter
"Tensor objects are not iterable when eager execution is not "
TypeError: Tensor objects are not iterable when eager execution is not enabled. To iterate over this tensor use tf.map_fn."
It looks like an error because the code was written in TF 0.1. Could you please change your code so that it runs on the latest version of TF?

Thanks
Rahul

Learning Cost is become stagnant after 300 iteration (epoch)

Hello,

First of all thank you for publishing this great code base. I have trained model upto 300 epoch but it looks like the learning cost is not reduced much and stayed around 1.5 to 1.6 . Did you land up this kind of situation . I am running this code on my laptop which has NVIDIA GPU (920m ) and took long time to complete a single epoch ( 100 epoch took near about 40 hours) . Need your suggestion whether should I continue learning until 1000 epoch or any trick would required to process further iteration .
Please note that after 300 iteration model is still not able generate proper caption from any random image( Facebook, Instagram etc) . I believe this error should reduced further to get a proper model.

Thanks in advance.

Regards,
Dipanjan

Loss stays at around 5 when I do the training on the same dataset

I use the code in model.py just to train the same Flickr30K dataset.
The loss does not improve even after 50 epochs and stays at around 5.
using SGD optimizer.
For how many epochs shoud I run to reduce the loss ??

error when I run the test_tf

ImportError: This module is deprecated. Use tf.nn.rnn_cell instead.

Error in model-72

Hi,

There's another error. "NotFoundError (see above for traceback): Tensor name "RNN/basic_lstm_cell/kernel" not found in checkpoint files ./models/tensorflow/model-72"
Please check this

Thanks
Rahul

Where is dict?

i've download the pretrained model, But i cannot found ./data/ixtoword.npy, would u publish the dict file?,3ks.

How can i just check this model ?

I download link below all,
Extraced FC7 data: download

Pretrained model download

Tensorflow VGG net download

I just want to test , so i didn't install caffe .
When i run ipython_demo.ipynb for test , i face many error ( ex, import caffe .... )

If i just want to test ( not training ) , How can i test model

Thanks and Sorry for not good at English

Is there any implement from google itself?

Thank you for providing the tensorflow implement of the image caption generator. I'm wondering if there is an official code from google itself which can be found?