Giter Club home page Giter Club logo

show_and_tell.tensorflow's Introduction

Neural Caption Generator

  • Tensorflow implementation of "Show and Tell" http://arxiv.org/abs/1411.4555
  • Borrowed some code and ideas from Andrej Karpathy's NeuralTalk.
  • You need flickr30k data (images and annotations)

Code

  • make_flickr_dataset.py : Extracting feats of flickr30k images, and save them in './data/feats.npy'
  • model.py : TensorFlow Version

Usage

  • Flickr30k Dataset Download
  • Extract VGG Featues of Flicker30k images (make_flickr_dataset.py)
  • Train: run train() in model.py
  • Test: run test() or test_tf() in model.py
  • parameters: VGG FC7 feature of test image, trained model path
  • Once you download Tensorflow VGG Net (one of the links below), you don't need Caffe when testing.

Downloading data/trained model

  • Extraced FC7 data: download
  • This is used in train() function in model.py. You can skip feature extraction part by using this.
  • Pretrained model download
  • This is used in test() and test_tf() in model.py. If you do not have time for training, or if you just want to check out captioning, download and test the model.
  • Tensorflow VGG net download
  • This file is used in test_tf() in model.py
  • Along with the files above, you might want to download flickr30k annotation data from link

alt tag

License

  • BSD license

show_and_tell.tensorflow's People

Contributors

jazzsaxmafia avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

show_and_tell.tensorflow's Issues

If I want to calculate the BLEU score for the model, how should I do?

Hello! I really appreciate for your great code of NIC model. I've revised your code and have done an training process based on your code. I want to know how I can test BLEU score of the model to see how well it works compare to the state-of-art results? Is there any resources for computing the score or other scores like METER or perplexity?
Thanks a lot :)

How long does the process take

I'm just running the extracting features process, but i can't see any progress shown on screen under 'np.save(feat_path, feats)
ipdb>'
I know it's a huge cnn training, but how long will it take? Or how can I see the progress printed on screen

the n_lstm_steps problem

In the code, I found all the caption for the filtering operation, the frequency is less than a certain number, will not be counted, I found a place, I think this place is not very serious, at first we don't have any filter for the caption, take the longest as n_steps the length of the caption, we to take less than the length of the caption 0 extended length, but when we feed the data after filtering. That means even the longest caption means it means we need to fill it with zero. Any Suggestions?

Question about different results for the same picture

Hi

I downloaded the code and it worked very well, nice stuff.
Just one question, when testing a same picture I got different results from time to time.
Is that just because of the weights' float-point rounding ?

Thank you.

What mask variable used for?

Hello,

great job for implementing the paper and thanks!
However, I've got a question for the 'mask' variable?
What does it used for in LSTM? in the LSTM equations I do not see any related variable?

Thanks for your help!

test_tf: NameError: global name 'crop_image' is not defined

I'm new to Tensorflow and deep learning, just found your code and make a test run. But I got this error:
.......
File ".../show_and_tell.tensorflow-master/model.py", line 269, in read_image
img = crop_image(path, target_height=224, target_width=224)
NameError: global name 'crop_image' is not defined

Does this means lacking some library?
My environment:
Win 10 pro, tensorflow.0.10.0 and phython 2.7.6

Best regards,

ValueError: setting an array element with a sequence.

fc7_tf, generated_words_tf = caption_generator.build_generator(maxlen=maxlen) in ipython_demo f
state = tf.zeros([1, self.lstm.state_size]) in model.py
i am using python 2.7 and tf 1.3.0
i am getting this error and how i can resolve it. and will be able to run the code properly .

Where to get the annotation file?

When I train the model, I cannot find the file 'results_20130124.token', would you publish the file or tell me how to generate the file?

Thank you!

Hi, a suggestion of shuffling data :)

Hi,

A suggestion of shuffling data in train():

index = np.arange(len(feats))
np.random.shuffle(index)
feats = feats[index]
captions = captions[index]

These lines should be executed every time after the epoch is finished.

Welcome to more discussion of tensorflow implementation!

Issue with ipython_demo aswell as test_tf

Having issues getting this implementation running. How downloaded the pretrained model and the tensorflow vgg net, the ixtoword dict, and given correct paths.

When attempting to run the Ipython demo aswell as the test_ft() in model I get the below error?

Anyone else encounter this or have a solution?
Thanks,
Joe

ValueError Traceback (most recent call last)
in ()
31
32
---> 33 fc7_tf, generated_words_tf = caption_generator.build_generator(maxlen=maxlength)
34
35 saver = tf.train.Saver()

/home/joe/edsproject/show_and_tell.tensorflow/model.pyc in build_generator(self, maxlen)
91 image_emb = tf.matmul(image, self.encode_img_W) + self.encode_img_b
92
---> 93 state = tf.zeros([1, self.lstm.state_size])
94 #last_word = image_emb # 첫 단어 대신 이미지
95 generated_words = []

/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/array_ops.pyc in zeros(shape, dtype, name)
1182 output = constant(zero, shape=shape, dtype=dtype, name=name)
1183 except (TypeError, ValueError):
-> 1184 shape = ops.convert_to_tensor(shape, dtype=dtypes.int32, name="shape")
1185 output = fill(shape, constant(zero, dtype=dtype), name=name)
1186 assert output.dtype.base_dtype == dtype

/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.pyc in convert_to_tensor(value, dtype, name, as_ref, preferred_dtype)
655
656 if ret is None:
--> 657 ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
658
659 if ret is NotImplemented:

/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/constant_op.pyc in _constant_tensor_conversion_function(v, dtype, name, as_ref)
178 as_ref=False):
179 _ = as_ref
--> 180 return constant(v, dtype=dtype, name=name)
181
182

/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/constant_op.pyc in constant(value, dtype, shape, name)
161 tensor_value = attr_value_pb2.AttrValue()
162 tensor_value.tensor.CopyFrom(
--> 163 tensor_util.make_tensor_proto(value, dtype=dtype, shape=shape))
164 dtype_value = attr_value_pb2.AttrValue(type=tensor_value.tensor.dtype)
165 const_tensor = g.create_op(

/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/tensor_util.pyc in make_tensor_proto(values, dtype, shape)
352 else:
353 _AssertCompatible(values, dtype)
--> 354 nparray = np.array(values, dtype=np_dt)
355 # check to them.
356 # We need to pass in quantized values as tuples, so don't apply the shape

ValueError: setting an array element with a sequence.

Questions

Here: if I understand your code correctly, you use the FC7 layer output of a pretrained VGG net as input to your model. However, your model has another trainable layer to compute the embedding from FC7. Is that correct? Can't you just use FC7 as the embedding layer?

KeyError: 'fc7' while running make_flickr_dataset.py

I had countered an error as something below:
Traceback (most recent call last):
File "make_flickr_dataset.py", line 23, in
feats = cnn.get_features(annotations['image'].values)
File "/home/lei/show_and_tell.tensorflow_lei/cnn_util.py", line 80, in get_features
out = self.net.forward_all(blobs=[layers], **{'data':caffe_in})
File "/home/lei/caffe/python/caffe/pycaffe.py", line 202, in _Net_forward_all
outs = self.forward(blobs=blobs, **batch)
File "/home/lei/caffe/python/caffe/pycaffe.py", line 134, in _Net_forward
return {out: self.blobs[out].data for out in outputs}
File "/home/lei/caffe/python/caffe/pycaffe.py", line 134, in
return {out: self.blobs[out].data for out in outputs}
KeyError: 'fc7'

How to fix it?

Vector serialization

Dear author,
I have a question what is LSTM's input should be serialized vector, and CNN's output vector without serialization, CNN output vector is how to serialize, then in the input to the LSTM, How is this serialized process implemented ?
I will appreciate it if you answer my question and I am looking forward to your early reply.
From: Kobe20

Code not compiling on latest TF Version 1.7

Hi,
I am trying to compile your code on the latest version of TF (1.7) but it throws this error:
"Traceback (most recent call last):
File "model.py", line 332, in
train() # Do not use pretrained model.
File "model.py", line 212, in train
loss, image, sentence, mask = caption_generator.build_model()
File "model.py", line 82, in build_model
output, state = self.lstm( current_emb, state ) # (batch_size, dim_hidden)
File "/anaconda/envs/python2/lib/python2.7/site-packages/tensorflow/python/ops/rnn_cell_impl.py", line 298, in call
*args, **kwargs)
File "/anaconda/envs/python2/lib/python2.7/site-packages/tensorflow/python/layers/base.py", line 714, in call
outputs = self.call(inputs, *args, **kwargs)
File "/anaconda/envs/python2/lib/python2.7/site-packages/tensorflow/python/ops/rnn_cell_impl.py", line 574, in call
c, h = state
File "/anaconda/envs/python2/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 400, in iter
"Tensor objects are not iterable when eager execution is not "
TypeError: Tensor objects are not iterable when eager execution is not enabled. To iterate over this tensor use tf.map_fn."
It looks like an error because the code was written in TF 0.1. Could you please change your code so that it runs on the latest version of TF?

Thanks
Rahul

Learning Cost is become stagnant after 300 iteration (epoch)

Hello,

First of all thank you for publishing this great code base. I have trained model upto 300 epoch but it looks like the learning cost is not reduced much and stayed around 1.5 to 1.6 . Did you land up this kind of situation . I am running this code on my laptop which has NVIDIA GPU (920m ) and took long time to complete a single epoch ( 100 epoch took near about 40 hours) . Need your suggestion whether should I continue learning until 1000 epoch or any trick would required to process further iteration .
Please note that after 300 iteration model is still not able generate proper caption from any random image( Facebook, Instagram etc) . I believe this error should reduced further to get a proper model.

Thanks in advance.

Regards,
Dipanjan

Error in model-72

Hi,

There's another error. "NotFoundError (see above for traceback): Tensor name "RNN/basic_lstm_cell/kernel" not found in checkpoint files ./models/tensorflow/model-72"
Please check this

Thanks
Rahul

Where is dict?

i've download the pretrained model, But i cannot found ./data/ixtoword.npy, would u publish the dict file?,3ks.

How can i just check this model ?

I download link below all,
Extraced FC7 data: download

Pretrained model download

Tensorflow VGG net download

I just want to test , so i didn't install caffe .
When i run ipython_demo.ipynb for test , i face many error ( ex, import caffe .... )

If i just want to test ( not training ) , How can i test model

Thanks and Sorry for not good at English

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.