karpathy / neuraltalk2 Goto Github PK

View Code? Open in Web Editor NEW

5.5K 5.5K 1.3K 370 KB

Efficient Image Captioning code in Torch, runs on GPU

Lua 30.47% Shell 0.20% Python 4.26% HTML 0.70% Jupyter Notebook 64.37%

neuraltalk2's Introduction

I like deep neural nets.

neuraltalk2's People

Contributors

Stargazers

Watchers

Forkers

ml-lab peratham lisabug milesqli snazz2001 aminyl zchengquan robertogemartin paulhendricks benjamesbabala yiiwood deepcompute giserh nguyenductung hson648 bipul21 seragentp yin-shane-xia brando3141 ctozlm yzli jmrinaldi ezhangle caomw aihgf satwantkumar acourtney2015 mkolod claudiouzelac ilovejs exercises sanchitaggarwal ghosthamlet ldbk kustomzone alexeyspiridonov blixt syrenity zdx3578 zerkh selimam anilcs13m masptj rszeto gragtah justfathi ersinpw mmasson01 igorpavlovic verrol drjova fmassa negashev santaklouse mtal rawmx isnyaga badmazafaka neraloth jots rtweiss takeshineshiro carolusian jkryanchou hungrxyz mbijon falkirks fedorajzf udaychettiar ipv1337 klayn24 herenow atc-it noscripter calinrada adrian907 nikolaiivanov fib1123 handsomeko ecneladis sanuj wojohowitz00 kevinwenya takaf51 kod3r cvf55 rcolomina golv1974 makeideashappen sunsocool ycfx jacopofar takhs91 socibo hudvin littlecherry11 mithul qnix zkiran jiangwaniot

neuraltalk2's Issues

size mismatch, m1: [1 x 512], m2: [25088 x 4096]

When running code on image:

http://cs.stanford.edu/people/karpathy/neuraltalk2/imgs/img1.jpg

(from demo website)
with a CPU-pretrained model from

http://cs.stanford.edu/people/karpathy/neuraltalk2/checkpoint_v1_cpu.zip

the following error shows up.

th eval.lua -gpuid -1 -model model_id1-501-1448236541.t7_cpu.t7 -image_folder img/ -num_images -1
DataLoaderRaw loading images from folder:   img/    
listing all images in directory img/    
Image added: img/img1.jpg   
DataLoaderRaw found 1 images    
constructing clones inside the LanguageModel    
/Users/tt/torch/install/bin/luajit: /Users/tt/torch/install/share/lua/5.1/nn/Linear.lua:53: size mismatch, m1: [1 x 512], m2: [25088 x 4096] at /tmp/luarocks_torch-scm-1-8011/torch7/lib/TH/generic/THTensorMath.c:706
stack traceback:
    [C]: in function 'addmm'
    /Users/tt/torch/install/share/lua/5.1/nn/Linear.lua:53: in function 'updateOutput'
    /Users/tt/torch/install/share/lua/5.1/nn/Sequential.lua:29: in function 'forward'
    eval.lua:121: in function 'eval_split'
    eval.lua:173: in main chunk
    [C]: in function 'dofile'
    ...s/tt/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:131: in main chunk
    [C]: at 0x0102587340

I'm on Mac OS X 10.10.5.

It seems that there is some mismatch between the training weights and output propagated through the net.

Checkpoint crash

Hi Andrej,

I'm trying to use NeuralTalk2 on my cpu (and so with the cpu checkpoint) using the following command:
th eval.lua -model model_id1-501-1448236541.t7_cpu.t7 -image_folder ../frames/ -num_images 10 -gpuid -1

It throws the following error:
ldb must be >= MAX(K,1): ldb=0 K=768BLAS error: Parameter number 11 passed to cblas_sgemm had an invalid value

On backtracing, I've narrowed down the source of the error to this line in eval.lua (line 134):
local seq = protos.lm:sample(feats, sample_opts)

Any pointers why this might be crashing?

Thanks a lot!

Dockerfile & image available for those interested

I have created a Dockerfile for amd64 (and working on the arm version). It's available on https://github.com/SaMnCo/docker-neuraltalk2
It's really early stage and only does the captioning, but if I see interest I'll add features.

The image is available on the Docker Hub (or will be once it's built) on https://hub.docker.com/r/samnco/neuraltalk2/

Hope you'll like it, thanks again for this amazing piece of work :)

Error while running eval.lua

While evaluating images,

th eval.lua -model model_id1-501-1448236541.t7 -image_folder images/ -num_images -1

I get this error,

/home/ubuntu/torch-distro/install/bin/luajit: ...ubuntu/torch-distro/install/share/lua/5.1/torch/File.lua:277: unknown object
stack traceback:
    [C]: in function 'error'
    ...ubuntu/torch-distro/install/share/lua/5.1/torch/File.lua:277: in function 'readObject'
    ...ubuntu/torch-distro/install/share/lua/5.1/torch/File.lua:271: in function 'readObject'
    ...ubuntu/torch-distro/install/share/lua/5.1/torch/File.lua:257: in function 'readObject'
    ...ubuntu/torch-distro/install/share/lua/5.1/torch/File.lua:271: in function 'readObject'
    ...ubuntu/torch-distro/install/share/lua/5.1/torch/File.lua:271: in function 'readObject'
    ...ubuntu/torch-distro/install/share/lua/5.1/torch/File.lua:257: in function 'readObject'
    ...ubuntu/torch-distro/install/share/lua/5.1/torch/File.lua:271: in function 'readObject'
    ...ubuntu/torch-distro/install/share/lua/5.1/torch/File.lua:271: in function 'readObject'
    ...ubuntu/torch-distro/install/share/lua/5.1/torch/File.lua:294: in function 'load'
    eval.lua:68: in main chunk
    [C]: in function 'dofile'
    ...rch-distro/install/lib/luarocks/rocks/trepl/scm-1/bin/th:131: in main chunk
    [C]: at 0x00406670

Have little clue how to solve this. Any help would be great!

COCO images and Torch's image reader

I tried to run your pretrained model on COCO validation set. It didn't work and I figured out that some images in COCO are png, although they have .jpg extension. This confuses Torch's image reader: not a JPEG file. OpenCV doesn't have this problem because it detects the image format using the header, not the filename.

Did you encounter this when working on COCO data? If so, what processing did you do?

Getting error on CPU only mode

I followed all the tutorial for installing all the dependeces, I can´t install cutorch and cunn because I don´t have an nvidia card. I´m on OSX 10.11.1.

I downloaded the model and tried to use it like this:

th eval.lua -gpuid -1 -model model_id1-501-1448236541.t7_cpu.t7

But I get the following error:

DataLoader loading json file:   /scr/r6/karpathy/cocotalk.json
/Users/miguel/torch/install/bin/luajit: ./misc/utils.lua:17: attempt to index local 'file' (a nil value)
stack traceback:
    ./misc/utils.lua:17: in function 'read_json'
    ./misc/DataLoader.lua:10: in function '__init'
    /Users/miguel/torch/install/share/lua/5.1/torch/init.lua:91: in function </Users/miguel/torch/install/share/lua/5.1/torch/init.lua:87>
    [C]: in function 'DataLoader'
    eval.lua:84: in main chunk
    [C]: in function 'dofile'
    ...guel/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
    [C]: at 0x01035bdbd0

What can I do?

The same as "attempt to concatenate local 'ext' (a nil value)" #11

Hi, I have completely the same issue with this even though I am running the latest resources that this issue has been patched. (I confirm #12 as well and there is no problem with my resources "DataLoaderRaw.lua".)
OSX 10.9.5, Mac book pro Retina, 15-inch, Late 2013 (no GPU). It seems caused by the invisible files though, there is no invisible files in the directory of /img/ where the objective images are stored.

I have been trying to run "neuraltalk2" somehow and it seems to be getting close, so please somebody points out what is the cause of this problem.

[The error is the below (exactly the same though)]

$ th eval.lua -model /Users/usrname/neuraltalk2/model_id1-501-1448236541.t7_cpu.t7 -image_folder /Users/usrname/neuraltalk2/img/ -num_images 10 -gpuid -1
DataLoaderRaw loading images from folder: /Users/usrname/neuraltalk2/img/
listing all images in directory /Users/usrname/neuraltalk2/img/
DataLoaderRaw found 14 images
constructing clones inside the LanguageModel
/Users/usrname/torch/install/bin/luajit: /Users/usrname/torch/install/share/lua/5.1/image/init.lua:346: attempt to concatenate local 'ext' (a nil value)
stack traceback:
/Users/usrname/torch/install/share/lua/5.1/image/init.lua:346: in function 'load'
./misc/DataLoaderRaw.lua:82: in function 'getBatch'
eval.lua:116: in function 'eval_split'
eval.lua:173: in main chunk
[C]: in function 'dofile'
...s/usrname/torch/install/lib/luarocusrname/rocusrname/trepl/scm-1/bin/th:131: in main chunk
[C]: at 0x010e9f77b0

[/img/ ls -la]
$ ls -la
total 2088
drwxr-xr-x 14 usrname staff 476 12 5 16:38 .
drwxr-xr-x 19 usrname staff 646 12 5 16:38 ..
-rw-r--r--@ 1 usrname staff 102769 12 4 23:01 1.jpg
-rw-r--r--@ 1 usrname staff 112326 12 4 22:58 10.jpg
-rw-r--r--@ 1 usrname staff 158805 12 4 22:57 11.jpg
-rw-r--r--@ 1 usrname staff 29750 12 4 22:58 12.jpg
-rw-r--r--@ 1 usrname staff 47949 12 4 22:57 2.jpg
-rw-r--r--@ 1 usrname staff 52914 12 4 22:58 3.jpg
-rw-r--r--@ 1 usrname staff 35022 12 4 22:57 4.jpg
-rw-r--r--@ 1 usrname staff 141824 12 4 22:56 5.jpg
-rw-r--r--@ 1 usrname staff 128698 12 4 22:56 6.jpg
-rw-r--r--@ 1 usrname staff 185112 12 4 22:59 7.jpg
-rw-r--r--@ 1 usrname staff 29393 12 4 22:59 8.jpg
-rw-r--r--@ 1 usrname staff 18712 12 4 22:55 9.jpg

System took control over my computer, and wants world domination

Should I reboot it?

Error when processing training images - No such file or directory IOError

Hi.

I have installed all the dependencies. I was trying to run prepro.py as mentioned in the documentation. I ran into an issue which I believe is different from issue #4 mentioned in the documentation.

Here are the contents of my coco folder after running the ipython tutorial:

>>> ls coco/*
coco/captions_train-val2014.zip  coco/coco_preprocess.ipynb  coco/coco_raw.json  coco/cocotalk.h5

coco/annotations:
captions_train2014.json  captions_val2014.json

coco/images:
captions_train2014.json  captions_val2014.json

When I run prepro.py I get the following error:

parsed input parameters:
{
  "output_json": "coco/cocotalk.json", 
  "images_root": "coco/images", 
  "input_json": "coco/coco_raw.json", 
  "word_count_threshold": 5, 
  "max_length": 16, 
  "output_h5": "coco/cocotalk.h5", 
  "num_test": 5000, 
  "num_val": 5000
}
example processed tokens:
['a', 'woman', 'riding', 'a', 'bike', 'down', 'a', 'bike', 'trail']
...
top words and their counts:
(1019751, 'a')
(224731, 'on')
...
(35371, 'woman')
total words: 6447836
number of bad words: 20059/29625 = 67.71%
number of words in vocab would be 9566
number of UNKs: 34543/6447836 = 0.54%
max length sentence in raw data:  49
sentence length distribution (count, number of words):
 0:          0   0.000000%
 1:          0   0.000000%
 ... 
 49:          4   0.000649%
inserting the special UNK token
assigned 5000 to val, 5000 to test.
encoded captions to array of size  (616767, 16)
Traceback (most recent call last):
  File "prepro.py", line 240, in <module>
    main(params)
  File "prepro.py", line 185, in main
    I = imread(os.path.join(params['images_root'], img['file_path']))
  File "/usr/local/lib/python2.7/dist-packages/scipy/misc/pilutil.py", line 154, in imread
    im = Image.open(name)
  File "/usr/lib/python2.7/dist-packages/PIL/Image.py", line 1955, in open
    fp = __builtin__.open(fp, "rb")
IOError: [Errno 2] No such file or directory: u'coco/images/train2014/COCO_train2014_000000152328.jpg'

Could someone please help me out? Am I missing something here?

Thanks

Adding notes on memory requirements?

I'm going to test this on a real computer tomorrow, but testing today on the 2GB GPU on my laptop I get an out of memory error with the 600MB pre-trained model.

I tried shutting everything else down in hope that 2GB was almost enough to run the model, but it doesn't seem to help (or even change the error message).

I tried running off the CPU using combinations of -gpuid -1 and -backend nn but i get different errors. Here are all the errors, in order:

kyle@kyle ~/D/L/neuraltalk2 (master)> th eval.lua -model models/checkpoint_v1.t7 -image_folder images/
DataLoaderRaw loading images from folder:   images/ 
listing all images in directory images/ 
DataLoaderRaw found 8 images    
/Users/kyle/torch/install/bin/luajit: ...torch/install/share/lua/5.1/cudnn/SpatialConvolution.lua:99: cuda runtime error (2) : out of memory at /Users/kyle/torch/extra/cutorch/lib/THC/THCStorage.cu:44
stack traceback:
    [C]: in function 'resizeAs'
    ...torch/install/share/lua/5.1/cudnn/SpatialConvolution.lua:99: in function 'createIODescriptors'
    ...torch/install/share/lua/5.1/cudnn/SpatialConvolution.lua:339: in function 'updateOutput'
    /Users/kyle/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function 'forward'
    eval.lua:115: in function 'eval_split'
    eval.lua:163: in main chunk
    [C]: in function 'dofile'
    ...kyle/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
    [C]: at 0x010b4892d0
kyle@kyle ~/D/L/neuraltalk2 (master) [1]> th eval.lua -backend nn -model models/checkpoint_v1.t7 -image_folder images/
/Users/kyle/torch/install/bin/luajit: /Users/kyle/torch/install/share/lua/5.1/torch/File.lua:262: unknown Torch class <cudnn.SpatialConvolution>
stack traceback:
    [C]: in function 'error'
    /Users/kyle/torch/install/share/lua/5.1/torch/File.lua:262: in function 'readObject'
    /Users/kyle/torch/install/share/lua/5.1/torch/File.lua:288: in function 'readObject'
    /Users/kyle/torch/install/share/lua/5.1/torch/File.lua:288: in function 'readObject'
    /Users/kyle/torch/install/share/lua/5.1/torch/File.lua:272: in function 'readObject'
    /Users/kyle/torch/install/share/lua/5.1/torch/File.lua:288: in function 'readObject'
    /Users/kyle/torch/install/share/lua/5.1/torch/File.lua:288: in function 'readObject'
    /Users/kyle/torch/install/share/lua/5.1/torch/File.lua:319: in function 'load'
    eval.lua:64: in main chunk
    [C]: in function 'dofile'
    ...kyle/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
    [C]: at 0x010f3862d0
kyle@kyle ~/D/L/neuraltalk2 (master) [1]> th eval.lua -gpuid -1 -model models/checkpoint_v1.t7 -image_folder images/
/Users/kyle/torch/install/bin/luajit: /Users/kyle/torch/install/share/lua/5.1/torch/File.lua:262: unknown Torch class <torch.CudaTensor>
stack traceback:
    [C]: in function 'error'
    /Users/kyle/torch/install/share/lua/5.1/torch/File.lua:262: in function 'readObject'
    /Users/kyle/torch/install/share/lua/5.1/torch/File.lua:288: in function 'readObject'
    /Users/kyle/torch/install/share/lua/5.1/torch/File.lua:272: in function 'readObject'
    /Users/kyle/torch/install/share/lua/5.1/torch/File.lua:288: in function 'readObject'
    /Users/kyle/torch/install/share/lua/5.1/torch/File.lua:288: in function 'readObject'
    /Users/kyle/torch/install/share/lua/5.1/torch/File.lua:319: in function 'load'
    eval.lua:64: in main chunk
    [C]: in function 'dofile'
    ...kyle/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
    [C]: at 0x010e5a42d0

read error: read 0 blocks instead of 1

I ran this line of code to predict caption on some images.
th eval.lua -model /home/ubuntu/neuraltalk2/model/ -image_folder ./images/ -num_images 1
Using AWS server configured with Torch and Caffe.
Tried with mentioning the name of the model also.
/home/ubuntu/torch-distro/install/bin/luajit: ...ubuntu/torch-distro/install/share/lua/5.1/torch/File.lua:194: read error: read 0 blocks instead of 1 at /home/ubuntu/torch-distro/pkg/torch/lib/TH/THDiskFile.c:302
stack traceback:
[C]: in function 'readInt'
...ubuntu/torch-distro/install/share/lua/5.1/torch/File.lua:194: in function 'readObject'
...ubuntu/torch-distro/install/share/lua/5.1/torch/File.lua:294: in function 'load'
eval.lua:68: in main chunk
[C]: in function 'dofile'
...rch-distro/install/lib/luarocks/rocks/trepl/scm-1/bin/th:131: in main chunk
[C]: at 0x00406670

Getting this error and have little idea of what this means.

cuda runtime error (2) : out of memory at THCStorage.cu

Hi All,
10 days ago I managed to run the neuraltalk2 eval. Yesterday (29.12) I reinstalled the torch and the dependencies like cutorch. Since that i get the next error message when i try to run eval with the same parameters:

constructing clones inside the LanguageModel
/home/aron/torch/install/bin/luajit: /home/aron/torch/install/share/lua/5.1/torch/File.lua:298: cuda runtime error (2) : out of memory at /tmp/luarocks_cutorch-scm-1-2709/cutorch/lib/THC/generic/THCStorage.cu:40
stack traceback:
[C]: in function 'read'
/home/aron/torch/install/share/lua/5.1/torch/File.lua:298: in function </home/aron/torch/install/share/lua/5.1/torch/File.lua:212>
[C]: in function 'read'
/home/aron/torch/install/share/lua/5.1/torch/File.lua:298: in function 'readObject'
/home/aron/torch/install/share/lua/5.1/torch/File.lua:316: in function 'readObject'
/home/aron/torch/install/share/lua/5.1/torch/File.lua:300: in function 'readObject'
/home/aron/torch/install/share/lua/5.1/torch/File.lua:316: in function 'readObject'
/home/aron/torch/install/share/lua/5.1/torch/File.lua:316: in function 'readObject'
/home/aron/torch/install/share/lua/5.1/torch/File.lua:316: in function 'readObject'
/home/aron/torch/install/share/lua/5.1/torch/File.lua:316: in function 'readObject'
/home/aron/torch/install/share/lua/5.1/torch/File.lua:316: in function 'readObject'
...
/home/aron/torch/install/share/lua/5.1/torch/File.lua:300: in function 'readObject'
/home/aron/torch/install/share/lua/5.1/torch/File.lua:316: in function 'readObject'
/home/aron/torch/install/share/lua/5.1/nngraph/gmodule.lua:402: in function 'read'
/home/aron/torch/install/share/lua/5.1/torch/File.lua:298: in function 'readObject'
/home/aron/torch/install/share/lua/5.1/nn/Module.lua:108: in function 'clone'
./misc/LanguageModel.lua:51: in function 'createClones'
eval.lua:96: in main chunk
[C]: in function 'dofile'
...aron/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
[C]: at 0x00406670

Using OpenCL instead of CUDA

Hi!

I have mac with an amd video card and I can´t use CUDA, there is any posibility to port the project to use OpenCL wich works with intel, amd and nvidia cards?

Issue when running on Jetson TX1

Good afternoon,

I'm trying to get Neuraltalk to run on a Jetson TX1.

I successfully managed to install Torch and all other dependencies listed in the main page, however I get this error when trying to run the command:
th eval.lua -model /path/to/model -image_folder /path/to/image/directory -num_images 10

(of course all paths have been replaced with the correct path, I'm using the model provided)

/usr/local/bin/luajit: /usr/local/share/lua/5.1/torch/File.lua:317: table index is nil
stack traceback:
/usr/local/share/lua/5.1/torch/File.lua:317: in function 'readObject'
/usr/local/share/lua/5.1/nn/Module.lua:154: in function 'read'
/usr/local/share/lua/5.1/torch/File.lua:298: in function 'readObject'
/usr/local/share/lua/5.1/torch/File.lua:316: in function 'readObject'
/usr/local/share/lua/5.1/torch/File.lua:316: in function 'readObject'
/usr/local/share/lua/5.1/torch/File.lua:347: in function 'load'
eval.lua:69: in main chunk
[C]: in function 'dofile'
/usr/local/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
[C]: at 0x0000d055

Any help would be welcome

Regards

Can i run neuraltalk2 MacbookAir without CUDA?...

First, i try to use CUDA but my macbook is not working on CUDA.
Under problem occur..
gimboseog-ui-MacBook-Air:neuraltalk2 JewelryKIM$ th eval.lua
/Users/JewelryKIM/torch/install/bin/luajit: ...rs/JewelryKIM/torch/install/share/lua/5.1/trepl/init.lua:383: cuda runtime error (35) : CUDA driver version is insufficient for CUDA runtime version at /tmp/luarocks_cutorch-scm-1-3737/cutorch/lib/THC/THCGeneral.c:16
stack traceback:
[C]: in function 'error'
...rs/JewelryKIM/torch/install/share/lua/5.1/trepl/init.lua:383: in function 'require'
eval.lua:58: in main chunk
[C]: in function 'dofile'
...yKIM/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
[C]: at 0x010adeebc0

i have install the CUDA version 7.5 but not delete the CUDA

Car model? Motorcycle or truck? Alpr ?

It's possible to training for recognition car model? If is a Ferrari or a fiat?

train.lua error , cuda runtime error (2) : out of memory

Hi, all
I tried to train network on MSCOCO, i downloaded the dataset, and then run prepro.py ,and there were cocotalk.h5 and cocotalk.json under ./coco filefolder.

Then i tried to run the script:

$ th train.lua -input_h5 coco/cocotalk.h5 -input_json coco/cocotalk.json

And i get the next error message:

/home/liuchang/torch/install/bin/luajit: ./misc/optim_updates.lua:65: cuda runtime error (2) : out of memory at /tmp/luarocks_cutorch-scm-1-6827/cutorch/lib/THC/generic/THCStorage.cu:40
stack traceback:
[C]: in function 'new'
./misc/optim_updates.lua:65: in function 'adam'
/home/liuchang/neuraltalk2/train.lua:375: in main chunk
[C]: in function 'dofile'
...hang/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
[C]: at 0x00406620

My config:

os: ubuntu 14.04, 64bit
gpu: getforce gtx745, 4g
cuda: 7.0
cudnn:cudnn-7.0-linux-x64-v3.0-prod.tgz

I use Zerobrane to debug the train.lua step by step, and the error occured at line 387, when my GPU memory is occupied by 99%.

I wonder if my GPU memory is too small to train on MSCOCO ？

Distributing training on several hosts

Does anyone know if it's possible to distribute the training on several hosts to reduce the time to run it?

I'm thinking Google Vision cannot run on a single host, and they must aggregate models from too many images. As a consequence, they must have means to grow a model from several simultaneous training systems. Could this work be ported to scale out?

Thanks,

RGB order?

I'm wondering if there is RGB->BGR conversion in the code?
It looks to me the imgs only goes cropping and RGB mean subtraction in the prepro.py.
Then the imgs are fed into VGGNet.

I guess this won't make too much difference on the results, but is it better to go through normal workflow, i.e., adding BGR conversion?

Thanks,
-Licheng

Dependency issues. loadcaffe, protobuf, gcc etc.

Hi there.
Now, I am very inexperienced in all this all this shell/bash business, but I tried to get neuraltalk to work.
After I thought I had installed all the dependencies that I needed, I tried to run the provided command $ th eval.lua -model /path/to/model -image_folder /path/to/image/directory -num_images 10 , but before anything happens, I immediately get the following error in my Terminal.

/Users/username/torch/install/bin/luajit: cannot open eval.lua: No such file or directory
stack traceback:
    [C]: in function 'dofile'
    ...username/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
    [C]: at 0x010a015f10

I was able to activate torch in the terminal and also to use luarocks to install the dependencies, so I thought I was doing something right, but no I have no idea at all, what I did wrong here. Is there a way for you to tell where I went wrong in the installation process (and explain it in idiot proof terminology, if possible ;) )?

Thanks!

OS X 10.8.5

Out of memory error when finetuning is enabled

Hello,

I am experiencing an error when I try to train a model with finetuning either from a previously saved checkpoint or from scratch.

What I did:

Stage 1 (success)

$ th train.lua -input_h5 coco/cocotalk.h5 -input_json coco/cocotalk.json -checkpoint_path chkp -language_eval 1

This trained a model on COCO with no finetuning. I got cider ~0.6 after 150k+ iters so I saved the checkpoint elsewhere (e.g. saved_checkpoints/coco_initial.t7) to continue with finetuning.

Stage 2 (failure)

$ th train.lua -input_h5 coco/cocotalk.h5 -input_json coco/cocotalk.json -checkpoint_path chkp -language_eval 1 -finetune_cnn_after 0 -start_from saved_checkpoints/coco_initial.t7

Tried to enable finetuning and load the previously saved checkpoint, but what I got was:

wrote json checkpoint to chkp/model_id.json
/home/cybernaut/torch/install/bin/luajit: ./misc/optim_updates.lua:65: cuda runtime error (2) : out of memory at /tmp/luarocks_cutorch-scm-1-6112/cutorch/lib/THC/THCStorage.cu:44
stack traceback:
[C]: in function 'new'
./misc/optim_updates.lua:65: in function 'adam'
train.lua:387: in main chunk
[C]: in function 'dofile'
...naut/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:131: in main chunk

[C]: at 0x00405e40

The GPU I used: GeForce GTX 860M with 4Gb of memory
Installed Cuda7 and cudnn R3
Installed all prementioned torch packages in readme of neuraltalk2

Any hints or ideas, would be much appreciated!
Thank you!

cannot open eval.lua: No such file or directory

I've followed the setup till the cjson part (since I am only using CPU). However when I ran the following command I got the error. I am on Mac OSX 10.9.5

command
$ th eval.lua -model model_id1-501-1448236541.t7_cpu.t7 -image_folder ./img -num_images 1 -gpuid -1

error
../torch/install/bin/luajit: cannot open eval.lua: No such file or directory stack traceback: [C]: in function 'dofile' ../torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk [C]: at 0x0105f2a1d0

Anyone can point me to the right direction? Thanks ;-)

Can it work in another direction (from text to image)?

libcudnn.dylib not found

Hello,

I've tried to make the image description work on Mac OSX ElCapitan, but libcudnn.dylib is still not found, whereas LD_LIBRARY_PATH is defined :

➜  neuraltalk2 git:(master) ✗ th eval.lua -model ./model_id1-501-1448236541.t7 -image_folder ./pic -num_images 17
/Users/olivier/torch/install/share/lua/5.1/cudnn/ffi.lua:574: dlopen(libcudnn.dylib, 5): image not found    
/Users/olivier/torch/install/bin/luajit: /Users/olivier/torch/install/share/lua/5.1/trepl/init.lua:383: /Users/olivier/torch/install/share/lua/5.1/cudnn/ffi.lua:577: 'libcudnn.so not found in library path.
Please install CuDNN from https://developer.nvidia.com/cuDNN
Then make sure all the files named as libcudnn.so* are placed in your library load path (for example /usr/local/lib , or manually add a path to LD_LIBRARY_PATH)

stack traceback:
    [C]: in function 'error'
    /Users/olivier/torch/install/share/lua/5.1/trepl/init.lua:383: in function 'require'
    eval.lua:60: in main chunk
    [C]: in function 'dofile'
    ...vier/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
    [C]: at 0x01079cfbd0
➜  neuraltalk2 git:(master) ✗ echo $LD_LIBRARY_PATH                                                                 
/Developpements/torch/cuda/lib:/Users/olivier/torch/install/lib
➜  neuraltalk2 git:(master) ✗ ls -l /Developpements/torch/cuda/lib 
total 233008
-rwxr-xr-x@ 1 olivier  staff  60047144 20 nov 12:04 libcudnn.4.dylib
lrwxr-xr-x@ 1 olivier  staff        16 23 nov 03:30 libcudnn.dylib -> libcudnn.4.dylib
-rw-r--r--@ 1 olivier  staff  59245464 20 nov 12:04 libcudnn_static.a
➜  neuraltalk2 git:(master) ✗ ls -l /Users/olivier/torch/install/lib
total 56720
-rwxr-xr-x  1 olivier  staff   1898340 14 déc 11:58 libTH.dylib
-rwxr-xr-x  1 olivier  staff  26156832 14 déc 15:40 libTHC.dylib
-rwxr-xr-x  1 olivier  staff     35692 14 déc 11:58 libluaT.dylib
-rwxr-xr-x  1 olivier  staff    651772 14 déc 11:58 libluajit.dylib
-rwxr-xr-x  1 olivier  staff    118084 14 déc 12:00 libqlua.dylib
-rwxr-xr-x  1 olivier  staff    170428 14 déc 12:00 libqtlua.dylib
drwxr-xr-x  3 olivier  staff       102 14 déc 11:58 lua
drwxr-xr-x  3 olivier  staff       102 14 déc 11:58 luarocks

Is there any more setup to do ? Or any missing file ?

Thanks.

Issues with CPU based training

Running this on a macbook with a 2GB RAM on the GPU was spitting out out of memory issues:

/torch/install/share/lua/5.1/torch/File.lua:270: cuda runtime error (2) : out of memory at /tmp/luarocks_cutorch-scm-1-2702/cutorch/lib/THC/THCStorage.cu:44

So I attempted to run it with GPU as I have 16GB RAM available but when running the training with the following options I get errors...

‹master*› » th train.lua -gpuid -1 -backend nn -input_h5 coco/cocotalk.h5 -input_json coco/cocotalk.json 1 ↵
DataLoader loading json file: coco/cocotalk.json
vocab size is 9567
DataLoader loading h5 file: coco/cocotalk.h5
read 123287 images of size 3x256x256
max sequence length in data is 16
assigned 113287 images to split train
assigned 5000 images to split val
assigned 5000 images to split test
[libprotobuf ERROR google/protobuf/text_format.cc:274] Error parsing text-format caffe.NetParameter: 1:9: Message type "caffe.NetParameter" has no field named "require".
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:537] Reading dangerously large protocol message. If the message turns out to be larger than 1073741824 bytes, parsing will be halted for security reasons. To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h.
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:78] The total number of bytes read was 553432081
Successfully loaded model/VGG_ILSVRC_16_layers.caffemodel
conv1_1: 64 3 3 3
conv1_2: 64 64 3 3
conv2_1: 128 64 3 3
conv2_2: 128 128 3 3
conv3_1: 256 128 3 3
conv3_2: 256 256 3 3
conv3_3: 256 256 3 3
conv4_1: 512 256 3 3
conv4_2: 512 512 3 3
conv4_3: 512 512 3 3
conv5_1: 512 512 3 3
conv5_2: 512 512 3 3
conv5_3: 512 512 3 3
fc6: 1 1 25088 4096
fc7: 1 1 4096 4096
fc8: 1 1 4096 1000
converting first layer conv filters from BGR to RGB...
/torch/install/bin/luajit: bad argument #2 to '?' (too many indices provided at /torch/pkg/torch/generic/Tensor.c:929)

What am I doing wrong? Is it possible to train this model on a macbook w/2GB GPU or on CPU with 16 GB system memory available.

Training freezes and high use of virtual memory (ubuntu)

Hi!

I installed all the dependencies and I'm able to use the pre trained MS COCO network. As a first try to train my own network I created a base with only one image. I'm running on a g2.2xlarge instance on AWS (it has a NVIDIA GPU) and installed CUDA, cuDNN and everything else that was on the readme file. What happens when I run the "th thrain.lua .." command is that a process from luajit starts to use 100% of one CPU core and 36.7G of virtual memory. This process also can't be killed. I just have to reboot the machine from AWS console. Is this normal? I expected the training process to be fast based on the small number of images in my base.

Thanks for the help and congrats! The results I've seen until here are very impressive!

CNN Feature Extraction Automatically Runs on Multiple GPUS?

It seems that the CNN feature extraction of images runs automatically on multiple GPUs, although I specify -gpuid 0. But the language model indeed runs on the specified GPU.

Image Sizes: 256 to 224 with augmentation?

I am trying to figure out why you resize the images to 256 and then perform the 'augmentation step' in ``net_utils.prepro` to take a random region of this image. Is there an inherent reason not to downsample to 224 in the first place?
If the subregion aspect is important (for overfitting I guess?), it should be performed before the downsampling.

I am working on a PR to make it more flexible for other sized images (particularly smaller ones) and will post it once this aspect is clear.

Out of memory training with 6G memory GPU

Hi,

I got "out of memory" error when training with a Titan Z GPU which has 6GB memory:
th train.lua -input_h5 coco/cocotalk.h5 -input_json coco/cocotalk.json

The error messages are:
/blah/torch/install/bin/luajit: .../torch/install/share/lua/5.1/trepl/init.lua:363: /tmp/luarocks_cutorch-scm-1-8337/cutorch/lib/THC/THCTensorRandom.cu(20) : cuda runtime error (2) : out of memory at /tmp/luarocks_cutorch-scm-1-8337/cutorch/lib/THC/THCGeneral.c:241
stack traceback:
[C]: in function 'error'
.../torch/install/share/lua/5.1/trepl/init.lua:363: in function 'require'
train.lua:79: in main chunk
[C]: in function 'dofile'
.../torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:131: in main chunk
[C]: at 0x00406670

I've tried reducing batch_size to 1 and also reducing rnn_size and input_encoding_size but none of
them helped.

Any idea what I may have missed? Many thanks.

File names / file paths to the .json file?

Hi there,
would it be possible to add a "JSON only" mode? For my purposes I only really need the .json file instead of the entire html structure. And I guess that version would be the more (or most?) basic outcome for such a script.

What I imagine:

Run the shell command using something like eval-json.lua instead of eval.lua.
Images don't get renamed and don't get copied.
A .json file will be placed in the images folder alongside the images.
In the .json file an entry could look like: {"caption":"my resulting caption","image_file":"myFilename.jpg"}

Is this feasible / desirable? For me it would make sense as it would be a more generic result of the script.

Error in CuDNN: CUDNN_STATUS_NOT_SUPPORTED

Hi,
I have followed all the steps and installed all dependencies required for neuraltalk2.
while running th eval.lua

DataLoaderRaw loading images from folder: /home/anilil/caffe/examples/images/
listing all images in directory /home/anilil/caffe/examples/images/
DataLoaderRaw found 3 images
constructing clones inside the LanguageModel
/home/anilil/torch/install/bin/luajit: /home/anilil/torch/install/share/lua/5.1/cudnn/init.lua:58: Error in CuDNN: CUDNN_STATUS_NOT_SUPPORTED
stack traceback:
[C]: in function 'error'
/home/anilil/torch/install/share/lua/5.1/cudnn/init.lua:58: in function 'errcheck'
...torch/install/share/lua/5.1/cudnn/SpatialConvolution.lua:385: in function 'updateOutput'
/home/anilil/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function 'forward'
eval.lua:121: in function 'eval_split'
eval.lua:173: in main chunk
[C]: in function 'dofile'
...ilil/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
[C]: at 0x00406670

I already tried the #19 and the #20

I already installed the latest version of Cudnn.

Live camera feed from lua-camera not working

Hi im using the CPU model from your Readme and lua-camera to get the image from the camera to process instead of reading an image from the folder. This method always gives me
"a view of a UNK UNK in a cloudy sky" as the caption"
I did this by

1. Initializing at the start of the program

   require 'camera'
   cam = image.Camera(0)

2. And making the following change to the eval_script function

 -- local data = loader:getBatch{batch_size = opt.batch_size, split = split, seq_per_img = opt. seq_per_img} #commented to prevent file load from folder
 img = cam:forward()
 data={images=nil,infos=nil}
 data.images=torch.ByteTensor(1, 3, img:size(2), img:size(3))
 data.images[1]=img
 ... --some lines in between
 for k=1,#sents do
   print(sents[k]) --removed everything else
 end
 break --since im using only 1 frame
 end --end loop

3. Loop the function

while true do
    loss, split_predictions, lang_stats = eval_split(opt.split, {num_images = opt.num_images})
end

However the following method works(gives me slightly relevant captions)

1. Loop the function by adding the following line

while true do
  local folder="/home/mithul/torch/projects/neuraltalk2/cam"
  local imfile=folder.."/image.jpeg"
  os.execute("streamer -f jpeg -s 1280x720 -o "..imfile)
    load_data()
    loss, split_predictions, lang_stats = eval_split(opt.split, {num_images = opt.num_images})
    print('loss: ', loss)
end

2. Run eval.lua with "/home/mithul/torch/projects/neuraltalk2/cam" as the image folder

However this method causes the camera to be switched on and off by the OS for each image causing a delay, and also re-initialize the loader for every frame.

I would like to know why the first method does not work while the second method works.

Training on SBU dataset

Hi all, I noticed the current implementation was trained on MS-COCO dataset, whose captions are limited in size. I wonder if anyone has tried to train on bigger datasets, such as the SBU dataset (http://tlberg.cs.unc.edu/vicente/sbucaptions/). How should I initialize the network settings in this case? Thanks!

Thank you NeuralTalk2

Just came back this weekend from PennApps XIII (the largest college hackathon in the U.S.) to win the “Most Innovative Use of Embedded Systems” prize, beating over 1500 other student hackers!

VIA is a Visual Impairment Assistant, which aims to bring context back to the visually impaired, as distance information alone isn’t enough. We used NeuralTalk2 for the long-range context component (and rightfully credited the source code and explained that we didn't in fact write the neural engine).

Check out the project and our video demo at this Devpost link!

http://devpost.com/software/via-visual-impairment-assistant-17pg9d

Very excited to be one of the few winning teams in the biggest hackathon in the States.

link missing

http://cs.stanford.edu/people/karpathy/neuraltalk2/demo.html gives a 404

Readme claims the ipython notebook downloads the ms coco dataset, but it does not.

Possibly add instructions about how to put the dataset in the appropriate folder?

"Camera Dropped Frame"

I am trying to locate the string "Camera Dropped Frame" as I am using text-to-speech on NeuralTalk2, and I do not need the "Camera Dropped Frame" text but doing a grep of both the library on Github and on my machine, I couldn't locate this string, making me think this may be a language thing. How do I remove this string? @karpathy

"attempt to concatenate local 'ext' (a nil value)"

After I finally got it to work this morning, now there is an error again and it refuses to work. Here's what I get:

$ th eval.lua -model /Users/username/Documents/NeuralTalk2/checkpoint_v1.t7_cpu.t7 -image_folder /Users/username/Documents/NeuralTalk2/images -num_images -1 -gpuid -1
DataLoaderRaw loading images from folder:   /Users/username/Documents/NeuralTalk2/images  
listing all images in directory /Users/username/Documents/NeuralTalk2/images  
DataLoaderRaw found 21 images 
constructing clones inside the LanguageModel  
/Users/username/torch/install/bin/luajit: /Users/username/torch/install/share/lua/5.1/image/init.lua:337: attempt to concatenate local 'ext' (a nil value)
stack traceback:
  /Users/username/torch/install/share/lua/5.1/image/init.lua:337: in function 'load'
  ./misc/DataLoaderRaw.lua:74: in function 'getBatch'
  eval.lua:115: in function 'eval_split'
  eval.lua:169: in main chunk
  [C]: in function 'dofile'
  ...username/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
  [C]: at 0x010bdd1f10

The strange thing (at least to me) is that it claims it finds 21 images, when there is actually only 20 in the images folder. Andybody knows what's wrong here?

libcudnn.so* not found

Thanks for this great code. I tried to follow your README instructions as religiously as possible. When I tried running the eval.lua script, I came up with

User@User:~/Documents/neuraltalk2$ th eval.lua -model ../KarpathyNN/model_id1-501-1448236541.t7 -image_folder ../MS-CoCo/test2014/ -num_images 10
/usr/local/share/lua/5.1/cudnn/ffi.lua:574: libcudnn.so: cannot open shared object file: No such file or directory
/usr/local/bin/luajit: /usr/local/share/lua/5.1/trepl/init.lua:363: /usr/local/share/lua/5.1/cudnn/ffi.lua:577: 'libcudnn.so not found in library path.
Please install CuDNN from https://developer.nvidia.com/cuDNN
Then make sure all the files named as libcudnn.so* are placed in your library load path (for example /usr/local/lib , or manually add a path to LD_LIBRARY_PATH)

stack traceback:
[C]: in function 'error'
/usr/local/share/lua/5.1/trepl/init.lua:363: in function 'require'
eval.lua:59: in main chunk
[C]: in function 'dofile'
/usr/local/lib/luarocks/rocks/trepl/scm-1/bin/th:133: in main chunk
[C]: at 0x00406260

But I have libcudnn.so* files installed as locate libcudnn gives

/home/User/Documents/cuda/lib64/libcudnn.so
/home/User/Documents/cuda/lib64/libcudnn.so.7.0
/home/User/Documents/cuda/lib64/libcudnn.so.7.0.64
/home/User/Documents/cuda/lib64/libcudnn_static.a

So I export this path to my LD_LIBRARY_PATH as in

export LD_LIBRARY_PATH=/home/User/cuda:${LD_LIBRARY_PATH}

When I echo $LD_LIBRARY_PATH, I get

/home/User/cuda:/home/User/catkin_ws/devel/lib:/home/User/cuda/lib64:/home/User/cuda/lib64/home/User/catkin_ws/devel/lib:/home/User/catkin_ws/devel/lib/x86_64-linux-gnu:/opt/ros/indigo/lib/x86_64-linux-gnu:/usr/local/cuda-7.0/lib64:/opt/ros/indigo/lib

It appears libcudnn is now in the LD_LIBRARY_PATH. However, running again the eval script still produces

User@User:~/Documents/neuraltalk2$ th eval.lua -model ../KarpathyNN/model_id1-501-1448236541.t7 -image_folder ../MS-CoCo/test2014/ -num_images 10
/usr/local/share/lua/5.1/cudnn/ffi.lua:574: libcudnn.so: cannot open shared object file: No such file or directory
/usr/local/bin/luajit: /usr/local/share/lua/5.1/trepl/init.lua:363: /usr/local/share/lua/5.1/cudnn/ffi.lua:577: 'libcudnn.so not found in library path.
Please install CuDNN from https://developer.nvidia.com/cuDNN
Then make sure all the files named as libcudnn.so* are placed in your library load path (for example /usr/local/lib , or manually add a path to LD_LIBRARY_PATH)

stack traceback:
[C]: in function 'error'
/usr/local/share/lua/5.1/trepl/init.lua:363: in function 'require'
eval.lua:59: in main chunk
[C]: in function 'dofile'
/usr/local/lib/luarocks/rocks/trepl/scm-1/bin/th:133: in main chunk
[C]: at 0x00406260

I'm sorry for the bother but would appreciate any help.

Getting identical sequence all the time...

Hi Andrej,
actually I am training this model using some arbitrary dataset and arbitrary features. What I did is:
: remove the CNN layer and use a [Linear(4096, D); ReLu()] instead.

But for language model (lm.sample()) on different features, I always ended up with exact the same sequences. So my question is: is this normal for the first several thousands iterations or did I overlook something?

Thanks so much!

prepro.py crashes on Unicode captions

Expanding my tag/image dataset further from Danbooru, my preprocessing step began to crash with the error:

Traceback (most recent call last):  File "prepro.py", line 241, in <module>
main(params)   File "prepro.py", line 162, in main
prepro_captions(imgs)   File 
"prepro.py", line 43, in prepro_captions
txt = str(s).lower().translate(None, string.punctuation).strip().split() UnicodeEncodeError: 'ascii' codec can't encode 
character u'\xd7' in position 21: ordinal not in range(128)

While no useful information is printed about which tag/JSON entry caused the problem, my guess is that one of the tags has some Unicode in it (probably a Japanese word or emoji) and neuraltalk2/prepro.py, like char-rnn, makes an ASCII-only assumption.

Using the first suggestion I found on StackOverflow, I tried tossing in some sort of iconv-like conversion step which renders Unicode in a longer ASCII form (I think that's what it does, anyway):

@@ -34,13 +34,13 @@ import numpy as np
 from scipy.misc import imread, imresize

 def prepro_captions(imgs):
-  
+
   # preprocess all the captions
   print 'example processed tokens:'
   for i,img in enumerate(imgs):
     img['processed_tokens'] = []
     for j,s in enumerate(img['captions']):
-      txt = str(s).lower().translate(None, string.punctuation).strip().split()
+      txt = s.encode('ascii', errors='backslashreplace').lower().translate(None, string.punctuation).strip().split()
       img['processed_tokens'].append(txt)
       if i < 10 and j == 0: print txt

Seems to work. Maybe some version of that could be added?

option -start_from in train.lua : what's the expected behaviour

Hi,

I have trained a model on the MSCOCO without any fine tuning. Then I wanted to add the fine tuning, so I did:

th train.lua -input_h5 /data/training/cocotalk.h5
-input_json /data/training/cocotalk.json
-cnn_proto /data/training/VGG_ILSVRC_16_layers_deploy.prototxt
-cnn_model /data/training/VGG_ILSVRC_16_layers.caffemodel
-checkpoint_path /data/model
-finetune_cnn_after 0
-start_from /data/model/model_id.t7
-batch_size 4

(small batch size as running with 4GB vRAM only)

As a result, a new file model_id.json was created and keeps growing (I am now at about 80k iterations).
How am I supposed to combine this new file with the original t7 file? Is it an expected behaviour?

Many thanks in advance,

another "out of memory issue" when reading the pretained model

In this #3 post a memory error was already solved by adding the option -batch_size 1 but did not work for me on a 16 GB iMac with cuda installed (see output)

The problem already arises when reading in the pretained model:

require 'nn';
require 'cudnn';
require 'cunn';
net = torch.load('./models/checkpoint_v1.t7', 'binary')

/usr/local/share/lua/5.1/torch/File.lua:270: cuda runtime error (2) : out of memory at /tmp/luarocks_cutorch-scm-1-8052/cutorch/lib/THC/THCGeneral.c:510
stack traceback:
    [C]: in function 'read'
    /usr/local/share/lua/5.1/torch/File.lua:270: in function </usr/local/share/lua/5.1/torch/File.lua:190>
    [C]: in function 'read'
    /usr/local/share/lua/5.1/torch/File.lua:270: in function 'readObject'
    /usr/local/share/lua/5.1/torch/File.lua:288: in function 'readObject'
    /usr/local/share/lua/5.1/torch/File.lua:272: in function 'readObject'
    /usr/local/share/lua/5.1/torch/File.lua:288: in function 'readObject'
    /usr/local/share/lua/5.1/torch/File.lua:288: in function 'readObject'
    /usr/local/share/lua/5.1/torch/File.lua:272: in function 'readObject'
    /usr/local/share/lua/5.1/torch/File.lua:288: in function 'readObject'
    /usr/local/share/lua/5.1/torch/File.lua:288: in function 'readObject'
    /usr/local/share/lua/5.1/torch/File.lua:319: in function 'load'
    [string "net = torch.load('checkpoint_v1.t7', 'binary'..."]:1: in main chunk
    [C]: in function 'xpcall'
    /usr/local/share/lua/5.1/itorch/main.lua:179: in function </usr/local/share/lua/5.1/itorch/main.lua:143>
    /usr/local/share/lua/5.1/lzmq/poller.lua:75: in function 'poll'
    /usr/local/share/lua/5.1/lzmq/impl/loop.lua:307: in function 'poll'
    /usr/local/share/lua/5.1/lzmq/impl/loop.lua:325: in function 'sleep_ex'
    /usr/local/share/lua/5.1/lzmq/impl/loop.lua:370: in function 'start'
    /usr/local/share/lua/5.1/itorch/main.lua:350: in main chunk
    [C]: in function 'require'

Any ideas?

$ th eval.lua -model ./models/checkpoint_v1.t7 -image_folder ./images
/usr/local/bin/luajit: /usr/local/share/lua/5.1/torch/File.lua:270: cuda runtime error (2) : out of memory at /tmp/luarocks_cutorch-scm-1-8052/cutorch/lib/THC/THCStorage.cu:44
stack traceback:
    [C]: in function 'read'
    /usr/local/share/lua/5.1/torch/File.lua:270: in function </usr/local/share/lua/5.1/torch/File.lua:190>
    [C]: in function 'read'
    /usr/local/share/lua/5.1/torch/File.lua:270: in function 'readObject'
    /usr/local/share/lua/5.1/torch/File.lua:288: in function 'readObject'
    /usr/local/share/lua/5.1/torch/File.lua:272: in function 'readObject'
    /usr/local/share/lua/5.1/torch/File.lua:288: in function 'readObject'
    /usr/local/share/lua/5.1/torch/File.lua:288: in function 'readObject'
    /usr/local/share/lua/5.1/torch/File.lua:272: in function 'readObject'
    /usr/local/share/lua/5.1/torch/File.lua:288: in function 'readObject'
    /usr/local/share/lua/5.1/torch/File.lua:288: in function 'readObject'
    /usr/local/share/lua/5.1/torch/File.lua:319: in function 'load'
    eval.lua:63: in main chunk
    [C]: in function 'dofile'
    /usr/local/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
    [C]: at 0x0101364c10

$ th eval.lua -backend nn -model models/checkpoint_v1.t7 -image_folder images/
/usr/local/bin/luajit: /usr/local/share/lua/5.1/torch/File.lua:262: unknown Torch class <cudnn.SpatialConvolution>
stack traceback:
    [C]: in function 'error'
    /usr/local/share/lua/5.1/torch/File.lua:262: in function 'readObject'
    /usr/local/share/lua/5.1/torch/File.lua:288: in function 'readObject'
    /usr/local/share/lua/5.1/torch/File.lua:288: in function 'readObject'
    /usr/local/share/lua/5.1/torch/File.lua:272: in function 'readObject'
    /usr/local/share/lua/5.1/torch/File.lua:288: in function 'readObject'
    /usr/local/share/lua/5.1/torch/File.lua:288: in function 'readObject'
    /usr/local/share/lua/5.1/torch/File.lua:319: in function 'load'
    eval.lua:63: in main chunk
    [C]: in function 'dofile'
    /usr/local/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
    [C]: at 0x010edaec10

$ th eval.lua -gpuid -1 -model models/checkpoint_v1.t7 -image_folder images/
/usr/local/bin/luajit: /usr/local/share/lua/5.1/torch/File.lua:262: unknown Torch class <torch.CudaTensor>
stack traceback:
    [C]: in function 'error'
    /usr/local/share/lua/5.1/torch/File.lua:262: in function 'readObject'
    /usr/local/share/lua/5.1/torch/File.lua:288: in function 'readObject'
    /usr/local/share/lua/5.1/torch/File.lua:272: in function 'readObject'
    /usr/local/share/lua/5.1/torch/File.lua:288: in function 'readObject'
    /usr/local/share/lua/5.1/torch/File.lua:288: in function 'readObject'
    /usr/local/share/lua/5.1/torch/File.lua:319: in function 'load'
    eval.lua:63: in main chunk
    [C]: in function 'dofile'
    /usr/local/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
    [C]: at 0x0105d7dc10

$ th eval.lua -model ./models/checkpoint_v1.t7 -image_folder ./images -batch_size 1
/usr/local/bin/luajit: /usr/local/share/lua/5.1/torch/File.lua:270: cuda runtime error (2) : out of memory at /tmp/luarocks_cutorch-scm-1-8052/cutorch/lib/THC/THCStorage.cu:44
stack traceback:
    [C]: in function 'read'
    /usr/local/share/lua/5.1/torch/File.lua:270: in function </usr/local/share/lua/5.1/torch/File.lua:190>
    [C]: in function 'read'
    /usr/local/share/lua/5.1/torch/File.lua:270: in function 'readObject'
    /usr/local/share/lua/5.1/torch/File.lua:288: in function 'readObject'
    /usr/local/share/lua/5.1/torch/File.lua:272: in function 'readObject'
    /usr/local/share/lua/5.1/torch/File.lua:288: in function 'readObject'
    /usr/local/share/lua/5.1/torch/File.lua:288: in function 'readObject'
    /usr/local/share/lua/5.1/torch/File.lua:272: in function 'readObject'
    /usr/local/share/lua/5.1/torch/File.lua:288: in function 'readObject'
    /usr/local/share/lua/5.1/torch/File.lua:288: in function 'readObject'
    /usr/local/share/lua/5.1/torch/File.lua:319: in function 'load'
    eval.lua:63: in main chunk
    [C]: in function 'dofile'
    /usr/local/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
    [C]: at 0x010845ec10

Getting error CUDNN_STATUS_NOT_INITIALIZED

Hi,

Followed the guide to the T, but when trying to launch the eval with GPU checkpoint, I'm getting the following error:

ubuntu@ip-172-31-12-54:~/neuraltalk2$ th eval.lua -model ../models/model_id1-501-1448236541.t7 -image_folder ../samples/
DataLoaderRaw loading images from folder:       ../samples/
listing all images in directory ../samples/
DataLoaderRaw found 4 images
constructing clones inside the LanguageModel
/home/ubuntu/torch/install/bin/luajit: /home/ubuntu/torch/install/share/lua/5.1/cudnn/init.lua:45: Error in CuDNN: CUDNN_STATUS_NOT_INITIALIZED
stack traceback:
        [C]: in function 'error'
        /home/ubuntu/torch/install/share/lua/5.1/cudnn/init.lua:45: in function 'getHandle'
        /home/ubuntu/torch/install/share/lua/5.1/cudnn/init.lua:53: in function 'errcheck'
        ...torch/install/share/lua/5.1/cudnn/SpatialConvolution.lua:39: in function 'resetWeightDescriptors'
        ...torch/install/share/lua/5.1/cudnn/SpatialConvolution.lua:338: in function 'updateOutput'
        /home/ubuntu/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function 'forward'
        eval.lua:121: in function 'eval_split'
        eval.lua:173: in main chunk
        [C]: in function 'dofile'
        ...untu/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
        [C]: at 0x00406670

Any idea?

Error when processing training images

Edit: See @susemeee's comment below (the image COCO_train2014_000000167126.jpg is corrupted, and you can download a replacement at https://msvocds.blob.core.windows.net/images/262993_z.jpg)

I was trying to run prepro.py but eventually ran into an issue in scipy's pilutil package (see below).

I've installed all dependencies, run the coco_preprocess.ipynb, and downloaded train2014.zip + val2014.zip and extracted them into coco/images.

Am I missing something?

$ python prepro.py --input_json coco/coco_raw.json --num_val 5000 --num_test 5000 --images_root coco/images --word_count_threshold 5 --output_json coco/cocotalk.json --output_h5 coco/cocotalk.h5
parsed input parameters:
{
  "output_json": "coco/cocotalk.json",
  "images_root": "coco/images",
  "input_json": "coco/coco_raw.json",
  "word_count_threshold": 5,
  "max_length": 16,
  "output_h5": "coco/cocotalk.h5",
  "num_test": 5000,
  "num_val": 5000
}
example processed tokens:
['a', 'woman', 'riding', 'a', 'bike', 'down', 'a', 'bike', 'trail']
... lots of info deleted for brevity ...
inserting the special UNK token
assigned 5000 to val, 5000 to test.
encoded captions to array of size  (616767, 16)
processing 0/123287 (0.00% done)
... lots of percentages deleted for brevity ...
processing 60000/123287 (48.67% done)
Traceback (most recent call last):
  File "prepro.py", line 236, in <module>
    main(params)
  File "prepro.py", line 186, in main
    Ir = imresize(I, (256,256))
  File "/usr/local/lib/python2.7/site-packages/scipy/misc/pilutil.py", line 424, in imresize
    im = toimage(arr, mode=mode)
  File "/usr/local/lib/python2.7/site-packages/scipy/misc/pilutil.py", line 234, in toimage
    raise ValueError("'arr' does not have a suitable array shape for "
ValueError: 'arr' does not have a suitable array shape for any mode.

Instructions have problems

The instructions say to do the following:

luarocks install nn
luarocks install nngraph
luarocks install image

However, I cannot get the above packages through luarocks.

What should I do?

Thanks!

val_images_use option error

In the description, it says -1 = all, but it's not considered in the code.

Error when running eval.lua ("attempt to index field 'tensorOutput' (a nil value)")

Hello,

I've just downloaded the project out of curiosity, and followed the setup instructions. But note that I have no idea of what I'm doing :)

I'm running on a problem when running:
$ th eval.lua -model ./model_id1-501-1448236541.t7_cpu.t7 -image_folder ./sample-images -num_images 1 -gpuid -1

Heres my setup:
Machine: Macbook Pro, late 2011, 2.4 GHz Intel Core i5, 2.4 GHz Intel Core i5, OS X El Capitan 10.11.1
Model: I'm using the provided cpu checkpoint model in the README.
Image in folder: http://25.media.tumblr.com/Jjkybd3nSab3wr6cd1T33jjw_500.jpg

Error output:

$ th eval.lua -model ./model_id1-501-1448236541.t7_cpu.t7 -gpuid -1 -image_folder ./sample-images
DataLoaderRaw loading images from folder:       ./sample-images
listing all images in directory ./sample-images
DataLoaderRaw found 1 images
constructing clones inside the LanguageModel
/Users/***/torch/install/bin/luajit: /Users/***/torch/install/share/lua/5.1/nn/Identity.lua:13: attempt to index field 'tensorOutput' (a nil value)
stack traceback:
        /Users/***/torch/install/share/lua/5.1/nn/Identity.lua:13: in function 'func'
        /Users/***/torch/install/share/lua/5.1/nngraph/gmodule.lua:252: in function 'neteval'
        /Users/***/torch/install/share/lua/5.1/nngraph/gmodule.lua:287: in function 'forward'
        ./misc/LanguageModel.lua:266: in function 'sample'
        eval.lua:134: in function 'eval_split'
        eval.lua:169: in main chunk
        [C]: in function 'dofile'
        ...***/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
        [C]: at 0x0102a00bd0

I've also partially ran the tests in test_language_model.lua, and they pass, although I've commented out the cudaApiForwardTest.

Let me know if there is anything else I can provide.

Bug on ARM hardware

Hi,

I am trying to run this code on a Raspberry Pi 2, as an evaluation of building smart cameras on limited hardware. Everything goes well at installation, though some of the python dependencies have to be apt-get installed instead of pip installed, but it "seems" to be OK.
Note the installation takes ~10hrs, so I was really eager to see the processing. Unfortunately, I run into:

scozannet@ubuntu:~/neural-networks/neuraltalk2$ th eval.lua -model ../model_id1-501-1448236541.t7_cpu.t7 -image_folder ../images -num_images 10 -batch_size 1 -gpuid -1
/home/scozannet/neural-networks/torch/install/bin/luajit: ...ural-networks/torch/install/share/lua/5.1/torch/File.lua:289: table index is nil
stack traceback:
...ural-networks/torch/install/share/lua/5.1/torch/File.lua:289: in function 'readObject'
...ural-networks/torch/install/share/lua/5.1/torch/File.lua:272: in function 'readObject'
...ural-networks/torch/install/share/lua/5.1/torch/File.lua:288: in function 'readObject'
...ural-networks/torch/install/share/lua/5.1/torch/File.lua:288: in function 'readObject'
...ural-networks/torch/install/share/lua/5.1/torch/File.lua:319: in function 'load'
eval.lua:69: in main chunk
[C]: in function 'dofile'
...orks/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
[C]: at 0x0000cff9

Any idea of what I am doing wrong? I have tried postfixing the image folder with /, changing number of images, batch size...
My image set is a copy of your demo images, stored as .jpg.

Let me know if you need anything to help find out the problem.
Many thanks in advance for your help,

Inaccurate captioning of images

Hello,

Thank you for the excellent code and guide to run the code.
I was successful in running the eval.lua on a set of 12 images. But apart from a couple of images the prediction of caption on the images was inaccurate.
Why would this be happening? Also could I use another model? Could you point me to a better or more comprehensive model that would help increase the accuracy?

Thanks!

karpathy / neuraltalk2 Goto Github PK

neuraltalk2's Introduction

neuraltalk2's People

Contributors

Stargazers

Watchers

Forkers

neuraltalk2's Issues

Stage 1 (success)

$ th train.lua -input_h5 coco/cocotalk.h5 -input_json coco/cocotalk.json -checkpoint_path chkp -language_eval 1

Stage 2 (failure)

$ th train.lua -input_h5 coco/cocotalk.h5 -input_json coco/cocotalk.json -checkpoint_path chkp -language_eval 1 -finetune_cnn_after 0 -start_from saved_checkpoints/coco_initial.t7

[C]: at 0x00405e40

1. Initializing at the start of the program

2. And making the following change to the eval_script function

3. Loop the function

However the following method works(gives me slightly relevant captions)

1. Loop the function by adding the following line

2. Run eval.lua with "/home/mithul/torch/projects/neuraltalk2/cam" as the image folder

Recommend Projects

Recommend Topics

Recommend Org