Giter Club home page Giter Club logo

peteanderson80 / bottom-up-attention Goto Github PK

View Code? Open in Web Editor NEW
1.4K 26.0 377.0 13.77 MB

Bottom-up attention model for image captioning and VQA, based on Faster R-CNN and Visual Genome

Home Page: http://panderson.me/up-down-attention/

License: MIT License

CMake 0.91% Makefile 0.22% HTML 0.06% CSS 0.08% Jupyter Notebook 63.93% C++ 25.32% Shell 0.47% Python 6.52% Cuda 2.10% MATLAB 0.29% C 0.08% Dockerfile 0.02%
vqa visual-question-answering captioning-images faster-rcnn caffe image-captioning mscoco mscoco-dataset

bottom-up-attention's Introduction

bottom-up-attention

This code implements a bottom-up attention model, based on multi-gpu training of Faster R-CNN with ResNet-101, using object and attribute annotations from Visual Genome.

The pretrained model generates output features corresponding to salient image regions. These bottom-up attention features can typically be used as a drop-in replacement for CNN features in attention-based image captioning and visual question answering (VQA) models. This approach was used to achieve state-of-the-art image captioning performance on MSCOCO (CIDEr 117.9, BLEU_4 36.9) and to win the 2017 VQA Challenge (70.3% overall accuracy), as described in:

Some example object and attribute predictions for salient image regions are illustrated below.

teaser-bike teaser-oven

Note: This repo only includes code for training the bottom-up attention / Faster R-CNN model (section 3.1 of the paper). The actual captioning model (section 3.2) is available in a separate repo here.

Reference

If you use our code or features, please cite our paper:

@inproceedings{Anderson2017up-down,
  author = {Peter Anderson and Xiaodong He and Chris Buehler and Damien Teney and Mark Johnson and Stephen Gould and Lei Zhang},
  title = {Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering},
  booktitle={CVPR},
  year = {2018}
}

Disclaimer

This code is modified from py-R-FCN-multiGPU, which is in turn modified from py-faster-rcnn code. Please refer to these links for further README information (for example, relating to other models and datasets included in the repo) and appropriate citations for these works. This README only relates to Faster R-CNN trained on Visual Genome.

License

bottom-up-attention is released under the MIT License (refer to the LICENSE file for details).

Pretrained features

For ease-of-use, we make pretrained features available for the entire MSCOCO dataset. It is not necessary to clone or build this repo to use features downloaded from the links below. Features are stored in tsv (tab-separated-values) format that can be read with tools/read_tsv.py.

LINKS HAVE BEEN UPDATED TO GOOGLE CLOUD STORAGE (14 Feb 2021)

10 to 100 features per image (adaptive):

36 features per image (fixed):

Both sets of features can be recreated by using tools/generate_tsv.py with the appropriate pretrained model and with MIN_BOXES/MAX_BOXES set to either 10/100 or 36/36 respectively - refer Demo.

Contents

  1. Requirements: software
  2. Requirements: hardware
  3. Basic installation
  4. Demo
  5. Training
  6. Testing

Requirements: software

  1. Important Please use the version of caffe contained within this repository.

  2. Requirements for Caffe and pycaffe (see: Caffe installation instructions)

Note: Caffe must be built with support for Python layers and NCCL!

# In your Makefile.config, make sure to have these lines uncommented
WITH_PYTHON_LAYER := 1
USE_NCCL := 1
# Unrelatedly, it's also recommended that you use CUDNN
USE_CUDNN := 1
  1. Python packages you might not have: cython, python-opencv, easydict
  2. Nvidia's NCCL library which is used for multi-GPU training https://github.com/NVIDIA/nccl

Requirements: hardware

Any NVIDIA GPU with 12GB or larger memory is OK for training Faster R-CNN ResNet-101.

Installation

  1. Clone the repository
git clone https://github.com/peteanderson80/bottom-up-attention/
  1. Build the Cython modules

    cd $REPO_ROOT/lib
    make
  2. Build Caffe and pycaffe

    cd $REPO_ROOT/caffe
    # Now follow the Caffe installation instructions here:
    #   http://caffe.berkeleyvision.org/installation.html
    
    # If you're experienced with Caffe and have all of the requirements installed
    # and your Makefile.config in place, then simply do:
    make -j8 && make pycaffe

Demo

  1. Download pretrained model, and put it under data\faster_rcnn_models.

  2. Run tools/demo.ipynb to show object and attribute detections on demo images.

  3. Run tools/generate_tsv.py to extract bounding box features to a tab-separated-values (tsv) file. This will require modifying the load_image_ids function to suit your data locations. To recreate the pretrained feature files with 10 to 100 features per image, set MIN_BOXES=10 and MAX_BOXES=100. To recreate the pretrained feature files with 36 features per image, set MIN_BOXES=36 and MAX_BOXES=36 use this alternative pretrained model instead. The alternative pretrained model was trained for fewer iterations but performance is similar.

Training

  1. Download the Visual Genome dataset. Extract all the json files, as well as the image directories VG_100K and VG_100K_2 into one folder $VGdata.

  2. Create symlinks for the Visual Genome dataset

    cd $REPO_ROOT/data
    ln -s $VGdata vg
  3. Generate xml files for each image in the pascal voc format (this will take some time). This script will extract the top 2500/1000/500 objects/attributes/relations and also does basic cleanup of the visual genome data. Note however, that our training code actually only uses a subset of the annotations in the xml files, i.e., only 1600 object classes and 400 attribute classes, based on the hand-filtered vocabs found in data/genome/1600-400-20. The relevant part of the codebase is lib/datasets/vg.py. Relation labels can be included in the data layers but are currently not used.

    cd $REPO_ROOT
    ./data/genome/setup_vg.py
  4. Please download the ImageNet-pre-trained ResNet-100 model manually, and put it into $REPO_ROOT/data/imagenet_models

  5. You can train your own model using ./experiments/scripts/faster_rcnn_end2end_multi_gpu_resnet_final.sh (see instructions in file). The train (95k) / val (5k) / test (5k) splits are in data/genome/{split}.txt and have been determined using data/genome/create_splits.py. To avoid val / test set contamination when pre-training for MSCOCO tasks, for images in both datasets these splits match the 'Karpathy' COCO splits.

    Trained Faster-RCNN snapshots are saved under:

    output/faster_rcnn_resnet/vg/
    

    Logging outputs are saved under:

    experiments/logs/
    
  6. Run tools/review_training.ipynb to visualize the training data and predictions.

Testing

  1. The model will be tested on the validation set at the end of training, or models can be tested directly using tools/test_net.py, e.g.:

    ./tools/test_net.py --gpu 0 --imdb vg_1600-400-20_val --def models/vg/ResNet-101/faster_rcnn_end2end_final/test.prototxt --cfg experiments/cfgs/faster_rcnn_end2end_resnet.yml --net data/faster_rcnn_models/resnet101_faster_rcnn_final.caffemodel > experiments/logs/eval.log 2<&1
    

    Mean AP is reported separately for object prediction and attibute prediction (given ground-truth object detections). Test outputs are saved under:

    output/faster_rcnn_resnet/vg_1600-400-20_val/<network snapshot name>/
    

Expected detection results for the pretrained model

objects [email protected] objects weighted [email protected] attributes [email protected] attributes weighted [email protected]
Faster R-CNN, ResNet-101 10.2% 15.1% 7.8% 27.8%

Note that mAP is relatively low because many classes overlap (e.g. person / man / guy), some classes can't be precisely located (e.g. street, field) and separate classes exist for singular and plural objects (e.g. person / people). We focus on performance in downstream tasks (e.g. image captioning, VQA) rather than detection performance.

bottom-up-attention's People

Contributors

alessandrosteri avatar bharatpublic avatar bharatsingh430 avatar peteanderson80 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

bottom-up-attention's Issues

About test results

Hi, I just run the test code using your trained resnet101 model on the test set. I got the following numbers on object detection task:

Mean AP = 0.0146
Weighted Mean AP = 0.1799
Mean Detection Threshold = 0.328

The mean AP (1.46%) is far from the number (10.2%) you reported in the table at the bottom of readme. The weighted mean AP is a bit higher than the number you reported. I am wondering whether there is a typo in your table.

thanks!

Image caption

hi,can you release your image caption implement version?

binascii.Error: Incorrect padding

Got binascii.Error: Incorrect padding when reading image 300104 from test2014/test2014_resnet101_faster_rcnn_genome.tsv.1 with tools/read_tsv.py. Anything wrong?

Traceback (most recent call last):
  File "read_tsv.py", line 64, in <module>
    read_and_save(os.path.join(in_dir, in_file), out_dir)
  File "read_tsv.py", line 45, in read_and_save
    item['features'] = np.frombuffer(base64.decodestring(item['features']), dtype=np.float32).reshape((item['num_boxes'], -1))
  File "/usr/lib64/python2.7/base64.py", line 321, in decodestring
    return binascii.a2b_base64(s)
binascii.Error: Incorrect padding

activation of the relation prediction

@peteanderson80
I saw the option "HAS_RELATION" in the cfg file. I turned it on and add a top[6] data for the proposal_target_layer and set the param num_rel_classes to 21(I am not sure if this is correct for the vg_1600-400-20 dataset) and start training, I got the following error:

File "/home/work/bottom-up-attention/tools/../lib/roi_data_layer/minibatch.py", line 55, in get_minibatch
    "Generation of gt_relations doesn't accomodate dropping objects"
AssertionError: Generation of gt_relations doesn't accomodate dropping objects

Is there something wrong with my setting?

Compilation Error

i got the following error:

token ""CUDACC_VER is no longer supported. Use CUDACC_VER_MAJOR, CUDACC_VER_MINOR, and CUDACC_VER_BUILD instead."" is not valid in preprocessor expressions

i've updated eigen already and still get the same error...
what should i do?

example images without bounding boxes

First of all, thanks for making this project public.

In the paper and this repo's README, two example images are used to show your model's prediction qualitatively. (images with bike and oven) Since I can't figure out where those images are in the dataset (VG or COCO), is it possible for you to provide where those images are?

Thanks in advance.

GloVe word embedding for top down attention model

Thank you for the fascinating paper and code!

I notice that for the image captioning model, you decided to use the standard vocabulary and train the word embedding matrix from scratch. So I've been wondering if I apply the approach mentioned in your previous paper about Constrained Beam Search (pretrained GloVe vectors with expanded vocabulary), will it improve the performance of the model?

Running evaluation script on CPU

Hello,

is it possible to run the evaluation script on CPU? Should I still install Caffe in the way proposed on the readme of this repository?

Thanks,
Claudio

why using the VG data to train the Faster-RCNN model

Thanks for sharing the models and features. I have tried the feature for VQA with my own model, really surprising results indeed :)
I have two questions as follows:

  1. As the VQA datasets is based on the images of MSCOCO, will it be better to train the faster rcnn model on the COCO dataset directly?
  2. Could a better object detection model, e.g., R-fcn or Deformable R-fcn further improve the VQA performance?

Problem in running demo.ipynb

net = caffe.Net(prototxt, caffe.TEST, weights=weights)
Traceback (most recent call last):
File "", line 1, in
Boost.Python.ArgumentError: Python argument types in
Net.init(Net, str, int)
did not match C++ signature:
init(boost::python::api::object, std::string, std::string, int)
init(boost::python::api::object, std::string, int)

read_tsv file error

when I use default setting, which r+b, to open tsv file, error occurs like
_csv.Error: iterator should return strings, not bytes (did you open the file in text mode?)
in line "for item in reader:"
when I use r+
TypeError: expected bytes-like object, not str. occur at np.frombuffer(base64.decodestring(item[field]),
my environment in windows. should it matter?

I run python demo.py failed!

I was download the model file first. when I run demo.py get message like this:
I1017 11:02:41.250684 40048 net.cpp:131] Top shape: 1 2 126 14 (3528)
I1017 11:02:41.250689 40048 net.cpp:139] Memory required for data: 117482412
I1017 11:02:41.250692 40048 layer_factory.hpp:77] Creating layer rpn_cls_prob_reshape
I1017 11:02:41.250699 40048 net.cpp:86] Creating Layer rpn_cls_prob_reshape
I1017 11:02:41.250704 40048 net.cpp:408] rpn_cls_prob_reshape <- rpn_cls_prob
I1017 11:02:41.250713 40048 net.cpp:382] rpn_cls_prob_reshape -> rpn_cls_prob_reshape
I1017 11:02:41.250741 40048 net.cpp:124] Setting up rpn_cls_prob_reshape
I1017 11:02:41.250747 40048 net.cpp:131] Top shape: 1 18 14 14 (3528)
I1017 11:02:41.250751 40048 net.cpp:139] Memory required for data: 117496524
I1017 11:02:41.250756 40048 layer_factory.hpp:77] Creating layer proposal
F1017 11:02:41.250799 40048 layer_factory.hpp:81] Check failed: registry.count(type) == 1 (0 vs. 1) Unknown layer type: Python (known types: AbsVal, Accuracy, ArgMax, BNLL, BatchNorm, BatchReindex, Bias, BoxAnnotatorOHEM, Concat, ContrastiveLoss, Convolution, Crop, Data, Deconvolution, Dropout, DummyData, ELU, Eltwise, Embed, EuclideanLoss, Exp, Filter, Flatten, HDF5Data, HDF5Output, HingeLoss, Im2col, ImageData, InfogainLoss, InnerProduct, InnerProductBlob, Input, LRN, LSTM, LSTMUnit, Log, MVN, MemoryData, MultinomialLogisticLoss, PReLU, PSROIPooling, Parameter, Pooling, Power, RNN, ROIPooling, ReLU, Reduction, Reshape, SPP, Scale, Sigmoid, SigmoidCrossEntropyLoss, Silence, Slice, SmoothL1Loss, SmoothL1LossOHEM, Softmax, SoftmaxWithLoss, SoftmaxWithLossOHEM, Split, TanH, Threshold, Tile, WindowData)
*** Check failure stack trace: ***
Aborted (core dumped)

Generation of 1600-400-20 vocab files

I run the setup_vg.py script with the max_objects/attributes/relation set to 1600/400/20 respectively. However, the generated vocabulary files are slightly different from the ones provided in the 1600-400-20 folder. Is there any manual post-procedure?

run tools/demo.ipynb error

[libprotobuf ERROR google/protobuf/text_format.cc:274] Error parsing text-format caffe.NetParameter: 6305:21: Message type "caffe.LayerParameter" has no field named "roi_pooling_param".
WARNING: Logging before InitGoogleLogging() is written to STDERR
F0216 21:21:02.787189 22074 upgrade_proto.cpp:90] Check failed: ReadProtoFromTextFile(param_file, param) Failed to parse NetParameter file: /bottom-up-attention/models/vg/ResNet-101/faster_rcnn_end2end/test.prototxt
*** Check failure stack trace: ***
Aborted

could you please give me a solution to fact this error? Thank you!!

ConstructorError: could not determine a constructor for the tag 'tag:yaml.org,2002:python/tuple'

Hi , I run generate_tsv.py to generate pretrained features,but i meet a problem.

here is the problem :
**Process Process-1:
Traceback (most recent call last):
File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
self._target(*self._args, self._kwargs)
File "generate_tsv.py", line 157, in generate_tsv
net = caffe.Net(prototxt, caffe.TRAIN, weights=weights)
File "/home/lijinze/bottom-up-attention-master/tools/../lib/rpn/anchor_target_layer.py", line 27, in setup
layer_params = yaml.load(self.param_str)
File "/usr/local/lib/python2.7/dist-packages/yaml/init.py", line 72, in load
return loader.get_single_data()
File "/usr/local/lib/python2.7/dist-packages/yaml/constructor.py", line 39, in get_single_data
return self.construct_document(node)
File "/usr/local/lib/python2.7/dist-packages/yaml/constructor.py", line 48, in construct_document
for dummy in generator:
File "/usr/local/lib/python2.7/dist-packages/yaml/constructor.py", line 398, in construct_yaml_map
value = self.construct_mapping(node)
File "/usr/local/lib/python2.7/dist-packages/yaml/constructor.py", line 208, in construct_mapping
return BaseConstructor.construct_mapping(self, node, deep=deep)
File "/usr/local/lib/python2.7/dist-packages/yaml/constructor.py", line 133, in construct_mapping
value = self.construct_object(value_node, deep=deep)
File "/usr/local/lib/python2.7/dist-packages/yaml/constructor.py", line 88, in construct_object
data = constructor(self, node)
File "/usr/local/lib/python2.7/dist-packages/yaml/constructor.py", line 414, in construct_undefined
node.start_mark)
ConstructorError: could not determine a constructor for the tag 'tag:yaml.org,2002:python/tuple'
in "", line 2, column 11:
'scales': !!python/tuple [4, 8, 16, 32]

Could you do me a favor?Thank you very much!

How l2-normalization over feature is implemented ?

this paper states that L2 normalization of the image features is crucial for good performance. However, you just use pool5 data, which is average pooled to become a 2048 vector in generate_tsv.py

I'm wondering if you have implemented L2-normalization over feature or not. If you did, please inform me how you do it. Thanks a lot~

Could you please list the platform version?

When I make pycaffe follow your configuration, it always arise some annoying conflicts. Could you please list the platform used in your code?

Besides, my environment is :

  • Ubuntu16.04
  • CUDA 8
  • CUDNN 5.0
  • Opencv2.4.13
  • mkl2016
  • NCCL

Is that worked?
Thanks~

How to load .tsv datas in Tensorflow?

I am trying to use your .tsv datas as image features for image captioning. However, I have no idea how to load the .tsv data so as to random batch sample the features items and match correlated captions. I find a way to solve it that is to turn .tsv into .json format, and make the .json file to be a dict with "image_ids" as the keys. However, the .json file to too large to load. In fact, I am even failed to generate the json file since the lack of Memory. I am also failed to use tensorflow's textlinereader.

So,
How to load .tsv datas rightly?
Looking forward to your help!
Thank you!

When I run make -j8 && make pycaffe error.

CXX src/caffe/internal_thread.cpp
CXX src/caffe/layer.cpp
CXX src/caffe/blob.cpp
CXX src/caffe/syncedmem.cpp
CXX src/caffe/solver.cpp
CXX src/caffe/layer_factory.cpp
CXX src/caffe/data_transformer.cpp
CXX src/caffe/layers/hdf5_data_layer.cpp
In file included from ./include/caffe/util/math_functions.hpp:11:0,
from src/caffe/syncedmem.cpp:3:
./include/caffe/util/mkl_alternate.hpp:14:19: fatal error: cblas.h: No such file or directory
#include <cblas.h>
^
compilation terminated.
make: *** [.build_release/src/caffe/syncedmem.o] Error 1
make: *** Waiting for unfinished jobs....
In file included from ./include/caffe/util/math_functions.hpp:11:0,
from src/caffe/blob.cpp:7:
./include/caffe/util/mkl_alternate.hpp:14:19: fatal error: cblas.h: No such file or directory
#include <cblas.h>
^
compilation terminated.
make: *** [.build_release/src/caffe/blob.o] Error 1
In file included from ./include/caffe/util/math_functions.hpp:11:0,
from ./include/caffe/layer.hpp:12,
from src/caffe/layer_factory.cpp:8:
./include/caffe/util/mkl_alternate.hpp:14:19: fatal error: cblas.h: No such file or directory
#include <cblas.h>
^
compilation terminated.
make: *** [.build_release/src/caffe/layer_factory.o] Error 1
In file included from ./include/caffe/util/math_functions.hpp:11:0,
from ./include/caffe/layer.hpp:12,
from src/caffe/layer.cpp:1:
./include/caffe/util/mkl_alternate.hpp:14:19: fatal error: cblas.h: No such file or directory
#include <cblas.h>
^
compilation terminated.
make: *** [.build_release/src/caffe/layer.o] Error 1
In file included from ./include/caffe/util/math_functions.hpp:11:0,
from ./include/caffe/layer.hpp:12,
from ./include/caffe/layers/hdf5_data_layer.hpp:10,
from src/caffe/layers/hdf5_data_layer.cpp:17:
./include/caffe/util/mkl_alternate.hpp:14:19: fatal error: cblas.h: No such file or directory
#include <cblas.h>
^
compilation terminated.
make: *** [.build_release/src/caffe/layers/hdf5_data_layer.o] Error 1
In file included from ./include/caffe/util/math_functions.hpp:11:0,
from src/caffe/data_transformer.cpp:10:
./include/caffe/util/mkl_alternate.hpp:14:19: fatal error: cblas.h: No such file or directory
#include <cblas.h>
^
compilation terminated.
make: *** [.build_release/src/caffe/data_transformer.o] Error 1
In file included from ./include/caffe/util/math_functions.hpp:11:0,
from ./include/caffe/layer.hpp:12,
from ./include/caffe/net.hpp:12,
from ./include/caffe/solver.hpp:7,
from src/caffe/solver.cpp:6:
./include/caffe/util/mkl_alternate.hpp:14:19: fatal error: cblas.h: No such file or directory
#include <cblas.h>
^
compilation terminated.
make: *** [.build_release/src/caffe/solver.o] Error 1
In file included from ./include/caffe/util/math_functions.hpp:11:0,
from src/caffe/internal_thread.cpp:5:
./include/caffe/util/mkl_alternate.hpp:14:19: fatal error: cblas.h: No such file or directory
#include <cblas.h>
^
compilation terminated.
make: *** [.build_release/src/caffe/internal_thread.o] Error 1

Test 2014 adaptive features are not getting prepared

I tried creating numpy files from test2014 variable box features using the following read_csv file in this git. but it says there's an error in padding while decoding from string to numpy array.

item[field] = np.frombuffer(base64.decodestring(item[field]),
dtype=np.float32).reshape((item['num_boxes'],-1))

it says, incorrect padding within decodestring function. I tried adding ''=" at the end but then the dimensions of resultant numpy array mismatches with num_boxes. This error occurs for every tsv file in test2014 adaptive feature set.

Tried debugging the code:
hRPQAAAAAAAAAAAAAAALSF8D0AAAAAAAAAAEK5hkBUkI4/ubE7PwAAAABMs648eTKRO2Xq5z1mSOE+aKcrPwAAAAAAAAAAAAAAABsb3T7nhK49jVvEPirGgjqzJrM+AAAAAJLICj8B0G4+HmhvPvccLz4AAAAAq37BOoivl0AtCwg8AAAAAAAAAAAAAAAAtsQoPgAAAADbHCo7AAAAAAAAAACSgow/AAAAAFYsqT+fwoM9AAAAAIEkFkHFf6U8AAAAAAAAAACspfQ+AAAAAAAAAACbh5E9AAAAAF/CRz8AAAAAAAAAAGXGa0BfWrs/FetIPKe0RD8AAAAAzPLROwAAAAAAAAAAAAAAAHy/Rz7JO49A/5cPP8bSlz4AAAAANOKEQAAAAAAAAAAAAAAAAAAAA437192 length of string=237423

/home/juan_fernandez/scripts/read_tsv.py(74)()
-> pdb.set_trace()
(Pdb) c
Traceback (most recent call last):
File "scripts/read_tsv.py", line 74, in
pdb.set_trace()
File "/home/juan_fernandez/anaconda2/envs/py27/lib/python2.7/base64.py", line 328, in decodestring
return binascii.a2b_base64(s)
binascii.Error: Incorrect padding

Any help would be appreciated.

Struggling for installation

Hello,

I am heavily failing to build the shipped Caffe with anaconda, mainly linking errors w.r.t google protobuf. Been struggling for like 7-8 hours and I am pretty close to give up.

So the question is what are the modifications shipped in the caffe/ folder? Can't we really use upstream Caffe?

Training Scripts

It is a nice work and the feature improves a lot ! Could you please share the detailed training process for the final model?

how to set larger batch size in training????

I modify the BATCH_SIZE from 64 to 192 in faster_rcnn_end2end_resnet.yml, but I get error :

F1206 14:38:02.929175 195836 loss_layer.cpp:19] Check failed: bottom[0]->num() == bottom[1]->num() (32 vs. 96) The data and label should have the same number.

I think I miss some thing in configure setting. May be another parameter should also be changed corresponding to the BATCH_SIZE.

Tensorflow version ?

Hi,

I have a question. I would like to know whether you will be releasing a tensorflow version of your code ?

Can sombody share the Pretrained features?

I used Chrome to download the features, but the speed is lower than 50kb/s and when downloading 50~100MB, it would interupt and when I redownload it will start from the begining.....
I feel helpless...

which proto is for generate_tsv.py๏ผŸ

I want to use your model to get features, but I don't know what proto do I need for generate_tsv.py.
Following the "test part", I choose "models/vg/ResNet-101/faster_rcnn_end2end_final/test.prototxt", but it report error :
cudnn.hpp:122] Check failed: status == CUDNN_STATUS_SUCCESS (3 vs. 0) CUDNN_STATUS_BAD_PARAM
I will run this model in XMedia Wikipedia and Pascal. I want more introduction about this part please.
btw, do I need to resize the img shape to 224,224,3 ?
Thank you

Required gpu memory

Hello
I am trying to use the pretrained model to extract image features, I have GTX 1070(8 gb) and I get out of memory error when I use the network on one image, I suspect this is a caffe issue regarding memory mangament, what do you suggest to solve this issue without decreasing performance?

not able to define function in lib/fast_rcnn/test.py

I want to define a function in lib/fast_rcnn/test.py . I implemented new function in test.py and imported test in demo.ipnb. When I access new function as test.new_function() it throw error module object has not attribute as new_function. How can we define new function test.py ?

image captioning task

hi, peteanderson:
I have paid close attention to you for a long time in the Cross-modal field. And, your bottom-up& top-down work really made greatly improved than other works. According to your paper, i accomplished the top-down algorithm for image captioning. But i could not reproduce your 'CIDER loss'. So i just use the 'cross entropy loss' . If possible, i hope you will put your image captioning code on github.
Wish you will rollout another wonderful article in CVPR 2018.

How about the image caption model?

Hello, the attribute extraction net using bottom-up attention you proposed is impressing! It indeed boosts the image caption performance in the paper. Besides the attention model, I am also interested in the caption model designed in the paper. In the paper, you mentioned that your caption model achieves performance comparable to start-of-art on most evaluation metrics. So to compare with my own model, can you provide your captioning model implementation code? :)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.