yikang-li / msdn Goto Github PK

This is our PyTorch implementation of Multi-level Scene Description Network (MSDN) proposed in our ICCV 2017 paper.

Home Page: http://cvboy.com/publication/iccv2017_msdn/

Python 89.92% C++ 0.23% Cuda 3.98% C 5.27% Shell 0.32% MATLAB 0.28%

msdn's Introduction

Multi-level Scene Description Network

This is our implementation of Multi-level Scene Description Network in Scene Graph Generation from Objects, Phrases and Region Captions. The project is based on PyTorch version of faster R-CNN. (Update: model links have been updated. Sorry for the inconvenience.)

Updates

We have released our newly proposed scene graph generation model in our ECCV-2018 paper:

Factorizable Net: An Efficient Subgraph-based Framework for Scene Graph Generation.

Check the github repo Factorizable Net if you are interested.

Progress

README for training
README for project settings
our trained RPN
our trained Full Model
Our cleansed Visual Genome Dataset
training codes
evaluation codes
Model acceleration (please refer to our ECCV project).
Multi-GPU support: we have release a beta Multi-GPU version of our FactorizableNet. If you want to enhance the training speed, please check that project.

We are still working on the project. If you are interested, please Follow our project.

Project Settings

Install the requirements (you can use pip or Anaconda):

conda install pip pyyaml sympy h5py cython numpy scipy
conda install -c menpo opencv3
conda install -c soumith pytorch torchvision cuda80 
pip install easydict

Clone the Faster R-CNN repository

git clone [email protected]:yikang-li/MSDN.git

Build the Cython modules for nms and the roi_pooling layer
```
cd MSDN/faster_rcnn
./make.sh
cd ..
```
Download the trained full model and trained RPN, and place it to output/trained_model
Download our cleansed Visual Genome dataset. And unzip it:

tar xzvf top_150_50.tgz

p.s. Our ipython scripts for data cleansing is also released.

Download Visual Genome images
Place Images and cleansed annotations to coresponding folders:

mkdir -p data/visual_genome
cd data/visual_genome
ln -s /path/to/VG_100K_images_folder VG_100K_images
ln -s /path/to/downloaded_folder top_150_50

p.s. You can change the default data directory by modifying __C.IMG_DATA_DIR in faster_rcnn/fast_rcnn/config.py

Training

Training in multiple stages. (Single-GPU training may take about one week.)
1. Training RPN for object proposals and caption region proposals (the shared conv layers are fixed). We also provide our pretrained RPN model.
by default, the training is done on a small part of the full dataset:
```
 CUDA_VISIBLE_DEVICES=0 python train_rpn.py
```
For full Dataset Training:
```
 CUDA_VISIBLE_DEVICES=0 python train_rpn.py --max_epoch=10 --step_size=2 --dataset_option=normal --model_name=RPN_full_region
```
--step_size is set to indicate the number of epochs to decay the learning rate, dataset_option is to indicate the \[ small | fat | normal \] subset.
1. Training MSDN
Here, we use SGD (controled by --optimizer)by default:
```
 CUDA_VISIBLE_DEVICES=0 python train_hdn.py --load_RPN --saved_model_path=./output/RPN/RPN_region_full_best.h5  --dataset_option=normal --enable_clip_gradient --step_size=2 --MPS_iter=1 --caption_use_bias --caption_use_dropout --rnn_type LSTM_normal 
```

Furthermore, we can directly use end-to-end training from scratch (not recommended). The result is not good.

 CUDA_VISIBLE_DEVICES=0 python train_hdn.py  --dataset_option=normal --enable_clip_gradient  --step_size=3 --MPS_iter=1 --caption_use_bias --caption_use_dropout --max_epoch=11 --optimizer=1 --lr=0.001

Evaluation

Our pretrained full Model is provided for your evaluation for further implementation. (Please download the related files in advance.)

./eval.sh

Currently, the accuracy of our released version is slightly different from the reported results in the paper:Recall@50: 11.705%; Recall@100: 14.085%.

Acknowledgement

We thank longcw for his generously releasing the PyTorch Implementation of Faster R-CNN.

Reference

@inproceedings{li2017msdn,
author={Li, Yikang and Ouyang, Wanli and Zhou, Bolei and Wang, Kun and Wang, Xiaogang},
title={Scene graph generation from objects, phrases and region captions},
booktitle = {Proceedings of the IEEE International Conference on Computer Vision},
year = {2017}
}

License:

The pre-trained models and the MSDN technique are released for uncommercial use.

Contact Yikang LI if you have questions.

msdn's People

Contributors

Stargazers

Watchers

msdn's Issues

Trafic limits error occurs when attempting to download full pre-trained network

Hello, Thank you for your great article and source code.
When I click on the Full pre-trained network link you provided, I get the following error: What should I do?

train_rpn_region.py ??

@yikang-li
Where is train_rpn_region.py??

GPU memory leakage during evaluation

Thanks for your work. I tried to run eval.sh and got an error called "cuda out of memory" .
env:
-- python 2.7
-- pytorch 0.4.1
-- cuda 9.0
-- gpu nvidia titan x

I found it's caused by a class named "RoIPoolFunction(Function)". In "forward" function of this class , there're some assignment expressions such as "self.output = output" and if I comment these expressions , the code works. I guess , when the model running in the eval mode , tensor like "self.output" wont be released (or grad of tensor ?) and memory leakage happens.

ImportError: No module named _roi_pooling

![image](https://user-images.githubusercontent.com/54923253/117979962-0c570e00-b366-11eb-8bdb-21c5dd618a1c.png
Is there missing a file???

unable to Build the Cython modules for nms and the roi_pooling layer

Hi!
I am studying your paper on MSDN and trying to run your model on my computer. When i try to execute cd MSDN-master/faster_rcnn ./make.sh It gives the following error

Traceback (most recent call last):
File "setup.py", line 59, in
CUDA = locate_cuda()
File "setup.py", line 52, in locate_cuda
for k, v in cudaconfig.iteritems():
AttributeError: 'dict' object has no attribute 'iteritems'
Compiling roi pooling kernels by nvcc...
./make.sh: line 10: nvcc: command not found
Traceback (most recent call last):
File "build.py", line 3, in
from torch.utils.ffi import create_extension
File "/home/faaiz/anaconda3/lib/python3.7/site-packages/torch/utils/ffi/init.py", line 1, in
raise ImportError("torch.utils.ffi is deprecated. Please use cpp extensions instead.")
ImportError: torch.utils.ffi is deprecated. Please use cpp extensions instead.

Please help if you know anything about the error i am getting

About Inverse Weight

Hello,
thank for your beautiful codes.

I want to ask you for how can I by myself transform the 'objects' and 'predicate' from unicode to float?
Is there some function that can be used to do it?

Now I'm trying to use your code with ImageNet. I can't find in the Internet how to get the inverse weight.

I'm looking forward to your answer.

Thank you.

RuntimeError: Expected object of type torch.cuda.FloatTensor but found type torch.cuda.LongTensor for argument #2 'other'

Hello, yikang-li!
when I runing: CUDA_VISIBLE_DEVICES=0 python train_rpn.py，I get this error：

/home/tp/MSDN-master/faster_rcnn/RPN.py:140: UserWarning: Implicit dimension choice for softmax has been deprecated. Change the call to include dim=X as an argument.
rpn_cls_prob = F.softmax(rpn_cls_score_reshape)
/home/tp/.local/lib/python2.7/site-packages/torch/nn/functional.py:52: UserWarning: size_average and reduce args will be deprecated, please use reduction='sum' instead.
warnings.warn(warning.format(ret))
Traceback (most recent call last):
File "train_rpn.py", line 194, in
main()
File "train_rpn.py", line 73, in main
train(train_loader, net, optimizer, epoch)
File "train_rpn.py", line 117, in train
target_net(im_data, im_info.numpy(), gt_objects.numpy()[0], gt_regions.numpy()[0])
File "/home/tp/.local/lib/python2.7/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/home/tp/MSDN-master/faster_rcnn/RPN.py", line 182, in forward
self.build_loss(rpn_cls_score_reshape, rpn_bbox_pred, rpn_data)
File "/home/tp/MSDN-master/faster_rcnn/RPN.py", line 240, in build_loss
rpn_loss_box = F.smooth_l1_loss(rpn_bbox_pred, rpn_bbox_targets, size_average=False) /(fg_cnt + 1e-4)
RuntimeError: Expected object of type torch.cuda.FloatTensor but found type torch.cuda.LongTensor for argument #2 'other'

How do I resolve this error? I use python2.7, cuda9.0 ,pytorch0.4.1

loss : nan problem

thank you for your nice code
I am studying your paper MSDN.
so i tried to implement your code.
I successfully implemented your code but, I got this ploblem

Epoch: [0][1000/15000] [lr: 0.01] [Solver: SGD]
Batch_Time: 0.322s FRCNN Loss: nan RPN Loss: 0.9130
[Loss] obj_cls_loss: nan obj_box_loss: nan pred_cls_loss: nan, caption_loss: 9.0235, region_box_loss: nan, region_objectness_loss: nan
[object] tp: 0.00, tf: 0.00, fg/bg=(46/175)
[predicate] tp: 0.00, tf: 0.00, fg/bg=(99/412)
[region] tp: 0.00, tf: 0.00, fg/bg=(48/76)

Most loss is Nan ...
there is no warning in console.
if you know the reason why of this problem..give me some help.

thank you!!

About training set

Hi, thank you for your nice code.
In section 4.1 of the paper, you mention that the training set contains 70998 images, however the provided cleansed Visual Genome dataset contains 46164 images for training. What's the difference between the two dataset and do they provide similar result?
Thank you

Performance of model

Hi Yikang.
I am studying your code.

I ran training code from provided RPN model.
and evaluate trained model using 'eval.sh'

but, I got low performance than provided full model.
My trained model's performance is as follows.
Recall@50 : 8.772%
Recall@100 : 10.908%

Used training command is
CUDA_VISIBLE_DEVICES=0 python train_hdn.py
--load_RPN
--saved_model_path=./output/RPN/RPN_region_full_best.h5
--dataset_option=normal --enable_clip_gradient
--step_size=2
--MPS_iter=1
--caption_use_bias
--caption_use_dropout
--rnn_type LSTM_normal

How can I get normal performance?

Also I found difference from between git code and paper.
paper's MPS_iter is 2, but code's baseline is 1.
paper's message passing method is Message_Passing_Unit_v2(add) , but code's baseline is Message_Passing_Unit_v1(mean).
Could you let me know why there are different?

Fail to download data (trained models, cleansed VG dataset)

Hi there,

firstly thank you for your fantastic work! I fail to access the files of 1) trained full model, 2) trained RPN 3) cleansed Visual Genome dataset in your step 4 & 5. The dropbox link seems down. Could you please check for me?

Thanks :)

'Tensor' object has no attribute 'astype'

File "/home/frank/MSDN/faster_rcnn/fast_rcnn/bbox_transform.py", line 78, in bbox_transform_inv_hdn
boxes = boxes.astype(deltas.dtype, copy=False)
AttributeError: 'Tensor' object has no attribute 'astype'

Results for PredCls and PhrCls

Hi Yikang,

I'm looking to replicate the results for the other visual genome scene graph evaluation modes. To get the test results under your evaluation, I would need to run something like the following, right?

        if args.mode == 'sggen':
            total_cnt_t, rel_cnt_correct_t = net.evaluate(
                im_data, im_info, gt_objects.numpy()[0], gt_relationships.numpy()[0], gt_regions.numpy()[0],
                top_Ns = top_Ns, nms=True)
        elif args.mode == 'phrcls':
            total_cnt_t, rel_cnt_correct_t = net.evaluate(
                im_data, im_info, gt_objects.numpy()[0], gt_relationships.numpy()[0], gt_regions.numpy()[0],
                top_Ns = top_Ns, nms=False, use_gt_boxes=True, use_gt_regions=False)
        elif args.mode == 'predcls':
            total_cnt_t, rel_cnt_correct_t = net.evaluate(
                im_data, im_info, gt_objects.numpy()[0], gt_relationships.numpy()[0], gt_regions.numpy()[0],
                top_Ns = top_Ns, nms=False, use_gt_boxes=True, use_gt_regions=False, only_predicate=True)

I had to change a couple of things too:

Hierarchical_Descriptive_Model.evaluate throws an error when use_gt_boxes=True because im_info is a 1 x 3 tensor. I got the best results when I uncommented the division by the image scale (which makes sense as then the GT boxes are at the same scale as the ROI proposals), can you confirm that e.g. gt_boxes_object = gt_objects[:, :4] is right?
https://github.com/yikang-li/MSDN/blob/master/faster_rcnn/MSDN.py#L118 seems like it contains a bug, because only the top couple of ROIs are overwritten. Can you confirm that it should be changed to object_rois = object_rois_gt?

However, even when I did these things, I can't match your paper results for PredCls and PhrCls. For PredCls for instance, I get around 37% R@50 and 46% R@50. Is there something else you did to get these numbers?

Thanks! -Rowan

PredCls, PhrCls

Hi, Thank you for your nice code.
Would you please tell me the method of measuring PredCls and PhrCls?

Dataset on your papers and evaluate object detection performance

Hi, I have some question regarding the dataset and evaluate the object detection.
At the moment, I'm using the normal dataset ( train_normal.json, 46164 images) to train a similar Faster-RCNN with the one used in your work for object detection on Visual Genome, then I want to get the same mAP as in your paper (6.72 for Faster RCNN only). From this issue, I know that you used a different dataset in the paper, so I have some question:

How could I get the exact same training and testing dataset as your papers.
How did you evaluate mAP object detection of Faster RCNN and your MSDN?
Thank you!

Some problems with train & eval

Thank you for your detailed project explanation.

With your provided data and code, I've been trying training and eval.

While training, your readme states:
CUDA_VISIBLE_DEVICES=0 python train_rpn_region.py
You mean train_rpn_region.py as train_rpn.py?

Also, I can't find faster_rcnn.roi_data_layer.roidb

Furthermore, with evaluation code,
RPN_v3 module doesn't exist. Do you mean from RPN_v3 import RPN as from RPN import RPN?

Also, eval still doesn't work with error, I still have following error:

➜  MSDN git:(master) ✗ bash eval.sh
Traceback (most recent call last):
  File "train_hdn.py", line 13, in <module>
    from faster_rcnn.MSDN import Hierarchical_Descriptive_Model
  File "/home/junho/MSDN/faster_rcnn/MSDN.py", line 45, in <module>
    class Hierarchical_Descriptive_Model(MSDN_base):
NameError: name 'MSDN_base' is not defined

eval.sh is your provided eval code

CUDA_VISIBLE_DEVICES=0 python train_hdn.py \   
	--resume_training --resume_model ./pretrained_models/HDN_1_iters_alt_normal_I_LSTM_with_bias_with_dropout_0_5_nembed_256_nhidden_512_with_region_regression_resume_SGD_best.h5 \   
	--dataset_option=normal  --MPS_iter=1 \   
	--caption_use_bias --caption_use_dropout \   
	--rnn_type LSTM_normal

I am referring faster-rcnn code (https://github.com/longcw/faster_rcnn_pytorch) but still need some help because your implementation is fork version of it.
Thank you very much for releasing the code.

GRU Unit

Hi, in your code, there is GRU unit. I know it's used to update the feature. But from your code, it seems not exactly like what you said in your paper . To my understanding, there will be a FC layer on object_sub and another different FC layer on object_obj. But in your code, it seems you calculate the average of object_sub and object_obj, and then use one FC layer on that average and another FC on feature_obj. Did I understand something wrong?

GRU_input_feature_object = (object_sub + object_obj) / 2.
out_feature_object = feature_obj + self.GRU_object(GRU_input_feature_object, feature_obj)

Failed to use the caption function in faster_rcnn/MSDN.py

when I put an img_path in the caption function in faster_rcnn/MSDN.py, I got the error
"File "/home/wangsijin/projects/new_MSDN/faster_rcnn/RPN.py", line 127, in forward
im_data = Variable(im_data.cuda())
AttributeError: 'numpy.ndarray' object has no attribute 'cuda'"

Then I translated the ndarray to a tensor using "torch.from_numpy()" and got the error
"RuntimeError: Given groups=1, weight[64, 3, 3, 3], so expected input[1, 600, 800, 3] to have 3 channels, but got 600 channels instead"

Then I used the ".transpose()" to transpose the 16008003 tensor to 13600800 tensor. And I got an error again
"File "/home/wangsijin/projects/new_MSDN/faster_rcnn/network.py", line 64, in np_to_variable
v = Variable(torch.from_numpy(x).type(dtype))
RuntimeError: the given numpy array has zero-sized dimensions. Zero-sized dimensions are not supported in PyTorch"

I don't know how to solve it and hope someone can help me. Thanks a lot.

~/s/e/M/faster_rcnn> ./make.sh 
  File "setup.py", line 89
    print extra_postargs
                       ^
SyntaxError: Missing parentheses in call to 'print'. Did you mean print(extra_postargs)?
Compiling roi pooling kernels by nvcc...
Traceback (most recent call last):
  File "build.py", line 3, in <module>
    from torch.utils.ffi import create_extension
  File "/usr/lib/python3.7/site-packages/torch/utils/ffi/__init__.py", line 1, in <module>
    raise ImportError("torch.utils.ffi is deprecated. Please use cpp extensions instead.")
ImportError: torch.utils.ffi is deprecated. Please use cpp extensions instead.

Maybe the readme should explicitly state python2 if that is the issue? Maybe a requirements.txt should be made for making a conda environment if older versions of packages are needed?

I tried using a python=2.7 anaconda env, and got the same result.

facing ffi error

ImportError: torch.utils.ffi is deprecated. Please use cpp extensions instead.

Please suggest me how I can resolve this issue.

Version of code

Thank you for your code
hi

I want to know your version of torch , cuda, python,

best regard
thank you