Giter Club home page Giter Club logo

detectron.pytorch's People

Contributors

jiasenlu avatar jwyang avatar roytseng-tw avatar vfdev-5 avatar yuliang-zou avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

detectron.pytorch's Issues

ImportError: dynamic module does not define module export function (PyInit_bbox)

Hi, @roytseng-tw
I encounter the import problems (python3.5), need help. Thanks.

$ python3 tools/train_net_step.py --dataset coco2017 --cfg configs/e2e_mask_rcnn_R-50-C4.yml --use_tfboard --bs 4 --nw 4

Traceback (most recent call last):
  File "tools/train_net_step.py", line 25, in <module>
    from datasets.roidb import combined_roidb_for_training
  File "/home/yuekaiyu/code/Detectron.pytorch/lib/datasets/roidb.py", line 27, in <module>
    import utils.boxes as box_utils
  File "/home/yuekaiyu/code/Detectron.pytorch/lib/utils/boxes.py", line 52, in <module>
    import utils.bbox as cython_bbox
ImportError: dynamic module does not define module export function (PyInit_bbox)

hangs in training

Thanks for your codes!
I was able to successfully train configs/e2e_mask_rcnn_R-50-FPN_1x.yaml with a (batch_size, learning_rate) = (8, 0.01) until a certain number of iterations (max = ~60K). So far, the losses look quite similar to your benchmark.
Training speed is also quite comparable to Detectron
The issue I'm having is the training hangs randomly at a certain iteration, which is not consistent from run to run, sometimes after 5K, 1K, or 60K iterations.
I'm using 4 V-100 GPUs.

Any thoughts?

Negative areas found

When I try to run the training code, I get the following error:

RuntimeWarning: Negative areas found: 3

I'm running: e2e_mask_rcnn_R-101-FPN_2x.yaml

About resume

Hello, I try to resume the training by using this command:

 python tools/train_net_step.py --dataset coco2017 --cfg configs/e2e_faster_rcnn_R-101-FPN_1x.yaml --use_tfboard --load_ckpt  Outputs/e2e_faster_rcnn_R-101-FPN_1x/May02-12-15-12_faster_step/ckpt/model_step69999.pth --resume

However, it throw out a runtime error

Traceback (most recent call last):
  File "tools/train_net_step.py", line 367, in main
    optimizer.step()
  File "/home/philokey/.virtualenvs/py3/lib/python3.5/site-packages/torch/optim/sgd.py", line 94, in step
    buf.mul_(momentum).add_(1 - dampening, d_p)
RuntimeError: invalid argument 3: sizes do not match at /pytorch/torch/lib/THC/generated/../generic/THCTensorMathPointwise.cu:271

How can I solve this problem?

Undefined names: CUDA, CylinderGridGenFunction

flake8 testing of https://github.com/roytseng-tw/mask-rcnn.pytorch on Python 3.6

$ flake8 . --count --select=E901,E999,F821,F822,F823 --show-source --statistics

./lib/setup.py:93:48: F821 undefined name 'CUDA'
            self.set_executable('compiler_so', CUDA['nvcc'])
                                               ^
./lib/model/roi_crop/modules/gridgen.py:39:18: F821 undefined name 'CylinderGridGenFunction'
        self.f = CylinderGridGenFunction(self.height, self.width, lr=lr)
                 ^
2     F821 undefined name 'CylinderGridGenFunction'
2

Combine train and val?

Hi, just ask, would you consider to combine the train and val together?
Like a standard one: every epoch do the validation before ckpt, if val accuracy/loss is higher then save the ckpt.
I know this need some work to be done.
But it would be very convenience and easily to start.

Now i am working on it, but i guess i can do the validation after every ckpt is generated during training, (based on your test_net.py, load the ckpt and val it). model.train->model.save->model.eval
It would be more efficient if you can prove a example that model.train->model.eval->model.save.

Error when running test on one GPU when multiple are available.

Hi, I tried to run a test today and got the following error:

Traceback (most recent call last):
  File "tools/test_net.py", line 108, in <module>
    check_expected_results=True)
  File "/home/rizhiy/object-detection/Detectron.pytorch/lib/core/test_engine.py", line 128, in run_inference
    all_results = result_getter()
  File "/home/rizhiy/object-detection/Detectron.pytorch/lib/core/test_engine.py", line 108, in result_getter
    multi_gpu=multi_gpu_testing
  File "/home/rizhiy/object-detection/Detectron.pytorch/lib/core/test_engine.py", line 158, in test_net_on_dataset
    args, dataset_name, proposal_file, output_dir, gpu_id=gpu_id
  File "/home/rizhiy/object-detection/Detectron.pytorch/lib/core/test_engine.py", line 253, in test_net
    cls_boxes_i, cls_segms_i, cls_keyps_i = im_detect_all(model, im, box_proposals, timers)
  File "/home/rizhiy/object-detection/Detectron.pytorch/lib/core/test.py", line 66, in im_detect_all
    model, im, cfg.TEST.SCALE, cfg.TEST.MAX_SIZE, box_proposals)
  File "/home/rizhiy/object-detection/Detectron.pytorch/lib/core/test.py", line 127, in im_detect_bbox
    return_dict = model(**inputs)
  File "/home/rizhiy/miniconda3/envs/Detectron.pytorch/lib/python3.6/site-packages/torch/nn/modules/module.py", line 357, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/rizhiy/object-detection/Detectron.pytorch/lib/nn/parallel/data_parallel.py", line 82, in forward
    mini_kwargs = dict([(k, v[i]) for k, v in kwargs.items()])
  File "/home/rizhiy/object-detection/Detectron.pytorch/lib/nn/parallel/data_parallel.py", line 82, in <listcomp>
    mini_kwargs = dict([(k, v[i]) for k, v in kwargs.items()])
IndexError: list index out of range

The command I used to run the test: python tools/test_net.py --cfg configs/e2e_mask_rcnn_R-101-FPN_2x.yaml --load_ckpt Outputs/e2e_mask_rcnn_R-101-FPN_2x/Apr19-11-34-35_devbox/ckpt/model_7_29315.pth --dataset coco2017.

It appears that there is some inconsistency in the number of devices during setup.

Not sure what needs to be fixed, but as a workaround, you can just restrict python to one GPU with CUDA_VISIBLE_DEVICES=0.

error when training using one GPU when multiple GPUs are available

I have 1 trivial GPU0 and 4 GPUs (1,2,3,4) in my machine. If I do not specify GPU to use and input:
python tools/train_net_step.py --dataset coco2017 --cfg configs/e2e_faster_rcnn_R-101-FPN_1x.yaml
the error is:
path/miniconda3/lib/python3.6/site-packages/torch/cuda/init.py:116: UserWarning:
Found GPU1 Quadro K600 which is of cuda capability 3.0.
PyTorch no longer supports this GPU because it is too old.

warnings.warn(old_gpu_warn % (d, name, major, capability[1]))
INFO train_net_step.py: 361: Training starts !
INFO net.py: 72: Changing learning rate 0.000000 -> 0.006667
Traceback (most recent call last):
File "tools/train_net_step.py", line 437, in
main()
File "tools/train_net_step.py", line 407, in main
net_outputs = maskRCNN(**input_data)
File "path/miniconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in call
result = self.forward(*input, **kwargs)
File "path/CODE/Detectron.pytorch/lib/nn/parallel/data_parallel.py", line 82, in forward
mini_kwargs = dict([(k, v[i]) for k, v in kwargs.items()])
File "path/CODE/Detectron.pytorch/lib/nn/parallel/data_parallel.py", line 82, in
mini_kwargs = dict([(k, v[i]) for k, v in kwargs.items()])
IndexError: list index out of range

Don't have good results when use pre-trained Detectron model

When I run infer_simple.py with pre-trained Detectron model, I don't have good results. The command is like as:
python3 tools/infer_simple.py --dataset coco --cfg configs/e2e_mask_rcnn_R-101-FPN_2x.yaml --load_detectron configs/e2e_mask_rcnn_R-101-FPN_2x.pkl --image_dir demo/sample_images --output_dir demo/out,
the scores of objects are very low as 0.08, I can't get accurate results.
So what's wrong?

Undefined names

See #5

flake8 testing of https://github.com/roytseng-tw/Detectron.pytorch on Python 3.6.3

$ flake8 . --count --select=E901,E999,F821,F822,F823 --show-source --statistics

./lib/core/test.py:315:13: F821 undefined name 'image_utils'
    im_ar = image_utils.aspect_ratio_rel(im, aspect_ratio)
            ^
./lib/core/test.py:402:18: F821 undefined name 'im_conv_body_only'
    im_scale_i = im_conv_body_only(model, im, cfg.TEST.SCALE, cfg.TEST.MAX_SIZE)
                 ^
./lib/core/test.py:465:16: F821 undefined name 'im_conv_body_only'
    im_scale = im_conv_body_only(model, im_hf, target_scale, target_max_size)
               ^
./lib/core/test.py:482:20: F821 undefined name 'im_conv_body_only'
        im_scale = im_conv_body_only(model, im, target_scale, target_max_size)
                   ^
./lib/core/test.py:491:13: F821 undefined name 'image_utils'
    im_ar = image_utils.aspect_ratio_rel(im, aspect_ratio)
            ^
./lib/core/test.py:499:20: F821 undefined name 'im_conv_body_only'
        im_scale = im_conv_body_only(
                   ^
./lib/core/test.py:569:16: F821 undefined name 'im_conv_body_only'
    im_scale = im_conv_body_only(model, im, cfg.TEST.SCALE, cfg.TEST.MAX_SIZE)
               ^
./lib/core/test.py:640:16: F821 undefined name 'im_conv_body_only'
    im_scale = im_conv_body_only(model, im_hf, target_scale, target_max_size)
               ^
./lib/core/test.py:658:20: F821 undefined name 'im_conv_body_only'
        im_scale = im_conv_body_only(model, im, target_scale, target_max_size)
                   ^
./lib/core/test.py:669:13: F821 undefined name 'image_utils'
    im_ar = image_utils.aspect_ratio_rel(im, aspect_ratio)
            ^
./lib/core/test.py:677:20: F821 undefined name 'im_conv_body_only'
        im_scale = im_conv_body_only(
                   ^
11    F821 undefined name 'image_utils'
11

pytorch 0.4 support ?

Hi, thanks for your great work!
I want to know that will Detectron.pytorch support pytorch>=0.4?

How to train with a smaller net-input size such as (640,480)?

Thanks for sharing your great job.
By printing the size of image, I get 768x1344 of resnet50-fpn model.
However, in my case, I want to retrain this network using a smaller net-input size, such as 640x480.

I simply tried to modify the config file of e2e_mask_rcnn_R-50-FPN_2x.yaml as follows:
image

But during traing, it said python double free or corruption error:
image

The tensorboard show this error is occured after 120 steps but not the start-training time....
image

Can you give me some advices for solving this error?
Have you trained with a smaller net-input size?

Doubts about the loss_cls and accuracy_cls calculation

Hi:
I have some doubts in the evaluation of loss_cls and accuracy_cls in function of fast_rcnn_losses in lib/modling/fast_rcnn_heads.py.
Based on my understanding, the following calculation seems assume cls_score and rois_label have the same length and matching order. Like pred [0,1,1], lable [0,1,2] (just the idea).
But the real is more like pred [0,1,2,3], label [0,1,2] (pred length may more or less And the order may not match).
Based on my experience in matterport's mask rcnn. Before calculate the class loss and accuracy, there is operation will matching the pred and label in order based on the nearest box. Basically, it found the nearest pred bbox as the 'right' pred for one label box (make sense).

I didnot found some operation in the code yet, i guess i ignore or misunderstand something (new to the mask rcnn/faster rcnn).
So my real question is how do you make sure the pred class and label class matching before calculate the loss/accuracy?
thanks.

def fast_rcnn_losses(cls_score, bbox_pred, label_int32, bbox_targets,
                     bbox_inside_weights, bbox_outside_weights):
    device_id = cls_score.get_device()
    rois_label = Variable(torch.from_numpy(label_int32.astype('int64'))).cuda(device_id)
    loss_cls = F.cross_entropy(cls_score, rois_label)
    ........
    # class accuracy
    cls_preds = cls_score.max(dim=1)[1].type_as(rois_label)
    accuracy_cls = cls_preds.eq(rois_label).float().mean(dim=0)

    return loss_cls, loss_bbox, accuracy_cls

mismatch of shape while loading from .pkl file

I tried inference with e2e_keypoint_rcnn_R-50-FPN_s1x.yaml using pkl file available from Detectron @https://github.com/facebookresearch/Detectron/blob/master/MODEL_ZOO.md

CUDA :9.0
GPU: K80
Pytorch: 0.4.0
python:2.7

Got this error:

File "tools/infer_simple.py", line 176, in
main()
File "tools/infer_simple.py", line 128, in main
load_detectron_weight(maskRCNN, args.load_detectron)
File "/home/tester/detectron/mask-rcnn.pytorch/lib/utils/detectron_weight_helper.py", line 22, in load_detectron_weight
p_tensor.copy_(torch.Tensor(src_blobs[d_name]))
RuntimeError: The expanded size of the tensor (81) must match the existing size (2) at non-singleton dimension 0

Cannot run inference in Jupyter

@roytseng-tw Hi,
I run the inference based on your infer_simple.py successfully .
At same environment (one gpu, same machine, same folder path), i use it in Jupyter for inference, but give me error in the data_parallel. Any ideal?

This is my Jupyter move:
I use following load the per-trained model with success return model structure.

cfg.MODEL.NUM_CLASSES = 3
cfg_file = 'configs/e2e_mask_rcnn_R-50-C4_1x.yaml'
load_name= '/home/ubuntu/Detectron_master/Outputs/e2e_mask_rcnn_R-50-C4_1x/May04-11-28-11_ubuntu16_step/ckpt/model_step19999.pth'

cfg_from_file(cfg_file)
assert_and_infer_cfg()

maskRCNN = Generalized_RCNN()
maskRCNN.cuda()
checkpoint = torch.load(load_name, map_location=lambda storage, loc: storage)
net_utils.load_ckpt(maskRCNN, checkpoint['model'])
maskRCNN = mynn.DataParallel(maskRCNN, cpu_keywords=['im_info', 'roidb'],
                                 minibatch=True)
maskRCNN.eval()

However, when i next call it in a im_detect_all

im=cv2.imread('test.jpg')
cls_boxes, cls_segms, cls_keyps = im_detect_all(maskRCNN, im, timers=timers)

It give me a mini_kwargs = dict([(k, v[i]) for k, v in kwargs.items()]) error.

IndexError                                Traceback (most recent call last)
<ipython-input-5-f3b1e8bf1385> in <module>()
      9 timers = defaultdict(Timer)
     10 print('entry[image]',entry['image'])
---> 11 cls_boxes, cls_segms, cls_keyps = im_detect_all(maskRCNN, im, timers=timers)

~/Detectron_master/lib/core/test.py in im_detect_all(model, im, box_proposals, timers)
     68     else:
     69         scores, boxes, im_scale, blob_conv = im_detect_bbox(
---> 70             model, im, cfg.TEST.SCALE, cfg.TEST.MAX_SIZE, box_proposals)
     71     timers['im_detect_bbox'].toc()
     72 

~/Detectron_master/lib/core/test.py in im_detect_bbox(model, im, target_scale, target_max_size, boxes)
    133     inputs['im_info'] = [Variable(torch.from_numpy(inputs['im_info']), volatile=True)]
    134 
--> 135     return_dict = model(**inputs)
    136 
    137     if cfg.MODEL.FASTER_RCNN:

~/.local/lib/python3.5/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    355             result = self._slow_forward(*input, **kwargs)
    356         else:
--> 357             result = self.forward(*input, **kwargs)
    358         for hook in self._forward_hooks.values():
    359             hook_result = hook(self, input, result)

~/Detectron_master/lib/nn/parallel/data_parallel.py in forward(self, *inputs, **kwargs)
     83                 mini_inputs = [x[i] for x in inputs]
     84 
---> 85                 mini_kwargs = dict([(k, v[i]) for k, v in kwargs.items()])
     86                 # print('mini_kwargs',mini_kwargs)
     87                 a, b = self._minibatch_scatter(device_id, *mini_inputs, **mini_kwargs)

~/Detectron_master/lib/nn/parallel/data_parallel.py in <listcomp>(.0)
     83                 mini_inputs = [x[i] for x in inputs]
     84 
---> 85                 mini_kwargs = dict([(k, v[i]) for k, v in kwargs.items()])
     86                 # print('mini_kwargs',mini_kwargs)
     87                 a, b = self._minibatch_scatter(device_id, *mini_inputs, **mini_kwargs)

IndexError: list index out of range

RetinaNet

Is RetinaNet (or any other single stage detector) training/inference supported? I saw some field that correspond to RetinaNet in config.py - hence this question.

Thanks,

Error at testing with Detectron pretrained ResNet-50 architecture

Expected results

I was trying to test Detectron ResNet50 architecture with pretrained caffe weights on COCO-Val 2017 set and got the error below.

Update: Detectron repo updated with "group batch norm" feature 12 days ago. (https://github.com/facebookresearch/Detectron/tree/master/configs/04_2018_gn_baselines) I believe they also changed model files and only providing pkl files for new baselines (https://github.com/facebookresearch/Detectron/blob/master/MODEL_ZOO.md). If my assumption is true, can you upload previous .pkl files to some place, so that we can continue using your implementation in pytorch?

Actual results

loading annotations into memory...
Done (t=0.70s)
creating index...
index created!
loading annotations into memory...
Done (t=0.93s)
creating index...
index created!
INFO test_engine.py: 335: loading detectron weights data/pretrained_model/R-50.pkl
Traceback (most recent call last):
  File "tools/test_net.py", line 112, in <module>
    check_expected_results=True)
  File "/home/john/Desktop/cvav_proj/detectorn_roytseng/mask-rcnn.pytorch/lib/core/test_engine.py", line 128, in run_inference
    all_results = result_getter()
  File "/home/john/Desktop/cvav_proj/detectorn_roytseng/mask-rcnn.pytorch/lib/core/test_engine.py", line 108, in result_getter
    multi_gpu=multi_gpu_testing
  File "/home/john/Desktop/cvav_proj/detectorn_roytseng/mask-rcnn.pytorch/lib/core/test_engine.py", line 158, in test_net_on_dataset
    args, dataset_name, proposal_file, output_dir, gpu_id=gpu_id
  File "/home/john/Desktop/cvav_proj/detectorn_roytseng/mask-rcnn.pytorch/lib/core/test_engine.py", line 232, in test_net
    model = initialize_model_from_cfg(args, gpu_id=gpu_id)
  File "/home/john/Desktop/cvav_proj/detectorn_roytseng/mask-rcnn.pytorch/lib/core/test_engine.py", line 336, in initialize_model_from_cfg
    load_detectron_weight(model, args.load_detectron)
  File "/home/john/Desktop/cvav_proj/detectorn_roytseng/mask-rcnn.pytorch/lib/utils/detectron_weight_helper.py", line 21, in load_detectron_weight
    p_tensor.copy_(torch.Tensor(src_blobs[d_name]))
KeyError: 'fpn_inner_res5_2_sum_w'

Detailed steps to reproduce

I've downloaded ResNet-50 model file from Detectron github page (https://s3-us-west-2.amazonaws.com/detectron/ImageNetPretrained/MSRA/R-50.pkl).

The command I've ran is here

python tools/test_net.py --dataset coco2017 --cfg configs/e2e_mask_rcnn_R-50-FPN_1x.yaml --load_detectron data/pretrained_model/R-50.pkl

Also I get KeyError: 'conv_rpn_w' when i change config to R-50-C4_1x or R-50-C4_2x files.

System information

  • Operating system: Ubuntu 16.04
  • CUDA version: 9
  • cuDNN version: ?
  • GPU models (for all devices if they are not all the same): 1050 Ti
  • python version: 3.6.4 (Anaconda custom)
  • pytorch version: 0.3.4
  • Anything else that seems relevant: ?

coco eval perfomance

Excellent work! Have you trained from scratch and how's the performance on COCO evaluation? BTW, could you share some pre-trained weights to test on? Thanks a lot!

batch size, lr, and schedule.

According to the documentation, if I understand correctly, in some settings you changed the batch size, and thus lr proportionally, but you did not change the schedule (in terms of "iterations"). You should scale the schedule proportionally and let the solver see the same total number of images. To match curves, the x-axis should also be # of images (or equivalently, epochs), but not iterations.

Dataloader throws error during iter()

Hello, I'm trying to get the repo to work with PyTorch 0.4.
While most of the changes are rather trivial, the sampler this repo uses, return both index and aspect ratio (correct me if it is something else, but it is a tuple and the batch sampler assume integer), there isn't any straightforward way to fix it with the new dataloader structure introduced in pytorch/pytorch#1867.
What would you think is the better way to make it compatible without breaking anything?
Thank you

Python 2 support

Will you add Python 2 support for this repo? In general, I have done the following three things to make the infer_simple.py script work for python2.

  1. fix super: 3to2 -f super -w .
  2. rename utils.collections to utils.collections2 to avoid conflicting with the official collections library
  3. pickle.load(fp, encoding='latin1') -> pickle.load(fp)

An example repo is at https://github.com/taoari/Detectron.pytorch/commits/dev, will you add full support of this repo for python 2?

'Detectron.pytorch/lib/utils/detectron_weight_helper.py' can be used to inference masks,but can't inference keypoints.

Correctly inference masks:
(cuipt) cui@DemonHunters:~/mask-rcnn.pytorch$ python tools/infer_simple.py --dataset coco --cfg configs/e2e_mask_rcnn_R-101-FPN_2x.yaml --load_detectron data/model_final.pkl --image_dir demo/sample_images Called with args: Namespace(cfg_file='configs/e2e_mask_rcnn_R-101-FPN_2x.yaml', cuda=True, dataset='coco', image_dir='demo/sample_images', images=None, load_ckpt=None, load_detectron='data/model_final.pkl', merge_pdfs=True, output_dir='infer_outputs', set_cfgs=[]) load cfg from file: configs/e2e_mask_rcnn_R-101-FPN_2x.yaml loading detectron weights data/model_final.pkl img 0 person 0.999168 img 1 suitcase 0.741572 chair 0.996991 chair 0.995423 chair 0.974603 chair 0.902452 chair 0.748457 book 0.762648 chair 0.9888 clock 0.992333 img 2 train 0.99889 person 0.826093 img 3 car 0.994156 car 0.999019 truck 0.839317 car 0.995135 car 0.9096 traffic light 0.984154 car 0.99167 car 0.995001 car 0.981888
however, can't inference keyoints, so how to modify 'detectron_weight_helper.py' to inference keypoints?
`(cuipt) cui@DemonHunters:~/mask-rcnn.pytorch$ python tools/infer_simple.py --dataset keypoints_coco \

--cfg configs/e2e_mask_rcnn_R-101-FPN_2x.yaml
--load_detectron data/model_final.pkl
--image_dir demo/sample_images_keypoints
Called with args:
Namespace(cfg_file='configs/e2e_mask_rcnn_R-101-FPN_2x.yaml', cuda=True, dataset='keypoints_coco', image_dir='demo/sample_images_keypoints', images=None, load_ckpt=None, load_detectron='data/model_final.pkl', merge_pdfs=True, output_dir='infer_outputs', set_cfgs=[])
load cfg from file: configs/e2e_mask_rcnn_R-101-FPN_2x.yaml
loading detectron weights data/model_final.pkl
Traceback (most recent call last):
File "tools/infer_simple.py", line 176, in
main()
File "tools/infer_simple.py", line 128, in main
load_detectron_weight(maskRCNN, args.load_detectron)
File "/home/cui/mask-rcnn.pytorch/lib/utils/detectron_weight_helper.py", line 21, in load_detectron_weight
p_tensor.copy_(torch.Tensor(src_blobs[d_name]))
RuntimeError: invalid argument 2: sizes do not match at /pytorch/torch/lib/THC/generic/THCTensorCopy.c:51
`

importerror: no deafultdict

When i run imfer_simple.py, I meet this error in "utils/misc.py" in line
from collections import defaultdice,Iterable
so how to solve this problem?

Support for different class ckpt loaded?

Hi:
I used a customized dataset with class=3, the training is fine and the ckpt can be generated.
But, when comes to test, there is a problem when loaded the ckpt: ckpt and model's output dimension not match.

maskRCNN = Generalized_RCNN() based on assume class 81(coco class) and my ckpt is based on class3.

What i usually do is change the the output layer of model to fit different class. But the mask rcnn is more complicated than a "normal" model.

So Could you show me which layers should be changed to fit the customized num_class?

RuntimeError: While copying the parameter named Mask_Outs.classify.weight, whose dimensions in the model are torch.Size([81, 256, 1, 1]) and whose dimensions in the checkpoint are torch.Size([3, 256, 1, 1]).

Thanks

Unable to Properly Load Classes

I am trying to train a model using a custom JSON dataset that I converted to the COCO format. I've adapted the code given in train.py, but I am unable to load the classes properly. Regardless of what number of classes I specify in the config file, I am getting this same error. Is there an obvious mistake that I am making? Thank you!

timers = defaultdict(Timer)

### Dataset ###
timers['roidb'].tic()
roidb, ratio_list, ratio_index = combined_roidb_for_training(cfg.TRAIN.DATASETS, cfg.TRAIN.PROPOSAL_FILES)
timers['roidb'].toc()
train_size = len(roidb)
logger.info('{:d} roidb entries'.format(train_size))
logger.info('Takes %.2f sec(s) to construct roidb', timers['roidb'].average_time)

sampler = MinibatchSampler(ratio_list, ratio_index)
dataset = RoiDataLoader(
    roidb,
    cfg.MODEL.NUM_CLASSES,
    training=True)
dataloader = torch.utils.data.DataLoader(
    dataset,
    batch_size=args.batch_size,
    sampler=sampler,
    num_workers=cfg.DATA_LOADER.NUM_THREADS,
    collate_fn=collate_minibatch)

assert_and_infer_cfg()

The output:

INFO:datasets.json_dataset:Loading cached gt_roidb from /home/cees2/Image Project/Code/mask-rcnn.pytorch/data/cache/init_data_gt_roidb.pkl
INFO:datasets.roidb:Appending horizontally-flipped training examples...
INFO:datasets.roidb:Loaded dataset: init_data
INFO:datasets.roidb:Filtered 120 roidb entries: 120 -> 0
INFO:datasets.roidb:Computing image aspect ratios and ordering the ratios...
INFO:datasets.roidb:done
INFO:datasets.roidb:Computing bounding-box regression targets...
INFO:datasets.roidb:done
loading annotations into memory...
Done (t=0.01s)
creating index...
index created!
[]
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-4-f5a66a92f826> in <module>()
      3 ### Dataset ###
      4 timers['roidb'].tic()
----> 5 roidb, ratio_list, ratio_index = combined_roidb_for_training(cfg.TRAIN.DATASETS, cfg.TRAIN.PROPOSAL_FILES)
      6 timers['roidb'].toc()
      7 train_size = len(roidb)

~/Image Project/Code/mask-rcnn.pytorch/lib/datasets/roidb.py in combined_roidb_for_training(dataset_names, proposal_files)
     77     logger.info('done')
     78 
---> 79     _compute_and_log_stats(roidb)
     80 
     81     return roidb, ratio_list, ratio_index

~/Image Project/Code/mask-rcnn.pytorch/lib/datasets/roidb.py in _compute_and_log_stats(roidb)
    229 def _compute_and_log_stats(roidb):
    230     print(roidb)
--> 231     classes = roidb[0]['dataset'].classes
    232     char_len = np.max([len(c) for c in classes])
    233     hist_bins = np.arange(len(classes) + 1)

IndexError: list index out of range

Hi roytseng, I'd like to put a project based on your 'Detectron.pytorch' project to my Github reposity, could I?

Inspired by '4K Video Demo by Karol Majek' at https://github.com/matterport/Mask_RCNN#projects-using-this-model, and based on your 'Detectron.pytorch' project, I built a toy project.
Compared Karol Majek's, my project blent human masks and human keypoints together, it seemed funny, so I'd like to put a project based on your 'Detectron.pytorch' project to my Github reposity, could I?
The demo video is below.
Could you visit this demo video at 'youku.com'?http://v.youku.com/v_show/id_XMzU2MDYyNDQ5Mg==.html?spm=a2hzp.8244740.0.0
Looking forward to hearing from you soon.

loss_rcnn_box is Nan

I was able to successfully train a model with a custom dataset using the command line arguments and train.py file given. I refactored the train.py code to run with hardcoded variables instead of command line arguments. Yet in my own script, after the first step, the loss_rcnn_bbox values are Nan, which will then crash the program. What could be possible causes?

        outputs = maskRCNN(**input_data)

        rois_label = outputs['rois_label']
        cls_score = outputs['cls_score']
        bbox_pred = outputs['bbox_pred']
        loss_rpn_cls = outputs['loss_rpn_cls'].mean()
        loss_rpn_bbox = outputs['loss_rpn_bbox'].mean()
        loss_rcnn_cls = outputs['loss_rcnn_cls'].mean()
        print(outputs['loss_rcnn_bbox'].mean()) #this value is Nan
        loss_rcnn_bbox = outputs['loss_rcnn_bbox'].mean()

Poor training results

Hi, I have trained R-101-FPN with coco2017, using 4 GPUs, but only got mmAP=0.33 during test which is well below Detectron result of 0.40.

What can be the problem?

I have used python tools/train_net.py --dataset coco2017 --cfg configs/e2e_mask_rcnn_R-101-FPN_2x.yaml --use-tfboard --nw 8 --b 8 for training and python tools/test_net.py --cfg configs/e2e_mask_rcnn_R-101-FPN_2x.yaml --load_ckpt Outputs/e2e_mask_rcnn_R-101-FPN_2x/Apr19-11-34-35_devbox/ckpt/model_7_29315.pth --dataset coco2017

The loss at the end was about 0.6 which also seems a bit high.

A bug when running train_net_step.py

Hi roytseng-tw, I run into the following bug when running the "train_net_step.py". Do you have any ideas about the reason? Thanks.

main()

File "tools/train_net_step.py", line 227, in main
dataiterator = iter(dataloader)
File "/home/wxk/anaconda/envs/pytorch/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 428, in iter
return _DataLoaderIter(self)
File "/home/wxk/anaconda/envs/pytorch/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 244, in init
self._put_indices()
File "/home/wxk/anaconda/envs/pytorch/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 292, in _put_indices
indices = next(self.sample_iter, None)
File "/home/wxk/anaconda/envs/pytorch/lib/python3.6/site-packages/torch/utils/data/sampler.py", line 120, in iter
batch.append(int(idx))
TypeError: int() argument must be a string, a bytes-like object or a number, not 'tuple'

Compile error: 'cuda.h'

Not really an issue, just want to share my experience.

If you are using the code in some clusters, cuda might not be installed under /usr/local/cuda/. In this case, in addition to modifying CUDA_PATH in make.sh. You might also need to specify CPATH=/path/to/your/cuda/include.

For example
CPATH=/path/to/your/cuda/include ./make.sh

A trouble to understand the attribute of "training" of the class "CollectAndDistributeFpnProposalOp()

I was running the "test_net.py". In the /lib/modeling/FPN.py file, there is such a line "self.CollectAndDistributeFpnRpnProposals = CollectAndDistributeFpnRpnProposalsOp()" in the constructor of the "fpn_rpn_outputs" class. Since the CollectAndDistributeFpnRpnProposalsOp class inherits the nn.module which has an attribute named "training" and it is "True" by default, so the CollectAndDistributeFpnRpnProposals object's "training" attribute is also "True".

But when I print out the "self.CollectAndDistributeFpnRpnProposals.training" in the "forward" function of the "fpn_rpn_outputs" class, I saw a "False".

Do you know when the "training" attribute of the CollectAndDistributeFpnRpnProposals object is set to be False?

inference time

Hi, do you compare the inference time to caffe2, which one is faster?
If I want to make inference of many images at the same time, could the average process time be shorter?

data_parallel error?

Hi, thanks for contribution of mask rcnn, i like the ideal of building with different modules(you can try different backbone, box head, mask head), which has high potential for improvements.
I try to using my customer dataset with coco style in this project. (Already successfully implemented in matterport's tf+keras mask rcnn)
But i get the following errors and get no clue.
I guess it is something in the data_parallel?
Any suggestions/ideals are welcome.

Namespace(batch_size=2, cfg_file='/home/ubuntu/skin_demo/Tooth/Detection/configs/e2e_mask_rcnn_R-50-C4_1x.yaml', cuda=True, dataset='coco2014', disp_interval=20, load_ckpt=None, load_detectron=None, lr=None, lr_decay_gamma=None, no_save=False, num_workers=1, optimizer=None, resume=False, set_cfgs=[], start_step=0, use_tfboard=True)
Batch size change from 1 (in config file) to 2
NUM_GPUs: 1, TRAIN.IMS_PER_BATCH: 2
Number of data loading threads: 1
Adjust BASE_LR linearly according to batch size change: 0.01 --> 0.02
loading annotations into memory...
Done (t=0.26s)
creating index...
index created!
INFO json_dataset.py: 298: Loading cached gt_roidb from /home/ubuntu/skin_demo/Tooth/Detection/Detectron.pytorch-master/data/cache/coco_2014_train_gt_roidb.pkl
INFO roidb.py:  50: Appending horizontally-flipped training examples...
INFO roidb.py:  52: Loaded dataset: coco_2014_train
INFO roidb.py: 143: Filtered 0 roidb entries: 578 -> 578
INFO roidb.py:  69: Computing image aspect ratios and ordering the ratios...
INFO roidb.py:  71: done
INFO roidb.py:  75: Computing bounding-box regression targets...
INFO roidb.py:  77: done
INFO train_net_step.py: 203: 578 roidb entries
INFO train_net_step.py: 204: Takes 1.24 sec(s) to construct roidb
INFO train_net_step.py: 319: Training starts !
INFO net.py:  72: Changing learning rate 0.000000 -> 0.006667
Traceback (most recent call last):
  File "tools/train_net_step.py", line 397, in <module>
    main()
  File "tools/train_net_step.py", line 364, in main
    net_outputs = maskRCNN(**input_data)
  File "/home/ubuntu/.local/lib/python3.5/site-packages/torch/nn/modules/module.py", line 357, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/ubuntu/skin_demo/Tooth/Detection/Detectron.pytorch-master/lib/nn/parallel/data_parallel.py", line 113, in forward
    outputs = [self.module(*inputs[0], **kwargs[0])]
  File "/home/ubuntu/.local/lib/python3.5/site-packages/torch/nn/modules/module.py", line 357, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/ubuntu/skin_demo/Tooth/Detection/Detectron.pytorch-master/lib/modeling/model_builder.py", line 116, in forward
    roidb = list(map(lambda x: blob_utils.deserialize(x)[0], roidb))
  File "/home/ubuntu/skin_demo/Tooth/Detection/Detectron.pytorch-master/lib/modeling/model_builder.py", line 116, in <lambda>
    roidb = list(map(lambda x: blob_utils.deserialize(x)[0], roidb))
  File "/home/ubuntu/skin_demo/Tooth/Detection/Detectron.pytorch-master/lib/utils/blob.py", line 176, in deserialize
    return pickle.loads(arr.astype(np.uint8).tobytes())
AttributeError: 'list' object has no attribute 'astype'

Unpickling error while training from scratch e2e mask rcnn for Resnet-50-C4 (1x).

Conda 4.5, Python 3.6, Pytorch 0.3.1

Traceback (most recent call last):
  File "tools/train_net_step.py", line 391, in <module>
    main()
  File "tools/train_net_step.py", line 222, in main
    maskRCNN = Generalized_RCNN()
mask-rcnn.pytorch/lib/modeling/model_builder.py", line 98, in __init__
    self._init_modules()
mask-rcnn.pytorch/lib/modeling/model_builder.py", line 102, in _init_modules
    resnet_utils.load_pretrained_imagenet_weights(self)
/mask-rcnn.pytorch/lib/utils/resnet_weights_helper.py", line 21, in load_pretrained_imagenet_weights
    pretrianed_state_dict = convert_state_dict(torch.load(weights_file))
lib/python3.6/site-packages/torch/serialization.py", line 267, in load
    return _load(f, map_location, pickle_module)
lib/python3.6/site-packages/torch/serialization.py", line 410, in _load
    magic_number = pickle_module.load(f)
_pickle.UnpicklingError: invalid load key, '<'.

What am I missing?
Please help.

Eval code for COCO

Hi, can you provide some eval APIs so that we can test the performance on COCO?

Documentation

Hello!

Is it possible to add documentation for model? for example, for forward params?

there are not those two function

File "tools/train_net.py", line 25, in
import utils.misc as misc_utils
File "/mnt/disk1/oujie/pytorch_mask/Detectron.pytorch-master/lib/utils/misc.py", line 3, in
from collections import defaultdict, Iterable
ImportError: cannot import name defaultdict

RuntimeError: received 0 items of ancdata

I got the following error during training:

Traceback (most recent call last):
  File "tools/train_net.py", line 316, in main
    for step, input_data in zip(range(args.start_iter, iters_per_epoch), dataloader):
  File "/home/rizhiy/miniconda3/envs/Detectron.pytorch/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 275, in __next__
    idx, batch = self._get_batch()
  File "/home/rizhiy/miniconda3/envs/Detectron.pytorch/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 254, in _get_batch
    return self.data_queue.get()
  File "/home/rizhiy/miniconda3/envs/Detectron.pytorch/lib/python3.6/multiprocessing/queues.py", line 337, in get
    return _ForkingPickler.loads(res)
  File "/home/rizhiy/miniconda3/envs/Detectron.pytorch/lib/python3.6/site-packages/torch/multiprocessing/reductions.py", line 70, in rebuild_storage_fd
    fd = df.detach()
  File "/home/rizhiy/miniconda3/envs/Detectron.pytorch/lib/python3.6/multiprocessing/resource_sharer.py", line 58, in detach
    return reduction.recv_handle(conn)
  File "/home/rizhiy/miniconda3/envs/Detectron.pytorch/lib/python3.6/multiprocessing/reduction.py", line 182, in recv_handle
    return recvfds(s, 1)[0]
  File "/home/rizhiy/miniconda3/envs/Detectron.pytorch/lib/python3.6/multiprocessing/reduction.py", line 161, in recvfds
    len(ancdata))
RuntimeError: received 0 items of ancdata

Low GPU utilization

I'm training on 4 GPUs with 8 workers but getting only about 50% GPU utilization.

What can be the problem?

A trouble to understand the __getitem__ method in RoiDataLoader class

I am trying to understand the signature of the "getitem" method of the "RoiDataLoader" class in the /lib/roi_data/loader.py file. That class is a subclass of the abstract class "dataset" in pytorch. In the definition of "dataset" in pytorch, the "getitem" method supports integer indexing in range from 0 to len(self) exclusive. But for the RoiDataLoader, the parameter for "getitem" method is an index_tuple. Could you explain how it works?

Thanks.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.