I'm trying to use multiple gpus, which is 4, to train mx-rcnn. The code I use is train

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

We can see that the only _minus is <code class="notra

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Custom Training: Infer_shape error when training rcnn about mx-rcnn HOT 7 CLOSED

commented on August 19, 2024

Custom Training: Infer_shape error when training rcnn

from mx-rcnn.

Comments (7)

ijkguo commented on August 19, 2024

There is only one _minus operation in get_vgg_rcnn. What changes did you do?

from mx-rcnn.

commented on August 19, 2024

@precedenceguo I only changed the class labels in pascal_voc.py to fit my datasets and set config.TEST.CXX_PROPOSAL = False, for 'Symbol' doesn't have the 'Proposal' attribute.
Here is the get_vgg_rcnn code I use:

data = mx.symbol.Variable(name="data")
rois = mx.symbol.Variable(name='rois')
label = mx.symbol.Variable(name='label')
bbox_target = mx.symbol.Variable(name='bbox_target')
bbox_weight = mx.symbol.Variable(name='bbox_weight')

# reshape input
rois = mx.symbol.Reshape(data=rois, shape=(-1, 5), name='rois_reshape')
label = mx.symbol.Reshape(data=label, shape=(-1, ), name='label_reshape')
bbox_target = mx.symbol.Reshape(data=bbox_target, shape=(-1, 4 * num_classes), name='bbox_target_reshape')
bbox_weight = mx.symbol.Reshape(data=bbox_weight, shape=(-1, 4 * num_classes), name='bbox_weight_reshape')

# shared convolutional layers
relu5_3 = get_vgg_conv(data)

# Fast R-CNN
pool5 = mx.symbol.ROIPooling(
    name='roi_pool5', data=relu5_3, rois=rois, pooled_size=(7, 7), spatial_scale=1.0 / config.RCNN_FEAT_SRTIDE)
# group 6
flatten = mx.symbol.Flatten(data=pool5, name="flatten")
fc6 = mx.symbol.FullyConnected(data=flatten, num_hidden=4096, name="fc6")
relu6 = mx.symbol.Activation(data=fc6, act_type="relu", name="relu6")
drop6 = mx.symbol.Dropout(data=relu6, p=0.5, name="drop6")
# group 7
fc7 = mx.symbol.FullyConnected(data=drop6, num_hidden=4096, name="fc7")
relu7 = mx.symbol.Activation(data=fc7, act_type="relu", name="relu7")
drop7 = mx.symbol.Dropout(data=relu7, p=0.5, name="drop7")
# classification
cls_score = mx.symbol.FullyConnected(name='cls_score', data=drop7, num_hidden=num_classes)
cls_prob = mx.symbol.SoftmaxOutput(name='cls_prob', data=cls_score, label=label, normalization='batch')
# bounding box regression
bbox_pred = mx.symbol.FullyConnected(name='bbox_pred', data=drop7, num_hidden=num_classes * 4)
bbox_loss_ = bbox_weight * mx.symbol.smooth_l1(name='bbox_loss_', scalar=1.0, data=(bbox_pred - bbox_target))
bbox_loss = mx.sym.MakeLoss(name='bbox_loss', data=bbox_loss_, grad_scale=1.0 / config.TRAIN.BATCH_ROIS)

# reshape output
cls_prob = mx.symbol.Reshape(data=cls_prob, shape=(config.TRAIN.BATCH_IMAGES, -1, num_classes), name='cls_prob_reshape')
bbox_loss = mx.symbol.Reshape(data=bbox_loss, shape=(config.TRAIN.BATCH_IMAGES, -1, 4 * num_classes), name='bbox_loss_reshape')

# group output
group = mx.symbol.Group([cls_prob, bbox_loss])
return group

from mx-rcnn.

ijkguo commented on August 19, 2024

We can see that the only _minus is bbox_pred - bbox_target.
Use sym.tojson() to check where is the _minus2

from mx-rcnn.

commented on August 19, 2024

@precedenceguo Yes, it indeed has only one 'minus', but it's name is _minus2. Part of the .json file is like below:

{
  "op": "null", 
  "param": {}, 
  "name": "bbox_target", 
  "inputs": [], 
  "backward_source_id": -1
}, 
{
  "op": "Reshape", 
  "param": {
    "keep_highest": "False", 
    "reverse": "False", 
    "shape": "(-1,84)", 
    "target_shape": "(0,0)"
  }, 
  "name": "bbox_target_reshape", 
  "inputs": [[83, 0]], 
  "backward_source_id": -1
}, 
{
  "op": "_Minus", 
  "param": {}, 
  _"name": "_minus2",_ 
  "inputs": [[82, 0], [84, 0]], 
  "backward_source_id": -1
}, 
{
  "op": "smooth_l1", 
  "param": {"scalar": "1"}, 
  "name": "bbox_loss_", 
  "inputs": [[85, 0]], 
  "backward_source_id": -1
}, 
{
  "op": "_Mul", 
  "param": {}, 
  "name": "_mul2", 
  "inputs": [[79, 0], [86, 0]], 
  "backward_source_id": -1
}, 
{
  "op": "MakeLoss", 
  "param": {
    "grad_scale": "0.0078125", 
    "normalization": "null", 
    "valid_thresh": "0"
  }, 
  "name": "bbox_loss", 
  "inputs": [[87, 0]], 
  "backward_source_id": -1
},

from mx-rcnn.

ijkguo commented on August 19, 2024

OK, so
mxnet.base.MXNetError: InferShape Error in _minus2's rhs argument
Shape inconsistent, Provided=(1536,12), inferred shape=(256,12)
means that lhs is shaped (1536,12) and rhs is shaped (256,12). lhs is bbox_pred and rhs is bbox_target.

Would you please check their shape again, which refer to the bbox_pred fc and the bbox_target io?

from mx-rcnn.

ijkguo commented on August 19, 2024

Why not checkout coco as a different dataset?

from mx-rcnn.

commented on August 19, 2024

@precedenceguo Thank you! I will checkout coco to give a try.

from mx-rcnn.

Custom Training: Infer_shape error when training rcnn about mx-rcnn HOT 7 CLOSED

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent